運用殘差網路及隨機森林預測方法於加速H.266/QTMT幀內編碼

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃昱寰 Yu-Huan Huang
論文名稱：	運用殘差網路及隨機森林預測方法於加速H.266/QTMT幀內編碼 Speed up H.266/QTMT Intra Coding based on the Predictions of ResNet Model and Random Forest Classifier
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	杭學鳴 Hsueh-Ming Hang 郭天穎 Tien-Ying Kuo 吳怡樂 Yi-Leh Wu 花凱龍 Kai-Lung Hua
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	79
中文關鍵詞：	視訊編碼、幀內編碼、快速演算法、深度學習
外文關鍵詞：	Quadtree with multi-type tree (QTMT), Fast intra coding
相關次數：	點閱：220 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著5G通信的發展，帶動4K以上的超高畫質影像、擴增實境(AR)/虛擬實境(VR)、360度全景投影等新技術的進步。為了提供更高畫質的沉浸式體驗，需要更多儲存空間及網路頻寬，其中高畫質視訊壓縮的效能扮演關鍵角色。JVET(Joint Video Exploration Team)自2015年開始制定新的視訊壓縮標準H.266/VVC(Versatile Video Coding)，預計於2020年正式發布為新一代的國際視訊壓縮標準。H.266/VVC採用了四分樹加多分樹劃分(QuadTree plus Multi-Type tree, QTMT)的編碼架構，即在原QT葉節點下再使用多分樹(Multi-type Tree)繼續向下劃分，使CU不侷限於正方形，而能夠根據影像的局部紋理採用合適的尺寸；另外為了匹配更多型態的紋理特徵，幀內預測模式也從原先的35種增加到67種。這些新技術讓H.266/VVC編碼效能大幅提升，但也需要更高的運算複雜度，實驗顯示H.266幀內編碼所需時間為H.265/HEVC的18倍。本論文針對H.266幀內編碼架構，研究快速編碼模式決策演算法。我們提出了劃分深度及劃分模式的快速決策：(1)在深度快速決策的部分，針對32×32大小的亮度區塊(Luma block)，使用ResNetV2的架構來抽取影像特徵並預測當前區塊BTDepth的最大值，再將預測所得到的Label對應到深度範圍表，對於不在範圍內的深度則可透過提前跳過(Early Skip)或提前終止(Early Terminate)來降低編碼運算量；(2)在劃分模式快速決策的部分，同樣使用32×32大小的亮度區塊作為預測標的。為了模擬多標籤分類器的功能，使用隨機森林分別訓練六種劃分模式的二元分類器，再將個別的預測結果組合成6 digits的整數串，對於預測值為0的劃分模式及其後續subCUs則不再嘗試劃分，藉此省去遞迴劃分所花費的時間。將兩種快速決策演算法結合之後，實驗結果顯示，與VTM7.0的原始編碼結果相比，本論文提出的方法在BDBR僅上升0.70%的情況下，能夠節省39.16%的編碼時間。

The fifth generation (5G) network enables the advance of video processing techniques, such as 4k and 8k UHD, augmented/virtual reality (AR/VR) and 360° videos. To provide high fidelity immersive experience, it needs high-speed communication, low-latency transmission, and huge data storage space, all of which depend on Highly Efficient Video Coding (HEVC) techniques. The Joint Video Exploration Team started to study the next-generation video compression standard - H.266/VVC (Versatile Video Coding) since 2015, which is expected to be finalized in 2020. The H.266/VVC adopts QuadTree (QT) plus Multi-Type (MT) tree (QTMT) block partition structure to encode one coding unit (CU). Apart from the recursive QT partition structure adopted in the H.265/HEVC, the recursive MT is applied to each leaf node to enable more flexible block partitions according to different local texture. Besides, it increases the number of Intra prediction modes from 35 to 67 to well encode various texture patterns. These new techniques enable the H.266/VVC to achieve high video coding efficiency, but also lead to very high time complexity. In comparison, the time complexity of H.266 is 18 times of that of the H.265 in terms of execution time. In this research, to reduce the VVC encoding time complexity, we proposed one fast CU depth decision method and one fast CU coding mode decision method: (1) For the fast CU depth decision, we select 32×32 luma blocks as training samples and utilize a ResNetV2 model to predict the maximum BTDepth value of one CU. Based on the predicted depth label, it can get a depth range from a pre-determined lookup table. It performs RDO tests only for depths in this range to reduce time complexity while maintaining coding quality; (2) For fast CU coding mode decision, we utilize a Random Forest learning algorithm for classification of the same size luma blocks, in which 6 different binary classifiers act as a multi-label classifier. If the predicted coding mode is label “0”, there is no need to further partition on this CU and its subCUs. Experiments showed that the proposed fast encoding method, comprising the fast CU depth decision process and the fast CU coding mode decision process, can reduce up to 39.16% of encoding time with just 0.7% increase in BDBR as compared to the default VTM7.0.

摘要...I
ABSTRACT...II
致謝...III
目錄...IV
圖目錄...VII
表目錄...IX
第一章 緒論...1
1  研究背景...1
2  研究動機與目的...1
3  論文組織...2
第二章 知識背景...4
1  H.266/VVC視訊編碼標準介紹...4
1.1  H.266/VVC與H.265/HEVC的差異...4
1.2  H.266/VVC的CU編碼架構...5
1.2.1  編碼單位(Coding Unit, CU)...5
1.2.2 QTMT的架構...6
1.2.3  邊界的CU劃分...8
1.2.4  CU對於冗餘劃分的限制...8
1.3  幀內預測 (Intra Prediction)...10
1.3.1  參考像素的獲取...10
1.3.2  參考像素的平滑濾波...11
1.3.3  當前區塊的預測...12
1.3.4  邊界濾波...15
2  殘差網路(RESNET)介紹...15
2.1  ResNetV1...15
2.2  ResNetV2...18
3  隨機森林(RANDOM FOREST)介紹...19
3.1  決策樹單元...19
3.2  資訊增益...19
3.3  隨機森林...20
第三章 H.266/VVC相關快速演算法介紹...22
1  傳統機器學習演算法...22
2  深度學習演算法...23
第四章 H.266/VVC編碼單位之快速演算法...26
1  H.266/VVC於幀內編碼中的CU劃分流程...26
2  H.266/VVC之劃分深度快速決策...26
2.1  劃分深度的分類...27
2.2  Label的分析...28
2.3  訓練資料的蒐集...32
2.4  CNN的架構與訓練...37
2.4.1 CNN模型架構...37
2.4.2 損失函數的選擇...38
2.4.3 模型的訓練...39
2.5  應用CNN之劃分深度快速決策...42
3  H.266/VVC之劃分模式快速決策...44
3.1  分類器的架構...45
3.1  劃分模式的分析...46
3.2  特徵值的選擇...48
3.3  RandomForest的訓練...50
3.3.1 訓練資料的蒐集...50
3.3.2 模型的訓練...50
4  H.266/VVC之劃分深度快速決策結合劃分模式快速決策...53
第五章 實驗結果與討論...55
1  實驗環境設置...55
2 實驗結果...56
2.1 與原始VTM之編碼結果進行比較...56
2.2 與其他H266/VVC相關之幀內編碼快速演算法進行比較...61
第六章 結論與未來研究探討...62
1  結論...62
2  未來研究討論...63
參考文獻...64


                                

[1] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, 2012.
[2] B. Bross, “Versatile Video Coding (Draft 1),” JVET-J1001, Joint Video Exploration Team (JVET), 2018.
[3] F. Bossen, X. Li, and K. Sühring, “Guidelines for VVC reference software development,” JVET-N1003, Joint Video Exploration Team (JVET), 2019.
[4] J. Chen, Y. Ye, and S. Kim, “Algorithm description for Versatile Video Coding and Test Model 5 (VTM 5),” JVET-N1002, Joint Video Exploration Team (JVET), 2019.
[5] Y. Huang et al., “A VVC proposal with quaternary tree plus binary-ternary tree coding block structure and advanced coding techniques,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 5, pp. 1311-1325, 2020.
[6] F. Bossen, X. Li, and K. Suehring, “AHG report: Test model software development(AHG3),” JVET-J0003, Joint Video Exploration Team (JVET), 2018.
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” European Conference on Computer Vision, pp. 630-645, 2016.
[9] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[10] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[11] E. Alshina, G. Sullivan, M. Corp, J.-R. Ohm, and J. Boyce, JVET-G1001: Algorithm description of Joint Exploration Test Model 7 (JEM7), 2018.
[12] H. Yang et al., “Low complexity CTU partition structure decision and fast intra mode decision for versatile video coding,” IEEE Transactions on Circuits and Systems for Video Technology, 2019.
[13] Z. Jin, P. An, C. Yang, and L. Shen, “Fast QTBT partition algorithm for intra frame coding through convolutional neural network,” IEEE Access, vol. 6, pp. 54660-54673, 2018.
[14] J. An, H. Huang, K. Zhang, Y. Huang, and S. Lei, “Quadtree plus binary tree structure integration with JEM tools,” JVET-B0023, Joint Video Exploration Team (JVET), 2016.
[15] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive CU split decision with pooling-variable CNN for VVC intra encoding,” IEEE Visual Communications and Image Processing (VCIP), pp. 1-4, 2019.
[16] Z. Liu et al., “CU partition mode decision for HEVC hardwired intra encoder using convolution neural network,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5088-5103, 2016.
[17] T.-L. Lin, H.-Y. Jiang, J.-Y. Huang, and P.-C. Chang, “Fast intra coding unit partition decision in H.266/FVC based on spatial features,” Journal of Real-Time Image Processing, pp. 1-18, 2018.
[18] A. Mercat, M. Viitanen, and J. Vanne, “UVG dataset: 50/120fps 4K sequences for video codec analysis and development,” Accepted to ACM Multimedia Syst. Conf., Istanbul, Turkey, 2020.
[19] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” European Conference on Computer Vision, pp. 499-515, 2016.
[20] shamangary, “Keras-MNIST-center-loss-with-visualization,” GitHub repository, https://github.com/shamangary/Keras-MNIST-center-loss-with-visualization, 2017.
[21] M. Abadi et al., “Tensorflow: A system for large-scale machine learning,” 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265-283, 2016.
[22] D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[23] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921-2929, 2016.
[24] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
[25] K. Suehring, and X. Li, “JVET common test conditions and software reference configurations,” JVET-B1010, 2016.
[26] J. Jung, and S. Pateux, “An Excel add-in for computing Bjontegaard metric and its evolution,” ITU-T VCEG contribution VCEG-AE07, Marrakech, 2007.
[27] M. Xu et al., “Reducing complexity of HEVC: A deep learning approach,” IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5044-5059, 2018.

簡易檢索 / 詳目顯示

相關論文