研究生: |
鄭崴隆 Wei-Lung Cheng |
---|---|
論文名稱: |
運用深度學習法於快速預測H.266 QTMT幀內編碼模式 Fast H.266 QTMT Mode Prediction based on Deep Learning Methods |
指導教授: |
陳建中
Jiann-Jone Chen |
口試委員: |
杭學鳴
Hsueh-Ming Hang 郭天穎 Tien-Ying Kuo 鍾國亮 Kuo-Liang Chung |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 80 |
中文關鍵詞: | 多功能影像編碼 、深度學習 、快速預測演算法 、多元分類方法 、幀內編碼 、多類型樹結構 |
外文關鍵詞: | H.266/VVC, Deep Learning, Fast Prediction, Mutiple Classification Method, Intra Coding, QTMT |
相關次數: | 點閱:339 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前的最新視訊編碼標準Versatile Video Coding (VVC)/H.266可以支援解析度4K以上的視訊通信,為因應5G高速通信的應用需求,高解析度媒體編碼與通信為當今發展主流。JVET(Joint Video Experts Team)於視訊編碼系統VTM-1.0中採用QTMT(Quad Tree and multi-type tree)區塊切割模式─加入二分樹(Binary Tree, BT)及三分樹(Ternary Tree, TT),共4種切割模式(水平二等分割、垂直二等分割、水平三分割、垂直三分割)。相較於HEVC/H.265只使用QT區塊切割模式,又增加了4種切割模式並須透過窮舉法找出率失真最佳(rate-distortion optimization, RDO)之編碼模式,計算複雜度大幅度提高,相較前一代HEVC/H.265,編碼時間約增加8倍。為了在不影響編碼品質的條件下,有效減少RDO程序的計算複雜度,本論文研究如何運用摺積神經網路(Convolutional Neural Networks,CNN)來預測VVC幀內編碼中之當前區塊(Current Coding Unit)切割模式,藉由深度學習的模型ResNet來預測 32×32的區塊切割模式,透過預測好的切割模式來減少編碼時窮舉法的計算次數以減少編碼時間。我們同時也提出了幾個方法來增加預測準確度: (1)使用大量的編碼過後的圖像(非範例影片)作為訓練資料集; (2)多元分類方法(Mutiple Classification Method)來提高預測準確度。實驗結果顯示,我們所提出的方法在範例影片ClassA~ClassE之BDBR上升1.22%的情況下,編碼時間可降低60.42%。
關鍵字: 多功能影像編碼、深度學習、快速預測演算法、多元分類方法、幀內編碼、多類型樹結構
The newest video codec standard, Versatile Video Coding (VVC)/H.266, supports 4K up resolution video coding. To meet the application requirement related to the 5G high speed communication, high resolution media coding and communication is the key technology. The JVET (Joint Video Experts Team) adopts a QTMT (Quad Tree and multi-type tree) coding structure VTM-1.0 in which it includes four block split modes, Horizontal and Vertical Binary Tree (HBT and VBT) and Horizontal and Vertical Ternary Tree (HTT and VTT), to better encode an image according to its texture. However, by including four more split modes, the exhaustive RDO procedure requires eight times of the time complexity of that of the HEVC/QT one. To eliminate the exhaustive RDO searching operations, we study how to utilize Convolutional Neural Network (CNN) models to predict the H.266 intra-coded CU coding mode. The deep-learning model, ResNet, is adopted which is trained and tested for 32×32 blocks because the coder executes the QTMT procedure since the CU size 32×32. We also proposed several methods to increase the model prediction accuracy: (1) We use a large number of encoded images not from MPEG video as the training data set; (2) We design Multiple Classification Method to improve the prediction accuracy. Experimental results showed that our proposed method can reduce the encoding time by 60.42% when the BDBR increment 1.22%.
Keywords: H.266/VVC、Deep Learning、Fast Prediction、Mutiple Classification Method、Intra Coding、QTMT
[1] C. Hsu, C. Chen, T. Chuang, H. Huang, S, Hsiang, C. Chen, M. Chiang, C. Lai, C. Tsai, Y. Su, Z. Lin, Y. Hsiao, J. Klopp, I. Wang, Y. Huang, S. Lei, “Description of SDR video coding technology proposal by MediaTek,” JVET-J0018, Joint Video Exploration Team (JVET). Apr. 2018.
[2] https://zh.wikipedia.org/wiki/%E5%9B%BE%E7%81%B5%E6%B5%8B%E8%AF%95
[3] https://ictjournal.itri.org.tw/Content/Messagess/contents.aspx?MmmID=654304432061644411&MSID=1001517067307416615
[4] https://reurl.cc/YW9Zzl
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep residual learning for image recognition,” December 2015.
[6] Y. Fan, J. Chen, H. Sun, J. Katto and M. Jing, “A fast QTMT partition decision strategy for VVC intra prediction,” in IEEE Access, vol. 8, pp. 107900-107911, 2020.
[7] Ting-Lan Lin, Hui-Yu Jiang, Jing-Ya Huang, Pao-Chi Chang, “Fast intra coding unit partition decision in H.266/FVC based on spatial features,” J Real-Time Image Proc 17, 493–510 (2020), July 2018.
[8] G. Tang, M. Jing, X. Zeng and Y. Fan, “Adaptive CU split decision with Pooling-variable CNN for VVC intra encoding,” IEEE Visual Communications and Image Processing(VCIP), Sydney,Australia,2019.
[9] Z. Jin, P. An, C. Yang and L. Shen, “Fast QTBT partition algorithm for intra frame coding through convolutional neural network,” IEEE Access, vol. 6, pp. 54660-54673, 2018.
[10] T. Amestoy, A. Mercat, W. Hamidouche, D. Menard and C. Bergeron, “Tunable VVC frame partitioning based on lightweight machine learning,” IEEE Transactions on Image Processing, vol. 29, pp. 1313-1328, 2020.
[11] https://blog.csdn.net/weixin_42979679/article/details/103672587
[12] https://blog.csdn.net/baidu_28446365/article/details/80421059
[13] H. Yang, L. Shen, X. Dong, Q. Ding, P. An and G. Jiang, “Low complexity CTU partition structure decision and fast intra mode decision for versatile video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1668-1682, June 2020.
[14] Jing-Ya Huang, “Intra mode mrediction for H.266/FVC video coding based on CNNs,” 2018.
[15] Z. Wang, S. Wang, X. Zhang, S. Wang and S. Ma, “Fast QTBT partitioning decision for interframe coding with convolution neural network,” IEEE International Conference on Image Processing (ICIP), Athens, 2018.
[16] T. Fu, H. Zhang, F. Mu and H. Chen, “Fast CU partitioning algorithm for H.266/VVC intra-frame coding,” 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019.
[17] Z. Wang, S. Wang, J. Zhang, S. Wang and S. Ma, “Effective quadtree plus binary tree block partition decision for future video coding,” 2017 Data Compression Conference (DCC), Snowbird, UT, 2017.
[18] Z. Wang, S. Wang, J. Zhang and S. Ma, “Local-constrained quadtree plus binary tree block partition structure for enhanced video coding,” 2016 Visual Communications and Image Processing (VCIP), Chengdu, 2016.
[19] https://www.quora.com/What-is-the-identity-block-in-ResNet
[20] https://www.itread01.com/content/1543994346.html
[21] https://blog.csdn.net/legalhighhigh/article/details/81409551
[22] https://www.researchgate.net/figure/HEVC-coding-architecture_fig3_301736507
[23] http://ultravideo.cs.tut.fi/?fbclid=IwAR0YWhGBD7tZNX2loeeH-619jfZIdmc615_StcCLkFxZDFnfBMjlWtnsTts#testsequences
[24] http://medialab.sjtu.edu.cn/web4k/index.html