簡易檢索 / 詳目顯示

研究生: 江其勲
CHI-HSUN CHIANG
論文名稱: 以深度學習模型預測H.266/VVC幀內編碼殘差區塊切割模式於加速編碼
Using Learned Model to Predict H.266/VVC Intra-frame Residual CU Split-type for Coding Speedup
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 杭學鳴
鍾國亮
郭天穎
吳怡樂
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 54
中文關鍵詞: H.266/VVC 多功能視訊編碼標準四分樹加上多分樹區塊劃分加速幀內編碼殘差神經網路深度學習網路
外文關鍵詞: Quad-Tree plus Multi-Type Tree block partitioning, speedup intra-frame coding, residual neural network
相關次數: 點閱:235下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著 5G 高速行動通訊技術的發展,處理器和軟體技術的進步,視訊編碼標準 H.266/VVC 提供 8K 視訊品質編碼,大幅提升多媒體通信相關應用的服務品質。與 前一代 H.265/HEVC 採用 QTBT(Quad-Tree Plus Binary Tree) 編碼結構相比,其採用 更有彈性之 QTMTT(Quad-Tree plus Multi-Type Tree) 區塊劃分方法,可以進一步降低 30%-50% 的位元率。因為在 CU 區塊劃分階段會採窮舉搜尋方式,以找出位元率-失 真最佳化 (Rate-Distortion Optimization, RDO) 的劃分模式。因此 H.266/VVC 在幀內模 式 (Intra-Frame Coding) 下的時間複雜度平均約為前一代 H.265/HEVC 的 18 倍。在此 狀況下必須降低編碼運算複雜度以符合實際應用的需求,本研究針對 H.266/VVC 幀 內編碼模式,提出一種基於深度神經網路的快速 CU 區塊劃分決策方法。針對 32×32 CU 區塊利用卷積神經網路 (Convolutional Neural Network, CNN) 學習其紋理特徵, 藉 此快速辨識出當前 CU 的劃分模式,達到加速幀內編碼的目的。模型輸入除了當前 CU 的像素值以外也使用多參考行 (Multi Reference Line, MRL) 及 CU 的平面模式殘 差 (planar mode residual) 做為額外的參考輸入,以貼近實際 VVC 編碼的運作流程。 實驗結果顯示本研究所提出的加速方法與 H.266/VVC 官方參考軟體 (VTM-18.0) 相 比,在幀內編碼模式中節省了 54.94% 的編碼時間,而 BDBR 僅上升 2.67%。


    With the development of 5G high-speed mobile communication technology and advancements in processor and software technology, the newest video coding standard H.266/ VVC offers 8K video quality encoding, significantly enhancing the service quality of multimedia communication applications. Compared to the previous generation H.265/HEVC that adopted the QTBT (Quad-Tree Plus Binary Tree) coding structure, H.266/VVC utilizes a more flexible QTMTT (Quad-Tree plus Multi-Type Tree) block partitioning structure, leading to a further 30%-50% reduction in bit rate. This is achieved by employing an exhaustive search during the CU (Coding Unit) block partitioning process to find the optimal partitioning mode based on Rate-Distortion Optimization (RDO). Consequently, the average time complexity of H.266/VVC's intra-frame coding is approximately 18 times that of H.265/ HEVC. Therefore, it becomes crucial to reduce the encoding computational complexity to meet practical application demands. In this research, a fast CU block partitioning decision method based on deep neural networks is proposed for the H.266/VVC intra-frame coding mode. Specifically, for 32×32 CUs, convolutional neural networks (CNNs) are used to learn their texture features, enabling rapid identification of the current CU's partitioning mode and accelerating intra-frame encoding. The model's input includes not only the pixel values of the current CU but also additional reference inputs such as Multi Reference Line (MRL) and the CU's planar mode residual, making the approach closely resemble the actual VVC encoding process. Experimental results demonstrate that the proposed acceleration method, when compared to H.266/VVC's official reference software (VTM-18.0), saves 54.94% of the encoding time in intra-frame coding modes, with only a slight increase of 2.67% in BDBR (Bjøntegaard Delta Bit Rate).

    目 錄 摘要............................................................. i ABSTRACT................................................................... ii 目錄........................................................................... iii 圖目錄.......................................................................... vi 表目錄........................................................................ viii 第一章 緒論................................................................. 1 1.1 研究動機及目的............................................. 1 1.2 問題描述及研究方法...................................... 1 1.3 論文組織.......................................................... 3 第二章 背景知識.............................................................. 4 2.1 視訊壓縮 (Video Compression) .......................... 4 2.1.1 H.265/HEVC 與 H.266/VVC .......................... 4 2.1.2 H.266/VVC 區塊劃分背景知識................................ 5 2.1.3 幀內預測 (Intra Prediction)..................... 6 2.2 深度神經網路 (Deep Neural Networks, DNN) ................. 8 2.2.1 卷積神經網路 (Convolutional Neural Network, CNN)................... 9 2.2.2 殘差網路 (ResNet)................................ 9 2.2.3 自注意力機制 (Self-attention )........................ 10 2.2.4 自注意力與卷積的整合 (ACmix)........................... 11 2.2.5 激勵函數........................................ 11 2.2.6 池化層 (Pooling Layer)............................... 13 2.3 相關文獻 (Related Work)................................. 13 第三章 H.266/VVC Intra-frame Coding 快速演算法........................ 16 3.1 資料庫選擇...................................................... 16 3.2 深度神經網路架構及訓練策略...................................... 18 3.2.1 深度神經網路輸入資料......................................... 18 3.2.2 卷積神經網路架構............................................ 21 3.2.3 卷積神經網路之訓練策略.................................. 22 3.3 VVC 幀內編碼模式快速劃分決策........................................... 25 3.3.1 快速劃分決策編碼流程................................................. 26 3.3.2 深度神經網路閥值設定......................................... 27 第四章 實驗結果與討論............................................... 29 4.1 實驗環境設置.......................................................... 29 4.2 實驗結果................................................................ 30 4.2.1 與原始 VTM 之編碼結果進行比較................................................ 30 4.2.2 各個類別 PSNR 與 bitrate 之比較 ..................... 31 4.2.3 實驗結果分析與討論......................................... 32 4.2.4 與其他論文加速方法比較........................................... 34 第五章 結論與未來研究討論........................................................ 36 5.1 結論...................................................................... 36 5.2 未來研究討論..................................................... 37 參考文獻........................................................... 38 第六章 附錄.................................................................... 43 6.1 專有名詞對照表...................................................... 43

    [1] B. Bross, J. Chen, J.-R. Ohm, G. J. Sullivan, and Y.-K. Wang, “Developments international video coding standardization after avc, with an overview of versatile video coding (vvc),” Proceedings of the IEEE, vol. 109, no. 9, pp. 1463–1493, 2021.
    [2] Y.-W. Huang, J. An, H. Huang, X. Li, S.-T. Hsiang, K. Zhang, H. Gao, J. Ma, and O. Chubach, “Block partitioning structure in the vvc standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3818–3833, 2021.
    [3] J. Fuhg, A. Karmarkar, T. Kadeethum, H. Yoon, and N. Bouklas, “Deep convolutional ritz method: parametric pde surrogates without labeled data,” Applied Mathematics and Mechanics, vol. 44, no. 7, pp. 1151–1174, 2023.
    [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [5] X. Pan, C. Ge, R. Lu, S. Song, G. Chen, Z. Huang, and G. Huang, “On the integration of self-attention and convolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–825, June 2022.
    [6] D. Ma, F. Zhang, and D. R. Bull, “Bvi-dvc: A training database for deep video compression,” IEEE Transactions on Multimedia, vol. 24, pp. 3847–3858, 2021.
    [7] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxán-Hernández, T. Wiegand,
    V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra prediction and
    mode coding in vvc,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3834–3847, 2021.
    [8] M. Saldanha, G. Sanchez, C. Marcon, and L. Agostini, “Performance analysis of vvc intra coding,” Journal of Visual Communication and Image Representation, vol. 79, p. 103202, 2021.
    [9] V. Cisco, “Cisco visual networking index: Forecast and trends, 2017–2022 white paper,” Cisco Internet Report, vol. 17, p. 13, 201
    [10] A. Browne, Y. Ye, and S. H. Kim, “Algorithm description for versatile video coding and test model 18 (vtm 18).” https://www.mpeg.org/wp-content/uploads/mpeg_ meetings/140_Mainz/w21950.zip, 2022.
    [11] L. Zhao, X. Zhao, S. Liu, X. Li, J. Lainema, G. Rath, F. Urban, and F. Racapé, “Wide angular intra prediction for versatile video coding,” in 2019 Data Compression Conference (DCC), pp. 53–62, 2019.
    [12] J. Pfaff, P. Helle, P. Merkle, M. Schäfer, B. Stallenberger, T. Hinz, H. Schwarz, D. Marpe, and T. Wiegand, “Data-driven intra-prediction modes in the development of the versatile video coding standard,” ITU J. ICT Discoveries, vol. 3, no. 1, pp. 25– 32, 2020.
    [13] S. De-Luxán-Hernández, V. George, J. Ma, T. Nguyen, H. Schwarz, D. Marpe, and T. Wiegand, “An intra subpartition coding mode for vvc,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 1203–1207, 2019.
    [14] M. Koo, M. Salehifar, J. Lim, and S.-H. Kim, “Low frequency non-separable transform (lfnst),” in 2019 Picture Coding Symposium (PCS), pp. 1–5, IEEE, 2019.
    [15] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
    [16] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks. advances in neural information processing 25,” 2012.
    [17] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 2016.
    [18] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
    [19] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
    [20] J. Tang and S. Sun, “Optimization of cu partition based on texture degree in h. 266/vvc,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 402–408, IEEE, 2022.
    [21] Y. Fan, H. Sun, J. Katto, J. Ming'E, et al., “A fast qtmt partition decision strategy for vvc intra prediction,” IEEE Access, vol. 8, pp. 107900–107911, 2020.
    [22] G. Wu, Y. Huang, C. Zhu, L. Song, and W. Zhang, “Svm based fast cu partitioning
    algorithm for vvc intra coding,” in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2021.
    [23] Q. He, W. Wu, L. Luo, C. Zhu, and H. Guo, “Random forest based fast cu partition for vvc intra coding,” in 2021 IEEE International Symposium on Broadband Multimedia
    Systems and Broadcasting (BMSB), pp. 1–4, IEEE, 2021.
    [24] J. Zhao, A. Wu, B. Jiang, and Q. Zhang, “Resnet-based fast cu partition decision algorithm for vvc,” IEEE Access, vol. 10, pp. 100337–100347, 2022. [25] H. Li, P. Zhang, B. Jin, and Q. Zhang, “Fast cu decision algorithm based on texture complexity and cnn for vvc,” IEEE Access, 2023.
    [26] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive cu split decision with poolingvariable cnn for vvc intra encoding,” in 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, IEEE, 2019.
    [27] T. Li, M. Xu, R. Tang, Y. Chen, and Q. Xing, “Deepqtmt: A deep learning approach for fast qtmt-based cu partition of intra-mode vvc,” IEEE Transactions on Image Processing, vol. 30, pp. 5377–5390, 2021.
    [28] A. Feng, K. Liu, D. Liu, L. Li, and F. Wu, “Partition map prediction for fast block partitioning in vvc intra-frame coding,” IEEE Transactions on Image Processing, 2023.
    [29] F. Bossen, J. Boyce, X. Li, V. Seregin, and K. Sühring, “Vtm common test conditions and software reference configurations for sdr video (jvet-t2010),” Joint Video Experts Team, 2020.
    [30] S. Nah, S. Baik, S. Hong, G. Moon, S. Son, R. Timofte, and K. Mu Lee, “Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
    [31] A. Mackin, F. Zhang, and D. R. Bull, “A study of high frame rate video formats,” IEEE Transactions on Multimedia, vol. 21, no. 6, pp. 1499–1512, 2018.
    [32] S. Winkler, “Analysis of public image and video databases for quality assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 616–625, 2012.
    [33] F. Bossen et al., “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, no. 7, 2013.
    [34] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive cu split decision with poolingvariable cnn for vvc intra encoding,” in 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, 2019.
    [35] Y.-J. Chang, H.-J. Jhu, H.-Y. Jian, L. Zhao, X. Zhao, X. Li, S. Liu, B. Bross, P. Keydel, H. Schwarz, et al., “Intra prediction using multiple reference lines for the versatile video coding standard,” in Applications of Digital Image Processing XLII, vol. 11137, pp. 302–309, SPIE, 2019.
    [36] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/ CVF international conference on computer vision, pp. 10012–10022, 2021.
    [37] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 568–578, 2021.
    [38] G. Bjontegaard, “Improvements of the bd-psnr model,” VCEG-AI11, 2008.
    [39] Y. Wang, Y. Liu, J. Zhao, and Q. Zhang, “Fast cu partitioning algorithm for vvc based on multi-stage framework and binary subnets,” IEEE Access, 2023.
    [40] M. Amna, W. Imen, and S. Fatma Ezahra, “Fast multi-type tree partitioning for versatile video coding using machine learning,” Signal, Image and Video Processing, vol. 17, no. 1, pp. 67–74, 2023.
    [41] T. Fu, H. Zhang, F. Mu, and H. Chen, “Fast cu partitioning algorithm for h.266/vvc intra-frame coding,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 55–60, 2019.

    無法下載圖示 全文公開日期 2025/07/31 (校內網路)
    全文公開日期 2025/07/31 (校外網路)
    全文公開日期 2025/07/31 (國家圖書館:臺灣博碩士論文系統)
    QR CODE