研究生: |
江其勲 CHI-HSUN CHIANG |
---|---|
論文名稱: |
以深度學習模型預測H.266/VVC幀內編碼殘差區塊切割模式於加速編碼 Using Learned Model to Predict H.266/VVC Intra-frame Residual CU Split-type for Coding Speedup |
指導教授: |
陳建中
Jiann-Jone Chen |
口試委員: |
杭學鳴
鍾國亮 郭天穎 吳怡樂 |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 54 |
中文關鍵詞: | H.266/VVC 多功能視訊編碼標準 、四分樹加上多分樹區塊劃分 、加速幀內編碼 、殘差神經網路 、深度學習網路 |
外文關鍵詞: | Quad-Tree plus Multi-Type Tree block partitioning, speedup intra-frame coding, residual neural network |
相關次數: | 點閱:235 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著 5G 高速行動通訊技術的發展,處理器和軟體技術的進步,視訊編碼標準 H.266/VVC 提供 8K 視訊品質編碼,大幅提升多媒體通信相關應用的服務品質。與 前一代 H.265/HEVC 採用 QTBT(Quad-Tree Plus Binary Tree) 編碼結構相比,其採用 更有彈性之 QTMTT(Quad-Tree plus Multi-Type Tree) 區塊劃分方法,可以進一步降低 30%-50% 的位元率。因為在 CU 區塊劃分階段會採窮舉搜尋方式,以找出位元率-失 真最佳化 (Rate-Distortion Optimization, RDO) 的劃分模式。因此 H.266/VVC 在幀內模 式 (Intra-Frame Coding) 下的時間複雜度平均約為前一代 H.265/HEVC 的 18 倍。在此 狀況下必須降低編碼運算複雜度以符合實際應用的需求,本研究針對 H.266/VVC 幀 內編碼模式,提出一種基於深度神經網路的快速 CU 區塊劃分決策方法。針對 32×32 CU 區塊利用卷積神經網路 (Convolutional Neural Network, CNN) 學習其紋理特徵, 藉 此快速辨識出當前 CU 的劃分模式,達到加速幀內編碼的目的。模型輸入除了當前 CU 的像素值以外也使用多參考行 (Multi Reference Line, MRL) 及 CU 的平面模式殘 差 (planar mode residual) 做為額外的參考輸入,以貼近實際 VVC 編碼的運作流程。 實驗結果顯示本研究所提出的加速方法與 H.266/VVC 官方參考軟體 (VTM-18.0) 相 比,在幀內編碼模式中節省了 54.94% 的編碼時間,而 BDBR 僅上升 2.67%。
With the development of 5G high-speed mobile communication technology and advancements in processor and software technology, the newest video coding standard H.266/ VVC offers 8K video quality encoding, significantly enhancing the service quality of multimedia communication applications. Compared to the previous generation H.265/HEVC that adopted the QTBT (Quad-Tree Plus Binary Tree) coding structure, H.266/VVC utilizes a more flexible QTMTT (Quad-Tree plus Multi-Type Tree) block partitioning structure, leading to a further 30%-50% reduction in bit rate. This is achieved by employing an exhaustive search during the CU (Coding Unit) block partitioning process to find the optimal partitioning mode based on Rate-Distortion Optimization (RDO). Consequently, the average time complexity of H.266/VVC's intra-frame coding is approximately 18 times that of H.265/ HEVC. Therefore, it becomes crucial to reduce the encoding computational complexity to meet practical application demands. In this research, a fast CU block partitioning decision method based on deep neural networks is proposed for the H.266/VVC intra-frame coding mode. Specifically, for 32×32 CUs, convolutional neural networks (CNNs) are used to learn their texture features, enabling rapid identification of the current CU's partitioning mode and accelerating intra-frame encoding. The model's input includes not only the pixel values of the current CU but also additional reference inputs such as Multi Reference Line (MRL) and the CU's planar mode residual, making the approach closely resemble the actual VVC encoding process. Experimental results demonstrate that the proposed acceleration method, when compared to H.266/VVC's official reference software (VTM-18.0), saves 54.94% of the encoding time in intra-frame coding modes, with only a slight increase of 2.67% in BDBR (Bjøntegaard Delta Bit Rate).
[1] B. Bross, J. Chen, J.-R. Ohm, G. J. Sullivan, and Y.-K. Wang, “Developments international video coding standardization after avc, with an overview of versatile video coding (vvc),” Proceedings of the IEEE, vol. 109, no. 9, pp. 1463–1493, 2021.
[2] Y.-W. Huang, J. An, H. Huang, X. Li, S.-T. Hsiang, K. Zhang, H. Gao, J. Ma, and O. Chubach, “Block partitioning structure in the vvc standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3818–3833, 2021.
[3] J. Fuhg, A. Karmarkar, T. Kadeethum, H. Yoon, and N. Bouklas, “Deep convolutional ritz method: parametric pde surrogates without labeled data,” Applied Mathematics and Mechanics, vol. 44, no. 7, pp. 1151–1174, 2023.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[5] X. Pan, C. Ge, R. Lu, S. Song, G. Chen, Z. Huang, and G. Huang, “On the integration of self-attention and convolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–825, June 2022.
[6] D. Ma, F. Zhang, and D. R. Bull, “Bvi-dvc: A training database for deep video compression,” IEEE Transactions on Multimedia, vol. 24, pp. 3847–3858, 2021.
[7] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxán-Hernández, T. Wiegand,
V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra prediction and
mode coding in vvc,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3834–3847, 2021.
[8] M. Saldanha, G. Sanchez, C. Marcon, and L. Agostini, “Performance analysis of vvc intra coding,” Journal of Visual Communication and Image Representation, vol. 79, p. 103202, 2021.
[9] V. Cisco, “Cisco visual networking index: Forecast and trends, 2017–2022 white paper,” Cisco Internet Report, vol. 17, p. 13, 201
[10] A. Browne, Y. Ye, and S. H. Kim, “Algorithm description for versatile video coding and test model 18 (vtm 18).” https://www.mpeg.org/wp-content/uploads/mpeg_ meetings/140_Mainz/w21950.zip, 2022.
[11] L. Zhao, X. Zhao, S. Liu, X. Li, J. Lainema, G. Rath, F. Urban, and F. Racapé, “Wide angular intra prediction for versatile video coding,” in 2019 Data Compression Conference (DCC), pp. 53–62, 2019.
[12] J. Pfaff, P. Helle, P. Merkle, M. Schäfer, B. Stallenberger, T. Hinz, H. Schwarz, D. Marpe, and T. Wiegand, “Data-driven intra-prediction modes in the development of the versatile video coding standard,” ITU J. ICT Discoveries, vol. 3, no. 1, pp. 25– 32, 2020.
[13] S. De-Luxán-Hernández, V. George, J. Ma, T. Nguyen, H. Schwarz, D. Marpe, and T. Wiegand, “An intra subpartition coding mode for vvc,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 1203–1207, 2019.
[14] M. Koo, M. Salehifar, J. Lim, and S.-H. Kim, “Low frequency non-separable transform (lfnst),” in 2019 Picture Coding Symposium (PCS), pp. 1–5, IEEE, 2019.
[15] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
[16] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks. advances in neural information processing 25,” 2012.
[17] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 2016.
[18] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[19] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
[20] J. Tang and S. Sun, “Optimization of cu partition based on texture degree in h. 266/vvc,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 402–408, IEEE, 2022.
[21] Y. Fan, H. Sun, J. Katto, J. Ming'E, et al., “A fast qtmt partition decision strategy for vvc intra prediction,” IEEE Access, vol. 8, pp. 107900–107911, 2020.
[22] G. Wu, Y. Huang, C. Zhu, L. Song, and W. Zhang, “Svm based fast cu partitioning
algorithm for vvc intra coding,” in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2021.
[23] Q. He, W. Wu, L. Luo, C. Zhu, and H. Guo, “Random forest based fast cu partition for vvc intra coding,” in 2021 IEEE International Symposium on Broadband Multimedia
Systems and Broadcasting (BMSB), pp. 1–4, IEEE, 2021.
[24] J. Zhao, A. Wu, B. Jiang, and Q. Zhang, “Resnet-based fast cu partition decision algorithm for vvc,” IEEE Access, vol. 10, pp. 100337–100347, 2022. [25] H. Li, P. Zhang, B. Jin, and Q. Zhang, “Fast cu decision algorithm based on texture complexity and cnn for vvc,” IEEE Access, 2023.
[26] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive cu split decision with poolingvariable cnn for vvc intra encoding,” in 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, IEEE, 2019.
[27] T. Li, M. Xu, R. Tang, Y. Chen, and Q. Xing, “Deepqtmt: A deep learning approach for fast qtmt-based cu partition of intra-mode vvc,” IEEE Transactions on Image Processing, vol. 30, pp. 5377–5390, 2021.
[28] A. Feng, K. Liu, D. Liu, L. Li, and F. Wu, “Partition map prediction for fast block partitioning in vvc intra-frame coding,” IEEE Transactions on Image Processing, 2023.
[29] F. Bossen, J. Boyce, X. Li, V. Seregin, and K. Sühring, “Vtm common test conditions and software reference configurations for sdr video (jvet-t2010),” Joint Video Experts Team, 2020.
[30] S. Nah, S. Baik, S. Hong, G. Moon, S. Son, R. Timofte, and K. Mu Lee, “Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
[31] A. Mackin, F. Zhang, and D. R. Bull, “A study of high frame rate video formats,” IEEE Transactions on Multimedia, vol. 21, no. 6, pp. 1499–1512, 2018.
[32] S. Winkler, “Analysis of public image and video databases for quality assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 616–625, 2012.
[33] F. Bossen et al., “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, no. 7, 2013.
[34] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive cu split decision with poolingvariable cnn for vvc intra encoding,” in 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, 2019.
[35] Y.-J. Chang, H.-J. Jhu, H.-Y. Jian, L. Zhao, X. Zhao, X. Li, S. Liu, B. Bross, P. Keydel, H. Schwarz, et al., “Intra prediction using multiple reference lines for the versatile video coding standard,” in Applications of Digital Image Processing XLII, vol. 11137, pp. 302–309, SPIE, 2019.
[36] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/ CVF international conference on computer vision, pp. 10012–10022, 2021.
[37] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 568–578, 2021.
[38] G. Bjontegaard, “Improvements of the bd-psnr model,” VCEG-AI11, 2008.
[39] Y. Wang, Y. Liu, J. Zhao, and Q. Zhang, “Fast cu partitioning algorithm for vvc based on multi-stage framework and binary subnets,” IEEE Access, 2023.
[40] M. Amna, W. Imen, and S. Fatma Ezahra, “Fast multi-type tree partitioning for versatile video coding using machine learning,” Signal, Image and Video Processing, vol. 17, no. 1, pp. 67–74, 2023.
[41] T. Fu, H. Zhang, F. Mu, and H. Chen, “Fast cu partitioning algorithm for h.266/vvc intra-frame coding,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 55–60, 2019.