研究生: |
羅睿成 Jui-Chen Lo |
---|---|
論文名稱: |
應用深度學習模型以及切割模式限制之提升H.266/VVC 快速幀間編碼方法 Fast Inter-Frame Coding for H.266/VVC by Splitting Constraints with Deep Learning Model |
指導教授: |
陳建中
Jiann-Jone Chen |
口試委員: |
杭學鳴
郭天穎 高立人 方俊才 |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 多功能視訊編碼標準 、幀間編碼 、切割 模式限制 |
外文關鍵詞: | H.266/VVC, QTMTT, split mode limitations. |
相關次數: | 點閱:91 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
H.266/VVC 被視為迎來全新時代的里程碑。其主要目標在於為使用者提供全方位的高品質編碼影像服務。相較於其前身H.265/HEVC,H.266/VVC 在相同的品質標準下展現出更優越的性能。具體而言,它呈現出更高的編碼壓縮率,可提升約30%至50% 的效能。這意味著使用H.266/VVC 進行影像編碼時,即使在維持相同視訊品質標準下,也能大幅節省傳輸頻寬和存儲空間。這樣的優勢使得觀眾無論是在線串流還是觀看存儲於本地的影片,都能享受到更加出色的視覺體驗。但由於採用了新的編碼技術如: 較符合影像特性的切割模式QTMTT(Quad-Tree and Multi Type Tree)、高精度、多方向以及多種類的預測模式等等,且利用(RateDistortion Optimization,RDO) 的方式找尋位元-失真率的最佳解,其編碼時間比起H.265/HEVC 高出了8 至10 倍,使得其較難應用在低運算能力之裝置上。因此,本篇論文提出了一個基於切割模式限制的CNN 神經網路達到加速H.266/VVC 的Inter-Frame Coding 編碼時間。我們設計的LAI-CNN 在切割模式的限制下進行預測,剔除掉冗餘的切割模式,有效節省Rate-Distortion Optimization 的窮舉時間。LAI-CNN 適用於三種Type,其中Type 1 著重在靜態/低動量區塊,應用在ClassE 下表現良好; Type 2 注重切割MTT 而來的subCU,涵蓋最多編碼區塊,整體加速最多; 而Type 3 則適用於切割QT 而來的subCU,加速表現落於三個Type 的中間,且失真表現最好。且LAI-CNN 模型在切割模式的限制下減少了判斷的選擇性以及複雜度,可以有效提升模型的效率,使得模型可以更小、更精簡。實驗結果顯示本研究所提出的加速方法與H.266/VVC 官方參考軟體(VTM-22.0) 在Random-Access 模式下相比,加速了36.35%∼49.20%,而BDBR 僅上升了1.36%∼3.47%。
H.266/VVC is seen as a milestone ushering in a new era. Its primary goal is to provide users with comprehensive high-quality video encoding services. Compared to its predecessor, H.265/HEVC, H.266/VVC exhibits superior performance at the same quality standard. Specifically, it offers higher encoding compression rates, enhancing efficiency by around
30% to 50%. This means that when using H.266/VVC for video encoding, significant bandwidth, and storage space can be saved even while maintaining the same video quality standard. Such advantages enable viewers to enjoy a superior visual experience whether streaming online
or watching locally stored videos. While VVC adopts new encoding techniques, such as: QTMTT split mode, higher resolution, more angle, and more prediction mode. Furthermore,
it utilizes Rate-Distortion Optimization (RDO) to find the optimal trade-off between bitrate and distortion. Its encoding time is 8 to 10 times higher than H.265/HEVC, making it more challenging to apply on devices with lower computational capabilities. Our designed LAI-CNN makes predictions under the constraint of split modes, eliminating redundant split modes effectively, thereby saving exhaustive time in Rate-DistortionOptimization. LAI-CNN applies to three types: Type 1 focuses on static/low-motion
blocks, performing well in ClassE sequences; Type 2 emphasizes subCUs derived from splitting MTT, covering the maximum number of encoded blocks, thus achieving the highest overall acceleration; Type 3 is suitable for subCUs derived from splitting QT, offering intermediate
acceleration performance among the three types and the best distortion performance. Furthermore, under the constraint of split modes, LAI-CNN reduces the selectivity and complexity of decision-making, effectively enhancing the efficiency of model, making it smaller and more streamlined.
Experiment results shows that our proposed method achieved 36.35%∼49.20% in time saving while BDBR only increased 1.36%∼3.47% under VTM-22.0 Random Access Configurations.
[1] F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi “Complexity analysis
of next-generation vvc encoding and decoding,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 3134–3138, 2020.
[2] T. Amestoy, A. Mercat, W. Hamidouche, D. Menard, and C. Bergeron, “Tunable vvc frame partitioning based on lightweight machine learning, IEEE Transactions on Image Processing, vol. 29, pp. 1313–1328, 2020.
[3] Z. Pan, P. Zhang, B. Peng, N. Ling, and J. Lei, “A cnn-based fast inter coding methodfor vvc,” IEEE Signal Processing Letters, vol. 28, pp. 1260–1264, 2021.
[4] A. Tissier, W. Hamidouche, J. Vanne, and D. Menard, “Machine learning based efficient qt-mtt partitioning for vvc inter coding,” in 2022 IEEE International Conference on Image Processing (ICIP), pp. 1401–1405, 2022.
[5] Y. Liu, M. Abdoli, T. Guionnet, C. Guillemot, and A. Roumy, “Light-weight cnn-based vvc inter partitioning acceleration,” in 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5, 2022.
[6] S. Jung and D. Jun, “Context-based inter mode decision method for fast affine prediction in versatile video coding,” Electronics, vol. 10, no. 11, 2021.
[7] J.-J. Chen, Y.-G. Chou, and C.-S. Jiang, “Speed up vvc intra-coding by learned models and feature statistics,” IEEE Access, vol. 11, pp. 124609–124623, 2023.
[8] Y. Wang, Y. Liu, J. Zhao, and Q. Zhang, “Fast cu partitioning algorithm for vvc based on multi-stage framework and binary subnets,” IEEE Access, vol. 11, pp. 56812–56821, 2023.
[9] W. Kuang, X. Li, X. Zhao, and S. Liu, “Unified fast partitioning algorithm for intra and inter predictions in versatile video coding,” in 2022 Picture Coding Symposium (PCS), pp. 271–275, 2022.
[10] X. Shang, G. Li, X. Zhao, and Y. Zuo, “Low complexity inter coding scheme for versatile video coding (vvc),” Journal of Visual Communication and Image Representation, vol. 90, p. 103683, 2023.
[11] Y.-W. Huang, J. An, H. Huang, X. Li, S.-T. Hsiang, K. Zhang, H. Gao, J. Ma, and O. Chubach, “Block partitioning structure in the vvc standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3818–3833, 2021.
[12] Z. Wang, J. Zhang, N. Zhang, and S. Ma, “Adaptive motion vector resolution scheme for enhanced video coding,” in 2016 Data Compression Conference (DCC), pp. 101–110, 2016.
[13] L. Li, H. Li, D. Liu, Z. Li, H. Yang, S. Lin, H. Chen, and F. Wu, “An efficient four parameter affine motion model for video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 8, pp. 1934–1948, 2018.
[14] A. Alshin, E. Alshina, and T. Lee, “Bi-directional optical flow for improving motion compensation,” in 28th Picture Coding Symposium, pp. 422–425, 2010.
[15] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
[16] R. S. C. Rosewarne, K. Sharman and R. Sjöberg, “High Efficiency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description,” 2019.
[17] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
[18] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxán-Hernández, T. Wiegand, V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra prediction and mode coding in vvc,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3834–3847, 2021.
[19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[20] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Computer Vision – ECCV 2018 (V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, eds.), (Cham), pp. 3–19, Springer International Publishing, 2018.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–
778, 2016.
[22] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for Computing Machinery, 2020.
[23] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), (San Diega, CA, USA), 2015.
[24] K. S. Jill Boyce and X. Li, “VTM common test conditions and software reference configurations for SDR video,” 2020.
[25] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32 (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché- Buc, E. Fox, and R. Garnett, eds.), pp. 8024–8035, Curran Associates, Inc., 2019.
[26] K. A. W. Ahmad, P. Wennersten, “AHG12: MTT split modes early
termination,” 2023.
[27] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018.
[28] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu “Lightgbm:
A highly efficient gradient boosting decision tree,” in Advances in Neural Information
Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.