應用深度學習模型以及切割模式限制之提升H.266/VVC 快速幀間編碼方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	羅睿成 Jui-Chen Lo
論文名稱：	應用深度學習模型以及切割模式限制之提升H.266/VVC 快速幀間編碼方法 Fast Inter-Frame Coding for H.266/VVC by Splitting Constraints with Deep Learning Model
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	杭學鳴郭天穎高立人方俊才
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	中文
論文頁數：	48
中文關鍵詞：	多功能視訊編碼標準、幀間編碼、切割模式限制
外文關鍵詞：	H.266/VVC, QTMTT, split mode limitations.
相關次數：	點閱：91 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

H.266/VVC 被視為迎來全新時代的里程碑。其主要目標在於為使用者提供全方位的高品質編碼影像服務。相較於其前身H.265/HEVC，H.266/VVC 在相同的品質標準下展現出更優越的性能。具體而言，它呈現出更高的編碼壓縮率，可提升約30%至50% 的效能。這意味著使用H.266/VVC 進行影像編碼時，即使在維持相同視訊品質標準下，也能大幅節省傳輸頻寬和存儲空間。這樣的優勢使得觀眾無論是在線串流還是觀看存儲於本地的影片，都能享受到更加出色的視覺體驗。但由於採用了新的編碼技術如: 較符合影像特性的切割模式QTMTT(Quad-Tree and Multi Type Tree)、高精度、多方向以及多種類的預測模式等等，且利用(RateDistortion Optimization,RDO) 的方式找尋位元-失真率的最佳解，其編碼時間比起H.265/HEVC 高出了8 至10 倍，使得其較難應用在低運算能力之裝置上。因此，本篇論文提出了一個基於切割模式限制的CNN 神經網路達到加速H.266/VVC 的Inter-Frame Coding 編碼時間。我們設計的LAI-CNN 在切割模式的限制下進行預測，剔除掉冗餘的切割模式，有效節省Rate-Distortion Optimization 的窮舉時間。LAI-CNN 適用於三種Type，其中Type 1 著重在靜態/低動量區塊，應用在ClassE 下表現良好; Type 2 注重切割MTT 而來的subCU，涵蓋最多編碼區塊，整體加速最多; 而Type 3 則適用於切割QT 而來的subCU，加速表現落於三個Type 的中間，且失真表現最好。且LAI-CNN 模型在切割模式的限制下減少了判斷的選擇性以及複雜度，可以有效提升模型的效率，使得模型可以更小、更精簡。實驗結果顯示本研究所提出的加速方法與H.266/VVC 官方參考軟體(VTM-22.0) 在Random-Access 模式下相比，加速了36.35%∼49.20%，而BDBR 僅上升了1.36%∼3.47%。

H.266/VVC is seen as a milestone ushering in a new era. Its primary goal is to provide users with comprehensive high-quality video encoding services. Compared to its predecessor, H.265/HEVC, H.266/VVC exhibits superior performance at the same quality standard. Specifically, it offers higher encoding compression rates, enhancing efficiency by around
30% to 50%. This means that when using H.266/VVC for video encoding, significant bandwidth, and storage space can be saved even while maintaining the same video quality standard. Such advantages enable viewers to enjoy a superior visual experience whether streaming online
or watching locally stored videos. While VVC adopts new encoding techniques, such as: QTMTT split mode, higher resolution, more angle, and more prediction mode. Furthermore,
it utilizes Rate-Distortion Optimization (RDO) to find the optimal trade-off between bitrate and distortion. Its encoding time is 8 to 10 times higher than H.265/HEVC, making it more challenging to apply on devices with lower computational capabilities. Our designed LAI-CNN makes predictions under the constraint of split modes, eliminating redundant split modes effectively, thereby saving exhaustive time in Rate-DistortionOptimization. LAI-CNN applies to three types: Type 1 focuses on static/low-motion
blocks, performing well in ClassE sequences; Type 2 emphasizes subCUs derived from splitting MTT, covering the maximum number of encoded blocks, thus achieving the highest overall acceleration; Type 3 is suitable for subCUs derived from splitting QT, offering intermediate
acceleration performance among the three types and the best distortion performance. Furthermore, under the constraint of split modes, LAI-CNN reduces the selectivity and complexity of decision-making, effectively enhancing the efficiency of model, making it smaller and more streamlined.
Experiment results shows that our proposed method achieved 36.35%∼49.20% in time saving while BDBR only increased 1.36%∼3.47% under VTM-22.0 Random Access Configurations.

摘要.............................. i
ABSTRACT................. ii
誌謝.............................. iii
目錄.............................. iv
圖目錄.......................... vii
表目錄.......................... ix
第一章緒論..................................... 1
1 研究動機及目的............................................ 1
2 問題描述及研究方法.................................... 2
3 論文組織............................................... 3
第二章背景知識.................................................. 4
1 影像編碼............................................... 4
1.1 預測編碼方法................................ 4
1.2 轉換編碼方法................................ 4
1.3 運動估計與運動補償.................... 5
1.4 量化和熵編碼................................ 6
2 H.266/VVC 標準........................................... 7
3 QTMTT 切割架構......................................... 7
4 預測模式(Prediction Mode) ......................... 9
4.1 合併預測模式(Merge Mode) ....... 9
4.2 動量預測模式(Motion Estimation Mode) ....................... 10
4.3 幀內預測模式(Intra Mode) .......... 12
5 卷積神經網路(Convolutional Neural Networks)............................................ 13
5.1 Convolution ................................... 13
5.2 Relu................................................ 13
5.3 Pooling ........................................... 14
6 相關文獻(Related Work).............................. 14
6.1 切割/不切割分算法...................... 15
6.2 切割模式分類算法........................ 15
6.3 切割深度預測算法........................ 18
6.4 基於鄰近CU 之切割及預測模式判斷法......................... 18
6.5 預測模式跳過算法........................ 20
第三章幀間編碼之加速演算法................................... 22
1 H.266/VVC 編碼流程................................... 22
2 H.266/VVC 切割條件之限制....................... 23
3 LAI-CNN 之預測任務.................................. 24
4 LAI-CNN 之訓練策略.................................. 26
4.1 訓練資料收集................................ 26
4.2 LAI-CNN 模型訓練方法及過程.. 27
5 快速劃分決策編碼流程................................ 28
第四章實驗結果與討論............................................... 33
1 實驗環境設置................................................ 33
2 實驗結果.......................... 34
2.1 和原始VTM 之編碼結果分別與綜合比較...................... 34
2.2 與其他論文加速方法比較............ 36
2.3 各類別PSNR 與bitrate 之比較... 37
2.4 消融實驗........................................ 37
2.5 編碼結果劃分圖可視化分析............................. 39
第五章結論與未來研究討論....................................... 44
1 結論.................................. 44
2 未來研究討論................................................ 45
參考文獻........................................... 46
第六章附錄..................................... 49
1 專有名詞對照表............................................ 49
                                

[1] F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi “Complexity analysis
of next-generation vvc encoding and decoding,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 3134–3138, 2020.
[2] T. Amestoy, A. Mercat, W. Hamidouche, D. Menard, and C. Bergeron, “Tunable vvc frame partitioning based on lightweight machine learning, IEEE Transactions on Image Processing, vol. 29, pp. 1313–1328, 2020.
[3] Z. Pan, P. Zhang, B. Peng, N. Ling, and J. Lei, “A cnn-based fast inter coding methodfor vvc,” IEEE Signal Processing Letters, vol. 28, pp. 1260–1264, 2021.
[4] A. Tissier, W. Hamidouche, J. Vanne, and D. Menard, “Machine learning based efficient qt-mtt partitioning for vvc inter coding,” in 2022 IEEE International Conference on Image Processing (ICIP), pp. 1401–1405, 2022.
[5] Y. Liu, M. Abdoli, T. Guionnet, C. Guillemot, and A. Roumy, “Light-weight cnn-based vvc inter partitioning acceleration,” in 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5, 2022.
[6] S. Jung and D. Jun, “Context-based inter mode decision method for fast affine prediction in versatile video coding,” Electronics, vol. 10, no. 11, 2021.
[7] J.-J. Chen, Y.-G. Chou, and C.-S. Jiang, “Speed up vvc intra-coding by learned models and feature statistics,” IEEE Access, vol. 11, pp. 124609–124623, 2023.
[8] Y. Wang, Y. Liu, J. Zhao, and Q. Zhang, “Fast cu partitioning algorithm for vvc based on multi-stage framework and binary subnets,” IEEE Access, vol. 11, pp. 56812–56821, 2023.
[9] W. Kuang, X. Li, X. Zhao, and S. Liu, “Unified fast partitioning algorithm for intra and inter predictions in versatile video coding,” in 2022 Picture Coding Symposium (PCS), pp. 271–275, 2022.
[10] X. Shang, G. Li, X. Zhao, and Y. Zuo, “Low complexity inter coding scheme for versatile video coding (vvc),” Journal of Visual Communication and Image Representation, vol. 90, p. 103683, 2023.
[11] Y.-W. Huang, J. An, H. Huang, X. Li, S.-T. Hsiang, K. Zhang, H. Gao, J. Ma, and O. Chubach, “Block partitioning structure in the vvc standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3818–3833, 2021.
[12] Z. Wang, J. Zhang, N. Zhang, and S. Ma, “Adaptive motion vector resolution scheme for enhanced video coding,” in 2016 Data Compression Conference (DCC), pp. 101–110, 2016.
[13] L. Li, H. Li, D. Liu, Z. Li, H. Yang, S. Lin, H. Chen, and F. Wu, “An efficient four parameter affine motion model for video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 8, pp. 1934–1948, 2018.
[14] A. Alshin, E. Alshina, and T. Lee, “Bi-directional optical flow for improving motion compensation,” in 28th Picture Coding Symposium, pp. 422–425, 2010.
[15] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
[16] R. S. C. Rosewarne, K. Sharman and R. Sjöberg, “High Efficiency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description,” 2019.
[17] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
[18] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxán-Hernández, T. Wiegand, V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra prediction and mode coding in vvc,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3834–3847, 2021.
[19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[20] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Computer Vision – ECCV 2018 (V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, eds.), (Cham), pp. 3–19, Springer International Publishing, 2018.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–
778, 2016.
[22] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for Computing Machinery, 2020.
[23] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), (San Diega, CA, USA), 2015.
[24] K. S. Jill Boyce and X. Li, “VTM common test conditions and software reference configurations for SDR video,” 2020.
[25] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32 (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché- Buc, E. Fox, and R. Garnett, eds.), pp. 8024–8035, Curran Associates, Inc., 2019.
[26] K. A. W. Ahmad, P. Wennersten, “AHG12: MTT split modes early
termination,” 2023.
[27] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018.
[28] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu “Lightgbm:
A highly efficient gradient boosting decision tree,” in Advances in Neural Information
Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.

全文公開日期 2026/08/01 (校內網路)
全文公開日期 2026/08/01 (校外網路)
全文公開日期 2026/08/01 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文