研究生: 羅睿成
Jui-Chen Lo
論文名稱: 應用深度學習模型以及切割模式限制之提升H.266/VVC 快速幀間編碼方法
Fast Inter-Frame Coding for H.266/VVC by Splitting Constraints with Deep Learning Model
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 杭學鳴
學位類別: 碩士
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 48
中文關鍵詞: 多功能視訊編碼標準幀間編碼切割 模式限制
外文關鍵詞: H.266/VVC, QTMTT, split mode limitations.
  • H.266/VVC 被視為迎來全新時代的里程碑。其主要目標在於為使用者提供全方位的高品質編碼影像服務。相較於其前身H.265/HEVC,H.266/VVC 在相同的品質標準下展現出更優越的性能。具體而言,它呈現出更高的編碼壓縮率,可提升約30%至50% 的效能。這意味著使用H.266/VVC 進行影像編碼時,即使在維持相同視訊品質標準下,也能大幅節省傳輸頻寬和存儲空間。這樣的優勢使得觀眾無論是在線串流還是觀看存儲於本地的影片,都能享受到更加出色的視覺體驗。但由於採用了新的編碼技術如: 較符合影像特性的切割模式QTMTT(Quad-Tree and Multi Type Tree)、高精度、多方向以及多種類的預測模式等等,且利用(RateDistortion Optimization,RDO) 的方式找尋位元-失真率的最佳解,其編碼時間比起H.265/HEVC 高出了8 至10 倍,使得其較難應用在低運算能力之裝置上。因此,本篇論文提出了一個基於切割模式限制的CNN 神經網路達到加速H.266/VVC 的Inter-Frame Coding 編碼時間。我們設計的LAI-CNN 在切割模式的限制下進行預測,剔除掉冗餘的切割模式,有效節省Rate-Distortion Optimization 的窮舉時間。LAI-CNN 適用於三種Type,其中Type 1 著重在靜態/低動量區塊,應用在ClassE 下表現良好; Type 2 注重切割MTT 而來的subCU,涵蓋最多編碼區塊,整體加速最多; 而Type 3 則適用於切割QT 而來的subCU,加速表現落於三個Type 的中間,且失真表現最好。且LAI-CNN 模型在切割模式的限制下減少了判斷的選擇性以及複雜度,可以有效提升模型的效率,使得模型可以更小、更精簡。實驗結果顯示本研究所提出的加速方法與H.266/VVC 官方參考軟體(VTM-22.0) 在Random-Access 模式下相比,加速了36.35%∼49.20%,而BDBR 僅上升了1.36%∼3.47%。

    H.266/VVC is seen as a milestone ushering in a new era. Its primary goal is to provide users with comprehensive high-quality video encoding services. Compared to its predecessor, H.265/HEVC, H.266/VVC exhibits superior performance at the same quality standard. Specifically, it offers higher encoding compression rates, enhancing efficiency by around
    30% to 50%. This means that when using H.266/VVC for video encoding, significant bandwidth, and storage space can be saved even while maintaining the same video quality standard. Such advantages enable viewers to enjoy a superior visual experience whether streaming online
    or watching locally stored videos. While VVC adopts new encoding techniques, such as: QTMTT split mode, higher resolution, more angle, and more prediction mode. Furthermore,
    it utilizes Rate-Distortion Optimization (RDO) to find the optimal trade-off between bitrate and distortion. Its encoding time is 8 to 10 times higher than H.265/HEVC, making it more challenging to apply on devices with lower computational capabilities. Our designed LAI-CNN makes predictions under the constraint of split modes, eliminating redundant split modes effectively, thereby saving exhaustive time in Rate-DistortionOptimization. LAI-CNN applies to three types: Type 1 focuses on static/low-motion
    blocks, performing well in ClassE sequences; Type 2 emphasizes subCUs derived from splitting MTT, covering the maximum number of encoded blocks, thus achieving the highest overall acceleration; Type 3 is suitable for subCUs derived from splitting QT, offering intermediate
    acceleration performance among the three types and the best distortion performance. Furthermore, under the constraint of split modes, LAI-CNN reduces the selectivity and complexity of decision-making, effectively enhancing the efficiency of model, making it smaller and more streamlined.
    Experiment results shows that our proposed method achieved 36.35%∼49.20% in time saving while BDBR only increased 1.36%∼3.47% under VTM-22.0 Random Access Configurations.

    摘要.............................. i ABSTRACT................. ii 誌謝.............................. iii 目錄.............................. iv 圖目錄.......................... vii 表目錄.......................... ix 第一章緒論..................................... 1 1.1 研究動機及目的............................................ 1 1.2 問題描述及研究方法.................................... 2 1.3 論文組織............................................... 3 第二章背景知識.................................................. 4 2.1 影像編碼............................................... 4 2.1.1 預測編碼方法................................ 4 2.1.2 轉換編碼方法................................ 4 2.1.3 運動估計與運動補償.................... 5 2.1.4 量化和熵編碼................................ 6 2.2 H.266/VVC 標準........................................... 7 2.3 QTMTT 切割架構......................................... 7 2.4 預測模式(Prediction Mode) ......................... 9 2.4.1 合併預測模式(Merge Mode) ....... 9 2.4.2 動量預測模式(Motion Estimation Mode) ....................... 10 2.4.3 幀內預測模式(Intra Mode) .......... 12 2.5 卷積神經網路(Convolutional Neural Networks)............................................ 13 2.5.1 Convolution ................................... 13 2.5.2 Relu................................................ 13 2.5.3 Pooling ........................................... 14 2.6 相關文獻(Related Work).............................. 14 2.6.1 切割/不切割分算法...................... 15 2.6.2 切割模式分類算法........................ 15 2.6.3 切割深度預測算法........................ 18 2.6.4 基於鄰近CU 之切割及預測模式判斷法......................... 18 2.6.5 預測模式跳過算法........................ 20 第三章幀間編碼之加速演算法................................... 22 3.1 H.266/VVC 編碼流程................................... 22 3.2 H.266/VVC 切割條件之限制....................... 23 3.3 LAI-CNN 之預測任務.................................. 24 3.4 LAI-CNN 之訓練策略.................................. 26 3.4.1 訓練資料收集................................ 26 3.4.2 LAI-CNN 模型訓練方法及過程.. 27 3.5 快速劃分決策編碼流程................................ 28 第四章實驗結果與討論............................................... 33 4.1 實驗環境設置................................................ 33 4.2 實驗結果.......................... 34 4.2.1 和原始VTM 之編碼結果分別與綜合比較...................... 34 4.2.2 與其他論文加速方法比較............ 36 4.2.3 各類別PSNR 與bitrate 之比較... 37 4.2.4 消融實驗........................................ 37 4.2.5 編碼結果劃分圖可視化分析............................. 39 第五章結論與未來研究討論....................................... 44 5.1 結論.................................. 44 5.2 未來研究討論................................................ 45 參考文獻........................................... 46 第六章附錄..................................... 49 6.1 專有名詞對照表............................................ 49

