運用背景分析方法與深度學習加速H.266/VVC幀間編碼｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	羅翊誠 Yi-Cheng Lou
論文名稱：	運用背景分析方法與深度學習加速H.266/VVC幀間編碼 Speed up H.266/VVC Inter Coding based on Background Analysis Method and Deep Learning
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	吳怡樂 Yi-Leh Wu 郭天穎 Tien-Ying Kuo 陳建中 Jiann-Jone Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	71
中文關鍵詞：	多功能影像編碼、幀間編碼加速、卷積神經網路、背景分析、仿射運動估測、四分多分樹區塊
外文關鍵詞：	Versatile Video Coding, Speed up Inter-Frame Coding, Convolution Neural Network, Background Analysis, Affine Motion Estimation, Quad-Tree plus Multi-Type Tree(QTMT)
相關次數：	點閱：194 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著處理器效能提升以及第五代行動通訊高速網路的普及，多媒體通訊技術及串流服務品質也大幅提高，為了提供高畫質視訊通訊服務，相關的儲存空間及網路頻寬成本也增加，
其中高畫質視訊壓縮效率關係到應用效能和成本。最新的視訊壓縮標準H.266/VVC(Versatile Video Coding)與前一代H.265/HEVC(High-Efficiency Video Coding)相比較，
其編碼位元率可降低30\%至50\%。VVC主要改進技術為採用高適應能力的QTMT 編碼模式，CU的劃分形狀不只侷限於正方形，以及VVC 於幀間預測模式之進化，
如仿射運動模式、幾何劃分模式以及運動向量的精度提升等等。但這些改進方法也造成幀間編碼(Inter Coding)運算複雜度的巨增，其處理時間約提高9.5倍。
因此，本論文研究VVC幀間編碼架構下之快速編碼決策的演算法，我們提出快速劃分決策以及背景區塊快速識別方法:
(1) 快速劃分決策利用卷積神經網路預測當前編碼區塊(Coding Unit, CU) 合適的多分樹(MT) 劃分模式，並只選擇測試預測的多分樹劃分模式，
以有效省去窮舉搜尋率失真最佳(Rate-Distortion Optimization, RDO) 模式所需的巨量運算;
(2) 背景區塊快速識別決策利用統計方法找出VVC 運動補償功能與畫面中靜態區域的關聯性，以此判定當前區域是否只進行不劃分測試。此外也利用上層CU 與當前CU 運動估測/補償的關聯性，
統計出當前背景區塊是否適合對仿射運動估測進行加速，以求達到整體加速編碼的目的。本研究提出之方法與VVC17.0 標準預設編碼相比，在幀間編碼Random Access模式下，可以降低約36.91\% 的編碼時間，
而BDBR 僅上升1.93\%。此外若加入仿射運動估測的加速，整體可節省約39.26\% 的編碼時間，而BDBR上升2.23\%。

With the evolution of computing performance and high-speed network(5G), multimedia quality has been highly improved.
The high-speed processor and communication technologies enable high-quality multimedia communication and applications.
To provide high-resolution video consumption, it has to pay higher storage and network costs.
To make practical applications feasible, the efficiency of video compression technology dominates the performance and costs.
The newest video coding standard, H.266/VVC(Versatile Video Coding) can reduce 30\% to 50\% of bitrates as compared to the previous counterpart,
H.265/HEVC(High-Efficient Video Coding). The major innovative technologies of VVC comprise:
(1) It adopts a QTMT coding mode that enables the coder to partition a CU not limited to square blocks;
(2) it utilizes affine motion estimation(AME), geometry partitioning mode(GPM) and 1/16 motion vector precision to improve the inter-frame coding efficiency.
However, it requires 9.5 times higher time complexity to achieve higher inter-frame coding performance.
In this research, we study how to reduce the time complexity by designing a fast partition mode prediction method and a fast background region identification method:
(1) A convolutional neural network has been designed to predict the QTMT partition type of a CU to eliminate the exhaustive rate-distortion optimization (RDO) search operations;
(2) We designed a heuristic background block identification method that analyzes the relationship between background
region contents and inter-frame motion compensation type so that we can quickly determine whether split or not for a CU.
In addition, the proposed fast AME also analyzed the ME/MC correlations between background
CUs and their parent CUs to determine whether to utilize affine ME to speed up the coding process.
Experiments showed that the proposed fast encoding algorithm (without fast AME) can reduce 36.91\% encoding time,
with only 1.93\% BDBR loss as compared to default VTM17.0.
The system can achieve 39.26\% time reduction under 2.23\% of BDBR loss when enables the fast AME mode.

摘要.................................................................................................................................... i
ABSTRACT....................................................................................................................... ii
誌謝.................................................................................................................................... iii
目錄.................................................................................................................................... iv
圖目錄................................................................................................................................ vii
表目錄................................................................................................................................ ix
第一章 緒論...................................................................................................................... 1
1 研究動機及目的............................................................................................... 1
2 問題描述及研究方法....................................................................................... 2
3 論文架構........................................................................................................... 2
第二章 背景知識.............................................................................................................. 3
1 H.266/VVC 介紹.............................................................................................. 3
1.1 H.266/VVC 與 H.265/HEVC........................................................... 3
1.2 H.266/VVC 編碼單位 (CU) 與 QTMT 架構................................... 4
2 H.266/VVC 幀間預測 (Inter Prediction) ......................................................... 6
2.1 傳統運動估測原理........................................................................... 6
2.2 Merge and Skip 原理........................................................................ 7
2.3 仿射運動補償預測 (Affine Motion Estimation, AME) .................. 9
2.4 幾何劃分模式 (Geometric Partitioning Mode, GPM)..................... 10
2.5 H.266/VVC 之 Random Access 的預設配置 .................................. 12
iv
3 深度學習框架................................................................................................... 15
3.1 空間金字塔池化網路 (SPP-Net)..................................................... 15
3.2 InceptionNet...................................................................................... 16
3.3 卷積區塊之注意力模組 (CBAM)................................................... 17
4 背景濾除演算法............................................................................................... 19
第三章 H.266/VVC 相關快速演算法介紹..................................................................... 21
1 深度劃分演算法............................................................................................... 21
2 編碼單元劃分/不劃分決策演算法 ................................................................. 22
3 編碼單元劃分模式決策演算法....................................................................... 25
4 快速演算法總結............................................................................................... 27
4.1 總結................................................................................................... 27
第四章 H.266/VVC 幀間編碼之快速演算法................................................................. 29
1 H.266/VVC 於幀間編碼中的 CU 預測與劃分過程 ...................................... 29
2 H.266/VVC 之劃分快速決策.......................................................................... 30
2.1 快速劃分深度演算法之訓練資料集分析....................................... 31
2.2 訓練資料的蒐集............................................................................... 33
3 神經網路系統架構與訓練方法....................................................................... 33
3.1 自適性形狀多資訊卷積神經網路 (Adaptive Shape Multiple
Information CNN, ASMI-CNN)....................................................... 33
3.2 神經網路之訓練............................................................................... 36
3.2.1 損失函數設計................................................................... 36
3.2.2 訓練過程........................................................................... 36
v
3.3 應用神經網路之劃分快速決策....................................................... 37
3.3.1 CU 劃分快速決策流程.................................................... 37
4 背景區塊分析系統加速方法........................................................................... 39
4.1 背景區塊分析加速演算法............................................................... 40
4.2 背景區塊分析 (BackGround Area Analysis, BGA Analysis) ......... 41
5 整合加速系統架構流程................................................................................... 47
第五章 實驗結果與討論.................................................................................................. 49
1 實驗環境設置................................................................................................... 49
2 實驗結果........................................................................................................... 50
2.1 與原始 VTM 之編碼結果進行比較和消融實驗............................ 50
2.2 實驗結果分析................................................................................... 52
2.3 與其他 H.266/VVC 幀間編碼快速演算法比較............................. 52
第六章 結論與未來研究討論.......................................................................................... 54
1 結論................................................................................................................... 54
2 未來研究討論................................................................................................... 55
參考文獻............................................................................................................................ 56
                                

[1] F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, “Complexity analysis of next-generation VVC encoding and decoding,” CoRR, vol. abs/2005.10801,
2020.
[2] A. Browne, Y. Y. J. Chen, and S. H. Kim, “Algorithm description for Versatile Video
Coding and Test Model 17 (VTM 17),” 2022.
[3] H. Gao, S. Esenlik, E. Alshina, and E. Steinbach, “Geometric partitioning mode in
versatile video coding: Algorithm review and analysis,” IEEE Transactions on Circuits
and Systems for Video Technology, vol. 31, no. 9, pp. 3603–3617, 2021.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional
networks for visual recognition,” in Computer Vision – ECCV 2014, pp. 346–361,
Springer International Publishing, 2014.
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CoRR, vol. abs/
1409.4842, 2014.
[6] S. Woo, J. Park, J. Lee, and I. S. Kweon, “CBAM: convolutional block attention module,” CoRR, vol. abs/1807.06521, 2018.
[7] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
[8] Y. Liu, M. Abdoli, T. Guionnet, C. Guillemot, and A. Roumy, “Light-weight cnn-based
vvc inter partitioning acceleration,” in 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5, 2022.
[9] Z. Pan, P. Zhang, B. Peng, N. Ling, and J. Lei, “A cnn-based fast inter coding method
for vvc,” IEEE Signal Processing Letters, vol. 28, pp. 1260–1264, 2021.
[10] Y.-H. Huang, J.-J. Chen, and Y.-H. Tsai, “Speed up h.266/qtmt intra-coding based on
predictions of resnet and random forest classifier,” in 2021 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–6, 2021.
[11] J. Zhao, A. Wu, B. Jiang, and Q. Zhang, “Resnet-based fast cu partition decision algorithm for vvc,” IEEE Access, vol. 10, pp. 100337–100347, 2022.
[12] N. Tang, J. Cao, F. Liang, J. Wang, H. Liu, X. Wang, and X. Du, “Fast CTU partition decision algorithm for VVC intra and inter coding,” in 2019 IEEE Asia Pacific
Conference on Circuits and Systems (APCCAS), pp. 361–364, 2019.
[13] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency
video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
[14] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview
of the versatile video coding (VVC) standard and its applications,” IEEE Transactions
on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
[15] H. Gao, R.-L. Liao, K. Reuzé, S. Esenlik, E. Alshina, Y. Ye, J. Chen, J. Luo, C.-C.
Chen, H. Huang, W.-J. Chien, V. Seregin, and M. Karczewicz, “Advanced geometricbased inter prediction for versatile video coding,” in 2020 Data Compression Conference (DCC), pp. 93–102, 2020.
[16] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” CoRR, vol. abs/
1804.02767, 2018.
[17] A. Bochkovskiy, C. Wang, and H. M. Liao, “Yolov4: Optimal speed and accuracy of
object detection,” CoRR, vol. abs/2004.10934, 2020.
[18] Z. Zivkovic, “Improved adaptive gaussian mixture model for background subtraction,”
in Proceedings of the 17th International Conference on Pattern Recognition, 2004.
ICPR 2004., vol. 2, pp. 28–31 Vol.2, 2004.
[19] Z. Jin, P. An, C. Yang, and L. Shen, “Fast qtbt partition algorithm for intra frame coding
through convolutional neural network,” IEEE Access, vol. 6, pp. 54660–54673, 2018.
[20] J. Li, S. Zhang, and F. Yang, “Random forest accelerated cu partition for inter prediction in h.266/vvc,” in 2022 IEEE International Conference on Multimedia and Expo
(ICME), pp. 01–06, 2022.
[21] H. Yang, L. Shen, X. Dong, Q. Ding, P. An, and G. Jiang, “Low-complexity ctu partition
structure decision and fast intra mode decision for versatile video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1668–1682,
2020.
[22] Z. Liu, H. Qian, and M. Zhang, “A fast multi-tree partition algorithm based on spatialtemporal correlation for vvc,” in 2022 Data Compression Conference (DCC), pp. 468–
468, 2022.
[23] F. Bossen, J. Boyce, K. Suehring, X. Li, and V. Seregin, “VTM common test conditions
and software reference configurations for SDR video,” 2020.
[24] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video
codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for
Computing Machinery, 2020.
[25] Xiph.org, “Xiph.org video test media,” 2019.
[26] L. Shen, Z. Zhang, and Z. Liu, “Adaptive inter-mode decision for hevc jointly utilizing
inter-level and spatiotemporal correlations,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 10, pp. 1709–1722, 2014.
[27] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, “Reducing complexity of hevc:
A deep learning approach,” IEEE Transactions on Image Processing, vol. 27, no. 10,
pp. 5044–5059, 2018.
[28] S.-H. Park and J.-W. Kang, “Fast affine motion estimation for versatile video coding
(vvc) encoding,” IEEE Access, vol. 7, pp. 158075–158084, 2019.
[29] G. Bjontegaard, “Improvements of the bd-psnr model,” 2008.
[30] G. Bjøntegaard, “Calculation of average psnr differences between rd-curves,” 2001.
[31] Y. Li, F. Luo, and Y. Zhu, “Temporal prediction model-based fast inter cu partition for
versatile video coding,” Sensors, vol. 22, no. 20, 2022.

全文公開日期 2025/01/31 (校內網路)
全文公開日期 2025/01/31 (校外網路)
全文公開日期 2025/01/31 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文