運用統計方法及深度學習於加速H.266/VVC幀內編碼｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	周業冠 Ye-Guan Zhou
論文名稱：	運用統計方法及深度學習於加速H.266/VVC幀內編碼 Speed up H.266/VVC Intra Coding based on Statistical Method and Deep Learning
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	杭學鳴 Hsueh-Ming Hang 郭天穎 Tien-Ying Kuo 吳怡樂 Yi-Leh Wu 鍾國亮 Kuo-Liang Chung
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	53
中文關鍵詞：	多功能影像編碼、嵌套多類型樹的四分樹編碼、卷積神經網路、幀內編碼
外文關鍵詞：	Versatile Video Coding, H.266/VVC, Quad-Tree plus Multi-Type Tree Coding, QTMT, Convolution Neural Network, Intra-Frame Coding
相關次數：	點閱：379 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著電腦處理器速度提升、5G 高速網路普遍應用，以及多媒體處理技術進步，
多媒體通信及串流服務應用的服務品質大幅提升。例如 4K 影質視訊、擴增實境
(Augmented Reality, AR)、沉浸式虛擬實境 (immersive virtual reality, VR)、360 度環景
影片、社群媒體等，然而支援這些應用與系統所需之儲存空間及頻寬的成本也隨之
提高，目前最新一代國際視訊編碼標準 H.266/VVC 支援多功能視訊編碼 (Versatile
Video Coding, H.266/VVC)，於 2020 年 7 月 6 日完成制定，與前一代的高效率視訊編
碼標準 (High-Efficiency Video Coding, H.265/HEVC) 採用 QTBT 編碼相比其位元率可
以降低 30% 50%，主要是因為 VVC 採用高適應性的 QTMT 編碼模式，可以更貼近
影像內容採用不同的切割模式以提高編碼效率。因此 VVC 編碼運算複雜度也提高到
31 倍 (Intra-frame coding)。本論文針對 VVC 的幀內編碼 (Intra Coding) 研究在不降低
編碼品質的前提下降低運算複雜度的方法，讓各種應用能實際運用 VVC 提升其系統
服務品質，我們針對 VVC 編碼架構在流程相應的模組中依據其限制，提出: (1) 運用
統計編碼區塊 (Coding Unit，CU) 信號特徵快速過濾不切割 CU 方法; (2) 設計與訓練
眷積神經網路模型來預測 QTMT 切割模式，只選擇高信度切割模式測試，以有效省
去窮舉搜尋率失真最佳 (Rate-Distortion Optimization, RDO) 模式所需的巨量運算，以
達到整體加速編碼的目的。實驗結果顯示，本研究所提方法與 VVC 標準預設編碼設
定相比，在幀內編碼 Intra Coding 模式下，可以降低 46.73% 的編碼時間，且 BDBR
僅上升 1.16%。此外，若將神經網路再次針對特定 QP 進行再訓練後，平均可節省
51.79% 的編碼時間，此情況下 BDBR 些微上升到了 2.07%

The quality of service of multimedia communications has been improved with the help
of high-speed networks, such as the 5G mobile network, and high CPU processing power. For
example, applications like 4K high-quality video communication, augmented reality (AR),
immersive virtual reality (VR), 360-degree video coding, and social media applications re-
quire the support of high storage disk space and a high-speed network environment. The
newest video coding standard, H.266/VVC, supports versatile video coding and has been fi-
nalized on July 6, 2020. The VVC adopts Quad-Tree plus Multi-Type Tree (QTMT) coding
mode and can reduce 30% 50% of bitrate as compared to its previous one, HEVC/H.265,
which adopts a QTBT coding mode. Although the QTMT can split a CU to better fit the
image texture contents for efficient coding, its time complexity is as high as 31 times of the
QTBT mode. In this research, we proposed to reduce the time complexity of the VVC coder
under neglectable quality degradation, such that the above-mentioned applications can im-
prove their quality of services. We proposed to: (1) exploit CU signal statistics to quickly
determine not to further split a CU; (2) design and train a convolutional neural network (CNN)
to predict the optimal split type for a CU, such that only a small subset of all QTMT types
require further RDO operation. Experiments showed that the proposed speedup VVC coding
method for all-intra coding can save 46.73% of execution time with only 1.16% of BDBR
increment. In addition, if the coder is designed to retrain the CNN model for a specific QP,
the coder can save 51.79% of execution time with 2.07% of BDBR increment.

摘要
ABSTRACT
目錄
圖目錄
表目錄
第一章 緒論
1 研究動機及目的
2 問題描述及研究辦法
3 論文組織
第二章 背景知識
1 視訊編碼 (Video Coding)
1.1 VVC 與 HEVC
1.2 幀內預測 (Intra Prediction)
2 神經網路 (Neural Network)
2.1 卷積神經網路 (Convolutional Neural Network, CNN)
2.2 激勵函數 (Activation Function)
2.3 池化層 (Pool Layer) 與自適應池化層 (Adaptive Pool Layer)
3 相關文獻 (Related papers)
第三章 訓練資料的處理及統計
1 訓練資料
2 資料處理
3 資料統計
3.1 SD 統計
3.2 RDcost 統計
第四章 VVC Intra Coding 快速演算法
1 神經網路的架構及訓練方法
1.1 自適應編碼單元卷積神經網路 (Adaptive CU CNN, ACUCNN)
1.2 神經網路之訓練
2 Intra Coding 快速分割決策
2.1 編碼流程
2.2 統計閥值使用及限制
2.3 ACUCNN 加速方法
第五章 實驗結果與討論
1 實驗環境設置
2 實驗結果
2.1 與原始 VTM 之編碼結果進行比較
2.2 實驗結果分析
2.3 與其他文獻加速方法比較
第六章 結論與未來研究討論
1 結論
2 未來研究討論
參考文獻
第七章 附錄
1 專有名詞對照表
                                

[1] A. Browne, J. Chen, Y. Ye, and S. H. Kim, “Algorithm description for Versatile Video
Coding and Test Model 14 (VTM 14),” 2021.
[2] M. Saldanha, G. Sanchez, C. Marcon, and L. Agostini, “Performance analysis of vvc
intra coding,” Journal of Visual Communication and Image Representation, vol. 79,
p. 103202, 2021.
[3] S. Du, “Understanding Deep Self-attention Mechanism in Convolution Neural Net-
works,” 2020.
[4] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile Video Coding Editorial Refine-
ments on Draft 10,” 2020.
[5] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency
Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
[6] F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, “Complexity Anal-
ysis Of Next-Generation VVC Encoding And Decoding,” in 2020 IEEE International
Conference on Image Processing (ICIP), pp. 3134–3138, 2020.
[7] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxán-Hernández, T. Wiegand,
V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra Prediction and
Mode Coding in VVC,” IEEE Transactions on Circuits and Systems for Video Technol-
ogy, vol. 31, no. 10, pp. 3834–3847, 2021.
[8] L. Zhao, X. Zhao, S. Liu, X. Li, J. Lainema, G. Rath, F. Urban, and F. Racapé, “Wide
Angular Intra Prediction for Versatile Video Coding,” in 2019 Data Compression Con-
ference (DCC), pp. 53–62, 2019.
[9] Y.-J. Chang, H.-J. Jhu, H.-Y. Jiang, L. Zhao, X. Zhao, X. Li, S. Liu, B. Bross, P. Keydel,
H. Schwarz, D. Marpe, and T. Wiegand, “Multiple Reference Line Coding for Most
Probable Modes in Intra Prediction,” in 2019 Data Compression Conference (DCC),
pp. 559–559, 2019.
[10] Y.-J. Chang, H.-J. Jhu, H.-Y. Jian, L. Zhao, X. Zhao, X. Li, S. Liu, B. Bross, P. Key-
del, H. Schwarz, et al., “Intra prediction using multiple reference lines for the versatile
video coding standard,” in Applications of Digital Image Processing XLII, vol. 11137,
p. 1113716, International Society for Optics and Photonics, 2019.
[11] M. Schäfer, B. Stallenberger, J. Pfaff, P. Helle, H. Schwarz, D. Marpe, and T. Wie-
gand, “An Affine-Linear Intra Prediction With Complexity Constraints,” in 2019 IEEE
International Conference on Image Processing (ICIP), pp. 1089–1093, 2019.
[12] J. Pfaff, P. Helle, P. Merkle, M. Schäfer, B. Stallenberger, T. Hinz, H. Schwarz,
D. Marpe, and T. Wiegand, “Data-driven intra-prediction modes in the development
of the versatile video coding standard,” ITU J. ICT Discoveries, vol. 3, no. 1, pp. 25–
32, 2020.
[13] S. De-Luxán-Hernández, V. George, J. Ma, T. Nguyen, H. Schwarz, D. Marpe, and
T. Wiegand, “An Intra Subpartition Coding Mode for VVC,” in 2019 IEEE Interna-
tional Conference on Image Processing (ICIP), pp. 1203–1207, 2019.
[14] M. Koo, M. Salehifar, J. Lim, and S.-H. Kim, “Low Frequency Non-Separable Trans-
form (LFNST),” in 2019 Picture Coding Symposium (PCS), pp. 1–5, 2019.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep
Convolutional Neural Networks,” in Advances in Neural Information Processing Sys-
tems (F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.), vol. 25, Curran Asso-
ciates, Inc., 2012.
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,”
CoRR, vol. abs/1512.03385, 2015.
[17] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,”
in European conference on computer vision, pp. 818–833, Springer, 2014.
[18] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison,
A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An
Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural In-
formation Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-
Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
[19] Y. Fan, J. Chen, H. Sun, J. Katto, and M. Jing, “A Fast QTMT Partition Decision Strat-
egy for VVC Intra Prediction,” IEEE Access, vol. 8, pp. 107900–107911, 2020.
[20] N. Tang, J. Cao, F. Liang, J. Wang, H. Liu, X. Wang, and X. Du, “Fast CTU Parti-
tion Decision Algorithm for VVC Intra and Inter Coding,” in 2019 IEEE Asia Pacific
Conference on Circuits and Systems (APCCAS), pp. 361–364, 2019.
[21] Q. Zhang, Y. Zhao, B. Jiang, and Q. Wu, “Fast CU Partition Decision Method Based
on Bayes and Improved De-Blocking Filter for H.266/VVC,” IEEE Access, vol. 9,
pp. 70382–70391, 2021.
[22] Q. Zhang, Y. Wang, L. Huang, B. Jiang, and X. Wang, “Fast CU partition decision for
H. 266/VVC based on the improved DAG-SVM classifier model,” Multimedia Systems,
vol. 27, no. 1, pp. 1–14, 2021.
[23] H. Yang, L. Shen, X. Dong, Q. Ding, P. An, and G. Jiang, “Low-Complexity CTU
Partition Structure Decision and Fast Intra Mode Decision for Versatile Video Cod-
ing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6,
pp. 1668–1682, 2020.
[24] Q. Zhang, Y. Wang, L. Huang, and B. Jiang, “Fast CU Partition and Intra Mode Decision
Method for H.266/VVC,” IEEE Access, vol. 8, pp. 117539–117550, 2020.
[25] Z. Jin, P. An, C. Yang, and L. Shen, “Fast QTBT Partition Algorithm for Intra Frame
Coding through Convolutional Neural Network,” IEEE Access, vol. 6, pp. 54660–
54673, 2018.
[26] Y.-H. Huang, J.-J. Chen, and Y.-H. Tsai, “Speed Up H.266/QTMT Intra-Coding Based
on Predictions of ResNet and Random Forest Classifier,” in 2021 IEEE International
Conference on Consumer Electronics (ICCE), pp. 1–6, 2021.
[27] A. Tissier, W. Hamidouche, J. Vanne, F. Galpin, and D. Menard, “CNN Oriented Com-
plexity Reduction Of VVC Intra Encoder,” in 2020 IEEE International Conference on
Image Processing (ICIP), pp. 3139–3143, 2020.
[28] G. Tech, J. Pfaff, H. Schwarz, P. Helle, A. Wieckowski, D. Marpe, and T. Wiegand,
“Fast Partitioning for VVC Intra-Picture Encoding with a CNN Minimizing the Rate-
Distortion-Time Cost,” in 2021 Data Compression Conference (DCC), pp. 3–12, 2021.
[29] T. Li, M. Xu, R. Tang, Y. Chen, and Q. Xing, “DeepQTMT: A Deep Learning Approach
for Fast QTMT-Based CU Partition of Intra-Mode VVC,” IEEE Transactions on Image
Processing, vol. 30, pp. 5377–5390, 2021.
[30] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive CU Split Decision with Pooling-
variable CNN for VVC Intra Encoding,” in 2019 IEEE Visual Communications and
Image Processing (VCIP), pp. 1–4, 2019.
[31] J. Zhao, Y. Wang, and Q. Zhang, “Adaptive CU split decision based on deep learning
and multifeature fusion for H. 266/VVC,” Scientific Programming, vol. 2020, 2020.
[32] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
[33] F. Bossen, “Common test conditions and software reference configurations,” 2013.
[34] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 1–9, 2015.
[35] M. Saldanha, G. Sanchez, C. Marcon, and L. Agostini, “Complexity Analysis Of VVC
Intra Coding,” in 2020 IEEE International Conference on Image Processing (ICIP),
pp. 3119–3123, 2020.
[36] F. Bossen, J. Boyce, K. Suehring, X. Li, and V. Seregin, “VTM common test conditions
and software reference configurations for SDR video,” 2020.
[37] G. Bjontegaard, “Improvements of the BD-PSNR model,” 2008.

全文公開日期 2024/08/08 (校內網路)
全文公開日期 2024/08/08 (校外網路)
全文公開日期 2024/08/08 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文