運用神經網路於加速H.266/VVC幀間編碼｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭宇峯 Yu-Fong Jheng
論文名稱：	運用神經網路於加速H.266/VVC幀間編碼 Speed up H.266/VVC Inter Coding based on Convolutional Neural Network
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	郭天穎 Tien-Ying Kuo 吳怡樂 Yi-Leh Wu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	54
中文關鍵詞：	H.266 多功能視訊編碼、四分多分樹區塊分割、加速幀間編碼、光流、深度預測模型
外文關鍵詞：	H.266/VVC, QTMT CU Partition, Inter-frame Speedup Coding, Optical Flow, Depth Prediction Model
相關次數：	點閱：261 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著 5G 網路普及，視訊通信與相關串流服務應用的畫質也逐漸提升，為了提供更好的使用體驗，JVET 國際標準組織制定新一代視訊編碼標準 H.266/VVC，與前一代 H.265/HEVC 相比，壓縮率可以提升 30% 到 50%，然而 H.266 所需編碼時間經正式測試是 H.265 的 9.5 倍。為了要讓 H.266/VVC 可以應用到實際的系統設計，本論文將基於 H.266/VVC 標準架構，提出一種用於幀間編碼 (Inter-Frame Coding) 之加速演算法。本論文設計一個名為樹狀深度預測模型 (Tree structure Depth Prediction Model,TDPModel) ，運用光流法 (Optical Flow)、二分樹 (Binary Tree) 以及深度學習模型(Deep-Learn Model)，來預測幀間編碼中每個編碼單元區塊 (Coding Tree Unit) 的劃分深度，以省去找出最佳編碼結果所需之窮舉運算，達到加速編碼的效果。在模型訓練的部分，本論文設計 Leaky_Hinge 成本的方式讓模型訓練出來後執行推論時可以控制 BDBR 不至於提升太多。實驗結果顯示，與 H.266/VVC 預設編碼 (VTM-10.0)相比，在幀間編碼中節省了 33.198% 的執行時間，而 BDBR 僅提升 4.156%。若包含幀內幀間編碼，則可節省 25.052% 的總編碼時間，而 BDBR 僅上升 2.665%。

With the prevalence of 5G mobile networks, the image resolutions of video communication and related streaming service applications have gradually increased. In order to provide a better user experience, the JVET (Joint Video Experts Team) proposed a new generation of video coding standard H.266/VVC. Compared with the previous generation of H.265/HEVC, the H.266/VVC can save about 30% ~50% of encoding bitrate, but the encoding time required for H.266 has been officially tested to be 9.5 times that of H.265. In order to make the H.266/VVC feasible for practical multimedia streaming applications, we proposed to speed up the Inter-Frame Coding process based on the H.266/VVC standard architecture. We designed a Tree structure Depth Prediction Model (TDPModel) that utilizes optical flow information, a binary tree decision structure with deep learning models to predict the partition depth of each coding tree unit (CTU) for inter-frame coding, such that the coding control system only needs to perform the rate-distortion optimization process under a subset instead of the whole depth range to save encoding time, while maintaining good encoding quality. For the model training, we designed a Leaky_Hinge loss function so that the BDBR can be kept small when the model performs inference. Experimental results show that the inter-frame encoding time can be saved by 33.19%, as compared with the H.266/VVC default configuration in the VTM-10.0, while the BDBR is controlled to be 4.156%. The total encoding time can be saved by about 25.052% with 2.665% BDBR if intra-frame coding time is included.

摘要 i
ABSTRACT ii
目錄 iii
圖目錄 vi
表目錄 viii
第一章 緒論 1
1 研究動機及目的 1
2 問題描述及研究方法 1
3 論文組織 3
第二章 背景知識 4
1 H.266/VVC 介紹  4
1.1 H.266/VVC 與 H.265/HEVC  4
1.2 QTMT  5
1.3 幀間預測 5
2 神經網路 6
2.1 卷積層 6
2.2 激勵函數 7
2.3 池化層 7
3 SVM(Support Vector Machine)  7
4 光流 (Optical Flow) 8
第三章 模型相關介紹 10
1 光流模型 10
1.1 使用深度學習預測光流 10
第四章 H.266/VVC 幀間編碼之快速演算法  12
1 CU 的劃分過程  12
2 深度定義 12
3 編碼參數修改 14
3.1 參考幀參數 14
3.1.1 預設編碼參數 14
3.1.2 編碼參數調整 14
3.2 幀內編碼週期參數 17
4 訓練資料的收集 17
5 系統架構 19
6 模型架構與訓練 22
6.1 模型架構 22
6.2 模型訓練 28
6.2.1 類別權重 28
6.2.2 損失函數 28
6.2.3 模型的訓練 30
第五章 實驗結果與討論 31
1 實驗環境設置 31
2 實驗結果 32
2.1 與原始 VTM 之編碼結果進行比較 32
2.2 不使用 Leaky_Hinge 之模型的編碼結果進行比較 35
2.3 各個類別 PSNR 與 bitrate 之比較  35
2.4 與其他加速方法比較 38
第六章 結論與未來研究討論 39
1 結論 39
2 未來研究討論 40
參考文獻 41
第七章 附錄 43
1 反正切函數2 43
                                

[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, pp. 1649–1668, Dec 2012.
[2] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding editorial refinements on draft 10.” https://jvet-experts.org/doc_end_user/current_document.php?id=10540, 2020.
[3] Y. Fan, J. Chen, H. Sun, J. Katto, and M. Jing, “A fast qtmt partition decision strategy for vvc intra prediction,” IEEE Access, vol. 8, pp. 107900–107911, 2020.
[4] T. Fu, H. Zhang, F. Mu, and H. Chen, “Fast cu partitioning algorithm for h.266/vvc intra-frame coding,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 55–60, 2019.
[5] F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, “Complexity analysis of next-generation vvc encoding and decoding,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 3134–3138, 2020.
[6] J. Chen, Y. Ye, and S. H. Kim, “Algorithm description for versatile video coding and test model 10 (vtm 10).” https://www.mpegstandards.org/wp-content/uploads/mpeg_meetings/131_OnLine/w19471.zip, 2020.
[7] M. Chenx, “Day 06：處理影像的利器 – 卷積神經網路 (convolutional neural network).” https://ithelp.ithome.com.tw/articles/10191820, 2020.
[8] wheam, “Day23 深度學習-卷積神經網路-pooling layer(池化層).” https://ithelp.ithome.com.tw/articles/10250745, 2020.
[9] A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbas, V. Golkov, P. v. d. Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766, 2015.43
[10] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720–2729, 2017.
[11] P. Seeling and M. Reisslein, “Video traffic characteristics of modern encoding standards: H.264/avc with svc and mvc extensions and h.265/hevc,” The Scientific World Journal, vol. 2014, p. 189481, Feb 2014.
[12] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,G. Irving, M. Isard, et al., “Tensorflow: A system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016.
[13] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Y. W. Teh and M. Titterington, eds.), vol. 9 of Proceedings of Machine Learning Research, (Chia Laguna Resort, Sardinia, Italy), pp. 249–256, PMLR, 13–15 May 2010.
[14] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.
[15] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for Computing Machinery, 2020.
[16] Anserw, “Bjontegaard metric calculation. include bd-psnr and bd-rate..” https://github.com/Anserw/Bjontegaard_metric, 2020.
[17] N. Tang, J. Cao, F. Liang, J. Wang, H. Liu, X. Wang, and X. Du, “Fast ctu partition decision algorithm for vvc intra and inter coding,” in 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 361–364, 2019.
[18] Z. Pan, P. Zhang, B. Peng, N. Ling, and J. Lei, “A cnn-based fast inter coding method for vvc,” IEEE Signal Processing Letters, vol. 28, pp. 1260–1264, 2021.

全文公開日期 2024/05/04 (校內網路)
全文公開日期 2024/05/04 (校外網路)
全文公開日期 2024/05/04 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文