研究生: |
鄭宇峯 Yu-Fong Jheng |
---|---|
論文名稱: |
運用神經網路於加速H.266/VVC幀間編碼 Speed up H.266/VVC Inter Coding based on Convolutional Neural Network |
指導教授: |
陳建中
Jiann-Jone Chen |
口試委員: |
郭天穎
Tien-Ying Kuo 吳怡樂 Yi-Leh Wu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 54 |
中文關鍵詞: | H.266 多功能視訊編碼 、四分多分樹區塊分割 、加速幀間編碼 、光流 、深度預測模型 |
外文關鍵詞: | H.266/VVC, QTMT CU Partition, Inter-frame Speedup Coding, Optical Flow, Depth Prediction Model |
相關次數: | 點閱:273 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著 5G 網路普及,視訊通信與相關串流服務應用的畫質也逐漸提升,為了提供更好的使用體驗,JVET 國際標準組織制定新一代視訊編碼標準 H.266/VVC,與前一代 H.265/HEVC 相比,壓縮率可以提升 30% 到 50%,然而 H.266 所需編碼時間經正式測試是 H.265 的 9.5 倍。為了要讓 H.266/VVC 可以應用到實際的系統設計,本論文將基於 H.266/VVC 標準架構,提出一種用於幀間編碼 (Inter-Frame Coding) 之加速演算法。本論文設計一個名為樹狀深度預測模型 (Tree structure Depth Prediction Model,TDPModel) ,運用光流法 (Optical Flow)、二分樹 (Binary Tree) 以及深度學習模型(Deep-Learn Model),來預測幀間編碼中每個編碼單元區塊 (Coding Tree Unit) 的劃分深度,以省去找出最佳編碼結果所需之窮舉運算,達到加速編碼的效果。在模型訓練的部分,本論文設計 Leaky_Hinge 成本的方式讓模型訓練出來後執行推論時可以控制 BDBR 不至於提升太多。實驗結果顯示,與 H.266/VVC 預設編碼 (VTM-10.0)相比,在幀間編碼中節省了 33.198% 的執行時間,而 BDBR 僅提升 4.156%。若包含幀內幀間編碼,則可節省 25.052% 的總編碼時間,而 BDBR 僅上升 2.665%。
With the prevalence of 5G mobile networks, the image resolutions of video communication and related streaming service applications have gradually increased. In order to provide a better user experience, the JVET (Joint Video Experts Team) proposed a new generation of video coding standard H.266/VVC. Compared with the previous generation of H.265/HEVC, the H.266/VVC can save about 30% ~50% of encoding bitrate, but the encoding time required for H.266 has been officially tested to be 9.5 times that of H.265. In order to make the H.266/VVC feasible for practical multimedia streaming applications, we proposed to speed up the Inter-Frame Coding process based on the H.266/VVC standard architecture. We designed a Tree structure Depth Prediction Model (TDPModel) that utilizes optical flow information, a binary tree decision structure with deep learning models to predict the partition depth of each coding tree unit (CTU) for inter-frame coding, such that the coding control system only needs to perform the rate-distortion optimization process under a subset instead of the whole depth range to save encoding time, while maintaining good encoding quality. For the model training, we designed a Leaky_Hinge loss function so that the BDBR can be kept small when the model performs inference. Experimental results show that the inter-frame encoding time can be saved by 33.19%, as compared with the H.266/VVC default configuration in the VTM-10.0, while the BDBR is controlled to be 4.156%. The total encoding time can be saved by about 25.052% with 2.665% BDBR if intra-frame coding time is included.
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, pp. 1649–1668, Dec 2012.
[2] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding editorial refinements on draft 10.” https://jvet-experts.org/doc_end_user/current_document.php?id=10540, 2020.
[3] Y. Fan, J. Chen, H. Sun, J. Katto, and M. Jing, “A fast qtmt partition decision strategy for vvc intra prediction,” IEEE Access, vol. 8, pp. 107900–107911, 2020.
[4] T. Fu, H. Zhang, F. Mu, and H. Chen, “Fast cu partitioning algorithm for h.266/vvc intra-frame coding,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 55–60, 2019.
[5] F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, “Complexity analysis of next-generation vvc encoding and decoding,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 3134–3138, 2020.
[6] J. Chen, Y. Ye, and S. H. Kim, “Algorithm description for versatile video coding and test model 10 (vtm 10).” https://www.mpegstandards.org/wp-content/uploads/mpeg_meetings/131_OnLine/w19471.zip, 2020.
[7] M. Chenx, “Day 06:處理影像的利器 – 卷積神經網路 (convolutional neural network).” https://ithelp.ithome.com.tw/articles/10191820, 2020.
[8] wheam, “Day23 深 度 學 習-卷 積 神 經 網 路-pooling layer(池 化 層).” https://ithelp.ithome.com.tw/articles/10250745, 2020.
[9] A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbas, V. Golkov, P. v. d. Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766, 2015.43
[10] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720–2729, 2017.
[11] P. Seeling and M. Reisslein, “Video traffic characteristics of modern encoding standards: H.264/avc with svc and mvc extensions and h.265/hevc,” The Scientific World Journal, vol. 2014, p. 189481, Feb 2014.
[12] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,G. Irving, M. Isard, et al., “Tensorflow: A system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016.
[13] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Y. W. Teh and M. Titterington, eds.), vol. 9 of Proceedings of Machine Learning Research, (Chia Laguna Resort, Sardinia, Italy), pp. 249–256, PMLR, 13–15 May 2010.
[14] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.
[15] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for Computing Machinery, 2020.
[16] Anserw, “Bjontegaard metric calculation. include bd-psnr and bd-rate..” https://github.com/Anserw/Bjontegaard_metric, 2020.
[17] N. Tang, J. Cao, F. Liang, J. Wang, H. Liu, X. Wang, and X. Du, “Fast ctu partition decision algorithm for vvc intra and inter coding,” in 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 361–364, 2019.
[18] Z. Pan, P. Zhang, B. Peng, N. Ling, and J. Lei, “A cnn-based fast inter coding method for vvc,” IEEE Signal Processing Letters, vol. 28, pp. 1260–1264, 2021.