簡易檢索 / 詳目顯示

研究生: 魏慈諠
Tzu-Hsuan Wei
論文名稱: 一種整合H.266/VVC區塊劃分資訊於深度學習之影像後級處理方法
A Deep Learning based Post-processing Method for H.266/VVC Decoded Video Assisted by CU Partition Information
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 杭學鳴
Hsueh-Ming Hang
郭天穎
Tien-Ying Kuo
蔡耀弘
Yao-Hong Tsai
吳怡樂
Yi-Leh Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 67
中文關鍵詞: 影像編碼後處理區塊劃分資訊深度學習殘差網路區塊深度資訊
外文關鍵詞: Deep Learning, Post-processing, H.266/VVC, CU Partition Information, CU Deep Information, ResNet
相關次數: 點閱:311下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 新一代多功能視訊編碼H.266/VVC(Versatile Video Coding)採用四分樹加多類型樹(QuadTree plus Multi-Type)區塊劃分的編碼結構,相較於前一代可以提供4K~16K高解析度的影像、擴增/虛擬實境(AR/VR)及360° 全景影片等功能,讓使用者有更好的視覺體驗。雖然H.266的編碼效率很高,但是視訊經過壓縮後會產生如偽影的模糊影像而降低視覺品質,因此本研究探討如何設計針對H.266/VVC壓縮後影像做後級處理,進一步提高視訊品質。每個類別訓練後的模型組成可切換系統(switch model),讓每一個區塊(patch)都能選用最適合的模型來做後級處理。網路模型採用兩個並行的殘差網路(ResNet)來擷取輸入影像和Mask的區塊特徵,透過shortcut connection和Batch Normalization來避免過擬合(Overfitting)及內部協變量移位(Internal Covariate Shift)的問題。實驗結果顯示,本論文所提方法整合區塊劃分Mask以及可切換模型對H.266/VVC壓縮後影像進行後級處理,對於輸入patch大小為64×64且以PSNR分三類的模型可平均提升0.2的PSNR、而以平均深度分六類的模型可以提升約0.14的PSNR。


    The newest H.266/VVC standard adopts a QTMT(QuadTree plus Multi-Type Tree) block coding structure to yield a better coding performance, as compared with its previous one, H.265/HEVC, that adopts QT coding structure. In addition, the number of intra-prediction directions increase from 35 to 67 in the H.266/VVC, which enables the prediction to better fit the image textures for better coding performance. The H.266/VVC can support 4K to 16K video coding, Augmented/Virtual Reality (AR/VR) and 360° panoramic video, to provide better experiences of media consuming.
    Although H.266/VVC is efficient, it still has to remove video artifacts due to compression. In this research, we study how to design post-processor for the H.266/VVC system to improve video perception quality. We proposed to utilize the QTMT block splitting information, in additional to the decompressed image blocks, to train a deep-learning model to perform video post-processing. The block splitting information is represented by a mask in which each split subblock is filled with its average pixel value. To train the post-processing model, the reconstructed image blocks and their Masks are used as inputs and the original uncompressed image blocks are used as label data. Three different patch sizes, 128x128, 64x64, 32x32, are adopted for training and test. In addition, three class types, which are PSNR, average coding depth, and maximum coding depth, are adopted to design switchable models. The well-trained models of each class form a switchable model, so each patch can have the most suitable model for post-processing. The network for performing post-processing comprises two ResNets which are designed to extract input image features and mask features, respectively. By utilizing shortcut connection and batch normalization, the system can avoid overfitting and internal covariates. Experimental results showed that the proposed method, when adopting 3 switchable PSNR models, can increase 0.2 dB of PSNR with patch size 64×64. When adopting 6 switchable model based on average block split depth, it can increase 0.14 dB of PSNR.

    摘要 Abstract 誌謝 目錄 圖目錄 表目錄 第一章 緒論 1.1 研究背景 1.2 動機與目的 1.3 方法概述 1.4 論文架構 第二章 背景知識與相關技術 2.1 H.266/VVC視訊編碼標準介紹 2.1.1 H.266/VVC的編碼架構 2.1.1.1 編碼單元(Coding Unit, CU) 2.1.1.2 QTMT的架構 2.1.1.3 QTMT的信號機制 2.1.1.4 邊界CU的劃分規則 2.1.1.5 CU針對冗餘的劃分限制 2.1.2 幀內預測 2.1.2.1 DC及Planar模式 2.1.2.2 角度模式 2.1.2.3 幀內模式編碼 2.1.3 H.266/VVC與H.265/HEVC的差異 2.2 ResNet殘差網路介紹 2.2.1 ResNet v1 2.2.2 ResNet v2 2.3 生成對抗網路(GAN)介紹與應用 2.3.1 Generative Adversarial Network(GAN) 2.3.2 Super-Resolution Generative Adversarial Network(SRGAN) 第三章 運用劃分資訊於殘差網路的H.266/VCC後處理 3.1 劃分資訊的運用 3.1.1 深度的取得方式 3.1.2 最大深度與平均深度 3.1.3 Mask(遮罩)的生成 3.2 資料庫的選用及預處理 3.3 後處理系統的框架 3.3.1 模型架構的介紹 3.3.2 模型的訓練 3.4 運用切換模型於H.266/VVC後處理 3.4.1 輸入幀的patch 3.4.2 切換模型的機制 第四章 實驗結果 4.1 實驗環境設置 4.2 實驗結果與分析 第五章 結論與未來研究探討 5.1 結論 5.2 未來研究探討 參考文獻

    [1] F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC complexity and implementation analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1685-1696, 2012.

    [2] J. Lainema, F. Bossen, W. Han, J. Min, and K. Ugur, “Intra coding of the HEVC standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1792-1801, 2012.

    [3] J. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, “Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC) ,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1669-1684, 2012.

    [4] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, 2012.

    [5] I. Goodfellow et al., “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, pp. 2672-2680, 2014.

    [6] L. Shen, Z. Zhang, and Z. Liu, “Effective CU size decision for HEVC intracoding,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4232-4241, 2014.

    [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.

    [8] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, Springer, pp. 630-645, 2016.

    [9] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.

    [10] C. Ledig et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26, pp. 105-114, CVPR 2017.

    [11] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987-5995, 2017.

    [12] M. Kahng, N. Thorat, D. H. Chau, F. B. Viégas, and M. Wattenberg, “GAN Lab: understanding complex deep generative models using interactive visual experimentation,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 310-320, 2019,

    [13] D. Park, Y. Yoon, J. Kim, J. Lee, and J. Kang, “Enhancement on the entropy coding of intra mode in VVC,” in 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), 5-7 June 2019.

    [14] N. Tang et al., “Fast CTU partition decision algorithm for VVC intra and inter coding,” IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 361-364, 2019.

    [15] L. Zhao et al., “Wide angular intra prediction for versatile video coding,” in Data Compression Conference (DCC), pp. 53-62, 2019.

    [16] Y. Fan, J. Chen, H. Sun, J. Katto, and M. Jing, “A fast QTMT partition decision strategy for VVC intra prediction,” IEEE Access, vol. 8, pp. 107900-107911, 2020.

    [17] S. Í, G. Correa, and M. Grellert, “Rate-distortion and complexity comparison of HEVC and VVC video encoders,” IEEE Latin American Symposium on Circuits & Systems (LASCAS), pp. 1-4, 2020.

    [18] W. Lin et al., “Partition-aware adaptive switching neural networks for post-processing in HEVC,” IEEE Transactions on Multimedia, vol. 22, no. 11, pp. 2749-2763, 2020.

    [19] H. Yang, L. Shen, X. Dong, Q. Ding, P. An, and G. Jiang, “Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1668-1682, 2020.

    [20] Jianle Chen, Yan Ye, and Seung Hwan Kim, “Algorithm description for versatile video coding and test model 8 (VTM 8) ,” JVET-Q2002-v3, Joint Video Experts Team (JVET), 2020.

    [21] I. Rajagopal, “Batch normalization — speed up neural network training,” https://medium.com/@ilango100/batch-normalization-speed-up-neural-network-training-245e39a62f85

    [22] A. L. Beam, “Deep learning 101 - part 1: history and background,” http://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html

    [23] B. Bross, "Versatile Video Coding (VVC)," Fraunhofer Heinrich Hertz Institute. https://www.itu.int/en/ITU-T/Workshops-and-Seminars/20191008/Documents/Benjamin_Bross_Presentation.pdf

    [24] K. Nguyen, C. Fookes, A. Ross, and S. Sridharan, “Iris recognition with off-the-shelf CNN features: a deep learning perspective,” IEEE Access, vol. 6, pp. 18848-18855, 2018.

    QR CODE