簡易檢索 / 詳目顯示

研究生: 李承翰
Chen-Han Lee
論文名稱: 運用正交卷積規範於提升深度視訊壓縮效能
Toward Improving Deep Video Compression by Orthogonal Convolution Constraint
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 唐政元
Cheng-Yuan Tang
蔡耀弘
Yao-Hung Tsai
吳怡樂
Yi-Leh Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 44
中文關鍵詞: 深度視訊壓縮正交卷積規範幀間編碼深度學習模型
外文關鍵詞: Deep Video Compression, Orthogonal Convolution Constraint, Inter-frame Coding, Deep learning model
相關次數: 點閱:211下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著第五代行動網路的普及與處理器速度的提高,多媒體通信的品質已從基本的 H.264 支援 2K 視訊編碼,提升到 H265/HEVC 支援 4K 畫質,以及最新的 H.266/VVC 支援 8K 畫質視訊通信編碼。其主要應用的技術為區塊轉換編碼 (block transform coding) 以及動量預測 (motion estimation) 分別處理空間域 (spatial redundancy) 以及時間域 (temporal redundancy) 上的冗餘與編碼。近年來深度學習已應用到視訊編碼,稱為深度視訊編碼 (Deep Video Compression, DVC)。DVC 搭配卷積神經網路自動擷取特徵取代傳統視訊編碼的離散餘旋轉換 (Discrete Cosine Transform, DCT),可以達到更高的視訊壓縮率。然而卷積神經網路具有特徵冗餘以及梯度爆炸等潛在缺點,考量 DCT 具備正交性,因此本論文探討正交卷積神經網路 OCNN 對編碼效率的影響,於模型訓練時加上卷積層正交規範 (orthogonal constraint),可以控制卷積層正交的嚴格程度,來達到優化卷積神經網路的效果。本論文提出的深度視訊壓縮附加正交卷積規範,結合深度視訊壓縮 DVC 與正交卷積神經網路 OCNN 的優勢,運用正交規範提升 DVC 模型裡卷積層提取特徵的效果,從而使整體視訊編碼系統達到更好的品質。實驗結果顯示,與未附加正交卷積規範的 OpenDVC 相比,BDBR 為 −5.536%,BD-PSNR 為 0.158dB。


    Multimedia communication quality has been improved with the advance of communication technology, e.g., 5G, and processing power. The H.264 supports 2K video coding and the H.265/HEVC and H.266/VVC support 4K and 8K video compression, respectively. Block transform coding and motion estimation, which are the typical video coding methods, are used to remove spatial redundancy and temporal redundancy respectively. Video coding methods have been further improved by deep learning technologies. In contrast to the traditional Discrete Cosine Transform (DCT), a Deep Video Compression (DVC) framework utilizes convolutional neural networks (CNN) to achieve efficient compression. Though the CNN can automatically find out the best features/transformations, it suffers feature redundancy and gradient explosion problems. In addition, the transformations do not guarantee orthogonality like that of the DCT. In this research, we proposed to impose orthogonal constraints on the convolution process to improve the video compression efficiency based on the DVC framework. Experiments showed that a strictly orthogonal constraint is not necessary. The orthogonal constraints should be well-adjusted under different compression ratios to best improve the compression efficiency. As compared with OpenDVC, which is without orthogonal convolution constraint, experimental results show that the BDBR is −5.536% and the BD-PSNR is 0.158dB.

    摘要 ABSTRACT 目錄 圖目錄 表目錄 第一章 緒論 1.1 研究動機及目的 1.2 論文組織 第二章 背景知識 2.1 卷積神經網路 2.1.1 卷積層 2.1.2 步伐 (Stride) 2.1.3 填充 (Padding) 2.1.4 正規化 (Regularization) 2.2 正交矩陣 2.3 自動編碼器 (Autoencoder) 第三章 相關研究方法介紹 3.1 深度視訊壓縮 3.1.1 編碼架構介紹 3.1.2 動量估測 (Motion estimation) 3.1.3 動量補償 (Motion compensation 3.1.4 殘差 (Residual) 3.1.5 壓縮動量與殘差 3.1.6 重建幀 3.1.7 損失函數 (Loss function) 3.2 OpenDVC 3.3 Orthogonal Convolutional Neural Network(OCNN) 3.3.1 Kernel orthogonality 13 3.3.2 Orthogonal convolution 13 第四章 深度視訊壓縮附加正交卷積規範 4.1 Group of Pictures(GOP) 4.2 系統架構 4.2.1 正交化動量壓縮網路 4.2.2 正交化殘差壓縮網路 4.3 損失函數 4.4 訓練資料 4.4.1 訓練參數 第五章 實驗結果與分析討論 5.1 實驗環境設置 5.2 測試資料 5.3 實驗過程與 µ 參數設置 5.4 實驗結果 5.5 觀察梯度 5.6 訓練階段梯度變化 第六章 結論與未來研究討論 6.1 結論 6.2 未來研究討論 參考文獻

    [1] IBM Cloud Education, “Convolutional neural networks.” https://www.ibm.com/
    cloud/learn/convolutional-neural-networks, 2020.
    [2] A. Wang, “Convolutional neural networks(cnn) #1 kernel, stride, padding.” https://ssur.cc/SkBAeLbPL, 2019.
    [3] E. Tang, “Improving autoencoder performance with pretrained rbms.” https://ssur.cc/qTUz6mEfe, 2020.
    [4] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep
    video compression framework,” 2018.
    [5] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720–2729, 2017.
    [6] J. Wang, Y. Chen, R. Chakraborty, and S. X. Yu, “Orthogonal convolutional neural
    networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11505–11515, 2020.
    [7] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/avc video coding standard,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560–576, 2003.
    [8] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
    [9] R. Yang, L. Van Gool, and R. Timofte, “Opendvc: An open source implementation of the dvc video compression method,” arXiv preprint arXiv:2006.15862, 2020.
    [10] K. Yanai, R. Tanno, and K. Okamoto, “Efficient mobile implementation of a cnn-based object recognition system,” in Proceedings of the 24th ACM international conference on Multimedia, pp. 362–366, 2016
    [11] F. Heide, W. Heidrich, and G. Wetzstein, “Fast and flexible convolutional sparse coding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5135–5143, 2015.
    [12] F. Bellard, “Bpg image format.” https://bellard.org/bpg/, 2018.
    [13] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016.
    [14] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with taskoriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8, pp. 1106–1125, 2019.
    [15] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,
    G. Irving, M. Isard, et al., “{TensorFlow}: a system for {Large-Scale} machine learning,” in 12th USENIX symposium on operating systems design and implementation
    (OSDI 16), pp. 265–283, 2016.
    [16] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [17] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for
    Computing Machinery, 2020.

    無法下載圖示 全文公開日期 2024/08/23 (校內網路)
    全文公開日期 2024/08/23 (校外網路)
    全文公開日期 2024/08/23 (國家圖書館:臺灣博碩士論文系統)
    QR CODE