簡易檢索 / 詳目顯示

研究生: 孫禾
He Sun
論文名稱: 運用多尺度可變形卷積動量補償之深度視訊壓縮
Multiscale Deformable Compensation for Learned Video Compression
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 杭學鳴
Hsueh-Ming Hang
郭天穎
Tien-Ying Kuo
吳怡樂
Yi-Leh Wu
鍾國亮
Kuo-Liang Chung
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 57
中文關鍵詞: 視訊編碼條件熵動量補償深度學習可變形卷積
外文關鍵詞: Video Coding, Conditional Entropy, Motion Compensation, Deep Learning, Deformable Convolution
相關次數: 點閱:183下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著多媒體通訊技術進步,使用者對於視訊畫面與流暢度的品質要求提高,為
    了在有限頻寬與儲存設備之下能提供更好的收視品質,近來提高視訊編碼演算法的
    研究相當熱門。隨著深度學習技術的成熟與廣泛應用,有越來越多研究採用大數據
    資料導向之深度學習方法設計視訊編碼系統。其中所採用之深度殘差影像編碼架構
    與傳統視訊壓縮標準相似,所不同的是將人工累積經驗設計之編碼模組,以深度學
    習來取代與優化模組效能,主要的模組有殘差編碼和熵編模型。近來文獻關於深度
    視訊編碼移除殘差編碼架構,專注於提供更好的旁訊息以優化條件熵編模型效率,
    其概念更貼近於深度網路。此方法為新近提案,有很大的研究與改進空間。本研究
    基於文獻上深度視訊壓縮模型,提出以優化動量補償改進條件熵編模型效率的方
    法,具體作法包含: (1) 採用可變形卷積動量補償 (Deformable v2 Compensation) 提升
    補償之效果;(2) 運用多尺度光流重建 (Multiscale Flow Reconstruction) 去除不必要之
    下採樣;(3) 因應隱含光流特徵無法如一般光流特徵直接進行轉換,更改 Inter 訓練
    階段之損失函數讓分段訓練得以實現。實驗結果顯示本研究所提方法之編碼效率優
    於 HEVC x265、基準模型、以及文獻上相關前人研究。此外,本方法與基準模型相
    似且具有快速編解碼速度的優點。


    With the advancement of multimedia communication technology, users tend to demand
    higher video quality-of-service (QoS), e.g., higher spatial and temporal resolutions. To provide better QoS under limited bandwidth and storage devices, research on how to design high
    efficient video coding algorithms become very popular recently. With the help of deep learning technology, a lot of research proposed to utilize data-oriented deep learning methods to
    design video coding systems, in which the deep residual video coding framework is similar
    to the traditional video coding standard. The major difference is that the function module
    designed by the human profession is replaced by optimized modules through deep-learning
    algorithms. Recent studies on deep video compression adopt an entropy-focused approach
    that aims to provide better side information to optimize the efficiency of conditional entropy
    coding. The entropy-focused approach is a recent proposal and there’s a lot of room for
    research and improvement. Based on the deep video compression model in the literature,
    we proposed to improve the efficiency of the conditional entropy encoding by optimizing
    motion compensation. The specific methods comprise: (1) the system adopts a deformable
    convolution motion compensation (Deformable v2 Compensation) to improve compensation
    efficiency; (2) it utilizes a multi-scale optical flow reconstruction procedure (Multiscale Flow
    Reconstruction) to reduce the downsampling redundancy; (3) Considering that the latent optical flow feature cannot be warped like the general ones, we change the loss function of the
    Inter training phase to make multi-stage training strategy feasible. Experiments showed that
    the proposed method outperforms the HEVC x265, the baseline model, and related previous
    works in compression efficiency. Besides, the proposed method demonstrates fast encoding
    and decoding speed.

    摘要 ABSTRACT 誌謝 目錄 圖目錄 表目錄 第一章 緒論 1.1 研究動機及目的 1.2 問題描述及研究方法 1.3 論文組織 第二章 背景知識 2.1 傳統編碼架構 2.2 Learned Image Compression 2.3 Learned Video Compression 2.3.1 殘差架構 2.3.2 條件熵編碼架構 第三章 模型相關介紹 3.1 光流模型 3.2 可變形卷積 3.2.1 Deformable Convolution 3.2.2 Deformable Convolution v2 第四章 運用多尺度可變卷積動量補償之視訊編碼演算法 4.1 模型架構 4.1.1 基準模型 (Baseline Model) 介紹 4.1.1.1 光流預估 4.1.1.2 Temporal Context Mining (TCM) 4.1.1.3 Temporal Context Re-filling (TCR) 4.1.1.4 Entropy Model 4.1.2 本文提出之架構修改 4.1.2.1 可變形卷積動量補償 4.1.2.2 多尺度光流重建 MFR (Multiscale Flow Reconstruction) 4.1.3 I 幀編碼器 4.2 模型訓練 4.2.1 訓練與驗證資料集 4.2.2 訓練策略 4.2.3 損失函數 4.2.4 訓練細節 第五章 實驗結果與討論 5.1 測試資料集 5.2 實驗環境設置 5.2.1 軟硬體配置 5.3 實驗結果 5.3.1 消融實驗 5.4 討論 第六章 結論與未來研究討論 6.1 結論 6.2 未來研究討論 參考文獻 第七章 附錄 7.1 模型架構 7.2 其他補充

    [1] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding editorial refinements
    on draft 10.” https://jvet-experts.org/doc_end_user/current_document.
    php?id=10540, 2020.
    [2] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948,
    2020.
    [3] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep
    video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11006–11015, 2019.
    [4] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scalespace flow for end-to-end optimized video compression,” in Proceedings of the IEEE/
    CVF Conference on Computer Vision and Pattern Recognition, pp. 8503–8512, 2020.
    [5] H. Liu, H. Shen, L. Huang, M. Lu, T. Chen, and Z. Ma, “Learned video compression via
    joint spatial-temporal correlation exploration,” Proceedings of the AAAI Conference on
    Artificial Intelligence, vol. 34, pp. 11580–11587, Apr. 2020.
    [6] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in
    feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
    Pattern Recognition, pp. 1502–1511, 2021.
    [7] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression
    with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF
    Conference on Computer Vision and Pattern Recognition, pp. 6628–6637, 2020.
    [8] M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing,
    2021.
    [9] J. Lin, D. Liu, H. Li, and F. Wu, “M-lvc: Multiple frames prediction for learned video
    compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
    Pattern Recognition, pp. 3546–3554, 2020.
    [10] R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression
    with recurrent auto-encoder and recurrent probability model,” IEEE Journal of Selected
    Topics in Signal Processing, vol. 15, no. 2, pp. 388–401, 2020.
    [11] O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elf-vc:
    Efficient learned flexible-rate video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14479–14488, 2021.
    [12] J. Liu, S. Wang, W.-C. Ma, M. Shah, R. Hu, P. Dhawan, and R. Urtasun, “Conditional
    entropy coding for efficient video compression,” in European Conference on Computer
    Vision, pp. 453–468, Springer, 2020.
    [13] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
    [14] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,”
    in Proceedings of the IEEE conference on computer vision and pattern recognition,
    pp. 4161–4170, 2017.
    [15] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better
    results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern
    recognition, pp. 9308–9316, 2019.
    [16] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned
    video compression,” arXiv preprint arXiv:2111.13850, 2021.
    [17] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in European Conference on
    Computer Vision, pp. 456–472, Springer, 2020.
    [18] C. Liu, H. Sun, J. Katto, X. Zeng, and Y. Fan, “Learned video compression with residual
    prediction and loop filter,” arXiv preprint arXiv:2108.08551, 2021.
    [19] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
    [20] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors
    for learned image compression,” Advances in neural information processing systems,
    vol. 31, 2018.
    [21] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/
    avc video coding standard,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560–576, 2003.
    [22] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency
    video coding (hevc) standard,” IEEE Transactions on circuits and systems for video
    technology, vol. 22, no. 12, pp. 1649–1668, 2012.
    [23] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional
    networks,” in Proceedings of the IEEE international conference on computer vision,
    pp. 764–773, 2017.
    [24] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van
    Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer
    vision, pp. 2758–2766, 2015.
    [25] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    [26] J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint
    arXiv:2011.03029, 2020.
    [27] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with taskoriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8,
    pp. 1106–1125, 2019.
    [28] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint
    arXiv:1711.05101, 2017.
    [29] Z. Hu, “Pytorch implementation and benchmark of video compression.” https://
    github.com/ZhihaoHu/PyTorchVideoCompression. Accessed: 2022-04-30.
    [30] “Deepmc-dcvc.” https://github.com/DeepMC-DCVC/DCVC. Accessed: 2022-04-
    30.
    [31] “x265.” https://www.videolan.org/developers/x265.html. Accessed: 2022-
    04-30.
    [32] F. Bossen et al., “Common test conditions and software reference configurations,”
    JCTVC-L1100, 2013.
    [33] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video
    codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for
    Computing Machinery, 2020.
    [34] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
    [35] J. J. Stéphane Pateux, “An excel add-in for computing Bjontegaard metric and its evolution,” 2007.

    無法下載圖示 全文公開日期 2025/03/30 (校內網路)
    全文公開日期 2025/03/30 (校外網路)
    全文公開日期 2025/03/30 (國家圖書館:臺灣博碩士論文系統)
    QR CODE