運用多尺度可變形卷積動量補償之深度視訊壓縮｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	孫禾 He Sun
論文名稱：	運用多尺度可變形卷積動量補償之深度視訊壓縮 Multiscale Deformable Compensation for Learned Video Compression
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	杭學鳴 Hsueh-Ming Hang 郭天穎 Tien-Ying Kuo 吳怡樂 Yi-Leh Wu 鍾國亮 Kuo-Liang Chung
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	57
中文關鍵詞：	視訊編碼、條件熵、動量補償、深度學習、可變形卷積
外文關鍵詞：	Video Coding, Conditional Entropy, Motion Compensation, Deep Learning, Deformable Convolution
相關次數：	點閱：183 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著多媒體通訊技術進步，使用者對於視訊畫面與流暢度的品質要求提高，為
了在有限頻寬與儲存設備之下能提供更好的收視品質，近來提高視訊編碼演算法的
研究相當熱門。隨著深度學習技術的成熟與廣泛應用，有越來越多研究採用大數據
資料導向之深度學習方法設計視訊編碼系統。其中所採用之深度殘差影像編碼架構
與傳統視訊壓縮標準相似，所不同的是將人工累積經驗設計之編碼模組，以深度學
習來取代與優化模組效能，主要的模組有殘差編碼和熵編模型。近來文獻關於深度
視訊編碼移除殘差編碼架構，專注於提供更好的旁訊息以優化條件熵編模型效率，
其概念更貼近於深度網路。此方法為新近提案，有很大的研究與改進空間。本研究
基於文獻上深度視訊壓縮模型，提出以優化動量補償改進條件熵編模型效率的方
法，具體作法包含: (1) 採用可變形卷積動量補償 (Deformable v2 Compensation) 提升
補償之效果；(2) 運用多尺度光流重建 (Multiscale Flow Reconstruction) 去除不必要之
下採樣；(3) 因應隱含光流特徵無法如一般光流特徵直接進行轉換，更改 Inter 訓練
階段之損失函數讓分段訓練得以實現。實驗結果顯示本研究所提方法之編碼效率優
於 HEVC x265、基準模型、以及文獻上相關前人研究。此外，本方法與基準模型相
似且具有快速編解碼速度的優點。

With the advancement of multimedia communication technology, users tend to demand
higher video quality-of-service (QoS), e.g., higher spatial and temporal resolutions. To provide better QoS under limited bandwidth and storage devices, research on how to design high
efficient video coding algorithms become very popular recently. With the help of deep learning technology, a lot of research proposed to utilize data-oriented deep learning methods to
design video coding systems, in which the deep residual video coding framework is similar
to the traditional video coding standard. The major difference is that the function module
designed by the human profession is replaced by optimized modules through deep-learning
algorithms. Recent studies on deep video compression adopt an entropy-focused approach
that aims to provide better side information to optimize the efficiency of conditional entropy
coding. The entropy-focused approach is a recent proposal and there’s a lot of room for
research and improvement. Based on the deep video compression model in the literature,
we proposed to improve the efficiency of the conditional entropy encoding by optimizing
motion compensation. The specific methods comprise: (1) the system adopts a deformable
convolution motion compensation (Deformable v2 Compensation) to improve compensation
efficiency; (2) it utilizes a multi-scale optical flow reconstruction procedure (Multiscale Flow
Reconstruction) to reduce the downsampling redundancy; (3) Considering that the latent optical flow feature cannot be warped like the general ones, we change the loss function of the
Inter training phase to make multi-stage training strategy feasible. Experiments showed that
the proposed method outperforms the HEVC x265, the baseline model, and related previous
works in compression efficiency. Besides, the proposed method demonstrates fast encoding
and decoding speed.

摘要
ABSTRACT
誌謝
目錄
圖目錄
表目錄
第一章 緒論
1 研究動機及目的
2 問題描述及研究方法
3 論文組織
第二章 背景知識
1 傳統編碼架構
2 Learned Image Compression
3 Learned Video Compression
3.1 殘差架構
3.2 條件熵編碼架構
第三章 模型相關介紹
1 光流模型
2 可變形卷積
2.1 Deformable Convolution
2.2 Deformable Convolution v2
第四章 運用多尺度可變卷積動量補償之視訊編碼演算法
1 模型架構
1.1 基準模型 (Baseline Model) 介紹
1.1.1 光流預估
1.1.2 Temporal Context Mining (TCM)
1.1.3 Temporal Context Re-filling (TCR)
1.1.4 Entropy Model
1.2 本文提出之架構修改
1.2.1 可變形卷積動量補償
1.2.2 多尺度光流重建 MFR (Multiscale Flow Reconstruction)
1.3 I 幀編碼器
2 模型訓練
2.1 訓練與驗證資料集
2.2 訓練策略
2.3 損失函數
2.4 訓練細節
第五章 實驗結果與討論
1 測試資料集
2 實驗環境設置
2.1 軟硬體配置
3 實驗結果
3.1 消融實驗
4 討論
第六章 結論與未來研究討論
1 結論
2 未來研究討論
參考文獻
第七章 附錄
1 模型架構
2 其他補充
                                

[1] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding editorial refinements
on draft 10.” https://jvet-experts.org/doc_end_user/current_document.
php?id=10540, 2020.
[2] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948,
2020.
[3] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep
video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11006–11015, 2019.
[4] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scalespace flow for end-to-end optimized video compression,” in Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition, pp. 8503–8512, 2020.
[5] H. Liu, H. Shen, L. Huang, M. Lu, T. Chen, and Z. Ma, “Learned video compression via
joint spatial-temporal correlation exploration,” Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 34, pp. 11580–11587, Apr. 2020.
[6] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in
feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 1502–1511, 2021.
[7] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression
with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 6628–6637, 2020.
[8] M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing,
2021.
[9] J. Lin, D. Liu, H. Li, and F. Wu, “M-lvc: Multiple frames prediction for learned video
compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 3546–3554, 2020.
[10] R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression
with recurrent auto-encoder and recurrent probability model,” IEEE Journal of Selected
Topics in Signal Processing, vol. 15, no. 2, pp. 388–401, 2020.
[11] O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elf-vc:
Efficient learned flexible-rate video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14479–14488, 2021.
[12] J. Liu, S. Wang, W.-C. Ma, M. Shah, R. Hu, P. Dhawan, and R. Urtasun, “Conditional
entropy coding for efficient video compression,” in European Conference on Computer
Vision, pp. 453–468, Springer, 2020.
[13] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[14] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 4161–4170, 2017.
[15] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better
results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp. 9308–9316, 2019.
[16] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned
video compression,” arXiv preprint arXiv:2111.13850, 2021.
[17] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in European Conference on
Computer Vision, pp. 456–472, Springer, 2020.
[18] C. Liu, H. Sun, J. Katto, X. Zeng, and Y. Fan, “Learned video compression with residual
prediction and loop filter,” arXiv preprint arXiv:2108.08551, 2021.
[19] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
[20] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors
for learned image compression,” Advances in neural information processing systems,
vol. 31, 2018.
[21] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/
avc video coding standard,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560–576, 2003.
[22] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency
video coding (hevc) standard,” IEEE Transactions on circuits and systems for video
technology, vol. 22, no. 12, pp. 1649–1668, 2012.
[23] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional
networks,” in Proceedings of the IEEE international conference on computer vision,
pp. 764–773, 2017.
[24] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van
Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer
vision, pp. 2758–2766, 2015.
[25] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[26] J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint
arXiv:2011.03029, 2020.
[27] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with taskoriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8,
pp. 1106–1125, 2019.
[28] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint
arXiv:1711.05101, 2017.
[29] Z. Hu, “Pytorch implementation and benchmark of video compression.” https://
github.com/ZhihaoHu/PyTorchVideoCompression. Accessed: 2022-04-30.
[30] “Deepmc-dcvc.” https://github.com/DeepMC-DCVC/DCVC. Accessed: 2022-04-
30.
[31] “x265.” https://www.videolan.org/developers/x265.html. Accessed: 2022-
04-30.
[32] F. Bossen et al., “Common test conditions and software reference configurations,”
JCTVC-L1100, 2013.
[33] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video
codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for
Computing Machinery, 2020.
[34] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
[35] J. J. Stéphane Pateux, “An excel add-in for computing Bjontegaard metric and its evolution,” 2007.

全文公開日期 2025/03/30 (校內網路)
全文公開日期 2025/03/30 (校外網路)
全文公開日期 2025/03/30 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文