研究生: |
孫禾 He Sun |
---|---|
論文名稱: |
運用多尺度可變形卷積動量補償之深度視訊壓縮 Multiscale Deformable Compensation for Learned Video Compression |
指導教授: |
陳建中
Jiann-Jone Chen |
口試委員: |
杭學鳴
Hsueh-Ming Hang 郭天穎 Tien-Ying Kuo 吳怡樂 Yi-Leh Wu 鍾國亮 Kuo-Liang Chung |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 57 |
中文關鍵詞: | 視訊編碼 、條件熵 、動量補償 、深度學習 、可變形卷積 |
外文關鍵詞: | Video Coding, Conditional Entropy, Motion Compensation, Deep Learning, Deformable Convolution |
相關次數: | 點閱:183 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著多媒體通訊技術進步,使用者對於視訊畫面與流暢度的品質要求提高,為
了在有限頻寬與儲存設備之下能提供更好的收視品質,近來提高視訊編碼演算法的
研究相當熱門。隨著深度學習技術的成熟與廣泛應用,有越來越多研究採用大數據
資料導向之深度學習方法設計視訊編碼系統。其中所採用之深度殘差影像編碼架構
與傳統視訊壓縮標準相似,所不同的是將人工累積經驗設計之編碼模組,以深度學
習來取代與優化模組效能,主要的模組有殘差編碼和熵編模型。近來文獻關於深度
視訊編碼移除殘差編碼架構,專注於提供更好的旁訊息以優化條件熵編模型效率,
其概念更貼近於深度網路。此方法為新近提案,有很大的研究與改進空間。本研究
基於文獻上深度視訊壓縮模型,提出以優化動量補償改進條件熵編模型效率的方
法,具體作法包含: (1) 採用可變形卷積動量補償 (Deformable v2 Compensation) 提升
補償之效果;(2) 運用多尺度光流重建 (Multiscale Flow Reconstruction) 去除不必要之
下採樣;(3) 因應隱含光流特徵無法如一般光流特徵直接進行轉換,更改 Inter 訓練
階段之損失函數讓分段訓練得以實現。實驗結果顯示本研究所提方法之編碼效率優
於 HEVC x265、基準模型、以及文獻上相關前人研究。此外,本方法與基準模型相
似且具有快速編解碼速度的優點。
With the advancement of multimedia communication technology, users tend to demand
higher video quality-of-service (QoS), e.g., higher spatial and temporal resolutions. To provide better QoS under limited bandwidth and storage devices, research on how to design high
efficient video coding algorithms become very popular recently. With the help of deep learning technology, a lot of research proposed to utilize data-oriented deep learning methods to
design video coding systems, in which the deep residual video coding framework is similar
to the traditional video coding standard. The major difference is that the function module
designed by the human profession is replaced by optimized modules through deep-learning
algorithms. Recent studies on deep video compression adopt an entropy-focused approach
that aims to provide better side information to optimize the efficiency of conditional entropy
coding. The entropy-focused approach is a recent proposal and there’s a lot of room for
research and improvement. Based on the deep video compression model in the literature,
we proposed to improve the efficiency of the conditional entropy encoding by optimizing
motion compensation. The specific methods comprise: (1) the system adopts a deformable
convolution motion compensation (Deformable v2 Compensation) to improve compensation
efficiency; (2) it utilizes a multi-scale optical flow reconstruction procedure (Multiscale Flow
Reconstruction) to reduce the downsampling redundancy; (3) Considering that the latent optical flow feature cannot be warped like the general ones, we change the loss function of the
Inter training phase to make multi-stage training strategy feasible. Experiments showed that
the proposed method outperforms the HEVC x265, the baseline model, and related previous
works in compression efficiency. Besides, the proposed method demonstrates fast encoding
and decoding speed.
[1] B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding editorial refinements
on draft 10.” https://jvet-experts.org/doc_end_user/current_document.
php?id=10540, 2020.
[2] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948,
2020.
[3] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep
video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11006–11015, 2019.
[4] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scalespace flow for end-to-end optimized video compression,” in Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition, pp. 8503–8512, 2020.
[5] H. Liu, H. Shen, L. Huang, M. Lu, T. Chen, and Z. Ma, “Learned video compression via
joint spatial-temporal correlation exploration,” Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 34, pp. 11580–11587, Apr. 2020.
[6] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in
feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 1502–1511, 2021.
[7] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression
with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 6628–6637, 2020.
[8] M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing,
2021.
[9] J. Lin, D. Liu, H. Li, and F. Wu, “M-lvc: Multiple frames prediction for learned video
compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 3546–3554, 2020.
[10] R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression
with recurrent auto-encoder and recurrent probability model,” IEEE Journal of Selected
Topics in Signal Processing, vol. 15, no. 2, pp. 388–401, 2020.
[11] O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elf-vc:
Efficient learned flexible-rate video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14479–14488, 2021.
[12] J. Liu, S. Wang, W.-C. Ma, M. Shah, R. Hu, P. Dhawan, and R. Urtasun, “Conditional
entropy coding for efficient video compression,” in European Conference on Computer
Vision, pp. 453–468, Springer, 2020.
[13] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[14] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 4161–4170, 2017.
[15] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better
results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp. 9308–9316, 2019.
[16] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned
video compression,” arXiv preprint arXiv:2111.13850, 2021.
[17] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in European Conference on
Computer Vision, pp. 456–472, Springer, 2020.
[18] C. Liu, H. Sun, J. Katto, X. Zeng, and Y. Fan, “Learned video compression with residual
prediction and loop filter,” arXiv preprint arXiv:2108.08551, 2021.
[19] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
[20] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors
for learned image compression,” Advances in neural information processing systems,
vol. 31, 2018.
[21] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/
avc video coding standard,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560–576, 2003.
[22] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency
video coding (hevc) standard,” IEEE Transactions on circuits and systems for video
technology, vol. 22, no. 12, pp. 1649–1668, 2012.
[23] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional
networks,” in Proceedings of the IEEE international conference on computer vision,
pp. 764–773, 2017.
[24] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van
Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer
vision, pp. 2758–2766, 2015.
[25] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[26] J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint
arXiv:2011.03029, 2020.
[27] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with taskoriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8,
pp. 1106–1125, 2019.
[28] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint
arXiv:1711.05101, 2017.
[29] Z. Hu, “Pytorch implementation and benchmark of video compression.” https://
github.com/ZhihaoHu/PyTorchVideoCompression. Accessed: 2022-04-30.
[30] “Deepmc-dcvc.” https://github.com/DeepMC-DCVC/DCVC. Accessed: 2022-04-
30.
[31] “x265.” https://www.videolan.org/developers/x265.html. Accessed: 2022-
04-30.
[32] F. Bossen et al., “Common test conditions and software reference configurations,”
JCTVC-L1100, 2013.
[33] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video
codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for
Computing Machinery, 2020.
[34] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
[35] J. J. Stéphane Pateux, “An excel add-in for computing Bjontegaard metric and its evolution,” 2007.