運用增益單位與動量預測網路改進可變碼率深度視訊編碼效率

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉軒宏 Hsuan-Hung Liu
論文名稱：	運用增益單位與動量預測網路改進可變碼率深度視訊編碼效率 Improve Variable-Bit-Rate Deep Video Coding efficiency using Gain Unit and Motion Vector Prediction
指導教授：	陳建中 Jiann-Jone Chen
口試委員:	杭學鳴郭天穎鍾國亮高立人
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	61
中文關鍵詞：	深度視訊編碼、條件熵編碼、量化參數、訓練策略、深度學習、幀間編碼
外文關鍵詞：	Deep Video Coding, Conditional Entropy Coding, Quantization Parameter(QP), Deep Learning, Training Strategy, Inter-frame
相關次數：	點閱：176 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

最新國際視訊編碼標準H.266/VVC相較於前一代H.265/HEVC提升編碼效率一倍，可在相同儲存空間與頻寬下，提供更高品質視訊服務，其架構主要運用混合視訊編碼 (Hybrid Video Coding) 原理，藉由經驗不斷提升編解碼模組效率。近幾年受到深度學習技術高效率的啟發，許多文獻研究以網路模型取代傳統編碼架構，以提升主觀客觀編碼品質，早期深度殘差編碼系統與傳統編碼相近，近年來，少數文獻運用深度網路特性，開發條件熵編碼方法，運用視訊之時間關聯性設計出較為優異條件熵模型，其效率超越深度殘差編碼系統，然而視訊編碼須提供多碼率功能，多數深度視訊架構在進行碼率-失真最佳化 (Rate-Distortion Optimization, RDO)過程，採用訓練多組模型對應不同量化參數(quantization parameter, QP)，偏離設計單一視訊編碼器提供多碼率的目標，本論文研究如何設計單一網路模型提供可變碼率深度視訊編碼系統，具體作法說明如下：(1) 借鑑深度可變碼率圖像壓縮中增益單位以及運用分配特徵通道位元的概念，可提升單一模型提供可變碼率的效率；(2) 提出單參考動量之動量預測網路，進一步降低基準模型動量編碼所需碼率；(3) 提出兩種不同訓練方式，並藉由交替兩種訓練方式，進一步提升編碼效能；(4) 結合深度可變碼率圖像壓縮架構，達到可同時控制幀內與幀間影像品質以及碼率之深度視訊壓縮系統。實驗結果顯示本文與 x265(veryslow) 相比，BD-Rate 在PSNR與MS-SSIM模型分別下降 32.406% 與 66.302%，與深度視訊編碼架構相比，本研究採用單一模型提供多碼率編碼，在編碼架構與效率上超越相關文獻之系統。

The latest international video coding standard, H.266/VVC, improves coding efficiency by a factor of two compared to H.265/HEVC. It enables higher-quality video services within the same storage space and bandwidth. The architecture of H.266/VVC is based on the Hybrid Video Coding principle and continuously enhances the efficiency of codec modules through design experience. In recent years, inspired by the effectiveness of deep learning techniques, researchers have replaced traditional encoding frameworks with neural networks for deep video compression (DVC). This has led to improvements in both subjective and objective encoding quality. While early deep residual coding system resembled traditional coding, recent studies have introduced superior conditional entropy coding methods based on deep network characteristics and temporal video correlation. These methods have surpassed the efficiency of deep residual coding systems. However, video coding also requires multi-bitrate functionality. Most deep video architectures train multiple models for different quantization parameters(QP) in the rate-distortion optimization (RDO) process, deviating from the goal of designing a single video encoder to provide multi-bitrate support. This paper explores how to design a single neural network model to provide variable bitrate deep video encoding system.
Major contributions of this work are: (1) Utilizing the concepts of gain units in deep variable bitrate image compression and feature channel bit allocation , the proposed system improves the efficiency of providing variable bitrate with a single model. (2) A motion prediction network with single-reference motion was proposed to reduce the bitrate required for motion coding in the baseline model. (3) We propose the utilization of two training methods, which are alternated to enhance the coding efficiency. (4) By combining the deep variable bitrate image compression model with the proposed DVC system, that can simultaneously control intra-frame and inter-frame quality and bitrate. Experimental results demonstrate the effectiveness of the proposed system. Compared to x265, the system reduces BD-Rate by 32.406 % and 66.302% in the PSNR and MS-SSIM model, respectively. The proposed method provides multi-rate encoding with a single model, surpassing related literature systems in terms of encoding architecture and efficiency.

摘要.................................................................. i
ABSTRACT.................................................................... ii
目錄.......................................................................... iii
圖目錄.................................................................... vi
表目錄.................................................................... viii
第一章 緒論........................................................... 1
1 研究動機及目的.................................................... 1
2 問題描述及研究方法................................................. 1
3 論文組織........................................................... 2
第二章 背景知識........................................................ 3
1 傳統編碼架構................................................ 3
2 深度圖像壓縮架構................................................ 4
2.1 深度圖像非可變碼率架構.................................... 4
2.2 深度圖像可變碼率架構................................ 4
3 深度視訊壓縮架構................................................. 5
3.1 深度視訊非可變碼率架構................................... 5
3.2 深度視訊可變碼率架構.................................. 12
第三章 模型相關介紹......................................................... 13
1 增益單位............................................................ 13
2 動量預測網路................................................................ 15
第四章 運用增益單位與動量預測網路改進可變碼率演算法.............. 17
1 模型架構................................................................... 17
1.1 基準模型介紹...................................... 17
1.1.1 多 尺 度 光 流 重 建 (Multiscale Flow Reconstruction,MFR).............. 17
1.1.2 可變形卷積動量補償 (Deformable v2 Compensation).................... 18
1.2 本文編碼架構..................................................... 19
1.2.1 本文編碼流程.................................................... 19
1.2.2 單參考動量之動量預測網路 (Single-reference Motion Vector Prediction, SMVP) .......................20
1.2.3 可調整碼率時間上下文重新填充網路 (Adjustable Temporal Context Re-filling, ATCR) ................... 21
1.3 本文架構之幀內編碼器....................................... 22
2 模型訓練................................................................... 22
2.1 訓練策略................................................ 22
2.2 損失函數................................................. 23
2.3 順序與隨機訓練方式................................... 24
2.4 訓練細節........................................................ 25
3 連續碼率運算................................................................. 26
第五章 實驗結果與討論............................................................... 27
1 訓練與測試數據集............................................................ 27
2 實驗環境設置................................................................... 27
3 實驗結果...................................................... 28
3.1 性能比較..................................................... 28
3.2 連續可變碼率.......................................... 33
3.3 模型複雜度.................................................... 34
3.4 消融實驗與分析........................................ 34
3.4.1 增益單位數量....................................... 34
3.4.2 單參考動量之動量預測網路........................................... 35
3.4.3 訓練策略.......................................... 35
4 討論.................................................... 37
4.1 不同幀內編碼架構編碼之比較............................ 37
4.2 訓練策略與方法................................................... 37
第六章 結論與未來研究討論......................................................... 39
1 結論..................................................... 39
2 未來研究討論................................................................... 40
參考文獻........................................................... 41
第七章 附錄......................................... 45
1 幀內模型訓練細節........................................................ 45
2 實驗結果補充............................................... 46
2.1 性能比較補充................................................ 46
2.2 連續可變碼率補充............................................. 47
2.3 連續可變碼率可視化結果....................................... 50

                                

[1] B. Bross, J. Chen, J.-R. Ohm, G. J. Sullivan, and Y.-K. Wang, “Developments in international video coding standardization after avc, with an overview of versatile video
coding (vvc),” Proceedings of the IEEE, vol. 109, no. 9, pp. 1463–1493, 2021.
[2] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[3] S. Yin, C. Li, Y. Bao, Y. Liang, F. Meng, and W. Liu, “Universal efficient variable-rate
neural image compression,” in ICASSP 2022 - 2022 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 2025–2029, 2022.
[4] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end
deep video compression framework,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 11006–11015, 2019.
[5] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in
feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 1502–1511, 2021.
[6] J. Lin, D. Liu, H. Li, and F. Wu, “M-lvc: Multiple frames prediction for learned video
compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 3546–3554, 2020.
[7] M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing,
vol. 31, pp. 974–983, 2022.
[8] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression
with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 6628–6637, 2020.
[9] R. Yang, R. Timofte, and L. Van Gool, “Advancing learned video compression with
in-loop frame prediction,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 33, no. 5, pp. 2410–2423, 2023.
[10] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” 2021.
[11] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned
video compression,” arXiv preprint arXiv:2111.13850, 2021.
[12] J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural
video compression,” in Proceedings of the 30th ACM International Conference on
Multimedia, 2022.
[13] Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, “Asymmetric gained deep image compression with continuous rate adaptation,” in 2021 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 10527–10536, 2021.
[14] C. Liu, H. Sun, J. Katto, X. Zeng, and Y. Fan, “Learned video compression with residual
prediction and loop filter,” arXiv preprint arXiv:2108.08551, 2021.
[15] H. SUN, “Multiscale deformable compensation for learned video compression,” Master’s thesis, National Taiwan University of Science and Technology, 2022.
[16] “Gainedvae.” https://github.com/mmSir/GainedVAE. Accessed: 2023-02-22.
[17] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
[18] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors
for learned image compression,” Advances in neural information processing systems,
vol. 31, 2018.
[19] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional
networks,” in Proceedings of the IEEE international conference on computer vision,
pp. 764–773, 2017.
[20] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 4161–4170, 2017.
[21] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better
results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp. 9308–9316, 2019.
[22] F. Bellard, “Bpg image format.” https://bellard.org/bpg/. Accessed: 2023-07-
15.
[23] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview
of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
[24] J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint
arXiv:2011.03029, 2020.
[25] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in European Conference on
Computer Vision, pp. 456–472, Springer, 2020.
[26] Z. Hu, “Pytorch implementation and benchmark of video compression.” https://
github.com/ZhihaoHu/PyTorchVideoCompression. Accessed: 2023-02-22.
[27] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint
arXiv:1711.05101, 2017.
[28] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with
task-oriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8,
pp. 1106–1125, 2019.
[29] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video
codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems
Conference, MMSys ’20, (New York, NY, USA), p. 297–302, Association for Computing Machinery, 2020.
[30] F. Bossen et al., “Common test conditions and software reference configurations,”
JCTVC-L1100, 2013.
[31] “x265.” https://www.videolan.org/developers/x265.html. Accessed: 2023-
05-20.
[32] “Vvcsoftware vtm.” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_
VTM/-/tree/VTM-14.1?ref_type=tags. Accessed: 2023-05-06.
[33] “microsoft dcvc.” https://github.com/microsoft/DCVC. Accessed: 2023-02-22.
[34] “Renyang-home alvc.” https://github.com/RenYang-home/ALVC. Accessed:
2023-05-20.
[35] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” 2001.
[36] “Thop: Pytorch-opcounter.” https://github.com/Lyken17/pytorch-OpCounter.
Accessed: 2023-06-13.

全文公開日期 2025/07/31 (校內網路)
全文公開日期 2025/07/31 (校外網路)
全文公開日期 2025/07/31 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文