研究生: |
吳又新 You-Shin Wu |
---|---|
論文名稱: |
利用Swin Transformer深度迴圈濾波器改進VVC幀內編碼效率 Using Swin Transformer based In-loop filter to improve VVC Intra-frame Coding |
指導教授: |
陳建中
Jiann-Jone Chen |
口試委員: |
杭學鳴
Hsueh-Ming Hang 郭天穎 Tien-Ying Kuo 吳怡樂 Yi-Leh Wu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 58 |
中文關鍵詞: | 多功能視訊編碼 、Swin Transformer 、幀內編碼 、deep-learning 、深度學習 、K 平均演算法 |
外文關鍵詞: | Swin Transformer |
相關次數: | 點閱:298 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網路頻寬增加以及軟硬體的技術進步,網路多媒體平台已經成為資訊傳輸的主流。高效率視訊編碼 H.265/HEVC 以及 H.266/VVC 分別提供 4K 和 8K 的視訊編碼標準,在此國際視訊編碼架構下,許多研究也將應用於電腦視覺的深度學習方法設計用於改進視訊編碼的壓縮效率。其中有研究開發應用於 H.266/VVC 的迴圈濾波器,來改善因為區塊劃分 (block decomposition) 和量化 (quantization) 過程中所產生的壓縮失真。近年轉換器模型 (Transformer) 有效的運用自注意力機制 (Self-Attention)而在精準度和效能超越循環網路 (Recurrence Neural Network, RNN),其特點為能夠抓取全域影像特性,在影像分類、影像去雜訊、以及單一輸入超高解析度影像 (Single Image Super Resolution, SISR) 領域都超越了卷積神經網路 (Convolutional Neural Network, CNN)。本研究提出以 SwinTransformer 為核心網路架構的深度迴圈濾波器 (Loop Filter),透過搭配輕量的 CNN 網路來對 H.266/VVC 幀內編碼 (intra coding) 迴圈中的重建影像,經由特徵擷取、去雜訊、以及影像強化,去除編碼過程中產生的壓縮失真,以提升整體編碼品質。另外本研究提出針對訓練樣本資料庫進行前處理,採用 K-means 分群法對訓練樣本分類,以篩檢出具備多樣圖像內容、特徵均勻分布的訓練樣本資料集,避免因為樣本不均勻造成模型過擬合而效果不佳的情況。實驗結果顯示最後訓練出來的 SwinIF 模型,相較於 VTM 19.0 AI 採預設編碼下,分別在亮度訊號 (Y) 和兩色差訊號 (U, V) 上達到 BDBR 平均 -5.07% -2.29% 和-4.40% 的降低。
With the increase in internet bandwidth and hardware and software technology advancements, multimedia platforms have become the mainstream for information transmission. The efficient video encodings, H.265/HEVC and H.266/VVC, provide standards for 4K and 8K video, respectively. Many researchers have applied deep learning methods from computer vision to improve video compression efficiency under this international video coding framework. Some studies have developed loop filters for H.266/VVC to enhance compression distortion caused by block decomposition and quantization processes. Utilizing the self-attention mechanism, the Transformer model has surpassed Recurrent Neural Networks (RNNs) in accuracy and performance. Its ability to capture global image features has shown superiority over Convolutional Neural Networks (CNNs) in image classification, denoising, and Single Image Super Resolution (SISR) tasks. This research proposes a deep loop filter model developed based on the SwinTransformer architecture. Integrating a lightweight CNN network aims to reconstruct the intra-coded frames of H.266/VVC, extracting features, denoising, and enhancing the images to remove compression artifacts, thus improving the overall coding quality. Additionally, this study introduces a preprocessing procedure for the training sample database, applying the K-means clustering method to classify training samples. This approach aims to select diverse training samples with uniformly distributed features, avoiding overfitting caused by imbalanced samples, leading to subpar results.Experimental results showed that the SwinIF model, compared to default encoding in VTM 19.0 AI, achieves an average reduction of 5.07%, 2.29%, and 4.40% in brightness signal (Y) and chroma signals (U, V), respectively, in terms of BDBR (Bjøntegaard Delta Bit Rate) metric.
[1] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h.264/avc
video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003.
[2] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency
video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
[3] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview
of the versatile video coding (vvc) standard and its applications,” IEEE Transactions
on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
[4] Y. Dai, D. Liu, and F. Wu, “A convolutional neural network approach for postprocessing in hevc intra coding,” in MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23,
pp. 28–39, Springer, 2017.
[5] J. Lainema, F. Bossen, W.-J. Han, J. Min, and K. Ugur, “Intra coding of the hevc standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12,
pp. 1792–1801, 2012.
[6] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. De-Luxán-Hernández, T. Wiegand,
V. Rufitskiy, A. K. Ramasubramonian, and G. Van der Auwera, “Intra prediction and
mode coding in vvc,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 31, no. 10, pp. 3834–3847, 2021.
[7] M. Karczewicz, N. Hu, J. Taquet, C.-Y. Chen, K. Misra, K. Andersson, P. Yin, T. Lu,
E. François, and J. Chen, “Vvc in-loop filters,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 31, no. 10, pp. 3907–3925, 2021.
[8] C.-M. Fu, E. Alshina, A. Alshin, Y.-W. Huang, C.-Y. Chen, C.-Y. Tsai, C.-W. Hsu, S.-
M. Lei, J.-H. Park, and W.-J. Han, “Sample adaptive offset in the hevc standard,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1755–
1764, 2012.
44
[9] C.-Y. Tsai, C.-Y. Chen, T. Yamakage, I. S. Chong, Y.-W. Huang, C.-M. Fu, T. Itoh,
T. Watanabe, T. Chujoh, M. Karczewicz, and S.-M. Lei, “Adaptive loop filtering for
video coding,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 6,
pp. 934–945, 2013.
[10] Y. Dai, D. Liu, Z.-J. Zha, and F. Wu, “A cnn-based in-loop filter with cu classification
for hevc,” in 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–
4, 2018.
[11] C. Jia, S. Wang, X. Zhang, S. Wang, J. Liu, S. Pu, and S. Ma, “Content-aware convolutional neural network for in-loop filtering in high efficiency video coding,” IEEE
Transactions on Image Processing, vol. 28, no. 7, pp. 3343–3356, 2019.
[12] W. Jia, L. Li, Z. Li, X. Zhang, and S. Liu, “Residual guided deblocking with deep learning,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 3109–
3113, IEEE, 2020.
[13] S. Chen, Z. Chen, Y. Wang, and S. Liu, “In-loop filter with dense residual convolutional neural network for vvc,” in 2020 IEEE Conference on Multimedia Information
Processing and Retrieval (MIPR), pp. 149–152, 2020.
[14] Z. Huang, Y. Li, and J. Sun, “Multi-gradient convolutional neural network based inloop filter for vvc,” in 2020 IEEE International Conference on Multimedia and Expo
(ICME), pp. 1–6, 2020.
[15] Y. Li, L. Zhang, and K. Zhang, “Convolutional neural network based in-loop filter for
vvc intra coding,” in 2021 IEEE International Conference on Image Processing (ICIP),
pp. 2104–2108, 2021.
[16] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: Analysis, applications, and prospects,” IEEE Transactions on Neural Networks
and Learning Systems, vol. 33, no. 12, pp. 6999–7019, 2022.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems
(F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.), vol. 25, Curran Associates,
Inc., 2012.
45
[18] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and
Z. Wang, “Real-time single image and video super-resolution using an efficient subpixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883, 2016.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser,
and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.
[20] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16
words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929,
2020.
[21] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer:
Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/
CVF international conference on computer vision, pp. 10012–10022, 2021.
[22] A. Horé and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in 2010 20th International
Conference on Pattern Recognition, pp. 2366–2369, 2010.
[23] G. Bjontegaard, “Calculation of average psnr differences between rd-curves,” 2001.
[24] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution:
Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops, July 2017