簡易檢索 / 詳目顯示

研究生: 李政樺
Cheng-Hua Lee
論文名稱: 運用深度學習法加速幀內編碼之QTMT模式判定
Fast Intra QTMT Mode Decision based on Deep Learning Methods
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 杭學鳴
Hsueh-Ming Hang
鍾國亮
Kuo-Liang Chung
花凱龍
Kai- Long Hua
郭天穎
Tian-Ying Kuo
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 63
中文關鍵詞: H.266QTMTIntraDeep Learning
外文關鍵詞: H.266, QTMT, Intra, Deep Learning
相關次數: 點閱:181下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著 5G 通訊時代的來臨以及網路與多媒體技術的快速發展,高畫質且低延遲
多媒體傳輸成為主流。視訊編碼標準從高效視訊編碼(HEVC/H.265)的 4K 畫質,提
升到多功視訊編碼(Versatile Video Coding, VVC/H.266)的 8K 高畫質。運用區塊編
碼(Coding Unit, CU)架構下,原先 HEVC/H.265 中的四分樹分割(Quad Tree, QT),
在 JVET/H.266 增加了二分樹分割(Binary Tree, BT),簡寫為 QTBT。最新一代
VVC/H.266 另增加 三分樹 (Ternary Tree, TT) 分割模式 , QTBTTT 簡寫為
QTMT(Multi-Type Tree)。相較於 HEVC/H.265,QTBT 在全幀內(All Intra)模式下需
523%的運算量,而 VVC 編碼只需要一半的位元率。VVC 採用 QTMT 使得 CU 可
以依照影像內容更靈活分割區塊,在編碼區塊中最大的編碼樹單元(CTU)先以 QT
分割,接著 QT 葉節點可以通過 MT 進一步分割。MT 包含垂直 BT(BV),水平
BT(BH),垂直 TT (TV),和水平 TT (TH),VVC/H.266 以窮舉法遍歷所有分割模式
之 RDO(Rate-Distortion Optimization)程序找出最佳的 CU 分割模式,運算複雜度相
較於前一版本高出甚多。本論文提出利用卷積神經網路(Convolutional Neural
Network,CNN)來預測禎內編碼中 CU 在 QT 以及 BT 切割之後的切割模式預測,
希望經由提前預測省去不必要的 RDO 運算比較時間,在維持編碼品質之下減少運
算複雜度。我們利用 CNN 模型 ResNet 預測 64×64、32×32、64×32、32×64、32×16
以及 16×32 大小的區塊的切割模式預測。所提方法的執行步驟為: (1)建立編碼區塊
數據集,我們從 VVC/H.266(VTM4.0.1)標準碼流中擷取出 64×64 以及 32×32 區塊
經 QTMT 程序切割出來的 CU 模式,將區塊原始像素值以及 CU 切割模式當作輸
入樣本和標示資料來訓練 ResNet;(2)建立第二個數據集,從碼流中擷取 64×32、
32×64、32×16 以及 16×32 區塊經由 QTMT 切割出來模式,加上原始區塊資料做為
ResNet 輸入; (3)將此兩個 ResNet 的預測結果結合,做為 VVC/H.266 的快速 CU 決
策模式。實驗結果顯示,我們所提出的方法在 BDBR 上升 1.58%的情況下,編碼
時間可降低 74.39%。


The fifth generation (5G) communication technique enables high-speed data
transmission. With the advance of computer processing speed, video communication
standard has improved from HEVC/H.265 that encodes 4K images to VVC/H.266 8K
images. For the H.266, in addition to utilize Quad Tree (QT) partition for one coding
unit(CU) in the HEVC, the H.266/JVET allows Binary Tree (BT), QTBT, partition and
the H.266/VVC allows BT and Ternary Tree (TT) block partition, abbreviated as QTMT.
In comparison, the time complexity of the Intra-coded QTBT is 523% of that of the HEVC,
but the QTBT can reduce 50% of the required bitrates. In the VVC, it first performs QT
for one coding tree unit (CTU) and then MT for each sub-block. The QTMT has to
determine whether to adopt vertical BT (BV), horizontal BT (BH), vertical TT (TV) or
horizontal TT (TH) to partition the CU through an exhaustive Rate-Distortion
Optimization (RDO) procedure which leads to high time complexity. To reduce the time
complexity when perform the QTMT procedure, we proposed to utilize a Convolutional
Neural Network (CNN) to predict the CU coding mode to eliminate the time consuming
RDO test procedure. We adopted a CNN model, ResNet, to perform deep learning based
on original image blocks and ground-truth CU modes to construct a model to predict the
CU coding mode, in which 3232 and 1616 blocks were selected to train the ResNet
model. (1) Pixel values and coding mode of 3232 blocks are extracted from video
bitstreams encoded by the H.266/VVC4.0.1 to train the first Resnet; (2) Pixel values and
coding modes of 1616 blocks are extracted to train the second ResNet; (3) Combining
both ResNets, the coding controller can predict the H.266/VVC CU mode based on block
pixel values such that it can determine to early terminate the procedure or bypass some
unlikely RDO processes. Experiments showed that the proposed method can reduce
74.39% of encoding time while the BDBR increment is less than 2.74%.

目錄 摘要 ...................................................................................................................................i Abstract.............................................................................................................................ii 致謝 .................................................................................................................................iii 目錄 .................................................................................................................................iv 圖目錄 ............................................................................................................................. vi 表目錄 ...........................................................................................................................viii 第一章 緒論 .................................................................................................................. 1 1.1 研究動機與目的.......................................................................................... 1 1.2 問題描述與研究方法.................................................................................. 2 1.3 論文組織...................................................................................................... 3 第二章 背景知識 ............................................................................................................ 4 2.1 VVC/H.266 視訊編碼標準介紹.................................................................. 4 2.1.1 VVC/H.266 標準制定背景.................................................................. 4 2.1.2 VVC/H.266 與 HEVC/H.265 比較......................................................... 5 2.1.3 VVC/H.266 的 CU 編碼架構 .............................................................. 6 2.1.4 幀內預測編碼 ...................................................................................... 12 2.1.5 幀間預測 .............................................................................................. 20 2.2 ResNet 卷積網路介紹..................................................................................... 29 2.3 機器學習運作流程 ......................................................................................... 31 第三章 VVC/H.266 編碼單位的快速演算法.............................................................. 32 3.1 VVC/H.266 複雜度分析.................................................................................. 32 3.2 FVC/H.266 之快速 CU 分割演算法相關文獻............................................ 35 3.3 HEVC/H.265 之快速 CU 分割演算法相關文獻......................................... 37 3.4 運用卷積神經網路之 VVC/H.266 快速決策法............................................ 40 v 3.4.1 深度學習卷積神經網路資料庫建立 .................................................. 40 3.4.2 深度學習 ResNet 應用方法................................................................. 45 第四章 實驗結果與討論 .............................................................................................. 50 4.1 實驗環境設置 ................................................................................................. 50 4.2 ResNet 與原始 VVC 之實驗結果比較 ........................................................... 51 4.3 本論文與 HEVC-CNN 文獻[9]及文獻[22]比較 ........................................... 55 4.4 本論文方法與 VVC/H.266 文獻[23]實驗結果比較 ..................................... 57 第五章 結論與未來研究探討 ...................................................................................... 59 5.1 結論 ................................................................................................................. 59 5.2 未來研究探討 ................................................................................................. 60

參考文獻
[1] Jicheng, A., et al., Quadtree plus binary tree structure integration with JEM tools.
Joint Video Exploration Team (JVET), 2016.
[2] Gweon, R.-H. and Y.-L. Lee, Early termination of CU encoding to reduce HEVC
complexity. IEICE Transactions on Fundamentals of Electronics Communications
and Computer Sciences. E95.A: p. 1215-1218, 2012.
[3] K. Jongho, et al. Adaptive Coding Unit early termination algorithm for HEVC. in
2012 IEEE International Conference on Consumer Electronics (ICCE). 2012.
[4] X. Shen and L. Yu, CU splitting early termination based on weighted SVM.
EURASIP Journal on Image and Video Processing, 2013.
[5] Zhang, Y., et al., Machine Learning-Based Coding Unit Depth Decisions for Flexible
Complexity Allocation in High Efficiency Video Coding. IEEE Transactions on
Image Processing. 24(7): p. 2225-2238, 2015.
[6] Wang, Z., et al. Effective Quadtree Plus Binary Tree Block Partition Decision for
Future Video Coding. in 2017 Data Compression Conference (DCC). 2017.
[7] Correa, G., et al., Fast HEVC Encoding Decisions Using Data Mining. IEEE
Transactions on Circuits and Systems for Video Technology. 25(4): p. 660-673, 2015.
[8] Zhu, L., et al., Binary and Multi-Class Learning Based Low Complexity
Optimization for HEVC Encoding. IEEE Transactions on Broadcasting. 63(3): p.
547-561, 2017.
[9] Liu, Z., et al., CU Partition Mode Decision for HEVC Hardwired Intra Encoder
Using Convolution Neural Network. IEEE Transactions on Image Processing. 25(11):
p. 5088-5103, 2016.
[10] T. Mallikarachchi et al., Content-adaptive feature-based CU size prediction for fast
low-delay video encoding in HEVC. IEEE Trans. Circuits Syst. Video Technol. vol.
28, no. 3, pp. 693–705, Mar. 2018.
62
[11] L. Zhu, Y. Zhang, Z. Pan, R. Wang, S. Kwong, and Z. Peng., Binary and multi-class
learning based low complexity optimization for HEVC encoding, IEEE Trans.
Broadcast., vol. 63, no. 3, pp. 547–561, Sep. 2017.
[12] Q. Hu et al., Fast HEVC intra mode decision based on logistic regression
classification, in Proc. IEEE Int. Symp. Broadband Multimedia Syst. Broadcast.
(BMSB), pp. 1–4, Jun. 2016.
[13] D. Liu, X. Liu, and Y. Li., Fast CU size decisions for HEVC intra frame coding
based on support vector machines, in Proc. IEEE 14th Intl Conf. Dependable, Auto.
Secure Comput. (DASC), pp. 594–597, Aug. 2016.
[14] T. Laude and J. Ostermann., Deep learning-based intra prediction mode decision for
HEVC, in Proc. Picture Coding Symp. (PCS), pp. 1–5, Dec. 2016.
[15] H.-S. Kim and R.-H. Park., Fast CU partitioning algorithm for HEVC using an
online-learning-based Bayesian decision rule, IEEE Trans. Circuits Syst. Video
Technol., vol. 26, no. 1, pp. 130–138, Jan. 2016.
[16] H. R. Tohidypour et al., Online-learning-based mode prediction method for quality
scalable extension of the high efficiency video coding (HEVC) standard, IEEE Trans.
Circuits Syst. Video Technol., vol. 27, no. 10, pp. 2204–2215, Oct. 2017.
[17] A. J. F. de Oliveira and M. S. Alencar., Online learning early skip decision method
for the HEVC inter process using the SVM-based pegasos algorithm, Electron. Lett.,
vol. 52, no. 14, pp. 1227–1229, Jul. 2016.
[18] F. Duanmu, Z. Ma, and Y. Wang., Fast mode and partition decision using machine
learning for intra-frame coding in HEVC screen content coding extension, IEEE J.
Emerg. Sel. Topics Circuits Syst., vol. 6, no. 4, pp. 517–531, Dec. 2016.
[19] S. Momcilovic, N. Roma, L. Sousa, and I. Milentijevic., Run-time machine learning
for HEVC/H.265 fast partitioning decision, in Proc. IEEE Int. Symp. Multimedia
(ISM), pp. 347–350, Dec. 2015.
63
[20] B. Du, W.-C. Siu, and X. Yang., Fast CU partition strategy for HEVC intra-frame
coding using learning approach via random forests, in Proc. Asia–Pacific Signal Inf.
Process. Assoc. Annu. Summit Conf. (APSIPA), pp. 1085–1090, Dec. 2015.
[21] Y. Shan and E.-H. Yang., Fast HEVC intra coding algorithm based on machine
learning and Laplacian transparent composite model, in Proc. IEEE Int. Conf.
Acoust., Speech Signal Process. (ICASSP), pp. 2642–2646, Mar. 2017.
[22] M. Xu et al., Reducing complexity of HEVC: A deep learning approach, IEEE Trans.
Image Process., vol. 27, no. 10, pp. 5044–5059, Oct. 2018.
[23] Yang, H., Shen, L., Dong, X., Ding, Q., An, P., Jiang, G., Low complexity CTU
partition structure decision and fast intra mode decision for Versatile video coding,
IEEE Trans. Circuits and Systems for Video Technology, 2019.
[24] Helle, Philipp, Oudin, Simon., Block Merging for Quadtree-Based Partitioning in
HEVC, IEEE Trans. Circuits and Systems for Video Technology, 2012.
[25] Park, Sang-hyo, Kang, Je-Won., Fast Affine Motion Estimation for Versatile Video
Coding (VVC) Encoding, IEEE Access, Oct. 2019.
[26] Choi, Young-Ju, Lee, Young-Woon, Kim, Byung-Gyu., Design of Perspective Affine
Motion Compensation for Versatile Video Coding (VVC), ISBN:978-3-030-40604-
2, Feb. 2020.
[27] Retrieved from 程式前沿-王榮剛:建立中國自主視訊技術生態
http://codertw.com /%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/6
28811/ July 18, 2020.
[28] Retrieved from 台部落 https://www.twblogs.net/a/5e50c121bd9eee21167e5c88
July 18, 2020.
[29] J. Chen, Y. Ye, S. Kim., Algorithm description for Versatile Video Coding and Test
Model 5, May 21, 2019.
[30] Kaiming H, Xiangyu Z, Shaoqing R, Jian S., Deep Residual Learning for Image
Recognition, IEEE conference on computer vision and pattern recognition, 2016.

QR CODE