簡易檢索 / 詳目顯示

研究生: 林森
Sen - Lin
論文名稱: 應用光流法改進深度資訊估測效能於虛擬視角合成技術
Utilize Optical Flow Algorithm in Improving Depth Map Estimation Performance for View Synthesis
指導教授: 陳建中
Jiann-jone Chen
口試委員: 杭學鳴
Hsueh-ming Hang
郭天穎
Tien-ying Kuo
鍾國亮
Chung-kuo Liang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 90
中文關鍵詞: 深度資訊視差多視角視訊編碼視角合成自由視角電視立體視訊電視
外文關鍵詞: depth map, disparity, multiview video coding, view synthesis, free-view TV, 3DTV
相關次數: 點閱:264下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

視訊編碼(video coding)技術的發展,將二維視訊被動收視的平台,進化到主動式真實呈現自然場景中深度與立體感的三維視訊(3D TeleVision, 3DTV)的收視。另外多視角視訊的編碼技術(Multiview Video Codec, MVC),讓使用者可以選擇任意視角觀看對應該角度的視訊(Free-View Video, FVV),本論文將探討整合三維視訊3DTV以及自由視角視訊FVV編碼的技術改進方法。在三維視訊及自由視角視訊中,為了呈現良好立體感以及在接收端合成(Synthesis)可供使用者自由切換之虛擬視角(Virtual View)影像,必須傳輸多重視角資訊及對應深度資訊(Depth Information),其中深度資訊的準確度會影響立體成像效果及合成視角品質,本論文提出應用一階近似總變分光流法(Total Variation L1-norm Optical Flow, TV-L1 OF),利用影像亮度特性,預測每個像素點運動方向,改善估測深度資訊之準確度,進而提升三維視訊及虛擬視角合成影像之品質。為了降低光流運算複雜度,提出適應物件區域修正演算法,在相近的品質下可節省約44~63%運算時間。此外多視角視訊編碼MVC因較傳統單一視角畫面影像編碼資料量更為龐大,運算複雜度極高,為了應用於可攜式編碼器或無線視訊感測網路的應用,本論文提出採用分散式視訊編碼技術(Distributed Video Coding, DVC)處理多視角視訊影像傳輸的問題,將複雜的計算移至解碼端,簡稱為多視角分散式視訊編碼(Multi-view Distributed Video Coding, MDVC)。本論文基於MDVC架構,整合時間域與視角間視訊畫面關聯性,將前項深度影像估測結果用以改善視訊編解碼品質,系統中整合View Synthesis, Opticalflow及feedBack機制改進輔助資訊(Side Information, SI)可信度,進而提升MDVC的編碼品質。實驗結果顯示,相較於DERS5.1,利用光流法改進之深度資訊於虛擬視角合成影像之平均PSNR可提升約0.5~1.3dB。


The development of video technology enables three-dimensional (depth with colors) video signal (3DTV) display, based on the two-dimensional video signal processing platform. In additional to rendering 3DTV, the advanced video processing technology can provide multi-view video display such that users can select watch TV from arbitrary viewpoints, which is also called free view video (FVV). We proposed to improve the coding performance of the integrated 3DTV with FVV in this thesis. For the 3DTV system integrated with the FVV to render satisfactory arbitrary view video quality, it has to synthesis virtual view video images from available multi-view video signals with depth map information. The confidence of depth map information is found to be critical to yield good synthesis images. We proposed to utilize a Total Variation L1-norm Optical Flow algorithm, TV-L1 OF, to estimate pixel motion vectors that are smooth and accurate, and to yield high confidence depth information. To reduce the computational complexity of motion vector estimation through the TV-L1 OF algorithm, we proposed to utilize the TV-L1 OF on object boundaries, such that the computation time can be reduced about 44~63%. In addition, the MVC codec has to deal with a much larger amount of signal data, as compared to mono view video codec. We proposed to utilize a distributed video coding method (DVC) to help to transfer encoder time complexity to decoder for the MVV system, abbreviated as MDVC. Based on the MDVC, the intra- and inter-view video correlations were exploited to yield the high confidence side information.  By integrating all these modules together, i.e., TV-L1 OF algorithm, view synthesis, and feedback mechanism, the overall codec performance can be improved, as compared to previous methods. Experiments showed that the estimated depth map information through the TV-L1 OF algorithm helps to improve the PSNRs of synthesis images up to 0.5~1.3dB, as compared to the DERS5.1 algorithm.

摘要 I ABSTRACT II 致謝 III 目錄 IV 圖目錄 VII 表目錄 X 第一章 緒論 1 1.1 研究動機 1 1.2 問題描述及研究方法 2 1.3 論文組織 4 第二章 背景知識與相關研究 5 2.1 立體成像與自由視角視訊 5 2.1.1 立體視覺成像原理 5 2.1.2 自由視角視訊架構 6 2.2 多視角深度估測與虛擬視角合成技術 9 2.2.1 多視角深度估測技術 9 2.2.2 虛擬視角合成技術 11 2.3 整合多視角與分散式視訊編碼 12 2.3.1 多視角視訊編碼起源 13 2.3.2 分散式視訊編碼 13 2.3.3 整合分散式視訊編碼與多視角視訊編碼架構 16 2.4 多視角關聯性之輔助資訊重建 16 2.4.1 運動補償時間域內插 17 2.4.2 視差補償視角預測 18 2.4.3 透視轉換模型 19 2.4.4 光流運動估測 20 2.5 相關模擬工具 26 2.5.1 H.264/AVC視訊編碼器 26 2.5.2 RCPT之渦輪編碼器 28 2.5.3 SIFT之特徵匹配法 32 2.5.4 HSV色彩編碼 36 第三章 改良之深度估測與多視角分散式視訊編碼系統 37 3.1 光流法及深度估測技術 37 3.1.1 一階近似總變分光流法 38 3.1.2 應用光流法改良之深度估測技術 39 3.1.3 適應物件區域修正演算法 44 3.2 MDVC系統 47 第四章 模擬結果與比較 53 4.1 實驗參數設定 53 4.2 實驗數據比較 56 4.1.1 虛擬視角合成影像PSNR效能 57 4.1.2 輔助資訊影像PSNR效能 62 4.1.3 時間複雜度分析 65 4.2 實驗結果展示 66 4.2.1 虛擬視角合成影像 66 4.2.2 重建輔助資訊影像 73 4.2.3 解碼後WZ影像 78 第五章 結論與未來展望 83 5.1 結論 83 5.2 未來展望 84 5.3 研究建議 85 參考文獻 87

[1] ITU-T and ISO/IEC JTC1, “Joint draft 8.0 on multi-view video coding,” JVT-AB204, Jul.
[2] 2008.Masayuki Tanimoto, “Overview of Free Viewpoint Television,” Signal Processing: Image Communication,vol. 21, no. 6,pp. 454-461,July 2006.
[3] Masayuki Tanimoto, “Free Viewpoint Television - FTV”, Picture Coding Symposium 2004, Session 5, December 2004.
[4] “Call for Comments on 3DAV”, ISO/IEC JTC1/SC29/WG11, N6051, October 2003.
[5] "Reference Softwares for Depth Estimation and View Synthesis", ISO/IEC JTC1/SC29/WG11
MPEG2008/M15377.
[6] Sang-Beom Lee, Yo-Sung Ho, "Temporally Consistent Depth Map Estimation for 3D Video Generation and Coding", May 2013
[7] Xiaoyu Xiu and Jie Liang, "An Improved Depth Map Estimation Algorithm For View Synthesis and Multiview Video Coding", Image Procsssing 2010.
[8] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Info. Theory, vol. 19, no. 4, pp. 471-480, Jul. 1973.
[9] P. Kauff et al., “Depth map creation and image-based rendering for advanced 3DTV services providing interoperability and scalability,” Signal Processing: Image Communication, vol. 22, no. 2, pp. 217–234, 2007.
[10] M. Gotfryd, K. Wegner, and M. Domański, “View synthesis software and assessment of its performance,” ISO/IEC JTC1/SC29/WG11, MPEG 2008/M15672, Hannover, Germany, July 2008.
[11] K. Muller et al., “View synthesis for advanced 3D video systems,” Eurasip Journal on Image and Video Processing, no. 438148, Nov. 2008.
[12] M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computational stereo,” IEEE Trans. PAMI, 25(8):993-1008, 2003.
[13] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis., vol. 47, no. 1, pp. 7–42, May 2002.
[14] K. Muller, P. Merkle, “Challenges in 3D video standardization,” Visual Communications and Image Processing, 2011.
[15] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Info. Theory, vol. 22, no. 1, pp. 1-10, Jan. 1976.
[16] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M. Ouaret, “The DISCOVER codec: architecture, techniques and evaluation,” in Proc. PCS, Lisbon, Portugal, Nov. 2007.
[17] M. Flierl and B. Girod, “Coding of multi-view image sequences with video sensors,” in Proc. IEEE ICIP, pp. 609-612, Oct. 2006.
[18] X. Guo, Y. Lu, F. Wu, D. Zhao, and W. Gao, “Wyner-Ziv-based multiview video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 6, pp. 713-724, Jun. 2008.
[19] M. Ouaret, F. Dufaux, and T. Ebrahimi, “Iterative multiview side information for enhanced reconstruction in distributed video coding", Journal on Image Video Process., pp. 1-17, Jan. 2009.
[20] Scharstein, D. and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," Int. Journal Computer Vision, vol.47, no.1-3, pp.7-42, 2002.
[21] Min, D. and K. Sohn, "Cost aggregation and occlusion handling with WLS in stereo matching," IEEE Trans. Image Processing, vol.17, no.8, pp.1431-1442, 2008.
[22] Vladimir Kolmogorov, "Computing Visual Correspondence with Occlusions via Graph Cut", IEEE Computer Society Conference, 2004.
[23] C.Fehn, "A 3D-TV Approach Using Depth-image-based Rendering(DIBR)",Image Processing, 2003.
[24] X. Artigas, E. Angeli, and L. Torres, “Side information generation for multiview distributed video coding using a fusion approach,” in Proc. NORSIG, Reykjavik, Iceland, pp. 250-253, Jun. 2007.
[25] M. Ouaret, F. Dufaux, and T. Ebrahimi, “Fusion-based multiview distributed video coding”, in Proc. ACM VSSN, pp. 139-144, Oct. 2006.
[26] ISO/IEC MPEG and the VCEG, “JMVC (Joint Multiview Video Coding) software for the Multiview Video Coding (MVC)”, 2010.
[27] J. Ascenso, C. Brites, and F. Pereira, “Content adaptive Wyner-Ziv video coding driven by motion activity,” in Proc. IEEE ICIP, pp. 605-608, Oct. 2006.
[28] M. Flierl, A. Mavlankar, and B. Girod, “Motion and disparity compensated coding for multi-view video, ” IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 1474-1484, Nov. 2007.
[29] R. Hartley and A. Zisserman, Multiple view geometry in computer vision, 2nd Ed., Cambridge University Press, ISBN: 0-521-54051-8, 2004.
[30] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
[31] R. C. Bolles and M. A. Fischler, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communication ACM, vol. 24, no. 6, pp. 381-395, Jun. 1981.
[32] B. K. P. Horn and B. G. Schunck, “Determining Optical Flow,” Artificial Intelligence, vol. 17, pp.185-203, Aug. 1981.
[33] G. Bradski, A. Kaehler, Learning OpenCV: computer vision with the OpenCV library, O'Reilly Media, Inc., ISBN: 978-0-596-51613-0, 2008.
[34] Information technology-coding of audio-visual objects-part 10: advanced video coding, ISO/IEC Std 14496-10, 2003.
[35] D. Marpe, T. Wiegard, and G. J. Sullivan, “The H.264/MPEG4 advanced video coding standard and its applications,” IEEE Communication Magazine, vol. 44, no. 8, pp. 134-143, Aug. 2006.
[36] Iain E. G. Richardson, H.264 and MPEG-4 video compression: video coding for next-generation multimedia, John Wiley & Sons, Ltd. ISBN: 0-470-84837-5, 2003.
[37] D. N. Rowitch and L. B. Milstein, “On the performance of hybrid FEC/ARQ system using rate compatible punctured turbo (RCPT) codes,” IEEE Trans. Communication, vol. 48, no. 6, pp. 948-959, Jun. 2000.
[38] L. R. Bahl, J. Cocke, F. Jelinek, and J. Racic, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. 20, no. 2, pp. 284-287, Mar. 1974.
[39] G. Berrou, A. Glavieuc, and P. Thitmajshima, “Near Shannon limit error-correcting coding: turbo codes,” in Proc. IEEE ICC, pp. 1064-1070, May 1993.
[40] P. Robertson, E. Villebrn, and P. Hoeher, “A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain,” in Proc. IEEE ICC, vol. 2, pp. 1009-1013, Jun. 1995.
[41] CHING-HUA CHEN,“An Improved Block Matching and Prediction Algorithm for Multi-view Video with Distributed Video Coder”, NTUST masters dissertation, 2010.
[42] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
[43] Sourimant Gael, “Depth maps estimation and use for 3DTV”, 2010.
[44] Qiuwen Zhang, “Improved multi-view depth estimation for view synthesis in 3D video”, 2011.
[45] D. Kubasov, K. Lajnef, and C. Guillemot, “A hybrid encoder/decoder rate control for Wyner-Ziv video coding with a feedback channel,” IEEE Workshop on MMSP, pp. 251-254, Oct. 2007.
[46] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” CMAP, Ecole Polytechnique, Tech. Rep. R.I. 685, 2010.
[47] C. Zach, T. Pock, and H. Bischof, “A duality based approach for realtime TV-L1 optical flow,” In 29th DAGM Symposium on Pattern Recognition, pp. 214-223, 2007.
[48] LU CHI CHUN, “Integrate Depth and Gray-level Information of Multi-View Video to Enhance Side Information of Distributed Video Coder”, NTUST masters dissertation, 2011.
[49] HSU YI-KANG, “Utilize Optical Flow Algorithm in Improving Distrubuted Multi-view Video Codec Performance”, NTUST masters dissertation, 2012
[50] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in European Conf. Computer Vision (ECCV), pp. 25–36, 2004.
[51] Call for proposals on multi-view video coding, ISO/IEC JTC1/SC29/WG11, N7327, Jul. 2005.
[52] Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), “JSVM Software Manual,” JSVM 9.18, 19 Jun. 2009.
[53] J. Lu, X. Zhang, and L. Wu, “Distributed video coding technology based on H.264 and turbo code,” in Proc. CISP, vol. 1, pp. 516-520, May 2008.
[54] J. Slowack, J. Škorupa, S. Mys, P. Lambert, C. Grecos, and R. Van de Walle, “Flexible distribution of complexity by hybrid predictive-distributed video coding,” EURASIP Journal on Signal Processing: Image Communication, vol. 25, no. 2, pp. 94-110, Feb. 2010.
[55] A. Smolic and P. Kauff, “Interactive 3-D video representation and coding technologies,” in Proc. IEEE, vol.93, no. 1, pp. 89-110, 2005.
[56] G. Bjontegaard, “Calculation of Average PSNR Differences Between RD-Curves, ” ITUT-T Q6/SG16, Doc. VCEG-M33, Apr. 2001.
[57] “Free Viewpoint Television (FTV),” http://www.tanimoto.nuee.nagoya-u.ac.jp/study/FTV
[58] The official website of OpenCV: http://opencv.willowgarage.com
[59] The official website of IT++: http://sourceforge.net/apps/wordpress/itpp/
[60] The official website of LibJacket: http://www.honghutech.com/software/accelereyes-jacket/libjacket
[61] Nvidia CUDA: http://www.nvidia.com/object/cuda_home_new.html
[62] M. Quaret, F. Dufaux, and T. Ebrahimi, “Mulitiview distributed video coding with encoder driven fusion,” in Proc. European Conf. Signal Processing (EUSIPCO ’07), Poznan, Poland, Sept. 2007.

QR CODE