簡易檢索 / 詳目顯示

研究生: 林俊明
Jun-Ming Lin
論文名稱: 基於多模態生成對抗網路的深度影像超解析度方法
Depth Map Upsampling via Multimodal Generative Adversarial Network
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 花凱龍
Kai-Lung Hua
郭景明
Jing-Ming Guo
陸敬互
Ching-Hu Lu
鍾國亮
Kuo-Liang Chung
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 42
中文關鍵詞: Depth map super-resolutionRange sensorGenerative adversarial network
外文關鍵詞: 深度影像超解析度, 範圍感測器, 生成對抗網路
相關次數: 點閱:175下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度影像超解析度(super-resolution)的任務是通過使用低解析度深度影像來 預測出高解析度深度影像。根據過去的經驗,預測出來的高解析度深度影像在邊 界區域經常有銳度不足的問題。在本論文中,我們提出了一個基於生成對抗網絡
    (generative adversarial network)的架構來解決這個問題。我們提出的網絡可以學 習低解析度和高解析度訓練影像之間的空間訊息、距離訊息和局部梯度訊息的關 係。我們的訓練影像包含了深度影像與其相對應的灰階場景影像,因為深度影像 與其相對應的灰階場景影像擁有相似的邊界,所以學習低解析度和高解析度灰階 場景影像之間的關係可以進一步提升深度影像超解析度的表現。藉由我們提出的 GAN 架構所訓練出的學習函數,我們可以從不特定的低解析度測試深度影像生成 含有正確的高頻和低頻訊息的高解析度深度影像。最後,定量和定性實驗結果證 明,我們的方法在深度影像超解析度工作上展現出最好的性能。


    The task of depth map super-resolution (SR) is to predict high resolution (HR) depth images by using low resolution (LR) depth images. According to past experience, the predicted high-resolution depth image often suffers from the loss of sharpness at the depth boundaries. In this paper, we propose an architecture based on the generative adversarial network (GAN) to solve this problem. The proposed network can learn relations of spatial information, range information and local gradient information between LR and HR training images which contain depth maps and corresponding gray-scale scene images. Because the depth map and its corresponding gray-scale scene image have similar boundaries, learning the relationship between LR and HR gray-scale scene images can further enhance the performance of the depth map super-resolution. With a learning function trained by our proposed GAN architecture, we can generate SR depth images with the correct high and low-frequency information from an unspecified test LR depth image. At last, quantitative and qualitative experimental results show that our method achieves state-of-art performance on depth map super-resolution work.

    摘要.......................................... I Abstract........................................ II Acknowledgements.................................. III Contents........................................ IV List of Figures..................................... V List of Tables .....................................VIII 1 Introduction.................................... 1 2 RelatedWorks................................... 5 3 Method ...................................... 7 3.1 Minibatch of Depth Maps and Corresponding Scene Images . . . . . . . 10 3.2 Multimodal Generative Adversarial Network Architecture . . . . . . . . 10 3.3 LossFunction................................ 13 4 ExperimentalDesign ............................... 15 4.1 Dataset ................................... 15 4.2 TrainingDetails............................... 15 5 ExperimentalResult................................ 16 6 Conclusions.................................... 30 References....................................... 31

    [1] J. Diebel and S. Thrun, “An application of Markov random fields to range sensing,”
    in MIT Press Proceedings of Conference on Neural Information Processing Systems
    (NIPS), pp. 291–298, 2005.
    [2] J. Kopf, M. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,”
    ACM Transactions on Graphics (TOG), vol. 26, no. 3, 2007.
    [3] J. Lu, D. Min, R. S. Pahwa, and M. N. Do, “A revisit to MRF-based depth map
    super-resolution and enhancement,” in IEEE International Conference on Acoustics,
    Speech, and Signal Processing (ICASSP), pp. 985–988, 2011.
    [4] D. Kim and K. Yoon, “High quality depth map up-sampling robust to edge noise
    of range sensors,” in IEEE International Conference on Image Processing (ICIP),
    pp. 553–556, 2012.
    [5] S.-W. Jung, “Enhancement of image and depth map using adaptive joint trilateral
    filter,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT),
    vol. 23, no. 2, pp. 258–269, 2013.
    [6] K.-H. Lo, K.-L. Hua, and Y.-C. F. Wang, “Depth map super-resolution via Markov
    random fields without texture-copying artifacts,” in IEEE International Conference
    on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1414–1418, 2013.
    [7] J. Park, H. Kim, Y.-W. Tai, M. Brown, and I. Kweon, “High quality depth map upsampling
    for 3d-tof cameras,” in IEEE International Conference on Computer Vision
    (ICCV), pp. 1623–1630, 2011.
    [8] D. Ferstl, C. Reinbacher, R. Ranftl, M. Ruether, and H. Bischof, “Image guided
    depth upsampling using anisotropic total generalized variation,” in IEEE International
    Conference on Computer Vision (ICCV), pp. 993–1000, 2013.
    [9] O. Choi and S.-W. Jung, “A consensus-driven approach for structure and texture
    aware depth map upsampling,” IEEE Transactions on Image Processing (TIP),
    vol. 23, no. 8, pp. 3321–3335, 2014.
    [10] K.-H. Lo, Y.-C. F. Wang, and K.-L. Hua, “Edge-preserving depth map upsampling
    by joint trilateral filter,” in IEEE Trans. Cybernetics, vol. 48, pp. 371–384, 2017.
    [11] D. Chan, H. Buisman, C. Theobalt, and S. Thrun, “A noise-aware filter for real-time
    depth upsampling,” in ECCV Workshop on Multi-camera and Multi-modal Sensor
    Fusion Algorithms and Applications, 2008.
    [12] Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-depth super resolution for range
    images,” in IEEE International Conference on Computer Vision and Pattern Recognition
    (CVPR), pp. 1–8, 2007.
    [13] O. M. Aodha, N. D. Campbell, A. Nair, and G. J. Brostow, “Patch based synthesis for
    single depth image super-resolution,” in European Conference on Computer Vision
    (ECCV), pp. 71–84, 2012.
    [14] M. Hornacek, C. Rhemann, M. Gelautz, and C. Rother, “Depth super resolution by
    rigid body self-similarity in 3d,” in IEEE International Conference on Computer
    Vision and Pattern Recognition (CVPR), pp. 1123–1130, 2013.
    [15] J. Li, Z. Lu, G. Zeng, R. Gan, and H. Zha, “Similarity-aware patchwork assembly
    for depth image super-resolution,” in IEEE International Conference on Computer
    Vision and Pattern Recognition (CVPR), pp. 3374–3381, 2014.
    [16] J. Lu and D. Forsyth, “Sparse depth super resolution,” in IEEE International Conference
    on Computer Vision and Pattern Recognition (CVPR), pp. 2245–2253, 2015.
    [17] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in IEEE
    International Conference on Computer Vision (ICCV), pp. 839–846, 1998.
    [18] Y. Li, L. Zhang, Y. Zhang, H. Xuan, and Q. Dai, “Depth map super-resolution via iterative
    joint-trilateral-upsampling,” in IEEE Visual Communications and Image Processing
    (VCIP), pp. 386–389, 2014.
    [19] J. Kim, J. Lee, S. Han, D. Kim, J. Min, and C. Kim, “Trilateral filter construction for
    depth map upsampling,” in IEEE IVMSP Workshop, pp. 1–4, 2013.
    [20] M.-Y. Liu, O. Tuzel, and Y. Taguchi, “Joint geodesic upsampling of depth images,”
    in IEEE International Conference on Computer Vision and Pattern Recognition
    (CVPR), pp. 169–176, 2013.
    [21] D. Min, J. Lu, and M. N. Do, “Depth video enhancement based on weighted mode
    filtering,” IEEE Transactions on Image Processing (TIP), vol. 21, no. 3, pp. 1176–
    1190, 2012.
    [22] K.-H. Lo, K.-L. Hua, and Y.-C. F. Wang, “Joint trilateral filtering for depth map
    super-resolution,” in IEEE Visual Communications and Image Processing (VCIP),
    pp. 1–6, 2013.
    [23] K.-L. Hua, K.-H. Lo, and Y.-C. F. Wang, “Extended guided filtering for depth map
    upsampling,” IEEE MultiMedia Magazine, vol. 23, no. 2, pp. 72–83, 2016.
    [24] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Aitken, A. Tejani, J. Totz, Z. Wang, and
    W. Shi, “Photo-realistic single image super-resolution using a generative adversarial
    network,” in The IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), July 2017.
    [25] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks
    for single image super-resolution,” in The IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR) Workshops, July 2017.
    [26] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural
    Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D.
    Lawrence, and K. Q. Weinberger, eds.), pp. 2672–2680, Curran Associates, Inc.,
    2014.
    [27] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo
    correspondence algorithms,” International Journal of Computer Vision (IJCV),
    vol. 47, no. 1-3, pp. 7–42, 2001.
    [28] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support
    inference from RGBD images,” in European Conference on Computer Vision
    (ECCV), pp. 746–760, 2012.
    [29] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International
    Conference on Learning Representations (ICLR), 2015.
    [30] “Middlebury stereo.” http://vision.middlebury.edu/stereo/.

    QR CODE