簡易檢索 / 詳目顯示

研究生: 廖潁桐
Ying-Tung Liao
論文名稱: 基於深度域一致性之域適應於雙目深度估測方法
Domain adaptation for stereo depth estimation via depth domain consistency
指導教授: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
口試委員: 賴坤財
Kuen-Tsair Lay
丘建青
Chien-Ching Chiu
鍾聖倫
Sheng-Luen Chung
方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 63
中文關鍵詞: 雙目深度預測卷積神經網路鑑別器非監督式學習域適應
外文關鍵詞: Stereo depth estimation, CNN, discriminator, unsupervised learning, domain adaptation
相關次數: 點閱:200下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了能讓雙目影像深度估測網路對不同數據集有更好的適應性,本論文提出了一種含有域適應Domain adaptation的深度估測網路,此方法不僅能適應不同數據集間環境的差異,更能接受不同數據集間雙目相機視角和基準線距離的差別。該方法藉由域適應架構使用一組有標記的數據集和一組未標記的數據集順練一個雙目深度估測網路,並藉由鑑別器以及邊緣特徵幫助目標數據集減小與源數據集域差距。同時本論文使用鑑別器對輸出進行分類,判斷結果是來源於哪個數據集,再將結果回饋給深度估測網路以完成非監督式學習。實驗結果顯示本論文在公開的雙目數據集中都有優良的表現,對比原先論文,在域適應方面更是解決了雙目相機基準線位移過大之數據集無法準確預測的問題。


    To enable the depth estimation of stereo image to be more adaptable to different data sets, this thesis establishes a depth estimation network with domain adaptation. This method can not only adapt to the environment of different data sets, but it also accommodates the differences in stereo camera viewing angles and baseline distances between different data sets. This method uses a set of labeled data sets and another unlabeled data sets to train the stereo depth estimation network based on domain adaptation, and helps the target data set to reduce the domain gap with the source data set with discriminator and edge features. More specifically, this thesis uses a discriminator to classify the output and determine which data sets the results come from,
    and then feeds the results back to the depth estimation network to complete unsupervised learning.
    Compared with previous works on domain adaptation, the new method can solve the problem when the data sets have a large baseline displacement in the stereo cameras and cannot be accurately predicted.
    Experimental results show that this this thesis can provide satisfactory performance on some the publicly available stereo data sets.

    摘要. . . . . . . . . . . . . . . . . . . . . . . . . .i Abstract . . . . . . . . . . . . . . . . . . . . . . . .ii Acknowledgment . . . . . . . . . . . . . . . . . . . .iii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . .vii List of Tables . . . . . . . . . . . . . . . . . . . . . .ix List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . .x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 2.1 Monocular Depth Estimation . . . . . . . . . . . . . . . . . . . .3 2.2 Stereo Depth Estimation . . . . . . . . . . . . . . . . . . . . . . .4 2.3 Neural Architectures . . . . . . . . . . . . . . . . . . . . . . . . .4 2.4 Domain Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . .6 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 3.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . .9 3.2 Stereo Depth Estimation . . . . . . . . . . . . . . . . . . . . . . .10 3.3 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . .12 3.4 Domain Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . .13 3.5 Depth Re-Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . .15 3.5.1Real-World Scale (Structure from Motion) . . . . . . . . .15 3.5.2Intercept and Slope (Linear Regression) . . . . . . . . . .16 3.6 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 3.6.1Loss Function for Depth Estimation Models . . . . . . . .18 3.6.2Loss Function for Domain Adaption Models . . . . . . . .18 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 4 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . .20 4.1 Stereo Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 4.1.1Carla Dataset . . . . . . . . . . . . . . . . . . . . . . . . .20 4.1.2Driving Stereo Dataset . . . . . . . . . . . . . . . . . . . .23 4.1.3DSEC Dataset . . . . . . . . . . . . . . . . . . . . . . . .24 4.1.4Kitti 2015 Dataset . . . . . . . . . . . . . . . . . . . . . .24 4.1.5ITRI Dataset . . . . . . . . . . . . . . . . . . . . . . . . .26 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . .29 4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . .31 4.3.1Root Mean Square Error . . . . . . . . . . . . . . . . . . .31 4.3.2Structural Similarity . . . . . . . . . . . . . . . . . . . . .32 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . .33 4.4.1Driving Stereo Dataset . . . . . . . . . . . . . . . . . . . .33 4.4.2DSEC Dataset . . . . . . . . . . . . . . . . . . . . . . . .36 4.4.3KITTI Dataset . . . . . . . . . . . . . . . . . . . . . . . .39 4.5Failure Cases and Error Analysis . . . . . . . . . . . . . . . . . .42 4.5.1Different Baselines . . . . . . . . . . . . . . . . . . . . . .42 4.5.2Unseen Objects in source domain . . . . . . . . . . . . . .45 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46 5 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . .47 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

    [1] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging intoself-supervised monocular depth prediction,” October 2019.
    [2] O. Ronneberger, P. Fische, and T. Brox, “U-net: Convolutional networks forbiomedical image segmentation,” May 2015.
    [3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” Dec 2016.
    [4] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image qualityassessment: from error visibility to structural similarity,” April 2004.
    [5] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monoculardepth estimation with left-right consistency,” inCVPR, 2017.
    [6] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: Thekitti dataset,”International Journal of Robotics Research (IJRR), 2013.
    [7] M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” inConference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [8] P. H. P. F. Nikolaus Mayer, Eddy Ilg, “A large dataset to train convolutionalnetworks for disparity, optical flow, and scene flow estimation,”IEEE, Dec2015.
    [9] G. Yang, J. Manela, M. Happold, and D. Ramanan, “Hierarchical deep stereomatching on high-resolution images,” inThe IEEE Conference on ComputerVision and Pattern Recognition (CVPR), June 2019.
    [10] J. Zbontar and Y. LeCun, “Stereo matching by training a convolutionalneural network to compare image patches,”Journal of Machine LearningResearch, vol. 17, pp. 1–32, 2016.48
    [11] A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy,A. Bachrach, and A. Bry, “End-to-end learning of geometry and contextfor deep stereo regression,”CoRR, vol. abs/1703.04309, 2017.
    [12] F. H. Tobias Domhan, Jost Tobias Springenberg, “Speeding up automatichyperparameter optimization of deep neural networks by extrapolation oflearning curves,”IJCAI, Dec 2015.
    [13] Q. V. L. Barret Zoph, “Neural architecture search with reinforcement learn-ing,”ICLR, Feb 2017.
    [14] H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecturesearch,”arXiv preprint arXiv:1806.09055, 2018.
    [15] T. Saikia, Y. Marrakchi, A. Zela, F. Hutter, and T. Brox, “Autodispnet:Improving disparity estimation with automl,” inIEEE International Con-ference on Computer Vision (ICCV), October 2019.
    [16] J. V. Frank Hutter, Lars Kotthoff,Automated Machine Learning : Meth-ods, Systems, Challenges. In press, available at http://automl.org/book.:Springer International Publishing, 2018.
    [17] Z. W. R. P. A. N. D. F. Bobak Shahriari, Kevin Swersky, “Taking the humanout of the loop: A review of bayesian optimization,”IEEE, Dec 2015.
    [18] X. Cheng, Y. Zhong, M. Harandi, Y. Dai, X. Chang, H. Li, T. Drummond,and Z. Ge, “Hierarchical neural architecture search for deep stereo match-ing,”Advances in Neural Information Processing Systems, vol. 33, 2020.
    [19] T. Y. D. L. Yiheng Zhang, Zhaofan Qiu and T. Mei, “Fully convolutionaladaptation networks for semantic segmentation,”CVPR, Dec 2018.
    [20] S. M. I. I. K. Liang-Chieh Chen, George Papandreou, “Semantic image seg-mentation with deep convolutional nets, atrous convolution, and fully con-nected crfs,”IEEE, May 2017.49
    [21] Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. Chan-draker, “Learning to adapt structured output space for semantic segmen-tation,” inIEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2018.
    [22] S. Zhao, H. Fu, M. Gong, and D. Tao, “Geometry-aware symmetric domainadaptation for monocular depth estimation,” inProceedings of the IEEEConference on Computer Vision and Pattern Recognition, pp. 9788–9798,2019.
    [23] Y. Zou, Z. Yu, X. Liu, B. V. Kumar, and J. Wang, “Confidence regularizedself-training,” inThe IEEE International Conference on Computer Vision(ICCV), October 2019.
    [24] Y. Zou, Z. Yu, B. V. Kumar, and J. Wang, “Unsupervised domain adaptationfor semantic segmentation via class-balanced self-training,” inProceedings ofthe European Conference on Computer Vision (ECCV), pp. 289–305, 2018.
    [25] P. Gargallo, “Opensfm structure from motion utility,” Feb 2019.
    [26] J. L. Sch ̈onberger, T. Price, T. Sattler, J.-M. Frahm, and M. Pollefeys, “Avote-and-verify strategy for fast spatial verification in image retrieval,” inAsian Conference on Computer Vision (ACCV), 2016.
    [27] J. L. Sch ̈onberger and J.-M. Frahm, “Structure-from-motion revisited,” inConference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [28] J. L. Sch ̈onberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixelwiseview selection for unstructured multi-view stereo,” inEuropean Conferenceon Computer Vision (ECCV), 2016.

    無法下載圖示 全文公開日期 2024/09/12 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE