基於語義實例輔助實現高分辨率深度估計｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李明勳 Ming-Xun Li
論文名稱：	基於語義實例輔助實現高分辨率深度估計 Semantic Instance Aided High Resolution Depth Estimation
指導教授：	陳郁堂 Yie-Tarng Chen
口試委員:	陳郁堂 Yie-Tarng Chen 陳省隆 Hsing-Lung Chen 林銘波 Ming-Bo Lin 方文賢 Wen-Hsien Fang 黃琴雅 Chin-Ya Huang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2021
畢業學年度：	110
語文別：	英文
論文頁數：	26
中文關鍵詞：	立體攝影機、深度估計、語義分割、實例分割
外文關鍵詞：	Stereo Camera, Depth Estimation, Semantic Segmentation, Instance Segmentation
相關次數：	點閱：230 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

本文研究利用雙相機鏡頭的深度預測，並加入語義及實例分割進行輔助。在單、雙鏡頭的深度預測研究中，利用捲積神經網路的深度學習預測深度已有不錯的表現。但是，卻無法在物件的邊緣及細節上有更佳的表現。因此本文提出利用神經網絡架構，通過使用雙相機鏡頭圖像，並且加入語義及實例分割的輔助，在良好的深度預測上，對物件的邊緣及細節上有更佳的表現。在神經網路架構上，除了利用雙相機鏡頭圖像、語義及實例分割進行深度預測外，還對深度進行了語義及實例分割的預測。加入語義分割及實例分割的輔助後，雖會使物件的邊緣及細節的預測上有良好的表現，但在訓練的過程中，語義及實例分割的信息可能會減少，所以另外加入語義及實例分割的預測以進行約束，使深度的預測更加良好。本論文在KITTI 2015和Carla數據集上的實驗表明了該方法的有效性。

We present a framework for high resolution stereo depth estimation. In recent years, deep neural networks showed fast progress in depth estimation for both monocular and stereo cameras. For new visual applications such as novel view synthesis, it is desired to acquire a high-resolution depth map to render a high-quality novel view. However, the outputs of existing high-resolution depth estimation approaches still lack fine depth estimation and sharp boundaries between objects. To address this issue, we propose a hierarchical stereo matching architecture integrating semantic and instance segmentation. Since the information of semantic and instant segmentation can impose spatial constraint on pixels such as learning the shape boundary of objects. We empirically evaluate our framework on KITTI 2015 and Carla datasets and show the effectiveness of the proposed approach.

Abstract　I
Acknowledgment　III
List of Figures　VI
List of Tables　VII
Introduction　1
1 Motivations　1
2 Summary of Thesis　2
3 Contributions　2
4 Thesis Outline　3
Related Work　4
1 Stereo Depth Estimation　4
2 Coarse-to-fine CNNs　4
3 Deep Learning for segmentation Prediction　5
4 Multi-Task for Segmentation and Depth　5
Proposed Method　6
1 Overall Methodology　6
2 Semantic and Instance Input Augmentation　6
3 Hierarchical Depth Stereo Matching　8
4 Transfer Semantic neural network　9
5 Loss Functions　10
6 Novel View Synthesis　11
Experimental Results　13
1 Dataset　13
1.1 KITTI Dataset　13
1.2 CARLA Dataset　15
2 Evaluation Protocol　17
2.1 Root-Mean-Square Error　17
2.2 Absolute Relative Error　17
2.3 Three-Pixel Error　18
2.4 Structural Similarity　18
3 Experimental Results　19
3.1 Results of KITTI-2015　19
3.2 Results of CARLA　22
Conclusion and Future Works　24
1 Conclusion　24
2 Future Works　24
References　25
                                

[1] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, pp. 1231–1237, Aug. 2013.
[2] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16, 2017.
[3] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
[4] M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070, 2015.
[5] G. Yang, J. Manela, M. Happold, and D. Ramanan, “Hierarchical deep stereo matching on high-resolution images,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. abs/1912.06704, 2019.
[6] Y. Meng, Y. Lu, A. Raj, S. Sunarjo, R. Guo, T. Javidi, G. Bansal, and
D. Bharadia, “Signet: Semantic instance aided unsupervised 3d geometry perception,” CoRR, vol. abs/1812.05642, 2018.
[7] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1411.4038, 2014.
[8] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” CoRR, vol. abs/1505.04597, 2015.
[9] D. Sun, X. Yang, M. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” CoRR, vol. abs/1709.02371, 2017.
[10] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,” CoRR, vol. abs/2005.10821, 2020.
[11] A. Kirillov, Y. Wu, K. He, and R. B. Girshick, “Pointrend: Image segmentation as rendering,” CoRR, vol. abs/1912.08193, 2019.
[12] R. Caruana, “Multitask learning,” Machine Learning, vol. 28, no. 1, pp. 41– 75, 1997.
[13] B. Liu, S. Gould, and D. Koller, “Single image depth estimation from predicted semantic labels,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1253–1260, 2010.
[14] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2.”
https://github.com/facebookresearch/detectron2, 2019.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.
[16] C. Godard, O. M. Aodha, M. Firman, and G. Brostow, “Digging into self-supervised monocular depth estimation,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837, 2019.
[17] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[18] M.-L. Shih, S.-Y. Su, J. Kopf, and J.-B. Huang, “3d photography using context-aware layered depth inpainting,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

簡易檢索 / 詳目顯示

相關論文