研究生: |
李明勳 Ming-Xun Li |
---|---|
論文名稱: |
基於語義實例輔助實現高分辨率深度估計 Semantic Instance Aided High Resolution Depth Estimation |
指導教授: |
陳郁堂
Yie-Tarng Chen |
口試委員: |
陳郁堂
Yie-Tarng Chen 陳省隆 Hsing-Lung Chen 林銘波 Ming-Bo Lin 方文賢 Wen-Hsien Fang 黃琴雅 Chin-Ya Huang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2021 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | 立體攝影機 、深度估計 、語義分割 、實例分割 |
外文關鍵詞: | Stereo Camera, Depth Estimation, Semantic Segmentation, Instance Segmentation |
相關次數: | 點閱:230 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文研究利用雙相機鏡頭的深度預測,並加入語義及實例分割進行輔助。在單、雙鏡頭的深度預測研究中,利用捲積神經網路的深度學習預測深度已有不錯的表現。但是,卻無法在物件的邊緣及細節上有更佳的表現。因此本文提出利用神經網絡架構,通過使用雙相機鏡頭圖像,並且加入語義及實例分割的輔助,在良好的深度預測上,對物件的邊緣及細節上有更佳的表現。在神經網路架構上,除了利用雙相機鏡頭圖像、語義及實例分割進行深度預測外,還對深度進行了語義及實例分割的預測。加入語義分割及實例分割的輔助後,雖會使物件的邊緣及細節的預測上有良好的表現,但在訓練的過程中,語義及實例分割的信息可能會減少,所以另外加入語義及實例分割的預測以進行約束,使深度的預測更加良好。本論文在KITTI 2015和Carla數據集上的實驗表明了該方法的有效性。
We present a framework for high resolution stereo depth estimation. In recent years, deep neural networks showed fast progress in depth estimation for both monocular and stereo cameras. For new visual applications such as novel view synthesis, it is desired to acquire a high-resolution depth map to render a high-quality novel view. However, the outputs of existing high-resolution depth estimation approaches still lack fine depth estimation and sharp boundaries between objects. To address this issue, we propose a hierarchical stereo matching architecture integrating semantic and instance segmentation. Since the information of semantic and instant segmentation can impose spatial constraint on pixels such as learning the shape boundary of objects. We empirically evaluate our framework on KITTI 2015 and Carla datasets and show the effectiveness of the proposed approach.
[1] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, pp. 1231–1237, Aug. 2013.
[2] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16, 2017.
[3] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
[4] M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070, 2015.
[5] G. Yang, J. Manela, M. Happold, and D. Ramanan, “Hierarchical deep stereo matching on high-resolution images,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. abs/1912.06704, 2019.
[6] Y. Meng, Y. Lu, A. Raj, S. Sunarjo, R. Guo, T. Javidi, G. Bansal, and
D. Bharadia, “Signet: Semantic instance aided unsupervised 3d geometry perception,” CoRR, vol. abs/1812.05642, 2018.
[7] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1411.4038, 2014.
[8] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” CoRR, vol. abs/1505.04597, 2015.
[9] D. Sun, X. Yang, M. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” CoRR, vol. abs/1709.02371, 2017.
[10] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,” CoRR, vol. abs/2005.10821, 2020.
[11] A. Kirillov, Y. Wu, K. He, and R. B. Girshick, “Pointrend: Image segmentation as rendering,” CoRR, vol. abs/1912.08193, 2019.
[12] R. Caruana, “Multitask learning,” Machine Learning, vol. 28, no. 1, pp. 41– 75, 1997.
[13] B. Liu, S. Gould, and D. Koller, “Single image depth estimation from predicted semantic labels,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1253–1260, 2010.
[14] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2.”
https://github.com/facebookresearch/detectron2, 2019.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.
[16] C. Godard, O. M. Aodha, M. Firman, and G. Brostow, “Digging into self-supervised monocular depth estimation,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837, 2019.
[17] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[18] M.-L. Shih, S.-Y. Su, J. Kopf, and J.-B. Huang, “3d photography using context-aware layered depth inpainting,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.