Basic Search / Detailed Display

Author: 胡蕎伊
CHAIO-YI HU
Thesis Title: 基於照明調整和語意分割之自監督單目夜間場景深度預測
Self-Supervised Night-time Monocular Depth Estimation Using Illumination Adjustment and Segmentation Information
Advisor: 花凱龍
Kai-Lung Hua
Committee: 陳永耀
Yung-Yao Chen
陳宜惠
Yi-Hui Chen
鄭文皇
Wen-Huang Cheng
孫士韋
Shih-Wei Sun
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2022
Graduation Academic Year: 110
Language: 英文
Pages: 31
Keywords (in Chinese): 語意分割注意力單目深度預測夜間牛津大學機器人汽車駕駛資料集影像增強
Keywords (in other languages): Semantic Segmentation, Attention, Monocular Depth Estimation, Night-time, Oxford Robotcar Driving Dataset, Image Enhance
Reference times: Clicks: 253Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 自監督式的單目深度預測的主流方法通常適用於白天的影像。然而,由於 可見度低、照明不均勻和照明不一致(如閃爍),這些方法對夜間拍攝的 影像不能夠很好的運作。我們提出了 LightSegDepth(LSDepth),涉及 更徹底的光照調整和改進的語義分割的特徵表示。與僅能處理黑暗場景中 低可見度的方法不同,LSDepth 利用糾正曝光不足(如黑暗區域)和曝光 過度(如眩光)來處理不均勻的照明。此外,我們還以語義分割特徵的形 式對模型進行了結構性的增強。這一聯合任務具有多方面的優勢。(1) 語 義圖提供了結構的連續性,以防止深度” 洞” 的產生。(2) 為物體的重新識 別提供了額外的背景,特別是對於光線不一致的連續影像幀。(3) 語義分 組的像素減少了對應的搜索空間。實驗結果表明,LSDepth 在牛津大學機 器人汽車夜間駕駛資料集上得到了最先進的性能。


    Mainstream methods for self-supervised monocular depth estimation are typically applied for daytime images. However, these methods will not work well for images taken during the night due to low visibility, uneven illumination, and inconsistent lighting (e.g., flickering). We propose our method LightSegDepth (LSDepth), which involves a more thorough illumination adjustment and improved feature representation with semantic segmentation. Contrary to methods that only handle low visibility in dark scenes, LSDepth also handles uneven lighting by correcting both un- derexposure (e.g., dark regions) and overexposure (e.g., glare). Further- more, we supplemented our model with structural context in the form of semantic segmentation features. This joint task has multi-fold advantages. (1) Semantic maps provide structural continuity to prevent depth ”holes”. (2) It provides additional context for object re-identification especially for lighting-inconsistent consecutive video frames. (3) Semantic grouping pixel reduce correspondence search space. Experimental results show that LS- Depth improves the state-of-the-art performance on the Oxford RobotCar night driving dataset.

    AbstractinChinese .......................... i AbstractinEnglish .......................... ii Acknowledgements.......................... iii Contents................................ iv ListofFigures............................. vi ListofTables ............................. viii 1. Introduction ............................ 1 2. Relatedwork ........................... 4 2.1. Self-SupervisedDepthEstimation . . . . . . . . . . . . . 4 2.2. Night-time Self-Supervised Depth Estimation . . . . . . . 5 3. Method .............................. 7 3.1. Self-SupervisedDepthEstimation . . . . . . . . . . . . . 8 3.2. IlluminationCorrectionModule .............. 10 3.3. Segmentation Information Sharing Module . . . . . . . . 11 4. Experiments............................ 14 4.1. Dataset ........................... 14 4.2. ImplementationDetails................... 14 4.3. DepthEstimationPerformance............... 15 4.4. AblationStudy ....................... 16 5. Conclusions ............................ 19 References

    [1] M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” Advances in neural information processing systems, vol. 28, 2015.
    [2] G. N. DeSouza and A. C. Kak, “Vision for mobile robot navigation: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 2, pp. 237–267, 2002.
    [3] R. T. Azuma, “A survey of augmented reality,” Presence: teleoperators & virtual environments, vol. 6, no. 4, pp. 355–385, 1997.
    [4] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
    [5] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE access, vol. 8, pp. 58443–58469, 2020.
    [6] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1851–1858, 2017.
    [7] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left- right consistency,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 270–279, 2017.
    [8] C. Zhao, Y. Tang, and Q. Sun, “Unsupervised monocular depth estimation in highly complex envi- ronments,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2022.
    [9] J. Spencer, R. Bowden, and S. Hadfield, “Defeat-net: General monocular depth via simultaneous unsupervised representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14402–14413, 2020.
    [10] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838, 2019.
    [11] K. Wang, Z. Zhang, Z. Yan, X. Li, B. Xu, J. Li, and J. Yang, “Regularizing nighttime weirdness: Efficient self-supervised monocular depth estimation in the dark,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16055–16064, 2021.
    [12] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” in Proceedings of the 27th ACM international conference on multimedia, pp. 1632–1640, 2019.
    [13] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
    [14] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in neural information processing systems, vol. 27, 2014.
    [15] A. Saxena, S. Chung, and A. Ng, “Learning depth from single monocular images,” Advances in neural information processing systems, vol. 18, 2005.
    [16] Y. Meng, Y. Lu, A. Raj, S. Sunarjo, R. Guo, T. Javidi, G. Bansal, and D. Bharadia, “Signet: Semantic instance aided unsupervised 3d geometry perception,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 9810–9820, 2019.
    [17] A. Sharma, L.-F. Cheong, L. Heng, and R. T. Tan, “Nighttime stereo depth estimation using joint translation-stereo learning: Light effects and uninformative regions,” in 2020 International Confer- ence on 3D Vision (3DV), pp. 23–31, IEEE, 2020.
    [18] Y. Kuznietsov, J. Stuckler, and B. Leibe, “Semi-supervised deep learning for monocular depth map prediction,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6647–6655, 2017.
    [19] M. Vankadari, S. Garg, A. Majumder, S. Kumar, and A. Behera, “Unsupervised monocular depth es- timation for night-time images using adversarial domain feature adaptation,” in European Conference on Computer Vision, pp. 443–459, Springer, 2020.
    [20] L. Liu, X. Song, M. Wang, Y. Liu, and L. Zhang, “Self-supervised monocular depth estimation for all day images using domain separation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12737–12746, 2021.
    [21] M. Li, J. Liu, W. Yang, X. Sun, and Z. Guo, “Structure-revealing low-light image enhancement via robust retinex model,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2828–2841, 2018.
    [22] C. Chen, Q. Chen, J. Xu, and V. Koltun, “Learning to see in the dark,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3291–3300, 2018.
    [23] Q. Zhang, Y. Nie, and W.-S. Zheng, “Dual illumination estimation for robust exposure correction,” in Computer Graphics Forum, vol. 38, pp. 243–252, Wiley Online Library, 2019.
    [24] P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact image code,” in Readings in computer vision, pp. 671–679, Elsevier, 1987.
    [25] Y. Zhu, K. Sapra, F. A. Reda, K. J. Shih, S. Newsam, A. Tao, and B. Catanzaro, “Improving semantic segmentation via video propagation and label relaxation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8856–8865, 2019.
    [26] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image seg- mentation,” in International Conference on Medical image computing and computer-assisted inter- vention, pp. 234–241, Springer, 2015.
    [27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE trans- actions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
    [28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
    [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [30] P. Rottmann, T. Posewsky, A. Milioto, C. Stachniss, and J. Behley, “Improving monocular depth es- timation by semantic pre-training,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5916–5923, IEEE, 2021.
    [31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

    無法下載圖示 Full text public date 2025/09/27 (Intranet public)
    Full text public date 2052/09/27 (Internet public)
    Full text public date 2052/09/27 (National library)
    QR CODE