基於照明調整和語意分割之自監督單目夜間場景深度預測

簡易檢索 / 詳目顯示

回結果列表

研究生：	胡蕎伊 CHAIO-YI HU
論文名稱：	基於照明調整和語意分割之自監督單目夜間場景深度預測 Self-Supervised Night-time Monocular Depth Estimation Using Illumination Adjustment and Segmentation Information
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	陳永耀 Yung-Yao Chen 陳宜惠 Yi-Hui Chen 鄭文皇 Wen-Huang Cheng 孫士韋 Shih-Wei Sun
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	31
中文關鍵詞：	語意分割、注意力、單目深度預測、夜間、牛津大學機器人汽車駕駛資料集、影像增強
外文關鍵詞：	Semantic Segmentation, Attention, Monocular Depth Estimation, Night-time, Oxford Robotcar Driving Dataset, Image Enhance
相關次數：	點閱：298 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

自監督式的單目深度預測的主流方法通常適用於白天的影像。然而，由於可見度低、照明不均勻和照明不一致(如閃爍)，這些方法對夜間拍攝的影像不能夠很好的運作。我們提出了 LightSegDepth(LSDepth)，涉及更徹底的光照調整和改進的語義分割的特徵表示。與僅能處理黑暗場景中低可見度的方法不同，LSDepth 利用糾正曝光不足(如黑暗區域)和曝光過度(如眩光)來處理不均勻的照明。此外，我們還以語義分割特徵的形式對模型進行了結構性的增強。這一聯合任務具有多方面的優勢。(1) 語義圖提供了結構的連續性，以防止深度” 洞” 的產生。(2) 為物體的重新識別提供了額外的背景，特別是對於光線不一致的連續影像幀。(3) 語義分組的像素減少了對應的搜索空間。實驗結果表明，LSDepth 在牛津大學機器人汽車夜間駕駛資料集上得到了最先進的性能。

Mainstream methods for self-supervised monocular depth estimation are typically applied for daytime images. However, these methods will not work well for images taken during the night due to low visibility, uneven illumination, and inconsistent lighting (e.g., flickering). We propose our method LightSegDepth (LSDepth), which involves a more thorough illumination adjustment and improved feature representation with semantic segmentation. Contrary to methods that only handle low visibility in dark scenes, LSDepth also handles uneven lighting by correcting both un- derexposure (e.g., dark regions) and overexposure (e.g., glare). Further- more, we supplemented our model with structural context in the form of semantic segmentation features. This joint task has multi-fold advantages. (1) Semantic maps provide structural continuity to prevent depth ”holes”. (2) It provides additional context for object re-identification especially for lighting-inconsistent consecutive video frames. (3) Semantic grouping pixel reduce correspondence search space. Experimental results show that LS- Depth improves the state-of-the-art performance on the Oxford RobotCar night driving dataset.

AbstractinChinese .......................... i
AbstractinEnglish .......................... ii Acknowledgements.......................... iii
Contents................................ iv
ListofFigures............................. vi
ListofTables ............................. viii
Introduction ............................ 1
Relatedwork ........................... 4
1. Self-SupervisedDepthEstimation . . . . . . . . . . . . . 4
2. Night-time Self-Supervised Depth Estimation . . . . . . . 5
Method .............................. 7
1. Self-SupervisedDepthEstimation . . . . . . . . . . . . . 8
2. IlluminationCorrectionModule .............. 10
3. Segmentation Information Sharing Module . . . . . . . . 11
Experiments............................ 14
1. Dataset ........................... 14
2. ImplementationDetails................... 14
3. DepthEstimationPerformance............... 15
4. AblationStudy ....................... 16
Conclusions ............................ 19
References
                                

[1] M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” Advances in neural information processing systems, vol. 28, 2015.
[2] G. N. DeSouza and A. C. Kak, “Vision for mobile robot navigation: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 2, pp. 237–267, 2002.
[3] R. T. Azuma, “A survey of augmented reality,” Presence: teleoperators & virtual environments, vol. 6, no. 4, pp. 355–385, 1997.
[4] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
[5] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE access, vol. 8, pp. 58443–58469, 2020.
[6] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1851–1858, 2017.
[7] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left- right consistency,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 270–279, 2017.
[8] C. Zhao, Y. Tang, and Q. Sun, “Unsupervised monocular depth estimation in highly complex envi- ronments,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2022.
[9] J. Spencer, R. Bowden, and S. Hadfield, “Defeat-net: General monocular depth via simultaneous unsupervised representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14402–14413, 2020.
[10] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838, 2019.
[11] K. Wang, Z. Zhang, Z. Yan, X. Li, B. Xu, J. Li, and J. Yang, “Regularizing nighttime weirdness: Efficient self-supervised monocular depth estimation in the dark,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16055–16064, 2021.
[12] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” in Proceedings of the 27th ACM international conference on multimedia, pp. 1632–1640, 2019.
[13] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
[14] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in neural information processing systems, vol. 27, 2014.
[15] A. Saxena, S. Chung, and A. Ng, “Learning depth from single monocular images,” Advances in neural information processing systems, vol. 18, 2005.
[16] Y. Meng, Y. Lu, A. Raj, S. Sunarjo, R. Guo, T. Javidi, G. Bansal, and D. Bharadia, “Signet: Semantic instance aided unsupervised 3d geometry perception,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 9810–9820, 2019.
[17] A. Sharma, L.-F. Cheong, L. Heng, and R. T. Tan, “Nighttime stereo depth estimation using joint translation-stereo learning: Light effects and uninformative regions,” in 2020 International Confer- ence on 3D Vision (3DV), pp. 23–31, IEEE, 2020.
[18] Y. Kuznietsov, J. Stuckler, and B. Leibe, “Semi-supervised deep learning for monocular depth map prediction,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6647–6655, 2017.
[19] M. Vankadari, S. Garg, A. Majumder, S. Kumar, and A. Behera, “Unsupervised monocular depth es- timation for night-time images using adversarial domain feature adaptation,” in European Conference on Computer Vision, pp. 443–459, Springer, 2020.
[20] L. Liu, X. Song, M. Wang, Y. Liu, and L. Zhang, “Self-supervised monocular depth estimation for all day images using domain separation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12737–12746, 2021.
[21] M. Li, J. Liu, W. Yang, X. Sun, and Z. Guo, “Structure-revealing low-light image enhancement via robust retinex model,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2828–2841, 2018.
[22] C. Chen, Q. Chen, J. Xu, and V. Koltun, “Learning to see in the dark,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3291–3300, 2018.
[23] Q. Zhang, Y. Nie, and W.-S. Zheng, “Dual illumination estimation for robust exposure correction,” in Computer Graphics Forum, vol. 38, pp. 243–252, Wiley Online Library, 2019.
[24] P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact image code,” in Readings in computer vision, pp. 671–679, Elsevier, 1987.
[25] Y. Zhu, K. Sapra, F. A. Reda, K. J. Shih, S. Newsam, A. Tao, and B. Catanzaro, “Improving semantic segmentation via video propagation and label relaxation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8856–8865, 2019.
[26] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image seg- mentation,” in International Conference on Medical image computing and computer-assisted inter- vention, pp. 234–241, Springer, 2015.
[27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE trans- actions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
[28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[30] P. Rottmann, T. Posewsky, A. Milioto, C. Stachniss, and J. Behley, “Improving monocular depth es- timation by semantic pre-training,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5916–5923, IEEE, 2021.
[31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

全文公開日期 2025/09/27 (校內網路)
全文公開日期 2052/09/27 (校外網路)
全文公開日期 2052/09/27 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文