簡易檢索 / 詳目顯示

研究生: 陳品光
Pin-Kuang Chen
論文名稱: 基於二階段資料擴增之實現在霧霾情境之單眼深度估計
Monocular Depth Estimation on Hazy Scenes Based on Two-stage Data Augmentation
指導教授: 林昌鴻
Chang-Hong Lin
口試委員: 林昌鴻
Chang-Hong Lin
陳維美
Wei-Mei Chen
陳永耀
Yung-Yao Chen
王煥宗
Huan-Chun Wang
林敬舜
Ching-Shun Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 64
中文關鍵詞: 立體視覺單眼深度預測深度學習無監督式學習資料擴增
外文關鍵詞: Stereo, Monocular Depth Estimation, Deep Learning, Unsupervised, Data Augmentation
相關次數: 點閱:303下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

由於近日深度學習在各領域的活躍,單眼深度估計的研究也有大幅的進步,同時也因為自動駕駛日益普及,單眼深度估計也開始嘗試運用在自動駕駛。儘管單眼深度估計的確靠著深度學習網路的應用,較以往傳統的機器學習方式大幅提升了準確度,卻仍然在實際的應用方面存在著龐大的進步空間。因此,本論文在深度學習的使用上,提出了二階段資料擴增的方法,在第一階段,我們針對訓練資料進行處理,使其模擬了現實應用中將遇到的難題,為深度學習網路提供了更多樣的訓練資料,並藉此提升模型預測之準確性,減少在有限資料的情況下產生之過度擬合的問題,強化網路的強健性及泛化程度。在第二階段,我們利用經過第一階段強化過的模型預測訓練資料的深度,再使用大氣散射物理模型生成接近真實場景的霧霾,為深度學習網路提供不同天氣現象的訓練資料,並藉此提升模型適應不同環境的能力。 為了驗證本方法的有效性,我們會在KITTI stereo 2015資料集使用大氣散射物理模型生成霧霾場景的測試資料,並且將KITTI stereo 2015資料集和生成的霧霾場景圖都用於驗證,最後驗證透過兩階段資料擴增會讓模型擁有預測不同環境深度的能力。


With the recent success of deep learning, the research of monocular depth estimation has also made significant progress. At the same time, due to the increasing popularity of autonomous driving, the task of monocular depth estimation is also trying to be implemented on autonomous driving. However, while some tasks like monocular depth estimation did achieve substantial improvements to the traditional machine learning based methods, they are still far from perfect to satisfy the need of real-life applications. In this thesis, we proposed a two-stage data augmentation method. In stage one, we use a data augmentation method by altering the training images that resemble real-world scenarios to improve the performance of the networks by providing more varieties to the training data. In stage two, we use the model trained by stage one to predict the depth from training data, and then use the atmospheric scattering model to generate the hazy scene that closes to the real-world scenarios. In the end, we provide training data of different weather phenomena for the deep learning network, and use this to train the model's ability to recognize different environments. To verify the effectiveness of our method, we would use the atmospheric scattering model to generate the test data of the haze scene from the KITTI stereo 2015 data set, and use the KITTI stereo 2015 data set and generated haze scene image for verification. Finally, the model would have the ability to predict the depth of different environments.

摘要 I ABSTRACT II LIST OF CONTENTS III LIST OF FIGURES V LIST OF TABLES VII CHAPTER 1 INTRODUCTIONS 1 1.1 Motivation 1 1.2 Contributions 2 1.3 Thesis Organization 3 CHAPTER 2 RELATED WORKS 4 2.1 Related Methods 4 2.1.1 Supervised Monocular Depth Estimation Methods 4 2.1.2 Unsupervised Monocular Depth Estimation Methods 5 2.2 ResNet [15] 6 CHAPTER 3 PROPOSED METHOD 9 3.1 Depth Estimation Concept 10 3.2 Data Augmentation 13 3.2.1 Blur 15 3.2.2 Random Flip Left-to-right 17 3.2.3 Random Color Jitter 19 3.2.3.1 Random Gamma Transform 22 3.2.3.2 Random Brightness 24 3.2.4 Random Color Shift 26 3.2.5 Haze Image Generation 28 CHAPTER 4 IMPLEMENTATION 30 4.1 Preprocessing 30 4.2 Network Architecture 31 4.2.1 Training Loss 33 CHAPTER 5 EXPERIMENTAL RESULTS 35 5.1 Experimental Environment 35 5.2 KITTI Database [22] 36 5.3 Evaluation Methods 38 5.4 Performance Evaluation 39 5.4.1 Evaluation Results of Data Augmentation I 39 5.4.2 Evaluation Results of Data Augmentation II 42 5.4.2.1 Performance Evaluation on KITTI Stereo 2015 42 5.4.2.2 Performance Evaluation on Hazy Image from KITTI 2015 45 CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 48 6.1 Conclusions 48 6.2 Future Works 49 REFERENCES 50

REFERENCES
[1] National Highway Traffic Safety Administration, "Asleep at the wheel - A national compendium of efforts to eliminate drowsy driving," 2017.
[2] R. W. Wolcott and R. M. Eustice, "Visual localization within lidar maps for automated urban driving," in IEEE/RSJ international conference on intelligent robots and systems, 2014, pp. 176-183.
[3] K. Yoneda, H. Tehrani, T. Ogawa, N. Hukuyama, and S. Mita, "Lidar scan feature for localization with highly precise 3-D map," in IEEE intelligent vehicles symposium proceedings, 2014, pp. 1345-1350
[4] J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, "Stacked convolutional auto-encoders for hierarchical feature extraction," in International conference on artificial neural networks, 2011, pp. 52-59.
[5] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, "Deep feature extraction and classification of hyperspectral images based on convolutional neural networks," in IEEE transactions on geoscience and remote sensing, 2016, pp. 6232-6251, vol. 54, no. 10.
[6] D. Eigen, C. Puhrsch, and R. Fergus, "Depth map prediction from a single image using a multi-scale deep network," in Advances in neural information processing systems, 2014, pp. 2366-2374.
[7] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGB-D images," in European conference on computer vision, 2012, pp. 746-760.
[8] D. Eigen and R. Fergus, "Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2650-2658.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[10] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," 2014, arXiv preprint arXiv:1409.1556.
[11] Y. Cao, Z. Wu, and C. Shen, "Estimating depth from monocular images as classification using deep fully convolutional residual networks," in IEEE transactions on circuits and systems for video technology, 2017, pp. 3174-3182, vol. 28, no. 11.
[12] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, "Deeper depth prediction with fully convolutional residual networks," in Fourth international conference on 3D vision, 2016, pp. 239-248.
[13] R. Garg, V. Kumar BG, G. Carneiro, and I. Reid, "Unsupervised CNN for single view depth estimation: Geometry to the rescue," in European conference on computer vision, 2016, pp. 740-756.
[14] C. Godard, O. M. Aodha, and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270-279.
[15] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[16] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, "Imagenet: A large-scale hierarchical image database," in IEEE conference on computer vision and pattern recognition, 2009, pp. 248-255.
[17] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015.
[18] V. Nair and G. Hinton, "Rectified linear units improve restricted boltzmann machines," in International conference on machine learning, 2010.
[19] Y. Zhang, L. Ding, and G. Sharma, "Hazerd: An outdoor scene dataset and benchmark for single image dehazing," in IEEE international conference on image processing, 2017, pp. 3205-3209.
[20] R. Hartley and A. Zisserman, "Multiple view geometry in computer vision, " in Cambridge university press, 2003, ISBN:0-511-18618-5.
[21] M. Jaderberg, K. Simonyan, and A. Zisserman, "Spatial transformer networks," in Advances in neural information processing systems, 2015, pp. 2017-2025.
[22] A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the KITTI vision benchmark suite," in IEEE conference on computer vision and pattern recognition, 2012, pp. 3354-3361
[23] C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," in Journal of big day, 2019, vol. 6.
[24] S. T. Barnard and M. A. Fischler, "Computational stereo," in ACM computing surveys, 1982, pp. 553-572, vol. 14, no. 4.
[25] G. H. Joblove and D. Greenberg, "Color spaces for computer graphics," in Proceedings of the 5th annual conference on computer graphics and interactive techniques, 1978, pp. 20-25.
[26] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[27] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," in IEEE transactions on image processing, 2004, pp. 600-612, vol. 13, no. 4.
[28] "Python Website." [Online]. [https://www.python.org/] (accessed July, 2020).
[29] "Tensorflow Website." [Online]. [https://www.tensorflow.org/] (accessed July, 2020).
[30] M. Menze and A. Geiger, "Object scene flow for autonomous vehicles," in Proceedings of the IEEE Conference on computer vision and pattern recognition, 2015, pp. 3061-3070.
[31] L. Ladicky, J. Shi, and M. Pollefeys, "Pulling things out of perspective," in Proceedings of the IEEE Conference on computer vision and pattern recognition, 2014, pp. 89-96.

無法下載圖示 全文公開日期 2025/08/04 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE