簡易檢索 / 詳目顯示

研究生: 蔡博文
BO-WEN TASI
論文名稱: 基於深度學習重建YouTube車禍影片三維場景之研究
3D Scene Reconstruction from YouTube Car Accident Videos Using Deep Neural Networks
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 陳郁堂
Yie-Tarng Chen
方文賢
Wen-Hsien Fang
陳省隆
Hsing-Lung Chen
林銘波
Ming-Bo Lin
呂政修
Jenq-Shiou Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 36
中文關鍵詞: 距離估測物件偵測物件追蹤車道分割
外文關鍵詞: Distance Estimation, Object Detection, Object Tracking, Lane Segmentation
相關次數: 點閱:337下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著汽車科技的發展,車禍的預防越來越受到重視。近年來,越來越多團體開始收集和分析車禍資料。在所有的行車資訊當中,最容易取得的便是那些被大量上傳到YouTube上的行車紀錄影片。然而,YouTube上的影片並不像車載電腦一樣,記錄著行車速度、與其他車輛距離等資訊,甚至連錄製影片相機的參數(焦距、焦點、相機內參數)都無法取得。因此,在本論文中,我們利用物件追蹤、車道偵測和透視變換等最先進的方案來重建移動車輛周圍的三維場景。具體而言,YouTube上的車禍影片被映射到具有真實距離信息的鳥瞰坐標圖。為此,我們開發了一種新的深度估計方法,該方法基於車道線經驗法則來恢復深度信息。如果影片不能提供精確的車道線信息,我們所提出的方法便不能很好地運行。


    With the development of self-driving cars, the car accident detection and prevention based on deep learning technologies have received attention in recent years. To this end, collecting and analyzing car accident datasets becomes a critical issue. and the car accident videos from YouTube provide valuable resource. Hence, the objective of this research is to develop a tool for automatically reconstruction the 3D scenes from massive car accident videos. Specifically, we attempt to estimate the depth information of moving vehicles for the front-view images. However, these videos from YouTube do not provide intrinsic and extrinsic camera parameters such as the focal length, focal point, which are critical information for 3D scene reconstruction. Consequently, the existing depth estimation approaches based on structure from motion and deep learning cannot be used in our cases. To fill this gap, in this thesis, we leveraged state-of-the art approaches in object tracking, lane detection and inverse perspective transformation to develop a novel depth estimation approach, which can restore the depth information by a lane line heuristic. First, we use the object detector and object tracker to extract moving objects and then apply the Mask R-CNN to detect lane markings from the front-view images. Subsequently, the inverse perspective transform is used to generate a bird-view Image, where a lane line heuristic is used to estimate the depth information for each vehicle. Furthermore, to accelerate 3D scene reconstruction for massive videos, we also investigate a new approach to automatically select the region of interest in the inverse perspective transform. The proposed approach depends on the precise lane line information.

    中文摘要 .iii Abstract .iv Acknowledgment .v Table of contents .vi List of Figures .viii 1 Introduction .1 2 Related Work .3 2.1 Object Detection .3 2.2 Multple Object Tracking .3 2.3 Object Segmentation .4 2.4 Unsupervised Depth Estimation .4 3 Proposed 3D Scene Reconstruction Method .5 3.1 Lane Lines Segmentation from Front View Images .6 3.1.1 Lane Segmentation : Mask R-CNN .6 3.1.2 Color Filter .7 3.1.3 Lane Segmentation Refinement .7 3.1.4 Lane Line Extrapolation .9 3.1.5 Outlier Removal .9 3.2 Inverse Perspective Transformation .10 3.3 Region-of-Interest Selection .14 3.4 Distance Estimation in a Bird View Map .16 4 Experimental Result and Analysis .17 4.1 Datasets .17 4.1.1 Berkeley Deep Drive - Drivable Area .17 4.1.2 Industrial Technology Research Institute (ITRI) Dataset .18 4.1.3 YouTube Car Accident Dataset .18 4.2 Distance Estimation in a Bird View Map .20 4.3 Finding Lane Lines from Front View Images .22 4.4 Outlier Removal .24 4.5 Region-of-Interest Selection .27 5 Conclusion .34 References .35

    [1] J. Mezirow, "Perspective transformation," Adult Education, vol. 28, pp. 100-110, Jan 1978.
    [2] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, "Bdd100k: A diverse driving video database with scalable annotation tooling," arXiv preprint arXiv:1805.04687, May 2018.
    [3] A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.
    [4] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European Conference on Computer Vision, pp. 21-37, Springer, 2016.
    [5] J. Redmon and A. Farhadi, "Yolo9000: better, faster, stronger," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263-7271, 2017.
    [6] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
    [7] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, pp. 91-99, 2015.
    [8] D. Gordon, A. Farhadi, and D. Fox, "Re^3: Real-time recurrent regression networks for visual tracking of generic objects," IEEE Robotics and Automation Letters, vol. 3, pp. 788-795, Jan 2018.
    [9] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, vol. 115, pp. 211-252, Dec 2015.
    [10] A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, "Visual tracking: An experimental survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1442-1468, Nov 2013.
    [11] K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961-2969, 2017.
    [12] R. Garg, V. K. BG, G. Carneiro, and I. Reid, "Unsupervised cnn for single view depth estimation: Geometry to the rescue," in European Conference on Computer Vision, pp. 740-756, Springer, 2016.
    [13] C. Godard, O. Mac Aodha, and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270-279, 2017.
    [14] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, "Unsupervised learning of depth and ego-motion from video," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851-1858, 2017.

    無法下載圖示 全文公開日期 2021/08/19 (校內網路)
    全文公開日期 2024/08/19 (校外網路)
    全文公開日期 2024/08/19 (國家圖書館:臺灣博碩士論文系統)
    QR CODE