簡易檢索 / 詳目顯示

研究生: Ferdyan Dannes Krisandika
Ferdyan Dannes Krisandika
論文名稱: 利用 YouTube 車禍影片生成車輛軌跡應用於自動駕駛車輛之研究
Vehicle Trajectory Parsing from YouTube Accident Videos for Self-driving Cars
指導教授: 陳郁堂
Yie-Tarng Chen
方文賢
Wen-Hsien Fang
口試委員: 邱建青
Jian-Qing Qiu
洪賢昇
Xian-Sheng Hong
賴坤財
Kuen-Tsair Lay
陳郁堂
Yie-Tarng Chen
方文賢
Wen-Hsien Fang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 60
中文關鍵詞: Object detectionObject trackingEgo-motionOptical flowRelative scale
外文關鍵詞: Object detection, Object tracking, Ego-motion, Optical flow, Relative scale
相關次數: 點閱:204下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Autonomous driving car systems (ADS) need training on a dataset. However current public dataset only provides videos with normal vehicle behaviour. To resolve this dilemma, if we can generate variety of accident trajectory data, then ADS can be properly trained to be aware of the surrounding environment. Consisting a variety of traffic accident videos of it. In light of this, we propose a novel method that combines object detection, object tracking, depth estimation, lane segmentation, and 3D geometry to generate the accident trajectories. The combination of the lane segmentation of the drive-able area in front of the cars, depth, detected cars, and image perspective transforms allow us to convert the front-view images into bird view images to re-map the tracked object positions. Then, by using the optical flow and ego-motion prediction information, we re-map the tracked vehicles from the 3D image coordinates into the real-world coordinates to get the accurate position. Simulations show that the proposed method can generate the accident trajectories of the crashed vehicle accurately.


    Autonomous driving car systems (ADS) need training on a dataset. However current public dataset only provides videos with normal vehicle behaviour. To resolve this dilemma, if we can generate variety of accident trajectory data, then ADS can be properly trained to be aware of the surrounding environment. Consisting a variety of traffic accident videos of it. In light of this, we propose a novel method that combines object detection, object tracking, depth estimation, lane segmentation, and 3D geometry to generate the accident trajectories. The combination of the lane segmentation of the drive-able area in front of the cars, depth, detected cars, and image perspective transforms allow us to convert the front-view images into bird view images to re-map the tracked object positions. Then, by using the optical flow and ego-motion prediction information, we re-map the tracked vehicles from the 3D image coordinates into the real-world coordinates to get the accurate position. Simulations show that the proposed method can generate the accident trajectories of the crashed vehicle accurately.

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Object Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.5 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Monocular 3D Object Detection . . . . . . . . . . . . . . . . . . . . . 6 2.7 Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Overall Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2.1 Ego-motion and Depth Estimation . . . . . . . . . . . . . . . 10 3.2.2 Distance Estimation . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.3 3.3 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . 19 Object Speed and Orientation Estimation . . . . . . . . . . . . . . . 23 3.3.1 Orientation Estimation . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Speed Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Distance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 4.2.1 Lane Line Detection . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Region of Interest Selection . . . . . . . . . . . . . . . . . . . 38 4.2.3 Vehicle Distance Estimation . . . . . . . . . . . . . . . . . . . 40 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.1 Ego-car Speed Estimation . . . . . . . . . . . . . . . . . . . . 42 4.3.2 Accident Trajectory Generation . . . . . . . . . . . . . . . . . 43 4.4 Speed and Orientation Estimation . . . . . . . . . . . . . . . . . . . . 48 4.5 Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Appendix A : Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    [1] C.-Y. Chan, “On the detection of vehicular crashes - system characteristics and archi-
    tecture,” in Article of the IEEE Transactions on Vehicular Technology, pp. 180–193,
    2002.
    [2] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti
    vision benchmark suite,” in Proceedings of the IEEE Conference on Computer Vision
    and Pattern Recognition, pp. 3354–3361, IEEE, 2012.
    [3] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti
    dataset,” Article of the International Journal of Robotics Research (IJRR), 2013.
    [4] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “Bdd100k: A
    diverse driving video database with scalable annotation tooling,” arXiv:1805.04687,
    2018.
    [5] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,
    S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understand-
    ing,” in Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR),
    June 2016.
    [6] M. Tlig, M. Machin, R. Kerneis, E. Arbaretier, L. Zhao, F. Meurville, and J. V. Frank,
    “Autonomous driving system : Model based safety analysis,” Proceeding of IEEE
    International Conference on Dependable Systems and Networks Workshops (DSN-
    W), pp. 2–5, 2018.
    [7] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on
    Computer Vision, 2015.
    [8] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Ob-
    ject Detection with Region Proposal Networks,” in Article of IEEE Transactions on
    Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2016.
    [9] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proceedings of
    the IEEE conference on Computer Vision and Pattern Recognition, pp. 7263–7271,
    2017.
    [10] J. Redmon and A. Farhadi, “Yolov3: An Incremental Improvement,” arXiv preprint
    arXiv:1804.02767, 2018.
    [11] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C.-Y. Fu, and A. C. Berg,
    “SSD: Single Shot MultiBox Detector,” in Proceedings of the European Conference
    on Computer Vision, 2016.
    [12] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat:
    Integrated Recognition, Localization and Detection using Convolutional Networks,”
    Feb. 2014.
    [13] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Detect to track and track to detect.,”
    Computer Research on Repository, vol. abs/1710.03958, 2017.
    [14] D. Gordon, A. Farhadi, and D. Fox, “Re3: Real-time recurrent regression networks
    for object tracking,” arXiv preprint arXiv:1705.06368, 2017.
    [15] C. Huang, S. Lucey, and D. Ramanan, “Learning policies for adaptive tracking with
    deep feature cascades,” Computer Research on Repository, vol. abs/1708.02973, 2017.
    [16] A. Klein, Z. Yumak, A. Beij, and A. F. van der Stappen, “Data-driven gaze animation
    using recurrent neural networks,” in Motion, Interaction and Games, MIG ’19, (New
    York, NY, USA), Association for Computing Machinery, 2019.
    [17] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic
    segmentation,” Article of the IEEE Transactions on Pattern Analysis and Machine
    Intelligence, vol. 39, pp. 640–651, April 2017.
    [18] W. Abdulla, “Mask r-cnn for object detection and instance segmentation on keras
    and tensorflow.” https://github.com/matterport/Mask_RCNN, 2017.
    [19] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with
    atrous separable convolution for semantic image segmentation,” in Proceedings of the
    European Conference on Computer Vision (7) (V. Ferrari, M. Hebert, C. Sminchis-
    escu, and Y. Weiss, eds.), vol. 11211 of Lecture Notes in Computer Science, pp. 833–
    851, Springer, 2018.
    [20] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous con-
    volution for semantic image segmentation,” Computing Research on Repository,
    vol. abs/1706.05587, 2017.
    [21] H. Wu, J. Zhang, K. Huang, K. Liang, and Y. Yu, “Fastfcn: Rethinking dilated
    convolution in the backbone for semantic segmentation,” arXiv:1903.11816, 2019.
    [22] T. Takikawa, D. Acuna, V. Jampani, and S. Fidler, “Gated-scnn: Gated shape cnns
    for semantic segmentation,” Proceedings of the International Conference on Computer
    Vision, 2019.
    [23] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised Monocular Depth
    Estimation with Left-Right Consistency,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 270–279, 2017.
    [24] D. Xu, W. Wang, H. Tang, H. Liu, N. Sebe, and E. Ricci, “Structured Attention
    Guided Convolutional Neural Fields for Monocular Depth Estimation,” in Proceedings
    of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3917–3925,
    2018.
    [25] V. Casser, S. Pirk, R. Mahjourian, and A. Angelova, “Depth prediction without the
    sensors: Leveraging structure for unsupervised learning from monocular videos,” in
    Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), 2019.
    [26] C. Godard, O. Mac Aodha, M. Firman, and G. Brostow, “Digging into Self-Supervised
    Monocular Depth Estimation,” arXiv preprint arXiv:1806.01260, 2018.
    [27] D. González, J. Pérez, and V. Milanés, “Parametric-based path generation for au-
    tomated vehicles at roundabouts,” Expert Syst. Appl., vol. 71, p. 332–341, Apr.
    2017.
    [28] Z. Tingting, L. Ming, M. Xiaoming, W. Qi, L. Fang, and Q. Li, “Trajectory generation
    model-based imm tracking for safe driving in intersection scenario,” Article of the
    International Journal of Vehicular Technology, vol. 2011, 01 2011.
    [29] J. Yang, J. Yuan, and Y. Li, “Parsing 3d motion trajectory for gesture recognition,”
    Article of the Journal Visual Community Image Represent., vol. 38, pp. 627–640,
    2016.
    [30] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation
    using deep learning and geometry,” 2016.
    [31] L. Liu, J. Lu, C. Xu, Q. Tian, and J. Zhou, “Deep fitting degree scoring network
    for monocular 3d object detection,” in Computer Vision and Pattern Recognition,
    pp. 1057–1066, Computer Vision Foundation / IEEE, 2019.
    [32] A. N. P. K. J. Leordeanu, “Shift r-cnn: Deep monocular 3d object detection with
    closed-form geometric constraints,” 2019.
    [33] S. M. LaValle and J. J. K. Jr., “Randomized kinodynamic planning.,” in ICRA,
    pp. 473–479, IEEE Robotics and Automation Society, 1999.
    [34] J. Ziegler and C. Stiller, “Spatiotemporal state lattices for fast trajectory planning in
    dynamic on-road driving scenarios.,” in Proceedings of the International Conference
    on Intelligent Robots and Systems, pp. 1879–1884, IEEE, 2009.
    [35] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. Clark, J. Dolan, D. Dug-
    gins, T. Galatali, C. Geyer, et al., “Autonomous driving in urban environments: Boss
    and the urban challenge,” Article of the Journal of Field Robotics, vol. 25, no. 8,
    pp. 425–466, 2008.
    [36] A. Kelly and B. Nagy, “Reactive nonholonomic trajectory generation via parametric
    optimal control,” Article of the I. J. Robotics Res., vol. 22, no. 7-8, pp. 583–602, 2003.
    [37] M. Werling, J. Ziegler, S. Kammel, and S. Thrun, “Optimal trajectory generation for
    dynamic street scenarios in a frenét frame,” Proceedings of the IEEE International
    Conference on Robotics and Automation, pp. 987–993, 2010.
    [38] W. Xu, J. Wei, J. M. Dolan, H. Zhao, and H. Zha, “A real-time motion planner with
    trajectory optimization for autonomous vehicles,” Proceedings of the IEEE Interna-
    tional Conference on Robotics and Automation, pp. 2061–2067, 2012.
    [39] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth
    and ego-motion from video,” in Proceedings of the Computer Vision and Pattern
    Recognition, 2017.
    [40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    2015. cite arxiv:1512.03385Comment: Tech report.
    [41] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth
    estimation with left-right consistency,” in Proceedings of the Computer Vision and
    Pattern Recognition, 2017.
    [42] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer
    networks.,” in Proceedings of the Conference on Neural Information Processing Sys-
    tems (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, eds.),
    pp. 2017–2025, 2015.
    [43] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with
    neural networks.,” Article of the IEEE Transactions Computational Imaging, vol. 3,
    no. 1, pp. 47–57, 2017.
    [44] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the
    IEEE International Conference on Computer Vision, pp. 2980–2988, IEEE, 2017.
    [45] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Com-
    put. Vision, vol. 60, pp. 91–110, Nov. 2004.
    [46] B. D. Lucas and T. Kanade, “An iterative image registration technique with an
    application to stereo vision,” in Proceedings of the 7th international joint conference
    on Artificial intelligence - Volume 2, International Joint Conference on Artificial
    Intelligence, (San Francisco, CA, USA), pp. 674–679, Morgan Kaufmann Publishers
    Inc., 1981.
    [47] Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-backward error: Automatic detec-
    tion of tracking failures,” in Proceedings of the International Conference on Pattern
    Recognition, pp. 2756–2759, IEEE, 2010.
    [48] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. Waslander, “Joint 3d proposal gen-
    eration and object detection from view aggregation,” Article of the International
    Conference on Intelligent Robots and Systems, 2018.
    [49] W. Ali, S. Abdelkarim, M. H. Zahran, M. Zidan, and A. E. Sallab, “Yolo3d: End-to-
    end real-time 3d oriented object bounding box detection from lidar point cloud,” in
    Proceedings of the European Conference on Computer Vision Workshops, 2018.
    [50] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation
    using deep learning and geometry,” in Article of the IEEE Conference on Computer
    Vision and Pattern Recognition, pp. 5632–5640, 2016.

    無法下載圖示 全文公開日期 2025/08/20 (校內網路)
    全文公開日期 2025/08/20 (校外網路)
    全文公開日期 2025/08/20 (國家圖書館:臺灣博碩士論文系統)
    QR CODE