簡易檢索 / 詳目顯示

研究生: 吳俊延
Jun-Yan Wu
論文名稱: 用於自動駕駛汽車的多任務視覺感知網路進行車輛軌跡解析
Enhancing Vehicle Trajectory Parsing in Self-Driving Cars through a Multi-Task Visual Perception Network
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 林銘波
Ming-Bo Lin
方文賢
Wen-Hsien Fang
呂政修
Jenq-Shiou Leu
賴坤財
Kuen-Tsair Lay
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 54
中文關鍵詞: 自我運動深度信息對象檢測車道分割RoL 選擇距離估計對象方向自我汽車速度絕對尺度軌跡生成多任務視覺感知網路YOLOv7
外文關鍵詞: ego motion, depth information, object detection, lane segmentation, RoL selection, distance estimation, object orientation, ego car speed, absolute scale, trajectory generation, Multitask Visual Perception Network, YOLOv7
相關次數: 點閱:259下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 軌跡解析在使用來自互聯網的事故視頻重建自動駕駛模擬的極端情況方面發揮著至關重要的作用。然而,實現高保真度的準確軌跡解析面臨著一些挑戰,包括自我運動、深度信息、對象檢測、車道分割、RoL 選擇、距離估計、對象方向、自我汽車速度、絕對尺度和軌跡生成。先前的研究需要為每個任務使用單獨的神經網路,從而導致系統繁瑣並增加了軌跡解析的執行時間。基於Omnidet 的進步,本研究引入了多任務視覺感知網路,通過減少神經網路的數量來簡化系統。此外,最先進的目標檢測技術YOLOv7被納入多任務視覺感知網路中,以提高目標檢測的準確性。實驗結果表明,新的實現提供了一個高效可靠的軌跡解析系統。


    Trajectory parsing plays a crucial role in reconstructing corner cases for self-driving simulations using accident videos sourced from the Internet. However, achieving accurate trajectory parsing with high fidelity poses several challenges, including ego motion, depth information, object detection, lane segmentation, RoL selection, distance estimation, object orientation, ego car speed, absolute scale, and trajectory generation. Previous research required a separate neural network for each task, resulting in a cumbersome system and increased execution time for trajectory parsing. Building upon the advancements of Omnidet, this study introduces a Multi-task Visual Perception Network to streamline the system by reducing the number of neural networks. Additionally, state-of-the-art object detection technique YOLOv7 is incorporated into the multi-task visual perception network to enhance the accuracy of object detection. Experimental results demonstrate that the new implementation offers an efficient and reliable trajectory parsing system.

    摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Works Of Autonomous Driving System . . . . . . . . 3 2.1.1 Ego Motion . . . . . . . . . . . . . . . . . . . 4 2.1.2 Depth Information . . . . . . . . . . . . . . . 5 2.1.3 Object Detection . . . . . . . . . . . . . . . . 6 2.1.4 Lane Segmentation . . . . . . . . . . . . . . . 7 2.1.5 RoI Selection (Region of Interest Selection) For Parameter Estimation With Inverse Perspective Transform . . . . . . . . . . . . . . . . . 7 2.1.6 Lane Lines Detection . . . . . . . . . . . . . . 10 2.1.7 Distance Estimation . . . . . . . . . . . . . . 11 2.1.8 Ego-car Speed . . . . . . . . . . . . . . . . . . 12 2.1.9 Trajectory Generation . . . . . . . . . . . . . 14 2.2 Multi-Task Model Using Surround View Fisheye Images 15 2.3 YOLOv7 . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 SPPCSPC . . . . . . . . . . . . . . . . . . . . 17 2.3.2 ELAN . . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 PANet . . . . . . . . . . . . . . . . . . . . . . 20 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Model Architecture . . . . . . . . . . . . . . . . . . . 22 3.2 Loss Function . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . 31 4.1 Object Detection . . . . . . . . . . . . . . . . . . . . 31 4.1.1 Datasets And Training Environment . . . . . 31 4.1.2 Experimental Results . . . . . . . . . . . . . . 31 4.2 Lane Segmentation . . . . . . . . . . . . . . . . . . . 33 4.2.1 Datasets And Training Environment . . . . . 34 4.2.2 Experimental Results . . . . . . . . . . . . . . 35 4.3 Testing Results Of Autonomous Driving System . . . 36 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . 38 5 Conclusion And Future Works . . . . . . . . . . . . . . . . 40 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . 40 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    [1] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, T. Darrell et al., “Bdd100k: A diverse driving video database with scalable annotation tooling,” arXiv preprint arXiv:1805.04687, vol. 2, no. 5, p. 6, 2018.
    [2] V. R. Kumar, S. Yogamani, H. Rashed, G. Sitsu, C. Witt, I. Leang, S. Milz, and P. M¨ader, “Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2830–2837, 2021.
    [3] V. Casser, S. Pirk, R. Mahjourian, and A. Angelova, “Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8001–8008.
    [4] J. M. Facil, B. Ummenhofer, H. Zhou, L. Montesano, T. Brox, and J. Civera, “Cam-convs: Camera-aware multi-scale convolutions for single-view depth,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 826–11 835.
    [5] V. R. Kumar, S. A. Hiremath, M. Bach, S. Milz, C. Witt, C. Pinard, S. Yogamani, and P. M¨ader, “Fisheyedistancenet: Self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving,” in 2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020, pp. 574–581.
    [6] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3828–3838.
    [7] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
    [8] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies
    sets new state-of-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7464–7475.
    [9] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759–8768.
    [10] H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 076–10 085.
    [11] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
    [12] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
    [13] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style convnets great again,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13 733–13 742.
    [14] S. Yogamani, C. Hughes, J. Horgan, G. Sistu, P. Varley, D. O’Dea, M. Uric´ar, S. Milz, M. Simon, K. Amende et al., “Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9308–9318.
    [15] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361.
    [16] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7482–7491.
    [17] Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in International conference on machine learning. PMLR, 2018, pp. 794–803.
    [18] S. Chennupati, G. Sistu, S. Yogamani, and S. A Rawashdeh, “Multinet++: Multi-stream feature aggregation and geometric loss strategy for multi-task learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
    [19] S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1871–1880.
    [20] M. Guo, A. Haque, D.-A. Huang, S. Yeung, and L. Fei-Fei, “Dynamic task prioritization for multitask learning,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 270–287.

    QR CODE