簡易檢索 / 詳目顯示

研究生: 蔡忠諺
Chung-Yen Tsai
論文名稱: 基於多視圖高融合之三維物件偵測及其在自動駕駛之應用
Multi-View Fusion for 3D Object Detection in Autonomous Vehicles
指導教授: 陳郁堂
Yie-Tarng Chen
方文賢
Wen-Hsien Fang
口試委員: 丘建青
Chien-ching Chiu
賴坤財
Kuen-Tsair Lay
徐繼聖
Gee-Sern Hsu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 63
中文關鍵詞: 多視圖快速深度完成演算法融合3D IoU 損失函數三維物件偵測自動駕駛
外文關鍵詞: multi-view, fast depth completion, fusion, 3D IoU loss, 3-D object detection, autonomous vehicles
相關次數: 點閱:230下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出一種新穎的架構,此架構同時運用光學雷達 (LIDAR) 、彩色照相機產生之 RGB-D 圖像以及應用快速深度完成演算法的前視圖 (FV) 以及鳥瞰視圖 (BEV) 三種不同的視圖進行車輛偵測。其中將前視圖透過快速深度完成演算法 (FDC) ,並將其稀疏的深度輸入轉換為更加密集的深度圖,以此為稀疏的前視圖補足投影後較為稀疏的資訊,使得網路在特徵提取時能夠提取更加豐富且準確的資訊。另外,本論文還針對不同特徵之間的融合提出了模型位置敏感融合策略 (PSM) ,此舉讓網路自行學習如何在融合過程中保留最有價值的資訊,藉由該方法可以有效的減少融合不同特徵時的資訊損失,並使得網路在之後的預測更加準確。另一方面,本論文也採用新的三維重疊率 (IoU) 損失函數與其他現有的損失函數一起對定界框進行聯合優化,且由於重疊率可以直接反映預測的精度,因此藉由三維重疊率損失函數可讓網路得知當前預測定界框的質量好壞,並使網路可以在學習的過程得到實時的反饋,以提高困難樣本的偵測精度的同時也能減少誤判。在 KITTI 數據集中與其他方法相比,結果顯示了本論文提出的新方法優於其他先進方法的性能。


    In this thesis, we propose a novel architecture, which amasses RGB-D images, and the front view (FV) with fast depth completion and the bird's eye view (BEV) both from the light detection and ranging (LIDAR) point cloud for 3D object detection in autonomous vehicles. In contrast to previous works, a fast depth completion algorithm is invoked to infer a dense depth map from the sparse depth input. A position-sensitive multi-model (PSM) fusion scheme is also addressed to learn to retain the translation-variant information in the fusion process. Moreover, a 3D intersection of union (IoU) loss function is employed for joint optimization of box parameters to enhance the detection accuracy of hard samples and decrease the false positive. Simulation results show that the new approach can provide superior performance over the state-of-the-art works on the widespread KITTI dataset.

    摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 圖示索引 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 表格索引 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 專有名詞縮寫表 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 第一章 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 三維物件偵測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 內容章節概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 第二章 相關背景回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 光學雷達 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 三維物件一階段偵測 . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 三維物件兩階段偵測 . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 三維回歸偵測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 三維物件偵測基準 . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.6 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 第三章 多視圖融合三維物件偵測法 . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 方法概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 特徵圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 鳥瞰圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 RGB-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.3 前視圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 全解析度特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 三維區域提議網路 . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4.1 三維錨點 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4.2 三維感興趣區域 . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.3 1 × 1 卷積層 . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.4 多模型位置敏感融合策略 . . . . . . . . . . . . . . . . . . 19 3.4.5 三維提議 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.6 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5 物件偵測網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5.1 三維定界框編碼 . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5.2 方向向量 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5.3 三維重疊率 . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5.4 預測結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.5 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 最佳化器 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.7 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 第四章 模擬結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1 訓練環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 自我方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 與先進方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.1 視覺比較圖 . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.2 錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 第五章 結論與未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 附錄1 : KITTI 資料集樣本 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    [1] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detection Network for Autonomous Driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915, 2017.
    [2] M. Simon, S. Milz, K. Amende, and H.-M. Gross, “Complex-YOLO:
    Real-time 3D Object Detection on Point Clouds,” arXiv preprint
    arXiv:1803.06199, 2018.
    [3] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun, “3D Object Proposals for Accurate Object Class Detection,” in Proceedings of the Neural Information Processing Systems, pp. 424–432, 2015.
    [4] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular
    3D Object Detection for Autonomous Driving,” in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 2147–2156,
    2016.
    [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time
    Object Detection with Region Proposal Networks,” in Proceedings of the
    Neural Information Processing Systems, pp. 91–99, 2015.
    [6] J. Ku, M. Mozifian, J. Lee, A. H., and S. L. W., “Joint 3D Proposal Generation and Object Detection from View Aggregation,” in Proceeding of the
    IEEE International Conference on Intelligent Robots and Systems, pp. 1–8,
    2018.
    [7] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets
    for 3D Object Detection From RGB-D Data,” in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 918–927, 2018.
    [8] J. Lahoud and B. Ghanem, “2D-Driven 3D Object Detection in RGB-D
    Images,” in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 4622–4630, 2017.
    [9] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on
    Point Sets for 3D Classification and Segmentation,” in Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–
    660, 2017.
    [10] D. Xu, D. Anguelov, and A. Jain, “PointFusion: Deep Sensor Fusion for
    3D Bounding Box Estimation,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 244–253, 2018.
    [11] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud
    Based 3D Object Detection,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 4490–4499, 2018.
    [12] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely Embedded Convolutional
    Detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
    [13] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271, 2017.
    [14] B. Li, T. Zhang, and T. Xia, “Vehicle Detection from 3D Lidar Using Fully
    Convolutional Network,” in Proceedings of the Robotics: Science and Systems, 2016.
    [15] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point
    Cloud,” in Proceedings of the International Conference on Intelligent Robots
    and Systems, pp. 1513–1518, 2017.
    [16] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448, 2015.
    [17] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuli`ere, and T. Chateau, “Deep
    MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle
    analysis from monocular image,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 2040–2049, 2017.
    [18] A. Mousavian, D. Anguelov, J. Flynn, and J. Koˇseck´a, “3D Bounding Box
    Estimation Using Deep Learning and Geometry,” in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 5632–5640,
    2017.
    [19] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A Unified Multi-scale Deep
    Convolutional Neural Network for Fast Object Detection,” in Proceedings of
    the European Conference on Computer Vision, pp. 354–370, 2016.
    [20] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous Driving?
    The KITTI Vision Benchmark Suite,” in Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
    [21] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
    “Feature Pyramid Networks for Object Detection,” in Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–
    2125, 2017.
    [22] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for
    Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [23] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,
    Z. Wojna, Y. Song, S. Guadarrama, et al., “Speed/accuracy trade-offs for
    modern convolutional object detectors,” in IEEE CVPR, 2017.
    [24] J. Ku, A. Harakeh, and S. Waslander, “In Defense of Classical Image Processing: Fast Depth Completion on the CPU,” arXiv preprint arXiv:1802.00036,
    2018.
    [25] X. Glorot and Y. Bengio, “Understanding the Difficulty of Training Deep
    Feedforward Neural Networks,” in Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 249–256, 2010.
    [26] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3D Object
    Proposals using Stereo Imagery for Accurate Object Class Detection,” IEEE
    Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
    pp. 1259–1272, 2018.
    [27] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based
    Fully Convolutional Networks,” in Proceedings of the Neural Information
    Processing Systems, pp. 379–387, 2016.
    [28] S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection
    in RGB-D Images,” in Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition, pp. 808–816, 2016.
    [29] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
    arXiv preprint arXiv:1412.6980, 2014.
    [30] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv
    preprint arXiv:1212.5701, 2012.
    [31] T. Tieleman and G. Hinton, “RMSprop Gradient Optimization,”
    http://www.cs.toronto.edu/tijmen/csc321/slides/lecture slides lec6.pdf,
    2014.
    [32] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” International Neural Network Society, vol. 12, no. 1, pp. 145–151,
    1999.
    [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification
    with Deep Convolutional Neural Networks,” in Proceedings of the Neural
    Information Processing Systems, pp. 1097–1105, 2012.

    無法下載圖示 全文公開日期 2024/08/13 (校內網路)
    全文公開日期 2024/08/13 (校外網路)
    全文公開日期 2024/08/13 (國家圖書館:臺灣博碩士論文系統)
    QR CODE