簡易檢索 / 詳目顯示

研究生: 賴朝祥
Chao-Hsiang Lai
論文名稱: 利用三維重疊率損失函數以優化三維物件偵測網路及其在自駕車之應用
Optimized 3D Object Detection Networks Using 3D Intersection-over-Union Loss Function for Autonomous Driving
指導教授: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
賴坤財
Kuen-Tsair Lay
丘建青
Chien-Ching Chiu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 59
中文關鍵詞: 3D object detection3D region proposal networkmultimodal object detectormulti-feature aggregation3D IoU loss
外文關鍵詞: 3D object detection, 3D region proposal network, multimodal object detector, multi-feature aggregation, 3D IoU loss
相關次數: 點閱:222下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This thesis focuses on advancing the state-of-the-art 3D object detection and localization in autonomous driving. An autonomous vehicle requires to be operated in a very unpredictable and dynamic environment. Hence, a robust perception system is essential. This work presents a novel architecture, which leverages the aggregated features from both LIDAR point clouds and RGB images, and is composed of two subnetworks: a 3D region proposal network (3D-RPN) and a second stage detection network. The input features are represented by two views, the front view (FV) and the bird's eye view (BEV). Different from the other state-of-the-art methods, the FV feature representation is a fusion of the information of RGB images and the projected LIDAR point clouds. The 3D-RPN is not only capable of performing multimodal feature fusion on full resolution feature maps, but it is also capable of generating reliable 3D object proposals. With these proposals, accurate oriented 3D bounding box regression and category classification of objects in 3D space are performed in the second stage detection network. Furthermore, a precise 3D intersection-over-union (IoU) loss is employed for joint optimization of box parameters, which not only increases the detection performance of hard samples but also reduces the false positives.


    This thesis focuses on advancing the state-of-the-art 3D object detection and localization in autonomous driving. An autonomous vehicle requires to be operated in a very unpredictable and dynamic environment. Hence, a robust perception system is essential. This work presents a novel architecture, which leverages the aggregated features from both LIDAR point clouds and RGB images, and is composed of two subnetworks: a 3D region proposal network (3D-RPN) and a second stage detection network. The input features are represented by two views, the front view (FV) and the bird's eye view (BEV). Different from the other state-of-the-art methods, the FV feature representation is a fusion of the information of RGB images and the projected LIDAR point clouds. The 3D-RPN is not only capable of performing multimodal feature fusion on full resolution feature maps, but it is also capable of generating reliable 3D object proposals. With these proposals, accurate oriented 3D bounding box regression and category classification of objects in 3D space are performed in the second stage detection network. Furthermore, a precise 3D intersection-over-union (IoU) loss is employed for joint optimization of box parameters, which not only increases the detection performance of hard samples but also reduces the false positives.

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 3D Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 3D Single-stage Detectors . . . . . . . . . . . . . . . . . . . . . . 5 2.2 3D Two-stage Detectors . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 2D Proposals with 3D Regression Detectors . . . . . . . . . . . . 7 2.4 3D Object Detection Benchmark . . . . . . . . . . . . . . . . . . 8 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Feature Maps Representation . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Bird's Eye View . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 Front View . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Full Resolution Feature Extractor . . . . . . . . . . . . . . . . . . 13 3.4 3D Region Proposal Network . . . . . . . . . . . . . . . . . . . . 15 3.4.1 3D Anchor Generation . . . . . . . . . . . . . . . . . . . . 16 3.4.2 3D RoI Align . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4.3 Bottleneck Layer . . . . . . . . . . . . . . . . . . . . . . . 17 3.4.4 3D Proposal Generation . . . . . . . . . . . . . . . . . . . 18 3.4.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Second Stage Detection Network . . . . . . . . . . . . . . . . . . . 19 3.5.1 3D Bounding Box Encoding . . . . . . . . . . . . . . . . . 20 3.5.2 Explicit Orientation Vector Regression . . . . . . . . . . . 21 3.5.3 Precise 3D Intersection-over-Union Regression . . . . . . . 21 3.5.4 Final Prediction Generation . . . . . . . . . . . . . . . . . 23 3.5.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.6 Training Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . 41 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Appendix A : Example images from the dataset . . . . . . . . . . . . . . . 43 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \ImageNet Classi cation
    with Deep Convolutional Neural Networks," in Proceedings of the Neural
    Information Processing Systems, pp. 1097{1105, 2012.
    [2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
    A. Karpathy, A. Khosla, M. Bernstein, et al., \ImageNet Large Scale Visual
    Recognition Challenge," International Journal of Computer Vision, vol. 115,
    no. 3, pp. 211{252, 2015.
    [3] S. Ren, K. He, R. Girshick, and J. Sun, \Faster R-CNN: Towards Real-Time
    Object Detection with Region Proposal Networks," in Proceedings of the
    Neural Information Processing Systems, pp. 91{99, 2015.
    [4] J. Dai, Y. Li, K. He, and J. Sun, \R-FCN: Object Detection via Region-based
    Fully Convolutional Networks," in Proceedings of the Neural Information
    Processing Systems, pp. 379{387, 2016.
    [5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
    Berg, \SSD: Single Shot MultiBox Detector," in Proceedings of the European
    Conference on Computer Vision, pp. 21{37, 2016.
    [6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \You Only Look Once:
    Uni ed, Real-Time Object Detection," in Proceedings of the IEEE Confer-
    ence on Computer Vision and Pattern Recognition, pp. 779{788, 2016.
    [7] A. Geiger, P. Lenz, and R. Urtasun, \Are we ready for Autonomous Driving?
    The KITTI Vision Benchmark Suite," in Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition, pp. 3354{3361, 2012.
    [8] F. Yang, W. Choi, and Y. Lin, \Exploit All the Layers: Fast and Accurate
    CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection
    Classi ers," in Proceedings of the IEEE Conference on Computer Vision and
    Pattern Recognition, pp. 2129{2137, 2016.
    [9] J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y.-W. Tai, and L. Xu,
    \Accurate Single Stage Detector Using Recurrent Rolling Convolution," in
    Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
    nition, pp. 5420{5428, 2017.
    [10] B. Li, T. Zhang, and T. Xia, \Vehicle Detection from 3D Lidar Using Fully
    Convolutional Network," in Proceedings of the Robotics: Science and Sys-
    tems, 2016.
    [11] B. Li, \3D Fully Convolutional Network for Vehicle Detection in Point
    Cloud," in Proceedings of the International Conference on Intelligent Robots
    and Systems, pp. 1513{1518, 2017.
    [12] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, \Vote3Deep:
    Fast Object Detection in 3D Point Clouds Using Ecient Convolutional
    Neural Networks," in Proceedings of the IEEE International Conference on
    Robotics and Automation, pp. 1355{1361, 2017.
    [13] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, \Multi-View 3D Object Detec-
    tion Network for Autonomous Driving," in Proceedings of the IEEE Confer-
    ence on Computer Vision and Pattern Recognition, pp. 1907{1915, 2017.
    [14] J. Ku, M. Mozi an, J. Lee, A. Harakeh, and S. Waslander, \Joint 3D
    Proposal Generation and Object Detection from View Aggregation," arXiv
    preprint arXiv:1712.02294, 2017.
    [15] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, \Frustum PointNets
    for 3D Object Detection From RGB-D Data," in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 918{927, 2018.
    [16] Y. Zhou and O. Tuzel, \VoxelNet: End-to-End Learning for Point Cloud
    Based 3D Object Detection," in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 4490{4499, 2018.
    [17] M. Simon, S. Milz, K. Amende, and H.-M. Gross, \Complex-YOLO:
    Real-time 3D Object Detection on Point Clouds," arXiv preprint
    arXiv:1803.06199, 2018.
    [18] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, \3D Object
    Proposals using Stereo Imagery for Accurate Object Class Detection," IEEE
    Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
    pp. 1259{1272, 2018.
    [19] S. Song and J. Xiao, \Deep Sliding Shapes for Amodal 3D Object Detection
    in RGB-D Images," in Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition, pp. 808{816, 2016.
    [20] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, \Monocular
    3D Object Detection for Autonomous Driving," in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 2147{2156,
    2016.
    [21] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau, \Deep
    MANTA: A Coarse-to- ne Many-Task Network for joint 2D and 3D vehicle
    analysis from monocular image," in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 2040{2049, 2017.
    [22] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, \3D Bounding Box
    Estimation Using Deep Learning and Geometry," in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 5632{5640,
    2017.
    [23] J. Redmon and A. Farhadi, \YOLO9000: Better, Faster, Stronger," in Pro-
    ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
    tion, pp. 7263{7271, 2017.
    [24] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Ur-
    tasun, \3D Object Proposals for Accurate Object Class Detection," in Pro-
    ceedings of the Neural Information Processing Systems, pp. 424{432, 2015.
    [25] R. Girshick, \Fast R-CNN," in Proceedings of the IEEE International Con-
    ference on Computer Vision, pp. 1440{1448, 2015.
    [26] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, \A Uni ed Multi-scale Deep
    Convolutional Neural Network for Fast Object Detection," in Proceedings of
    the European Conference on Computer Vision, pp. 354{370, 2016.
    [27] J. Lahoud and B. Ghanem, \2D-Driven 3D Object Detection in RGB-D
    Images," in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 4622{4630, 2017.
    [28] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, \PointNet: Deep Learning on
    Point Sets for 3D Classi cation and Segmentation," in Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pp. 652{
    660, 2017.
    [29] D. Xu, D. Anguelov, and A. Jain, \PointFusion: Deep Sensor Fusion for
    3D Bounding Box Estimation," in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 244{253, 2018.
    [30] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
    \Feature Pyramid Networks for Object Detection," in Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117{
    2125, 2017.
    [31] K. Simonyan and A. Zisserman, \Very Deep Convolutional Networks for
    Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
    [32] X. Glorot and Y. Bengio, \Understanding the Diculty of Training Deep
    Feedforward Neural Networks," in Proceedings of the International Confer-
    ence on Arti cial Intelligence and Statistics, pp. 249{256, 2010.
    [33] K. He, G. Gkioxari, P. Dollar, and R. Girshick, \Mask R-CNN," in Proceed-
    ings of the IEEE International Conference on Computer Vision, pp. 2980{
    2988, 2017.
    [34] D. P. Kingma and J. Ba, \Adam: A Method for Stochastic Optimization,"
    arXiv preprint arXiv:1412.6980, 2014.
    [35] M. D. Zeiler, \ADADELTA: An Adaptive Learning Rate Method," arXiv
    preprint arXiv:1212.5701, 2012.
    [36] T. Tieleman and G. Hinton, \RMSprop Gradient Optimization,"
    http://www.cs.toronto.edu/tijmen/csc321/slides/lecture slides lec6.pdf,
    2014.
    [37] N. Qian, \On the Momentum Term in Gradient Descent Learning Algo-
    rithms," International Neural Network Society, vol. 12, no. 1, pp. 145{151,
    1999.

    QR CODE