研究生: |
賴朝祥 Chao-Hsiang Lai |
---|---|
論文名稱: |
利用三維重疊率損失函數以優化三維物件偵測網路及其在自駕車之應用 Optimized 3D Object Detection Networks Using 3D Intersection-over-Union Loss Function for Autonomous Driving |
指導教授: |
方文賢
Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen |
口試委員: |
方文賢
Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen 賴坤財 Kuen-Tsair Lay 丘建青 Chien-Ching Chiu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 59 |
中文關鍵詞: | 3D object detection 、3D region proposal network 、multimodal object detector 、multi-feature aggregation 、3D IoU loss |
外文關鍵詞: | 3D object detection, 3D region proposal network, multimodal object detector, multi-feature aggregation, 3D IoU loss |
相關次數: | 點閱:222 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
This thesis focuses on advancing the state-of-the-art 3D object detection and localization in autonomous driving. An autonomous vehicle requires to be operated in a very unpredictable and dynamic environment. Hence, a robust perception system is essential. This work presents a novel architecture, which leverages the aggregated features from both LIDAR point clouds and RGB images, and is composed of two subnetworks: a 3D region proposal network (3D-RPN) and a second stage detection network. The input features are represented by two views, the front view (FV) and the bird's eye view (BEV). Different from the other state-of-the-art methods, the FV feature representation is a fusion of the information of RGB images and the projected LIDAR point clouds. The 3D-RPN is not only capable of performing multimodal feature fusion on full resolution feature maps, but it is also capable of generating reliable 3D object proposals. With these proposals, accurate oriented 3D bounding box regression and category classification of objects in 3D space are performed in the second stage detection network. Furthermore, a precise 3D intersection-over-union (IoU) loss is employed for joint optimization of box parameters, which not only increases the detection performance of hard samples but also reduces the false positives.
This thesis focuses on advancing the state-of-the-art 3D object detection and localization in autonomous driving. An autonomous vehicle requires to be operated in a very unpredictable and dynamic environment. Hence, a robust perception system is essential. This work presents a novel architecture, which leverages the aggregated features from both LIDAR point clouds and RGB images, and is composed of two subnetworks: a 3D region proposal network (3D-RPN) and a second stage detection network. The input features are represented by two views, the front view (FV) and the bird's eye view (BEV). Different from the other state-of-the-art methods, the FV feature representation is a fusion of the information of RGB images and the projected LIDAR point clouds. The 3D-RPN is not only capable of performing multimodal feature fusion on full resolution feature maps, but it is also capable of generating reliable 3D object proposals. With these proposals, accurate oriented 3D bounding box regression and category classification of objects in 3D space are performed in the second stage detection network. Furthermore, a precise 3D intersection-over-union (IoU) loss is employed for joint optimization of box parameters, which not only increases the detection performance of hard samples but also reduces the false positives.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \ImageNet Classication
with Deep Convolutional Neural Networks," in Proceedings of the Neural
Information Processing Systems, pp. 1097{1105, 2012.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
A. Karpathy, A. Khosla, M. Bernstein, et al., \ImageNet Large Scale Visual
Recognition Challenge," International Journal of Computer Vision, vol. 115,
no. 3, pp. 211{252, 2015.
[3] S. Ren, K. He, R. Girshick, and J. Sun, \Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks," in Proceedings of the
Neural Information Processing Systems, pp. 91{99, 2015.
[4] J. Dai, Y. Li, K. He, and J. Sun, \R-FCN: Object Detection via Region-based
Fully Convolutional Networks," in Proceedings of the Neural Information
Processing Systems, pp. 379{387, 2016.
[5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, \SSD: Single Shot MultiBox Detector," in Proceedings of the European
Conference on Computer Vision, pp. 21{37, 2016.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \You Only Look Once:
Unied, Real-Time Object Detection," in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 779{788, 2016.
[7] A. Geiger, P. Lenz, and R. Urtasun, \Are we ready for Autonomous Driving?
The KITTI Vision Benchmark Suite," in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 3354{3361, 2012.
[8] F. Yang, W. Choi, and Y. Lin, \Exploit All the Layers: Fast and Accurate
CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection
Classiers," in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2129{2137, 2016.
[9] J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y.-W. Tai, and L. Xu,
\Accurate Single Stage Detector Using Recurrent Rolling Convolution," in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pp. 5420{5428, 2017.
[10] B. Li, T. Zhang, and T. Xia, \Vehicle Detection from 3D Lidar Using Fully
Convolutional Network," in Proceedings of the Robotics: Science and Sys-
tems, 2016.
[11] B. Li, \3D Fully Convolutional Network for Vehicle Detection in Point
Cloud," in Proceedings of the International Conference on Intelligent Robots
and Systems, pp. 1513{1518, 2017.
[12] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, \Vote3Deep:
Fast Object Detection in 3D Point Clouds Using Ecient Convolutional
Neural Networks," in Proceedings of the IEEE International Conference on
Robotics and Automation, pp. 1355{1361, 2017.
[13] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, \Multi-View 3D Object Detec-
tion Network for Autonomous Driving," in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 1907{1915, 2017.
[14] J. Ku, M. Mozian, J. Lee, A. Harakeh, and S. Waslander, \Joint 3D
Proposal Generation and Object Detection from View Aggregation," arXiv
preprint arXiv:1712.02294, 2017.
[15] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, \Frustum PointNets
for 3D Object Detection From RGB-D Data," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 918{927, 2018.
[16] Y. Zhou and O. Tuzel, \VoxelNet: End-to-End Learning for Point Cloud
Based 3D Object Detection," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 4490{4499, 2018.
[17] M. Simon, S. Milz, K. Amende, and H.-M. Gross, \Complex-YOLO:
Real-time 3D Object Detection on Point Clouds," arXiv preprint
arXiv:1803.06199, 2018.
[18] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, \3D Object
Proposals using Stereo Imagery for Accurate Object Class Detection," IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
pp. 1259{1272, 2018.
[19] S. Song and J. Xiao, \Deep Sliding Shapes for Amodal 3D Object Detection
in RGB-D Images," in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 808{816, 2016.
[20] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, \Monocular
3D Object Detection for Autonomous Driving," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2147{2156,
2016.
[21] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau, \Deep
MANTA: A Coarse-to-ne Many-Task Network for joint 2D and 3D vehicle
analysis from monocular image," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2040{2049, 2017.
[22] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, \3D Bounding Box
Estimation Using Deep Learning and Geometry," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5632{5640,
2017.
[23] J. Redmon and A. Farhadi, \YOLO9000: Better, Faster, Stronger," in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 7263{7271, 2017.
[24] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Ur-
tasun, \3D Object Proposals for Accurate Object Class Detection," in Pro-
ceedings of the Neural Information Processing Systems, pp. 424{432, 2015.
[25] R. Girshick, \Fast R-CNN," in Proceedings of the IEEE International Con-
ference on Computer Vision, pp. 1440{1448, 2015.
[26] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, \A Unied Multi-scale Deep
Convolutional Neural Network for Fast Object Detection," in Proceedings of
the European Conference on Computer Vision, pp. 354{370, 2016.
[27] J. Lahoud and B. Ghanem, \2D-Driven 3D Object Detection in RGB-D
Images," in Proceedings of the IEEE International Conference on Computer
Vision, pp. 4622{4630, 2017.
[28] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, \PointNet: Deep Learning on
Point Sets for 3D Classication and Segmentation," in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 652{
660, 2017.
[29] D. Xu, D. Anguelov, and A. Jain, \PointFusion: Deep Sensor Fusion for
3D Bounding Box Estimation," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 244{253, 2018.
[30] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
\Feature Pyramid Networks for Object Detection," in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117{
2125, 2017.
[31] K. Simonyan and A. Zisserman, \Very Deep Convolutional Networks for
Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
[32] X. Glorot and Y. Bengio, \Understanding the Diculty of Training Deep
Feedforward Neural Networks," in Proceedings of the International Confer-
ence on Articial Intelligence and Statistics, pp. 249{256, 2010.
[33] K. He, G. Gkioxari, P. Dollar, and R. Girshick, \Mask R-CNN," in Proceed-
ings of the IEEE International Conference on Computer Vision, pp. 2980{
2988, 2017.
[34] D. P. Kingma and J. Ba, \Adam: A Method for Stochastic Optimization,"
arXiv preprint arXiv:1412.6980, 2014.
[35] M. D. Zeiler, \ADADELTA: An Adaptive Learning Rate Method," arXiv
preprint arXiv:1212.5701, 2012.
[36] T. Tieleman and G. Hinton, \RMSprop Gradient Optimization,"
http://www.cs.toronto.edu/tijmen/csc321/slides/lecture slides lec6.pdf,
2014.
[37] N. Qian, \On the Momentum Term in Gradient Descent Learning Algo-
rithms," International Neural Network Society, vol. 12, no. 1, pp. 145{151,
1999.