利用三維重疊率損失函數以優化三維物件偵測網路及其在自駕車之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴朝祥 Chao-Hsiang Lai
論文名稱：	利用三維重疊率損失函數以優化三維物件偵測網路及其在自駕車之應用 Optimized 3D Object Detection Networks Using 3D Intersection-over-Union Loss Function for Autonomous Driving
指導教授：	方文賢 Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen
口試委員:	方文賢 Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen 賴坤財 Kuen-Tsair Lay 丘建青 Chien-Ching Chiu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	59
中文關鍵詞：	3D object detection 、3D region proposal network 、multimodal object detector 、multi-feature aggregation 、3D IoU loss
外文關鍵詞：	3D object detection, 3D region proposal network, multimodal object detector, multi-feature aggregation, 3D IoU loss
相關次數：	點閱：222 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

This thesis focuses on advancing the state-of-the-art 3D object detection and localization in autonomous driving. An autonomous vehicle requires to be operated in a very unpredictable and dynamic environment. Hence, a robust perception system is essential. This work presents a novel architecture, which leverages the aggregated features from both LIDAR point clouds and RGB images, and is composed of two subnetworks: a 3D region proposal network (3D-RPN) and a second stage detection network. The input features are represented by two views, the front view (FV) and the bird's eye view (BEV). Different from the other state-of-the-art methods, the FV feature representation is a fusion of the information of RGB images and the projected LIDAR point clouds. The 3D-RPN is not only capable of performing multimodal feature fusion on full resolution feature maps, but it is also capable of generating reliable 3D object proposals. With these proposals, accurate oriented 3D bounding box regression and category classification of objects in 3D space are performed in the second stage detection network. Furthermore, a precise 3D intersection-over-union (IoU) loss is employed for joint optimization of box parameters, which not only increases the detection performance of hard samples but also reduces the false positives.

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 3D Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 3D Single-stage Detectors . . . . . . . . . . . . . . . . . . . . . . 5
2 3D Two-stage Detectors . . . . . . . . . . . . . . . . . . . . . . . 6
3 2D Proposals with 3D Regression Detectors . . . . . . . . . . . . 7
4 3D Object Detection Benchmark . . . . . . . . . . . . . . . . . . 8
Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Feature Maps Representation . . . . . . . . . . . . . . . . . . . . 11
2.1 Bird's Eye View . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Front View . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Full Resolution Feature Extractor . . . . . . . . . . . . . . . . . . 13
4 3D Region Proposal Network . . . . . . . . . . . . . . . . . . . . 15
4.1 3D Anchor Generation . . . . . . . . . . . . . . . . . . . . 16
4.2 3D RoI Align . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Bottleneck Layer . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 3D Proposal Generation . . . . . . . . . . . . . . . . . . . 18
4.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Second Stage Detection Network . . . . . . . . . . . . . . . . . . . 19
5.1 3D Bounding Box Encoding . . . . . . . . . . . . . . . . . 20
5.2 Explicit Orientation Vector Regression . . . . . . . . . . . 21
5.3 Precise 3D Intersection-over-Union Regression . . . . . . . 21
5.4 Final Prediction Generation . . . . . . . . . . . . . . . . . 23
5.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Training Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . 41
1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Appendix A : Example images from the dataset . . . . . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
                                

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \ImageNet Classication
with Deep Convolutional Neural Networks," in Proceedings of the Neural
Information Processing Systems, pp. 1097{1105, 2012.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
A. Karpathy, A. Khosla, M. Bernstein, et al., \ImageNet Large Scale Visual
Recognition Challenge," International Journal of Computer Vision, vol. 115,
no. 3, pp. 211{252, 2015.
[3] S. Ren, K. He, R. Girshick, and J. Sun, \Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks," in Proceedings of the
Neural Information Processing Systems, pp. 91{99, 2015.
[4] J. Dai, Y. Li, K. He, and J. Sun, \R-FCN: Object Detection via Region-based
Fully Convolutional Networks," in Proceedings of the Neural Information
Processing Systems, pp. 379{387, 2016.
[5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, \SSD: Single Shot MultiBox Detector," in Proceedings of the European
Conference on Computer Vision, pp. 21{37, 2016.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \You Only Look Once:
Unied, Real-Time Object Detection," in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 779{788, 2016.
[7] A. Geiger, P. Lenz, and R. Urtasun, \Are we ready for Autonomous Driving?
The KITTI Vision Benchmark Suite," in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 3354{3361, 2012.
[8] F. Yang, W. Choi, and Y. Lin, \Exploit All the Layers: Fast and Accurate
CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection
Classiers," in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2129{2137, 2016.
[9] J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y.-W. Tai, and L. Xu,
\Accurate Single Stage Detector Using Recurrent Rolling Convolution," in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pp. 5420{5428, 2017.
[10] B. Li, T. Zhang, and T. Xia, \Vehicle Detection from 3D Lidar Using Fully
Convolutional Network," in Proceedings of the Robotics: Science and Sys-
tems, 2016.
[11] B. Li, \3D Fully Convolutional Network for Vehicle Detection in Point
Cloud," in Proceedings of the International Conference on Intelligent Robots
and Systems, pp. 1513{1518, 2017.
[12] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, \Vote3Deep:
Fast Object Detection in 3D Point Clouds Using Ecient Convolutional
Neural Networks," in Proceedings of the IEEE International Conference on
Robotics and Automation, pp. 1355{1361, 2017.
[13] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, \Multi-View 3D Object Detec-
tion Network for Autonomous Driving," in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 1907{1915, 2017.
[14] J. Ku, M. Mozian, J. Lee, A. Harakeh, and S. Waslander, \Joint 3D
Proposal Generation and Object Detection from View Aggregation," arXiv
preprint arXiv:1712.02294, 2017.
[15] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, \Frustum PointNets
for 3D Object Detection From RGB-D Data," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 918{927, 2018.
[16] Y. Zhou and O. Tuzel, \VoxelNet: End-to-End Learning for Point Cloud
Based 3D Object Detection," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 4490{4499, 2018.
[17] M. Simon, S. Milz, K. Amende, and H.-M. Gross, \Complex-YOLO:
Real-time 3D Object Detection on Point Clouds," arXiv preprint
arXiv:1803.06199, 2018.
[18] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, \3D Object
Proposals using Stereo Imagery for Accurate Object Class Detection," IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
pp. 1259{1272, 2018.
[19] S. Song and J. Xiao, \Deep Sliding Shapes for Amodal 3D Object Detection
in RGB-D Images," in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 808{816, 2016.
[20] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, \Monocular
3D Object Detection for Autonomous Driving," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2147{2156,
2016.
[21] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau, \Deep
MANTA: A Coarse-to-ne Many-Task Network for joint 2D and 3D vehicle
analysis from monocular image," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2040{2049, 2017.
[22] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, \3D Bounding Box
Estimation Using Deep Learning and Geometry," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5632{5640,
2017.
[23] J. Redmon and A. Farhadi, \YOLO9000: Better, Faster, Stronger," in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 7263{7271, 2017.
[24] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Ur-
tasun, \3D Object Proposals for Accurate Object Class Detection," in Pro-
ceedings of the Neural Information Processing Systems, pp. 424{432, 2015.
[25] R. Girshick, \Fast R-CNN," in Proceedings of the IEEE International Con-
ference on Computer Vision, pp. 1440{1448, 2015.
[26] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, \A Unied Multi-scale Deep
Convolutional Neural Network for Fast Object Detection," in Proceedings of
the European Conference on Computer Vision, pp. 354{370, 2016.
[27] J. Lahoud and B. Ghanem, \2D-Driven 3D Object Detection in RGB-D
Images," in Proceedings of the IEEE International Conference on Computer
Vision, pp. 4622{4630, 2017.
[28] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, \PointNet: Deep Learning on
Point Sets for 3D Classication and Segmentation," in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 652{
660, 2017.
[29] D. Xu, D. Anguelov, and A. Jain, \PointFusion: Deep Sensor Fusion for
3D Bounding Box Estimation," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 244{253, 2018.
[30] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
\Feature Pyramid Networks for Object Detection," in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117{
2125, 2017.
[31] K. Simonyan and A. Zisserman, \Very Deep Convolutional Networks for
Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
[32] X. Glorot and Y. Bengio, \Understanding the Diculty of Training Deep
Feedforward Neural Networks," in Proceedings of the International Confer-
ence on Articial Intelligence and Statistics, pp. 249{256, 2010.
[33] K. He, G. Gkioxari, P. Dollar, and R. Girshick, \Mask R-CNN," in Proceed-
ings of the IEEE International Conference on Computer Vision, pp. 2980{
2988, 2017.
[34] D. P. Kingma and J. Ba, \Adam: A Method for Stochastic Optimization,"
arXiv preprint arXiv:1412.6980, 2014.
[35] M. D. Zeiler, \ADADELTA: An Adaptive Learning Rate Method," arXiv
preprint arXiv:1212.5701, 2012.
[36] T. Tieleman and G. Hinton, \RMSprop Gradient Optimization,"
http://www.cs.toronto.edu/tijmen/csc321/slides/lecture slides lec6.pdf,
2014.
[37] N. Qian, \On the Momentum Term in Gradient Descent Learning Algo-
rithms," International Neural Network Society, vol. 12, no. 1, pp. 145{151,
1999.

簡易檢索 / 詳目顯示

相關論文