研究生: |
涂博翔 PO-HSIANG TU |
---|---|
論文名稱: |
三維物件偵測和自駕車應用 3D Object Detection Network for Autonomous Vehicles |
指導教授: |
方文賢
Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen |
口試委員: |
呂政修
Jenq-Shiou Leu 陳省隆 Hsing-Lung Chen 林銘波 Ming-Bo Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 57 |
中文關鍵詞: | 多視圖 、三維重疊損失函數 、三維物件偵測 、生成光學雷達 、自動駕駛 |
外文關鍵詞: | multi-view, 3D IoU loss, 3D Object Detection, Pseudo lidar, autonomous vehicles |
相關次數: | 點閱:283 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們使用三種資訊去進行車輛的偵測,第一個是利用彩色相
機的RGB圖像並且結合深度和高度資訊得到的RGB-D資料,第二個利用光學
雷達(LIDAR)資料產生的鳥瞰圖,第三個也是利用光學雷達(LIDAR)資料產生
的點雲圖,並且使用組正歸化,和權重標準化去優化,在損失函數方面使用三
維重疊率去計算,有效學習汽車位置資訊。 我們將KITTI 數據集中光學雷達
資料換成生成的偽光學雷達資料,原理是根據影像去生成新的雷達資料,利
用圖像獲取對應的深度,然後將原本的圖結合生成的深度訊息得到的偽雷達
圖,此生成的光學雷達圖雖然比原始的光學雷達圖來的較不精確,但是和只使
用RGB圖像的檢測器架構相比,我們的性能比這些方法好,因此在未來在自動
駕駛的應用方面,使用相機生成的光學雷達可以為一種額外的資訊。
關 鍵 字 :多視圖、三維重疊損失函數、三維物件偵測、生成光學雷達、自
動駕駛。
In this thesis, we use three types of information to detect vehicles. The first is
the RGB-D information obtained by using RGB images combined with the depth
and height information, and the second one is the bird’s eye view generated by
using the LIDAR data, and the third one is a point cloud image also generated
by the LIDAR data, and uses group normalization and weight standardization
to optimize, and uses a three-dimensional overlap rate to reduce the loss func-
tion calculate and learn car position information effectively. We also replace the
original LIDAR data in the KITTI dataset with the generated pseudo LIDAR
data. The principle is to generate new LIDAR data based on the image, use the
images to obtain the corresponding depth, and then combine the original maps
to generate The pseudo LIDAR obtained from the depth information. Although
the generated pseudo LIDAR is less accurate than the original LIDAR, it is more
informative and denser than the original LIDAR. Our performance is better than
these methods compared with the detector architecture that only uses RGB im-
ages, therefore in the application of automatic driving, using presudo LIDAR
generated by the camera can be an additional information.
Keywards: multi-view, 3D IoU loss, 3D Object Detection, pseudo lidar,
autonomous vehicles.
[1] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detec-
tion Network for Autonomous Driving,” in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 1907–1915, 2017.
[2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks,” in Proceedings of the
Neural Information Processing Systems, pp. 91–99, 2015.
[3] J. Ku, M. Mozifian, J. Lee, A. H., and S. L. W., “Joint 3D Proposal Gen-
eration and Object Detection from View Aggregation,” in Proceeding of the
IEEE International Conference on Intelligent Robots and Systems, pp. 1–8,
2018.
[4] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature Pyramid Networks for Object Detection,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–
2125, 2017.
[5] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets
for 3D Object Detection From RGB-D Data,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 918–927, 2018.
[6] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on
Point Sets for 3D Classification and Segmentation,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–
660, 2017.
[7] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud
Based 3D Object Detection,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 4490–4499, 2018.
[8] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely Embedded Convolutional
Detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
[9] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Point-
pillars: Fast encoders for object detection from point clouds,” in Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 12697–12705, 2019.
[10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, “SSD: Single Shot MultiBox Detector,” in Proceedings of the European
Conference on Computer Vision, pp. 21–37, 2016.
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once:
Unified, Real-Time Object Detection,” arXiv preprint arXiv:1506.02640,
2015.
[12] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 7263–7271, 2017.
[13] B. Li, T. Zhang, and T. Xia, “Vehicle Detection from 3D Lidar Using Fully
Convolutional Network,” in Proceedings of the Robotics: Science and Sys-
tems, 2016.
[14] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point
Cloud,” in Proceedings of the International Conference on Intelligent Robots
and Systems, pp. 1513–1518, 2017.
[15] M. Simon, S. Milz, K. Amende, and H.-M. Gross, “Complex-YOLO: An
Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds,”
arXiv preprint arXiv:1803.06199, 2018.
[16] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Ur-
tasun, “3D Object Proposals for Accurate Object Class Detection,” in Pro-
ceedings of the Neural Information Processing Systems, pp. 424–432, 2015.
[17] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Con-
ference on Computer Vision, pp. 1440–1448, 2015.
[18] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular
3D Object Detection for Autonomous Driving,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2147–2156,
2016.
[19] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teulière, and T. Chateau, “Deep
MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle
analysis from monocular image,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2040–2049, 2017.
[20] A. Mousavian, D. Anguelov, J. Flynn, and J. Košecká, “3D Bounding Box
Estimation Using Deep Learning and Geometry,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5632–5640,
2017.
[21] J. Lahoud and B. Ghanem, “2D-Driven 3D Object Detection in RGB-D
Images,” in Proceedings of the IEEE International Conference on Computer
Vision, pp. 4622–4630, 2017.
[22] D. Xu, D. Anguelov, and A. Jain, “PointFusion: Deep Sensor Fusion for
3D Bounding Box Estimation,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 244–253, 2018.
[23] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous Driving?
The KITTI Vision Benchmark Suite,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
[24] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal
regression network for monocular depth estimation,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–
2011, 2018.
[25] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular
depth estimation with left-right consistency,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 270–279, 2017.
[26] J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 5410–5418, 2018.
[27] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and
T. Brox, “A large dataset to train convolutional networks for disparity, opti-
cal flow, and scene flow estimation,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 4040–4048, 2016.
[28] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single
image using a multi-scale deep network,” in Advances in Neural Information
Processing Systems, pp. 2366–2374, 2014.
[29] K. Karsch, C. Liu, and S. B. Kang, “Depth extraction from video using
non-parametric sampling,” in European Conference on Computer Vision,
pp. 775–788, Springer, 2012.
[30] Y. Wang, Z. Lai, G. Huang, B. H. Wang, L. Van Der Maaten, M. Campbell,
and K. Q. Weinberger, “Anytime stereo image depth estimation on mobile
devices,” in International Conference on Robotics and Automation, pp. 5893–
5900, IEEE, 2019.
[31] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3d object
proposals using stereo imagery for accurate object class detection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
pp. 1259–1272, 2017.
[32] C. C. Pham and J. W. Jeon, “Robust object proposals re-ranking for ob-
ject detection in autonomous driving using convolutional neural networks,”
Signal Processing: Image Communication, vol. 53, pp. 110–122, 2017.
[33] B. Xu and Z. Chen, “Multi-level fusion based 3d object detection from
monocular images,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2345–2353, 2018.
[34] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q.
Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the gap
in 3d object detection for autonomous driving,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 8445–8453,
2019.
[35] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for
Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
[36] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,
Z. Wojna, Y. Song, S. Guadarrama, et al., “Speed/accuracy trade-offs for
modern convolutional object detectors,” in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 7310–7311, 2017.
[37] S. Qiao, H. Wang, C. Liu, W. Shen, and A. Yuille, “Weight standardization,”
arXiv preprint arXiv:1903.10520, 2019.
[38] Y. Wu and K. He, “Group normalization,” in Proceedings of the European
Conference on Computer Vision, pp. 3–19, 2018.
[39] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parame-
ters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
[40] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based
Fully Convolutional Networks,” in Proceedings of the Neural Information
Processing Systems, pp. 379–387, 2016.
[41] S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection
in RGB-D Images,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 808–816, 2016.
[42] F. Manhardt, W. Kehl, and A. Gaidon, “Roi10d: Monocular lifting of 2d de-
tection to 6d pose and metric shape,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2019.
[43] X. Ma, Z. Wang, H. Li, P. Zhang, W. Ouyang, and X. Fan, “Accurate
Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for
Autonomous Driving ,” in Proceedings of the IEEE International Conference
on Computer Vision, 2019.
[44] P. Li, X. Chen, and S. Shen, “Stereo R-CNN Based 3D Object Detection for
Autonomous Driving ,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2019.
[45] Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Camp-
bell, and K. Q. Weinberger, “Pseudo-lidar++: Accurate depth for 3d object
detection in autonomous driving,” arXiv preprint arXiv:1906.06310, 2019.