簡易檢索 / 詳目顯示

研究生: 涂博翔
PO-HSIANG TU
論文名稱: 三維物件偵測和自駕車應用
3D Object Detection Network for Autonomous Vehicles
指導教授: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
口試委員: 呂政修
Jenq-Shiou Leu
陳省隆
Hsing-Lung Chen
林銘波
Ming-Bo Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 57
中文關鍵詞: 多視圖三維重疊損失函數三維物件偵測生成光學雷達自動駕駛
外文關鍵詞: multi-view, 3D IoU loss, 3D Object Detection, Pseudo lidar, autonomous vehicles
相關次數: 點閱:283下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在本論文中,我們使用三種資訊去進行車輛的偵測,第一個是利用彩色相
機的RGB圖像並且結合深度和高度資訊得到的RGB-D資料,第二個利用光學
雷達(LIDAR)資料產生的鳥瞰圖,第三個也是利用光學雷達(LIDAR)資料產生
的點雲圖,並且使用組正歸化,和權重標準化去優化,在損失函數方面使用三
維重疊率去計算,有效學習汽車位置資訊。 我們將KITTI 數據集中光學雷達
資料換成生成的偽光學雷達資料,原理是根據影像去生成新的雷達資料,利
用圖像獲取對應的深度,然後將原本的圖結合生成的深度訊息得到的偽雷達
圖,此生成的光學雷達圖雖然比原始的光學雷達圖來的較不精確,但是和只使
用RGB圖像的檢測器架構相比,我們的性能比這些方法好,因此在未來在自動
駕駛的應用方面,使用相機生成的光學雷達可以為一種額外的資訊。
關 鍵 字 :多視圖、三維重疊損失函數、三維物件偵測、生成光學雷達、自
動駕駛。


In this thesis, we use three types of information to detect vehicles. The first is
the RGB-D information obtained by using RGB images combined with the depth
and height information, and the second one is the bird’s eye view generated by
using the LIDAR data, and the third one is a point cloud image also generated
by the LIDAR data, and uses group normalization and weight standardization
to optimize, and uses a three-dimensional overlap rate to reduce the loss func-
tion calculate and learn car position information effectively. We also replace the
original LIDAR data in the KITTI dataset with the generated pseudo LIDAR
data. The principle is to generate new LIDAR data based on the image, use the
images to obtain the corresponding depth, and then combine the original maps
to generate The pseudo LIDAR obtained from the depth information. Although
the generated pseudo LIDAR is less accurate than the original LIDAR, it is more
informative and denser than the original LIDAR. Our performance is better than
these methods compared with the detector architecture that only uses RGB im-
ages, therefore in the application of automatic driving, using presudo LIDAR
generated by the camera can be an additional information.
Keywards: multi-view, 3D IoU loss, 3D Object Detection, pseudo lidar,
autonomous vehicles.

摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 圖示索引 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 表格索引 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 專有名詞縮寫表 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 第一章 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 三維物件偵測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 內容章節大綱 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 第二章 相關背景回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 光學雷達 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 三維物件一階段偵測 . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 三維物件兩階段偵測 . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 三維回歸偵測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 三維物件偵測基準 . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 圖像深度估計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 第三章 多視圖融合三維物件偵測法 . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 方法概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 特徵圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 鳥瞰圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 RGB-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.3 柱狀圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 三維區域提議網路 . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.1 三維錨點 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.2 三維感興趣區域 . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.3 1 × 1 卷積層 . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4.4 三維提議 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4.5 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 第二階段偵測網路 . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5.1 三維定界框編碼 . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5.2 方向向量 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5.3 三維重疊率 . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5.4 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 生成光學雷達 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.7 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 第四章 模擬結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1 訓練環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 自我方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 與先進方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 結果呈現 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.6 新版本生成雷達比較 . . . . . . . . . . . . . . . . . . . . . . . . . 48 第五章 結論與未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1 結論和未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 附錄1 : KITTI 資料集樣本 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

[1] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detec-
tion Network for Autonomous Driving,” in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 1907–1915, 2017.
[2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks,” in Proceedings of the
Neural Information Processing Systems, pp. 91–99, 2015.
[3] J. Ku, M. Mozifian, J. Lee, A. H., and S. L. W., “Joint 3D Proposal Gen-
eration and Object Detection from View Aggregation,” in Proceeding of the
IEEE International Conference on Intelligent Robots and Systems, pp. 1–8,
2018.
[4] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature Pyramid Networks for Object Detection,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–
2125, 2017.
[5] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets
for 3D Object Detection From RGB-D Data,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 918–927, 2018.
[6] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on
Point Sets for 3D Classification and Segmentation,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–
660, 2017.
[7] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud
Based 3D Object Detection,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 4490–4499, 2018.
[8] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely Embedded Convolutional
Detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
[9] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Point-
pillars: Fast encoders for object detection from point clouds,” in Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 12697–12705, 2019.
[10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, “SSD: Single Shot MultiBox Detector,” in Proceedings of the European
Conference on Computer Vision, pp. 21–37, 2016.
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once:
Unified, Real-Time Object Detection,” arXiv preprint arXiv:1506.02640,
2015.
[12] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 7263–7271, 2017.
[13] B. Li, T. Zhang, and T. Xia, “Vehicle Detection from 3D Lidar Using Fully
Convolutional Network,” in Proceedings of the Robotics: Science and Sys-
tems, 2016.
[14] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point
Cloud,” in Proceedings of the International Conference on Intelligent Robots
and Systems, pp. 1513–1518, 2017.
[15] M. Simon, S. Milz, K. Amende, and H.-M. Gross, “Complex-YOLO: An
Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds,”
arXiv preprint arXiv:1803.06199, 2018.
[16] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Ur-
tasun, “3D Object Proposals for Accurate Object Class Detection,” in Pro-
ceedings of the Neural Information Processing Systems, pp. 424–432, 2015.
[17] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Con-
ference on Computer Vision, pp. 1440–1448, 2015.
[18] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular
3D Object Detection for Autonomous Driving,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2147–2156,
2016.
[19] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teulière, and T. Chateau, “Deep
MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle
analysis from monocular image,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2040–2049, 2017.
[20] A. Mousavian, D. Anguelov, J. Flynn, and J. Košecká, “3D Bounding Box
Estimation Using Deep Learning and Geometry,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5632–5640,
2017.
[21] J. Lahoud and B. Ghanem, “2D-Driven 3D Object Detection in RGB-D
Images,” in Proceedings of the IEEE International Conference on Computer
Vision, pp. 4622–4630, 2017.
[22] D. Xu, D. Anguelov, and A. Jain, “PointFusion: Deep Sensor Fusion for
3D Bounding Box Estimation,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 244–253, 2018.
[23] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous Driving?
The KITTI Vision Benchmark Suite,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
[24] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal
regression network for monocular depth estimation,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–
2011, 2018.
[25] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular
depth estimation with left-right consistency,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 270–279, 2017.
[26] J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 5410–5418, 2018.
[27] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and
T. Brox, “A large dataset to train convolutional networks for disparity, opti-
cal flow, and scene flow estimation,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 4040–4048, 2016.
[28] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single
image using a multi-scale deep network,” in Advances in Neural Information
Processing Systems, pp. 2366–2374, 2014.
[29] K. Karsch, C. Liu, and S. B. Kang, “Depth extraction from video using
non-parametric sampling,” in European Conference on Computer Vision,
pp. 775–788, Springer, 2012.
[30] Y. Wang, Z. Lai, G. Huang, B. H. Wang, L. Van Der Maaten, M. Campbell,
and K. Q. Weinberger, “Anytime stereo image depth estimation on mobile
devices,” in International Conference on Robotics and Automation, pp. 5893–
5900, IEEE, 2019.
[31] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3d object
proposals using stereo imagery for accurate object class detection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
pp. 1259–1272, 2017.
[32] C. C. Pham and J. W. Jeon, “Robust object proposals re-ranking for ob-
ject detection in autonomous driving using convolutional neural networks,”
Signal Processing: Image Communication, vol. 53, pp. 110–122, 2017.
[33] B. Xu and Z. Chen, “Multi-level fusion based 3d object detection from
monocular images,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2345–2353, 2018.
[34] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q.
Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the gap
in 3d object detection for autonomous driving,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 8445–8453,
2019.
[35] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for
Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
[36] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,
Z. Wojna, Y. Song, S. Guadarrama, et al., “Speed/accuracy trade-offs for
modern convolutional object detectors,” in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 7310–7311, 2017.
[37] S. Qiao, H. Wang, C. Liu, W. Shen, and A. Yuille, “Weight standardization,”
arXiv preprint arXiv:1903.10520, 2019.
[38] Y. Wu and K. He, “Group normalization,” in Proceedings of the European
Conference on Computer Vision, pp. 3–19, 2018.
[39] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parame-
ters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
[40] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based
Fully Convolutional Networks,” in Proceedings of the Neural Information
Processing Systems, pp. 379–387, 2016.
[41] S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection
in RGB-D Images,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 808–816, 2016.
[42] F. Manhardt, W. Kehl, and A. Gaidon, “Roi10d: Monocular lifting of 2d de-
tection to 6d pose and metric shape,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2019.
[43] X. Ma, Z. Wang, H. Li, P. Zhang, W. Ouyang, and X. Fan, “Accurate
Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for
Autonomous Driving ,” in Proceedings of the IEEE International Conference
on Computer Vision, 2019.
[44] P. Li, X. Chen, and S. Shen, “Stereo R-CNN Based 3D Object Detection for
Autonomous Driving ,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2019.
[45] Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Camp-
bell, and K. Q. Weinberger, “Pseudo-lidar++: Accurate depth for 3d object
detection in autonomous driving,” arXiv preprint arXiv:1906.06310, 2019.

無法下載圖示 全文公開日期 2025/08/24 (校內網路)
全文公開日期 2025/08/24 (校外網路)
全文公開日期 2025/08/24 (國家圖書館:臺灣博碩士論文系統)
QR CODE