基於多視圖高融合之三維物件偵測及其在自動駕駛之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔡忠諺 Chung-Yen Tsai
論文名稱：	基於多視圖高融合之三維物件偵測及其在自動駕駛之應用 Multi-View Fusion for 3D Object Detection in Autonomous Vehicles
指導教授：	陳郁堂 Yie-Tarng Chen 方文賢 Wen-Hsien Fang
口試委員:	丘建青 Chien-ching Chiu 賴坤財 Kuen-Tsair Lay 徐繼聖 Gee-Sern Hsu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	63
中文關鍵詞：	多視圖、快速深度完成演算法、融合、3D IoU 損失函數、三維物件偵測、自動駕駛
外文關鍵詞：	multi-view, fast depth completion, fusion, 3D IoU loss, 3-D object detection, autonomous vehicles
相關次數：	點閱：230 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們提出一種新穎的架構，此架構同時運用光學雷達 (LIDAR) 、彩色照相機產生之 RGB-D 圖像以及應用快速深度完成演算法的前視圖 (FV) 以及鳥瞰視圖 (BEV) 三種不同的視圖進行車輛偵測。其中將前視圖透過快速深度完成演算法 (FDC) ，並將其稀疏的深度輸入轉換為更加密集的深度圖，以此為稀疏的前視圖補足投影後較為稀疏的資訊，使得網路在特徵提取時能夠提取更加豐富且準確的資訊。另外，本論文還針對不同特徵之間的融合提出了模型位置敏感融合策略 (PSM) ，此舉讓網路自行學習如何在融合過程中保留最有價值的資訊，藉由該方法可以有效的減少融合不同特徵時的資訊損失，並使得網路在之後的預測更加準確。另一方面，本論文也採用新的三維重疊率 (IoU) 損失函數與其他現有的損失函數一起對定界框進行聯合優化，且由於重疊率可以直接反映預測的精度，因此藉由三維重疊率損失函數可讓網路得知當前預測定界框的質量好壞，並使網路可以在學習的過程得到實時的反饋，以提高困難樣本的偵測精度的同時也能減少誤判。在 KITTI 數據集中與其他方法相比，結果顯示了本論文提出的新方法優於其他先進方法的性能。

In this thesis, we propose a novel architecture, which amasses RGB-D images, and the front view (FV) with fast depth completion and the bird's eye view (BEV) both from the light detection and ranging (LIDAR) point cloud for 3D object detection in autonomous vehicles. In contrast to previous works, a fast depth completion algorithm is invoked to infer a dense depth map from the sparse depth input. A position-sensitive multi-model (PSM) fusion scheme is also addressed to learn to retain the translation-variant information in the fusion process. Moreover, a 3D intersection of union (IoU) loss function is employed for joint optimization of box parameters to enhance the detection accuracy of hard samples and decrease the false positive. Simulation results show that the new approach can provide superior performance over the state-of-the-art works on the widespread KITTI dataset.

摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
圖示索引 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
表格索引 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
專有名詞縮寫表 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
第一章 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 三維物件偵測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 內容章節概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
第二章 相關背景回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 光學雷達 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 三維物件一階段偵測 . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 三維物件兩階段偵測 . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 三維回歸偵測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 三維物件偵測基準 . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
第三章 多視圖融合三維物件偵測法 . . . . . . . . . . . . . . . . . . . . . . . 9
1 方法概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 特徵圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 鳥瞰圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 RGB-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 前視圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 全解析度特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 三維區域提議網路 . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 三維錨點 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 三維感興趣區域 . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 1 × 1 卷積層 . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 多模型位置敏感融合策略 . . . . . . . . . . . . . . . . . . 19
4.5 三維提議 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 物件偵測網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1 三維定界框編碼 . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 方向向量 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 三維重疊率 . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4 預測結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 最佳化器 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
第四章 模擬結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1 訓練環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 自我方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 與先進方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 視覺比較圖 . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 結語 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
第五章 結論與未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
附錄1 : KITTI 資料集樣本 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
                                

[1] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detection Network for Autonomous Driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915, 2017.
[2] M. Simon, S. Milz, K. Amende, and H.-M. Gross, “Complex-YOLO:
Real-time 3D Object Detection on Point Clouds,” arXiv preprint
arXiv:1803.06199, 2018.
[3] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun, “3D Object Proposals for Accurate Object Class Detection,” in Proceedings of the Neural Information Processing Systems, pp. 424–432, 2015.
[4] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular
3D Object Detection for Autonomous Driving,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2147–2156,
2016.
[5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks,” in Proceedings of the
Neural Information Processing Systems, pp. 91–99, 2015.
[6] J. Ku, M. Mozifian, J. Lee, A. H., and S. L. W., “Joint 3D Proposal Generation and Object Detection from View Aggregation,” in Proceeding of the
IEEE International Conference on Intelligent Robots and Systems, pp. 1–8,
2018.
[7] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets
for 3D Object Detection From RGB-D Data,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 918–927, 2018.
[8] J. Lahoud and B. Ghanem, “2D-Driven 3D Object Detection in RGB-D
Images,” in Proceedings of the IEEE International Conference on Computer
Vision, pp. 4622–4630, 2017.
[9] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on
Point Sets for 3D Classification and Segmentation,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–
660, 2017.
[10] D. Xu, D. Anguelov, and A. Jain, “PointFusion: Deep Sensor Fusion for
3D Bounding Box Estimation,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 244–253, 2018.
[11] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud
Based 3D Object Detection,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 4490–4499, 2018.
[12] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely Embedded Convolutional
Detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
[13] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271, 2017.
[14] B. Li, T. Zhang, and T. Xia, “Vehicle Detection from 3D Lidar Using Fully
Convolutional Network,” in Proceedings of the Robotics: Science and Systems, 2016.
[15] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point
Cloud,” in Proceedings of the International Conference on Intelligent Robots
and Systems, pp. 1513–1518, 2017.
[16] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448, 2015.
[17] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuli`ere, and T. Chateau, “Deep
MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle
analysis from monocular image,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2040–2049, 2017.
[18] A. Mousavian, D. Anguelov, J. Flynn, and J. Koˇseck´a, “3D Bounding Box
Estimation Using Deep Learning and Geometry,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5632–5640,
2017.
[19] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A Unified Multi-scale Deep
Convolutional Neural Network for Fast Object Detection,” in Proceedings of
the European Conference on Computer Vision, pp. 354–370, 2016.
[20] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous Driving?
The KITTI Vision Benchmark Suite,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
[21] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature Pyramid Networks for Object Detection,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–
2125, 2017.
[22] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for
Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
[23] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,
Z. Wojna, Y. Song, S. Guadarrama, et al., “Speed/accuracy trade-offs for
modern convolutional object detectors,” in IEEE CVPR, 2017.
[24] J. Ku, A. Harakeh, and S. Waslander, “In Defense of Classical Image Processing: Fast Depth Completion on the CPU,” arXiv preprint arXiv:1802.00036,
2018.
[25] X. Glorot and Y. Bengio, “Understanding the Difficulty of Training Deep
Feedforward Neural Networks,” in Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 249–256, 2010.
[26] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3D Object
Proposals using Stereo Imagery for Accurate Object Class Detection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5,
pp. 1259–1272, 2018.
[27] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based
Fully Convolutional Networks,” in Proceedings of the Neural Information
Processing Systems, pp. 379–387, 2016.
[28] S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection
in RGB-D Images,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 808–816, 2016.
[29] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[30] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv
preprint arXiv:1212.5701, 2012.
[31] T. Tieleman and G. Hinton, “RMSprop Gradient Optimization,”
http://www.cs.toronto.edu/tijmen/csc321/slides/lecture slides lec6.pdf,
2014.
[32] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” International Neural Network Society, vol. 12, no. 1, pp. 145–151,
1999.
[33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification
with Deep Convolutional Neural Networks,” in Proceedings of the Neural
Information Processing Systems, pp. 1097–1105, 2012.

全文公開日期 2024/08/13 (校內網路)
全文公開日期 2024/08/13 (校外網路)
全文公開日期 2024/08/13 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文