研究生: |
許華哲 Hua-Che Hsu |
---|---|
論文名稱: |
運用多尺度錐體點網路進行三維物件偵測之研究 Study of Applying Multi-Scale Frustum PointNets to 3D Object Detection |
指導教授: |
呂政修
Jenq-Shiou Leu 陳維美 Wei-Mei Chen |
口試委員: |
林淵翔
Yuan-Hsiang Lin 林昌鴻 Chang Hong Lin 呂政修 Jenq-Shiou Leu 陳維美 Wei-Mei Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 41 |
中文關鍵詞: | 多尺度機制 、錐體點網路 、三維物件偵測 、點雲 、點網路 、自駕車 |
外文關鍵詞: | Multi-Scale Mechanism, Frustum PointNet, 3D Object Dection, Point Clouds, PointNet, Self-Driving Car |
相關次數: | 點閱:347 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自動駕駛汽車會是今後最重要的技術之一,自動駕駛技術尚未到達完全自動控制之前必須要有駕駛員的輔助控制,很多時候交通事故的主要原因除了人為造成以外,另一個原因是自動駕駛汽車感知周圍環境的辨識系統不成功。自動駕駛和人類駕駛一樣需要擁有三維的視覺,目前已經存在多種三維物件辨識的方法,在這篇論文中,我們提出一個能改進現有使用點雲做三維物件辨識的方法。基於錐體點網路模型並使用點雲作為預測依據,在三維空間中以原點(0,0,0)對點雲中的每個三維座標點進行0.125倍、不變和8倍的縮放,三組大小的點雲各別使用三層卷積神經網路提取特徵,最後再將三組特徵串聯為一組特徵。我們針對三組不同倍數的點雲使用相對應卷積神經網路的卷積核大小,倍數較小的點雲使用較小的卷積核,倍數較大的點雲使用較大的卷積核。倍數較小的點雲的卷積核大小在每層卷積神經網路中依層增加,倍數較大的點雲的卷積核大小在每層卷積神經網路中依層減少,倍數不變的點雲在每層卷積神經網路使用相同的卷積核大小,此方法如同人類觀測物體並能從不同倍數的點雲中提取有用且不同的點雲特徵。我們使用KITTI提供的驗證方法比較改進前後的模型,結果顯示,我們的模型辨識出的精確率比模型改進前的精確率還要好。
Abstract—Self-driving cars will be one of the most important technologies in the future. The driver’s assisted control is necessary before the automatic driving technology reaches full automatic control. Therefore, the main cause of self-driving car accidents is not only caused by humans, but also detections. Self-driving cars need to have the same three-dimensional vision as humans. At present, there are a variety of three-dimensional object detection methods. In this paper, we propose a method that can improve existing methods of using point clouds for three-dimensional object detection. Based on the Frustum PointNets for 3D Object Detection from RGB-D Data and using the point clouds as the basis for prediction. Each three-dimensional coordinate point in point clouds is scaled by 0.125 times, unchanged and 8 times with the origin (0,0,0) in the three-dimensional space. Three sets of point clouds use three-layer convolutional neural networks to extract features, and three sets of features are concatenated into one set of features. We use the convolution kernel corresponding to three different multiples of point clouds. The smaller multiples of point clouds use the smaller convolution kernel, and the larger multiples of point clouds use the larger convolution kernel. The convolution kernel of point clouds with a smaller multiple is increased in each layer of the convolutional neural network, and the convolution kernel of point clouds with a larger multiple is reduced in each layer of the convolutional neural network. Point clouds with the same multiple uses the same convolution kernel in each layer of convolutional neural network. This method is similar to how humans observe objects and extract useful features from different multiples of point clouds. We use the verification method provided by the KITTI to compare the model before and after the improvement, and the results show that the precision of our model detection is better than the original model.
[1] SAE International, "Automated Driving Levels of Driving Automation," New SAE International Standard J3016, 2014.
[2] R. Q. Charles, H. Su, M. Kaichun and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 77-85.
[3] C. R. Qi, W. Liu, C. Wu, H. Su and L. J. Guibas, "Frustum PointNets for 3D Object Detection from RGB-D Data," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 918-927.
[4] L. Chen, Y. Yang, J. Wang, W. Xu and A. L. Yuille, "Attention to Scale: Scale-Aware Semantic Image Segmentation," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3640-3649.
[5] Y. Bai, L. Dong, X. Huang, W. Yang and M. Liao, "Hierarchical segmentation of polarimetric SAR image via Non-Parametric Graph Entropy," 2014 IEEE Geoscience and Remote Sensing Symposium, 2014, pp. 2786-2789.
[6] A. Geiger, P. Lenz, C. Stiller and R. Urtasun, “Vision Meets Robotics: The KITTI Dataset,” The International Journal of Robotics Research, vol. 32, no. 11, 2013, pp. 1231–1237.
[7] M. Everingham, L. Van~Gool, Williams, C. K. I., J. Winn and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results," http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html, 2007.
[8] A. Simonelli, S. R. Bulò, L. Porzi, M. Lopez-Antequera and P. Kontschieder, "Disentangling Monocular 3D Object Detection," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1991-1999.
[9] Jeffrey G. Andrews, Stefano Buzzi, Wan Choi, Stephen V. Hanly, Angel Lozano, Anthony C. K. Soong and Jianzhong Charlie Zhang, "What Will 5G Be?," in IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1065-1082, June 2014, doi: 10.1109/JSAC.2014.2328098.
[10] M. Daily, S. Medasani, R. Behringer and M. Trivedi, "Self-Driving Cars," in Computer, vol. 50, no. 12, pp. 18-23, December 2017, doi: 10.1109/MC.2017.4451204.
[11] L. K. Hansen and P. Salamon, "Neural network ensembles," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, Oct. 1990, doi: 10.1109/34.58871.
[12] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
[13] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 580-587, doi: 10.1109/CVPR.2014.81.
[14] Pedro F. Felzenszwalb and Daniel P. Huttenlocher, "Efficient Graph-based Image Segmentation," International journal of computer vision, 2004, pp. 167-181.
[15] B. Li, W. Ouyang, L. Sheng, X. Zeng and X. Wang, "GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1019-1028, doi: 10.1109/CVPR.2019.00111.
[16] R. Girshick, "Fast r-cnn," Proceedings of the IEEE international conference on computer vision, 2015.
[17] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031.
[18] Charles R. Qi, Li Yi, Hao Su and Leonidas J. Guibas, "Pointnet++: Deep hierarchical feature learning on point sets in a metric space," Advances in neural information processing systems, 2017.