基於二維投影與Transformer網路之光達人車偵測｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳祐丞 Yo-Cheng Chen
論文名稱：	基於二維投影與Transformer網路之光達人車偵測 LiDAR Pedestrian/Vehicle Detection Using 2D Projections And Transformer Networks
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	鄭文皇 Weng-Huang Cheng 花凱龍 Kai-Lung Hua 郭彥甫 Yan-Fu Kuo 陳駿丞 Jun-Cheng Chen 余能豪 Neng-Hao Yu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	36
中文關鍵詞：	光達、點雲、三維物件偵測
外文關鍵詞：	LiDAR, Point Cloud, 3D Object detection, Transformer
相關次數：	點閱：274 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

自駕系統需要一個可以根據環境做出相應反應的物件偵測模組。多數感測模組已可以做到一個強健的物件偵測系統，但我們仍需要能應對非最佳狀況(例如，光學相機較難應付惡劣的天氣條件)的感測器。因此，我們開發僅需光達的人車偵測系統。但是光達對於遠方物件僅反映出較稀疏的點雲，為了減輕稀疏性的影響，我們投影群集到二維影像上並擴張他們，而且不像以往的方法，我們保留了三維資訊在二維影像上而不只是使用輪廓或形狀而已。再者，我們利用 Transformer 來處理光達輸出的點雲，我們發現這樣的方法非常有效。結果顯示我們的方法可以比以往的行人檢測表現還好，並且擴展到人車偵測也有效高的性能。

Self-driving systems need an object detection module to sense the environment and respond accordingly. Multiple sensors can be used to have a robust detection method but fallback systems are still needed in-case non-optimal conditions affect some sensors (i.e. bad weather condition for optical cameras). Hence, we explore pedestrian detection using LiDAR only. The problem with LiDAR is that it has a sparse point cloud for distant objects. To alleviate sparsity, we project point-clusters to a 2D image, but unlike previous works, we retain some 3D information in the projection, instead of using only the general silhouette or shape of the cluster. Furthermore, we use Transformer networks to process the points in the LiDAR output, which we have found to have good performance for point clouds. We showcase that our methods can perform well even against other recent pedestrian detection methods.

摘要 . . .  . . . . . . . . . . . . . . . . . . 1
Abstract . . . . . . . . . . . . . . . . . . . 2
Acknowledgements . . . . . . . . . . . . . . . 3
Table of Contents . . . . . . . . . . . . . .  4
List of Tables . . . . . . . . . . . . . . . . 5
List of Illustrations . . . . . . . . . . . .  6
1 Introduction . . . . . . . . . . . . . . . . 8
2 Related Work . . . . . . . . . . . . . . .  11
3 Methodology . . . . . . . . . . . . . . . . 12
3.1 Region Proposal . . . . . . . . . . . . . 14
3.2 2D Image Projection . . . . . . . . . . . 15
3.3 Network Architecture . . . . . . . . . .  18
3.4 Feature Integration . . . . . . . . . . . 18
4 Result and Discussion . . . . . . . . . . . 21
4.1 Implementation Details . . . . . . . . .  21
4.2 Experimental Result . . . . . . . . . . . 21
4.3 Ablation Study . . . . . . . . . . . . .  23
4.4 Visualization result . . . . . . . . . .  23
5 Conclusion . . . . . . . . . . . . . . . .  30
References . . . . . . . . . . . . . . . . .  31
                                

[1] H.L. Tang, S.C. Chien, W.H. Cheng, Y.Y. Chen, and K.L. Hua, “Multicue pedestrian detection from 3d point cloud data,” in 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017, pp. 1279–1284.
[2] T.C. Lin, D. S. Tan, H.L. Tang, S.C. Chien, F.C. Chang, Y.Y. Chen, W.H. Cheng, and K.L. Hua, “Pedestrian detection from lidar data via cooperative deep and handcrafted features,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 1922–1926.
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2016.
[5] X. Chen, K. Henrickson, and Y. Wang, “Kinectbased pedestrian detection for crowded scenes,” ComputerAided Civil and Infrastructure Engineering, vol. 31, no. 3, pp. 229–240, 2016.
[6] S.L. Chang, F.T. Yang, W.P. Wu, Y.A. Cho, and S.W. Chen, “Nighttime pedestrian detection using thermal imaging based on hog feature,” in Proceedings 2011 International Conference on System Science and Engineering. IEEE, 2011, pp. 694–698.
[7] A. H. Pech, P. M. Nauth, and R. Michalik, “A new approach for pedestrian detection in vehicles by ultrasonic signal analysis,” in IEEE EUROCON 201918th International Conference on Smart Technologies. IEEE, 2019, pp. 1–5.
[8] W. Jun and T. Wu, “Camera and lidar fusion for pedestrian detection,” in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). IEEE, 2015, pp. 371–375.
[9] S. Lange, F. Ulbrich, and D. Goehring, “Online vehicle detection using deep neural networks and lidar based preselected image patches,” in 2016 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2016, pp. 954–959.
[10] K. O. Arras, O. M. Mozos, and W. Burgard, “Using boosted features for the detection of people in 2d range data,” in Proceedings 2007 IEEE international conference on robotics and automation. IEEE, 2007, pp. 3402–3407.
[11] Y. Freund and R. E. Schapire, “A decisiontheoretic generalization of online learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997.
[12] J. Cheng, Z. Xiang, T. Cao, and J. Liu, “Robust vehicle detection using 3d lidar under complex urban environment,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 691–696.
[13] J. SanchezRiera, K. Srinivasan, K.L. Hua, W.H. Cheng, M. A. Hossain, and M. F. Alhamid, “Robust rgbd hand tracking using deep learning priors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 9, pp. 2289–2301, 2017.
[14] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
[15] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” CoRR, vol. abs/2010.11929, 2020.
[16] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “Endtoend object detection with transformers,” in Computer Vision ECCV 2020 16th European Conference, Glasgow, UK, August 2328, 2020, Proceedings, Part I, ser. Lecture Notes in Computer Science, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., vol. 12346. Springer, 2020, pp. 213–229.
[17] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
[18] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231– 1237, 2013.
[19] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[20] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
[21] K. Kidono, T. Miyasaka, A. Watanabe, T. Naito, and J. Miura, “Pedestrian recognition using highdefinition lidar,” in 2011 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2011, pp. 405–410.
[22] J. Huang and S. You, “Point cloud labeling using 3d convolutional neural network,”in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 2670–2675.33

全文公開日期 2026/02/03 (校內網路)
全文公開日期 2031/02/03 (校外網路)
全文公開日期 2031/02/03 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文