研究生: |
王淯銓 YU-QUAN WANG |
---|---|
論文名稱: |
基於視覺語意與光達的三維物件偵測技術 Multi-modality 3D Object Detection Based on Semantic Feature Embedding With Possibility Guidance |
指導教授: |
陳永耀
Yung-Yao Chen |
口試委員: |
呂政修
Jenq-Shiou Leu 林郁修 Yu-Hsiu Lin 阮聖彰 Shanq-Jang Ruan |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 點雲 、光達 、三維物件偵測演算法 、多模態感知演算法 |
外文關鍵詞: | Point cloud, Multi-modality object detection, LiDAR, 3D object detection |
相關次數: | 點閱:361 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在過去幾年內許多大型的 3D 物件偵測公開資料集的釋出,使得物件感知技術在業界或是學界都有著重大的進展。然而,點雲的稀疏性是不可避免的問題,加上 3D 物件偵測演算法時常需要下採樣的操作,使原本點就稀疏的前景物件結構進一步遭受破壞,導致模型訓練時無法有效提取前景特徵。此外,由於某些背景物體與目標物存在相似形狀容易造成純光達感知算法經常檢測偽陽物件。為了解決上述提到的問題,我們提出一種多模態特徵編碼器作為我們的訓練骨架,內部包含語意強化的點雲協助分類幾何特徵相似之物件;可能性引導之最遠距離採樣法讓模型在層與層間的下採樣,在保有點集多樣性的條件下,盡可能地採樣前景點,使模型在訓練過程有更多前景資訊參與,強化前景與背景的分辨能力。除此之外,我們在特徵特徵精化網路使用虛擬點採樣大幅提升影響特徵的利用度,再次將模型融入語意特徵進行特徵優化,緩解小及遠方物件特徵不足問題。由實驗結果可觀察出,我們的模型在KITTI 的驗證資料集得到不錯的偵測性能,所有任務的平均 mAP 為77.23%,尤其在腳踏車人類別,相較於最先進的 3D 物件偵測技術,所提出方法能得到最佳的偵測的結果。
In recent years, the release of many large-scale 3D public datasets has
made significant progress in 3D perception technology in the industry and
academia. However, sparsity, one of the properties of point cloud, is an
inevitable issue for perception. Especially, downsampling operation
frequently is used in 3D module, which further damage quality of aggregated
features because of sampling few foreground points, resulting in ineffective
model training. Besides, some background objects are in similar shape to
target objects, it causes that LiDAR-based method detects false positive
constantly. In order to address the above-mentioned problems, we proposed
a multi-modality feature encoder as our backbone, which contains
semantically augmented points that assist model with distinguishing similar
shape objects, and possibility-guided farthest point to sample the foreground
points as much as possible under the condition of maintaining the diversity
of the point set. Moreover, we proposed virtual point sample to greatly
increase utilization of image features that can alleviate scarcity of feature for
small and distant objects. Our model is evaluated on KITTI validation set
with average mAP 77.23% which is comparable to the SOTA method,
especially in category of cyclist that outperform other methods.
[1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” Computer Vision and Pattern Recognition., 2012, pp. 3354–3361.
[2] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: waymo open dataset,” Computer Vision and Pattern Recognition., 2016, pp. 2443–2451.
[3] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom. Nuscenes, “A multimodal dataset for autonomous driving,” Computer Vision and Pattern Recognition., 2020, pp. 11621–11631.
[4] Z. Qin, J. Wang, and Y. Lu, “Monogrnet: A general framework for monocular 3d object detection,” Association for the Advancement of Artificial Intelligence., 2021, pp. 8851-8858.
[5] G. Brazil and X. Liu, “M3d-rpn: Monocular 3d region proposal network for object detection,” International Conference on Computer Vision., 2019, pp. 9287–9296.
[6] Y. Chen, L. Tai, K. Sun, and M. Li, “Monopair: Monocular 3d object detection using pairwise spatial relationships,” Computer Vision and Pattern Recognition., 2020, pp. 12093-12102.
[7] D. Zhou, J. Fang, X. Song, L. Liu, J. Yin, Y. Dai, H. Li, and R. Yang, “Joint 3d instance segmentation and object detection for autonomous driving,” Computer Vision and Pattern Recognition., 2020, pp. 1839–1849.
[8] J. Li, H. Dai, L. Shao, and Y. Ding, “Anchor-free 3d single stage detector with mask-guided attention for point cloud,” ACM International Conference on Multimedia., 2021, pp. 553-562.
[9] W. Zheng, W. Tang, S. Chen, L. Jiang, and C.-W. Fu, “Cia-ssd: Confident iou-aware single-stage object detector from point cloud,” Association for the Advancement of Artificial Intelligence., 2021, pp. 3555-3562.
[10] S. Pang, D. Morris, and H. Radha, “Clocs: Camera-lidar object candidates fusion for 3d object detection,” Intelligent Robots and Systems., 2020, pp. 10386–10393.
[11] J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel r-cnn: towards high performance voxel-based 3d object detection,” Association for the Advancement of Artificial Intelligence., 2021, pp. 1201–1209.
[12] D. Xu, D.Anguelov, and A. Jain, “Pointfusion: Deep sensor fusion for 3d bounding box estimation,” Computer Vision and Pattern Recognition., 2018, pp. 244–253.
[13] M. Liang, B. Yang, S. Wang, and R. Urtasun, “Deep continuous fusion for multi-sensor 3d object detection,” European Conference on Computer Vision., 2018, pp. 641–656.
[14] X. Zhao, Z. Liu, R. Hu, and K. Huang, “3d object detection using scale invariant and feature reweighting networks,” Association for the Advancement of Artificial Intelligence., 2019, pp. 9267–9274.
[15] T. SCHLOMER ¨, D. HECK, AND O. DEUSSEN, “Farthest point optimized point sets with maximized minimum distance,” ACM SIGGRAPH Symposium on High Performance Graphics., 2011, pp. 135–142.
[16] C. QI, R., “Frustum pointnets for 3d object detection from rgb-d data,” Computer Vision and Pattern Recognition., 2018, pp. 918-927.
[17] Z. Wang and K. Jia, “Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection,” Intelligent Robots and Systems., 2019, pp. 1742–1749.
[18] ] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” Computer Vision and Pattern Recognition., 2017, pp. 1907–1915.
[19] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, “Joint 3d proposal generation and object detection from view aggregation,” Intelligent Robots and Systems., 2018, pp. 1-8.
[20] G. Wang, B. Tian, Y. Zhang, L. Chen, D. Cao, and J. Wu, “Multiview adaptive fusion network for 3d object detection,” arXiv preprint., 2020.
[21] J. H. Yoo, Y. Kim, J. Kim, and J. W. Choi, “3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection,” European Conference on Computer Vision., 2020, pp. 720–736.
[22] T. Huang, Z. Liu, X. Chen, and X. Bai, “Epnet: Enhancing point features with image semantics for 3d object detection,” European Conference on Computer Vision., 2020, pp. 35–52.
[23] A. Som, H. Choi, K. N. Ramamurthy, M. P. Buman, and P. Turaga, “Pi-net: A deep learning approach to extract topological persistence images,” Computer Vision and Pattern Recognition Workshops., 2020, pp. 834-835.
[24] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Neural Information Processing Systems., 2017, pp. 5099–5108.
[25] S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” Computer Vision and Pattern Recognition., 2019, pp. 770–779.
[26] P. Li, H. Zhao, P. Liu, and F. Cao, “Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving,” European Conference on Computer Vision., 2020, pp. 644– 660.
[27] X. Ma, S. Liu, Z. Xia, H. Zhang, X. Zeng, and W. Ouyang, “Rethinking pseudo-lidar representation,” European Conference on Computer Vision., 2020, pp. 311–327.
[28] Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, J. Yan, and W. Ouyang, “Geometry uncertainty projection network for monocular 3d object detection,” International Conference on Computer Vision., 2021, pp. 3111–3121.
[29] Y. Wang et al, “Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving,” Computer Vision and Pattern Recognition., 2019, pp. 8445-8453.
[30] R. Qian et al, “End-to-end pseudo-lidar for image-based 3d object detection,” Computer Vision and Pattern Recognition., 2020, pp. 5881-5890.
[31] G. Brazil, G. Pons-Moll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” European Conference on Computer Vision., 2020, pp. 135-152.
[32] X. Chen, K. Kundu, Z. Zhang, H. Ma1, S. Fidler, and R. Urtasun, “Monocular 3d object detection for autonomous driving,” Computer Vision and Pattern Recognition., 2016, pp. 2147-2156.
[33] C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” Computer Vision and Pattern Recognition., 2021, pp. 8555–8564.
[34] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” Computer Vision and Pattern Recognition., 2018, pp. 4490-4499.
[35] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, pp. 3337.
[36] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” Computer Vision and Pattern Recognition., 2019, pp. 12697-12705.
[37] Z. Yang, Y. Sun, S. Liu, and J. Jia, “3dssd: Point-based 3d single stage object detector,” Computer Vision and Pattern Recognition., 2020, pp. 11040–11048.
[38] W. Shi and R. R. Rajkumar, “Point-gnn: Graph neural network for 3d object detection in a point cloud,” Computer Vision and Pattern Recognition., 2020, pp. 1711-1719.
[39] S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Pointpainting: Sequential fusion for 3d object detection,” Computer Vision and Pattern Recognition., 2020, pp. 4604–4612.
[40] L. Xie, C. Xiang, Z. Yu, G. Xu, Z. Yang, D. Cai, and X. He, “Pi-rcnn: An efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module,”Association for the Advancement of Artificial Intelligence., 2020, pp. 12460–12467.
[41] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” European Conference on Computer Vision., 2018, pp. 801-818.
[42] C. Chen, Z. Chen, J. Zhang, and D. Tao, “ Sasa: Semantics-augmented set abstraction for point-based 3d object detection,” Association for the Advancement of Artificial Intelligence., 2022.
[43] G. Neuhold, T. Ollmann, S. Rota Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes” International Conference on Computer Vision., 2017, pp. 4990-4999.
[44] Y. Zhang, et al, “Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds,” Computer Vision and Pattern Recognition., 2022, 18953-18962.
[45] M. Cordts, M. Omran, S. Ramos, T. Scharwachter, M. En- ¨ zweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” Computer Vision and Pattern Recognition Workshop on The Future of Datasets in Vision, 2015.
[46] C. Wang, C. Ma, M. Zhu, and X. Yang, “Pointaugmenting: Cross-modal augmentation for 3d object detection,” Computer Vision and Pattern Recognition., 2021, pp. 11794–11803.
[47] C. He, H. Zeng, J. Huang, X.-S. Hua, and L. Zhang, “Structure aware single-stage 3d object detection from point cloud,” Computer Vision and Pattern Recognition., 2020, pp. 11873-11882.
[48] Z. Liu, X. Zhao, T. Huang, R. Hu, Y. Zhou, and X. Bai, “Tanet: Robust 3d object detection from point clouds with triple attention,” Association for the Advancement of Artificial Intelligence., 2020, pp. 11677–11684.
[49] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” Computer Vision and Pattern Recognition., 2020. pp. 10529-10538.
[50] S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li, “From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network,” IEEE transactions on pattern analysis and machine intelligence., pp. 2647-2664, 2020.
[51] H. Sheng et al. “Improving 3d object detection with channel-wise transformer,” Computer Vision and Pattern Recognition., 2021, pp. 2743-2752.
[52] Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, “Std: Sparse-to-dense 3d object detector for point cloud,” Computer Vision and Pattern Recognition., 2019, pp. 1951–1960.
[53] C. Lin, D. Tian, X. Duan, J. Zhou, D. Zhao and D. Cao, “CL3D: camera-LiDAR 3d object detection with point feature enhancement and point-guided fusion,” Intelligent Transportation Systems., pp. 1-11, 2022.
[54] Y. Zhang, J. Chen, and D. Huang, “CAT-Det: Contrastively Augmented Transformer for Multi-modal 3d Object Detection,” Computer Vision and Pattern Recognition., 2022, pp. 908-917.
[55] Wang, H., Chen, Z., Cai, Y., Chen, L., Li, Y., Sotelo, M. A., and Li, Z., “Voxel-rcnn-complex: An effective 3-D point cloud object detector for complex traffic conditions, ” Instrumentation and Measurement., vol. 71, pp. 1-12, 2022.