簡易檢索 / 詳目顯示

研究生: 王淯銓
YU-QUAN WANG
論文名稱: 基於視覺語意與光達的三維物件偵測技術
Multi-modality 3D Object Detection Based on Semantic Feature Embedding With Possibility Guidance
指導教授: 陳永耀
Yung-Yao Chen
口試委員: 呂政修
Jenq-Shiou Leu
林郁修
Yu-Hsiu Lin
阮聖彰
Shanq-Jang Ruan
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 59
中文關鍵詞: 點雲光達三維物件偵測演算法多模態感知演算法
外文關鍵詞: Point cloud, Multi-modality object detection, LiDAR, 3D object detection
相關次數: 點閱:361下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在過去幾年內許多大型的 3D 物件偵測公開資料集的釋出,使得物件感知技術在業界或是學界都有著重大的進展。然而,點雲的稀疏性是不可避免的問題,加上 3D 物件偵測演算法時常需要下採樣的操作,使原本點就稀疏的前景物件結構進一步遭受破壞,導致模型訓練時無法有效提取前景特徵。此外,由於某些背景物體與目標物存在相似形狀容易造成純光達感知算法經常檢測偽陽物件。為了解決上述提到的問題,我們提出一種多模態特徵編碼器作為我們的訓練骨架,內部包含語意強化的點雲協助分類幾何特徵相似之物件;可能性引導之最遠距離採樣法讓模型在層與層間的下採樣,在保有點集多樣性的條件下,盡可能地採樣前景點,使模型在訓練過程有更多前景資訊參與,強化前景與背景的分辨能力。除此之外,我們在特徵特徵精化網路使用虛擬點採樣大幅提升影響特徵的利用度,再次將模型融入語意特徵進行特徵優化,緩解小及遠方物件特徵不足問題。由實驗結果可觀察出,我們的模型在KITTI 的驗證資料集得到不錯的偵測性能,所有任務的平均 mAP 為77.23%,尤其在腳踏車人類別,相較於最先進的 3D 物件偵測技術,所提出方法能得到最佳的偵測的結果。


    In recent years, the release of many large-scale 3D public datasets has
    made significant progress in 3D perception technology in the industry and
    academia. However, sparsity, one of the properties of point cloud, is an
    inevitable issue for perception. Especially, downsampling operation
    frequently is used in 3D module, which further damage quality of aggregated
    features because of sampling few foreground points, resulting in ineffective
    model training. Besides, some background objects are in similar shape to
    target objects, it causes that LiDAR-based method detects false positive
    constantly. In order to address the above-mentioned problems, we proposed
    a multi-modality feature encoder as our backbone, which contains
    semantically augmented points that assist model with distinguishing similar
    shape objects, and possibility-guided farthest point to sample the foreground
    points as much as possible under the condition of maintaining the diversity
    of the point set. Moreover, we proposed virtual point sample to greatly
    increase utilization of image features that can alleviate scarcity of feature for
    small and distant objects. Our model is evaluated on KITTI validation set
    with average mAP 77.23% which is comparable to the SOTA method,
    especially in category of cyclist that outperform other methods.

    目錄 指導教授推薦書 I 考試委員審定書 I 致謝 II 摘要 I Abstract II 目錄 III 圖目錄 VI 表目錄 VII 第一章緒論 1 1.1前言 1 1.2研究動機 4 1.3論文貢獻 7 第二章相關文獻 9 2.1基於光學影像的3D物件偵測 9 2.2 基於光達的3D物件偵測 9 2.3基於雙模態的3D物件偵測 10 第三章方法 13 模型架構概述(Framework Overview) 13 3.1 多模態特徵編碼器(Muti-modality Features Encoder, MFE) 14 3.1.1點對點特徵強化(Point-wised Augmentation, PA) 14 3.1.2 基於可能性引導的最遠距離採樣(Possibility-guided Farthest Point Sample, P-FPS) 15 3.2 虛擬點採樣(Virtual Point Sample, VPS) 17 3.2.1虛擬點的生成過程 19 3.2.2 點特徵與影響特徵融合 20 3.3 模型之損失函式 21 第四章實驗 24 4.1 實驗資料集與評量矩陣 24 4.1.1 2D預訓練資料集 24 4.1.2 聯合訓練(Joint Training)資料集 24 4.2 實驗細節 25 4.2.1 網路架構 25 4.2.2 訓練與推導 25 4.3 性能驗證 27 4.4 消融實驗(缺少平衡係數表) 29 4.4.1 驗證模型中的每個構件 30 4.4.2 P-FPS參數調整與其他變體之比較 31 4.4.3 虛擬點採樣分析 32 4.4.4 根據不同距離驗證方法性能並與SOTA進行比較 33 4.5 可視化結果分析 35 第五章 結論與未來展望 39 6.1 結論 39 6.2 未來展望 39 參考文獻 40 圖目錄 圖1、純光達模型容易將形狀相似物件搞混示意圖 2 圖2、各種融合法流程示意圖 4 圖3、 Set Abstraction模塊運作流程 5 圖4、不同點雲數量可視化比對圖 6 圖5、先前點對點融合方法流程圖 7 圖6、原生點雲投影至影響位置上示意圖 7 圖7、模型簡易架構圖 13 圖8 、RPN中虛擬點與原生點雲投影可視化 18 圖9、提案框中虛擬點生成流程鳥瞰圖示意圖 20 圖10、40個召回點對應精確度的示意圖 29 圖11、不同的 對於性能的影響 32 圖12、比較不同FPS採樣策略 32 圖13、偵測結果比對圖(一) 36 圖14、偵測結果比對圖(二) 37 圖15、偵測結果比對圖(三) 38 表目錄 表3-1 多模態編碼器演算法細節 16 表4-1我們的方法在KITTI驗證資料集中與其他SOTA性能的比較 28 表4-2驗證每個構件對於模型的結果貢獻 31 表4-3虛擬點採樣有效性驗證 33 表4-4 Virtual Point Sample不同距離下的性能驗證 34

    [1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” Computer Vision and Pattern Recognition., 2012, pp. 3354–3361.
    [2] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: waymo open dataset,” Computer Vision and Pattern Recognition., 2016, pp. 2443–2451.
    [3] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom. Nuscenes, “A multimodal dataset for autonomous driving,” Computer Vision and Pattern Recognition., 2020, pp. 11621–11631.
    [4] Z. Qin, J. Wang, and Y. Lu, “Monogrnet: A general framework for monocular 3d object detection,” Association for the Advancement of Artificial Intelligence., 2021, pp. 8851-8858.
    [5] G. Brazil and X. Liu, “M3d-rpn: Monocular 3d region proposal network for object detection,” International Conference on Computer Vision., 2019, pp. 9287–9296.
    [6] Y. Chen, L. Tai, K. Sun, and M. Li, “Monopair: Monocular 3d object detection using pairwise spatial relationships,” Computer Vision and Pattern Recognition., 2020, pp. 12093-12102.
    [7] D. Zhou, J. Fang, X. Song, L. Liu, J. Yin, Y. Dai, H. Li, and R. Yang, “Joint 3d instance segmentation and object detection for autonomous driving,” Computer Vision and Pattern Recognition., 2020, pp. 1839–1849.
    [8] J. Li, H. Dai, L. Shao, and Y. Ding, “Anchor-free 3d single stage detector with mask-guided attention for point cloud,” ACM International Conference on Multimedia., 2021, pp. 553-562.
    [9] W. Zheng, W. Tang, S. Chen, L. Jiang, and C.-W. Fu, “Cia-ssd: Confident iou-aware single-stage object detector from point cloud,” Association for the Advancement of Artificial Intelligence., 2021, pp. 3555-3562.
    [10] S. Pang, D. Morris, and H. Radha, “Clocs: Camera-lidar object candidates fusion for 3d object detection,” Intelligent Robots and Systems., 2020, pp. 10386–10393.
    [11] J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel r-cnn: towards high performance voxel-based 3d object detection,” Association for the Advancement of Artificial Intelligence., 2021, pp. 1201–1209.
    [12] D. Xu, D.Anguelov, and A. Jain, “Pointfusion: Deep sensor fusion for 3d bounding box estimation,” Computer Vision and Pattern Recognition., 2018, pp. 244–253.
    [13] M. Liang, B. Yang, S. Wang, and R. Urtasun, “Deep continuous fusion for multi-sensor 3d object detection,” European Conference on Computer Vision., 2018, pp. 641–656.
    [14] X. Zhao, Z. Liu, R. Hu, and K. Huang, “3d object detection using scale invariant and feature reweighting networks,” Association for the Advancement of Artificial Intelligence., 2019, pp. 9267–9274.
    [15] T. SCHLOMER ¨, D. HECK, AND O. DEUSSEN, “Farthest point optimized point sets with maximized minimum distance,” ACM SIGGRAPH Symposium on High Performance Graphics., 2011, pp. 135–142.
    [16] C. QI, R., “Frustum pointnets for 3d object detection from rgb-d data,” Computer Vision and Pattern Recognition., 2018, pp. 918-927.
    [17] Z. Wang and K. Jia, “Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection,” Intelligent Robots and Systems., 2019, pp. 1742–1749.
    [18] ] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” Computer Vision and Pattern Recognition., 2017, pp. 1907–1915.
    [19] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, “Joint 3d proposal generation and object detection from view aggregation,” Intelligent Robots and Systems., 2018, pp. 1-8.
    [20] G. Wang, B. Tian, Y. Zhang, L. Chen, D. Cao, and J. Wu, “Multiview adaptive fusion network for 3d object detection,” arXiv preprint., 2020.
    [21] J. H. Yoo, Y. Kim, J. Kim, and J. W. Choi, “3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection,” European Conference on Computer Vision., 2020, pp. 720–736.
    [22] T. Huang, Z. Liu, X. Chen, and X. Bai, “Epnet: Enhancing point features with image semantics for 3d object detection,” European Conference on Computer Vision., 2020, pp. 35–52.
    [23] A. Som, H. Choi, K. N. Ramamurthy, M. P. Buman, and P. Turaga, “Pi-net: A deep learning approach to extract topological persistence images,” Computer Vision and Pattern Recognition Workshops., 2020, pp. 834-835.
    [24] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Neural Information Processing Systems., 2017, pp. 5099–5108.
    [25] S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” Computer Vision and Pattern Recognition., 2019, pp. 770–779.
    [26] P. Li, H. Zhao, P. Liu, and F. Cao, “Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving,” European Conference on Computer Vision., 2020, pp. 644– 660.
    [27] X. Ma, S. Liu, Z. Xia, H. Zhang, X. Zeng, and W. Ouyang, “Rethinking pseudo-lidar representation,” European Conference on Computer Vision., 2020, pp. 311–327.
    [28] Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, J. Yan, and W. Ouyang, “Geometry uncertainty projection network for monocular 3d object detection,” International Conference on Computer Vision., 2021, pp. 3111–3121.
    [29] Y. Wang et al, “Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving,” Computer Vision and Pattern Recognition., 2019, pp. 8445-8453.
    [30] R. Qian et al, “End-to-end pseudo-lidar for image-based 3d object detection,” Computer Vision and Pattern Recognition., 2020, pp. 5881-5890.
    [31] G. Brazil, G. Pons-Moll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” European Conference on Computer Vision., 2020, pp. 135-152.
    [32] X. Chen, K. Kundu, Z. Zhang, H. Ma1, S. Fidler, and R. Urtasun, “Monocular 3d object detection for autonomous driving,” Computer Vision and Pattern Recognition., 2016, pp. 2147-2156.
    [33] C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” Computer Vision and Pattern Recognition., 2021, pp. 8555–8564.
    [34] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” Computer Vision and Pattern Recognition., 2018, pp. 4490-4499.
    [35] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, pp. 3337.
    [36] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” Computer Vision and Pattern Recognition., 2019, pp. 12697-12705.
    [37] Z. Yang, Y. Sun, S. Liu, and J. Jia, “3dssd: Point-based 3d single stage object detector,” Computer Vision and Pattern Recognition., 2020, pp. 11040–11048.
    [38] W. Shi and R. R. Rajkumar, “Point-gnn: Graph neural network for 3d object detection in a point cloud,” Computer Vision and Pattern Recognition., 2020, pp. 1711-1719.
    [39] S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Pointpainting: Sequential fusion for 3d object detection,” Computer Vision and Pattern Recognition., 2020, pp. 4604–4612.
    [40] L. Xie, C. Xiang, Z. Yu, G. Xu, Z. Yang, D. Cai, and X. He, “Pi-rcnn: An efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module,”Association for the Advancement of Artificial Intelligence., 2020, pp. 12460–12467.
    [41] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” European Conference on Computer Vision., 2018, pp. 801-818.
    [42] C. Chen, Z. Chen, J. Zhang, and D. Tao, “ Sasa: Semantics-augmented set abstraction for point-based 3d object detection,” Association for the Advancement of Artificial Intelligence., 2022.
    [43] G. Neuhold, T. Ollmann, S. Rota Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes” International Conference on Computer Vision., 2017, pp. 4990-4999.
    [44] Y. Zhang, et al, “Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds,” Computer Vision and Pattern Recognition., 2022, 18953-18962.
    [45] M. Cordts, M. Omran, S. Ramos, T. Scharwachter, M. En- ¨ zweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” Computer Vision and Pattern Recognition Workshop on The Future of Datasets in Vision, 2015.
    [46] C. Wang, C. Ma, M. Zhu, and X. Yang, “Pointaugmenting: Cross-modal augmentation for 3d object detection,” Computer Vision and Pattern Recognition., 2021, pp. 11794–11803.
    [47] C. He, H. Zeng, J. Huang, X.-S. Hua, and L. Zhang, “Structure aware single-stage 3d object detection from point cloud,” Computer Vision and Pattern Recognition., 2020, pp. 11873-11882.
    [48] Z. Liu, X. Zhao, T. Huang, R. Hu, Y. Zhou, and X. Bai, “Tanet: Robust 3d object detection from point clouds with triple attention,” Association for the Advancement of Artificial Intelligence., 2020, pp. 11677–11684.
    [49] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” Computer Vision and Pattern Recognition., 2020. pp. 10529-10538.
    [50] S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li, “From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network,” IEEE transactions on pattern analysis and machine intelligence., pp. 2647-2664, 2020.
    [51] H. Sheng et al. “Improving 3d object detection with channel-wise transformer,” Computer Vision and Pattern Recognition., 2021, pp. 2743-2752.
    [52] Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, “Std: Sparse-to-dense 3d object detector for point cloud,” Computer Vision and Pattern Recognition., 2019, pp. 1951–1960.
    [53] C. Lin, D. Tian, X. Duan, J. Zhou, D. Zhao and D. Cao, “CL3D: camera-LiDAR 3d object detection with point feature enhancement and point-guided fusion,” Intelligent Transportation Systems., pp. 1-11, 2022.
    [54] Y. Zhang, J. Chen, and D. Huang, “CAT-Det: Contrastively Augmented Transformer for Multi-modal 3d Object Detection,” Computer Vision and Pattern Recognition., 2022, pp. 908-917.
    [55] Wang, H., Chen, Z., Cai, Y., Chen, L., Li, Y., Sotelo, M. A., and Li, Z., “Voxel-rcnn-complex: An effective 3-D point cloud object detector for complex traffic conditions, ” Instrumentation and Measurement., vol. 71, pp. 1-12, 2022.

    無法下載圖示 全文公開日期 2027/08/09 (校內網路)
    全文公開日期 2027/08/09 (校外網路)
    全文公開日期 2027/08/09 (國家圖書館:臺灣博碩士論文系統)
    QR CODE