簡易檢索 / 詳目顯示

研究生: Vo Hoang Chuong
Vo Hoang Chuong
論文名稱: 基於自注意力池化及多尺度金字塔模塊之大規模點雲語義分割
Semantic Segmentation of Large-Scale Point Clouds with Self-Attentive Pooling and Multi-Scale Pyramid Module
指導教授: 郭景明
Jing-Ming Guo
口試委員: 徐繼聖
Gee-Sern Hsu
王乃堅
Nai-Jian Wang
黃敬群
Ching-Chun Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 67
外文關鍵詞: Large-scale point clouds, local feature aggregation, feature fusion module, multi-scale pyramid module
相關次數: 點閱:170下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • In this thesis, we address the challenge of efficiently performing large-scale 3D point cloud semantic segmentation. Many current algorithms are limited in their ability to handle large point clouds due to their reliance on expensive sampling techniques or computationally intensive pre/post-processing procedures. To address this issue, we present an improved version of RandLA-Net, which is called RandLASAMP-Net a neural architecture that is effective and efficient for per-point semantics inference on massive point clouds. We noticed that the local feature aggregation modules in RandLA-Net still rely on a basic attention mechanism, despite being computationally and memory efficient. To improve the local feature aggregation module for point cloud segmentation tasks, we propose integrating a self-attention mechanism. By allowing the model to attend to different parts of the point cloud, we aim to increase its ability to extract meaningful features from even small and unstructured sets of points. In addition, we introduce a multi-scale pyramid module adapted to the U-Net-shaped architecture for the point cloud, which helps the model leverage features from the encoding stage more effectively. These contributions aim to improve performance on the semantic segmentation task, and we evaluate their effectiveness through a series of experiments. Our goal is to create a model that can handle large-scale point clouds efficiently, while still achieving high accuracy on the semantic segmentation task.

    ABSTRACT i ACKNOWLEDGEMENT ii TABLE OF CONTENTS iv ABBREVIATIONS AND SYMBOLS vi LIST OF FIGURES vii LIST OF TABLES viii CHAPTER 1: INTRODUCTION 1 1.1 Background 1 1.2 Research Objective 4 1.3 Research Scope and Assumptions 4 1.4 Research Methodology 5 1.5 Research Outline 8 CHAPTER 2: LITERATURE REVIEW 9 2.1 The first impact of deep learning on point cloud for 3d segmentation 9 2.2 Deep hierarchical feature learning on point clouds 11 2.3 Efficient semantic segmentation in point clouds integrating random sampling technique 13 2.4 Self-attention mechanism 15 2.5 Multi-scale Pyramid Module 17 CHAPTER 3: METHODOLOGY 19 3.1 Local Feature Aggregation 19 3.1.1 Local Spatial Encoding 20 3.1.2 Self-attentive pooling 23 3.2 Dilated Residual Block 26 3.3 Multi-scale Pyramid Module (MPM) and Feature Fusion Module (FFM) 27 3.4 RandLASAMP-Net 30 3.4.1 Model input data 31 3.4.2 Encoding Stages 32 3.4.3 Decoding stages 32 3.4.4 Multi-scale Pyramid module (MPM) and Feature Fusion Module (FFM) 33 3.4.5 Final Semantic Prediction output 33 3.4.6 Loss function 33 CHAPTER 4: EXPERIMENTAL RESULTS AND EVALUATIONS 35 4.1 Experimental Setup 35 4.2 Datasets 36 4.2.1 SemanticKITTI 36 4.2.2 KITTI-360 37 4.3 Evaluation metric 37 4.4 Model Results and Analysis 38 4.4.1 Model Results 38 4.4.2 Model Analysis 40 4.5 Ablation study 45 4.5.1 Substituting self-attentive pooling by simple attentive/max/mean/sum pooling 45 4.5.2 Ablation study on Multi-scale Pyramid Module 46 4.6 Result Interpretation 47 CHAPTER 5: CONCLUSION AND FUTURE WORKS 49 5.1 Conclusion 49 5.2 Future works 50 REFERENCES 51

    [1] Z. Liang et al., "Stereo matching using multi-level cost volume and multi-scale feature constancy," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 300-315, 2019.
    [2] Y. Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan, "Rotational projection statistics for 3D local surface description and object recognition," International journal of computer vision, vol. 105, no. 1, pp. 63-86, 2013.
    [3] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-view 3d object detection network for autonomous driving," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 1907-1915.
    [4] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, "Pointnet: Deep learning on point sets for 3d classification and segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652-660.
    [5] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, "Pointnet++: Deep hierarchical feature learning on point sets in a metric space," Advances in neural information processing systems, vol. 30, 2017.
    [6] J. Li, B. M. Chen, and G. H. Lee, "So-net: Self-organizing network for point cloud analysis," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9397-9406.
    [7] H. Zhao, L. Jiang, C.-W. Fu, and J. Jia, "Pointweb: Enhancing local neighborhood features for point cloud processing," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5565-5573.
    [8] Z. Zhang, B.-S. Hua, and S.-K. Yeung, "Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1607-1616.
    [9] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, "Dynamic graph cnn for learning on point clouds," Acm Transactions On Graphics (tog), vol. 38, no. 5, pp. 1-12, 2019.
    [10] Y. Shen, C. Feng, Y. Yang, and D. Tian, "Mining point cloud local structures by kernel correlation and graph pooling," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4548-4557.
    [11] C. Wang, B. Samari, and K. Siddiqi, "Local spectral graph convolution for point set feature learning," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 52-66.
    [12] L. Wang, Y. Huang, Y. Hou, S. Zhang, and J. Shan, "Graph attention convolution for point cloud semantic segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10296-10305.
    [13] C. Chen, G. Li, R. Xu, T. Chen, M. Wang, and L. Lin, "Clusternet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4994-5002.
    [14] H. Su et al., "Splatnet: Sparse lattice networks for point cloud processing," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2530-2539.
    [15] B.-S. Hua, M.-K. Tran, and S.-K. Yeung, "Pointwise convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 984-993.
    [16] W. Wu, Z. Qi, and L. Fuxin, "Pointconv: Deep convolutional networks on 3d point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9621-9630.
    [17] H. Lei, N. Akhtar, and A. Mian, "Octree guided cnn with spherical kernels for 3d point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9631-9640.
    [18] A. Komarichev, Z. Zhong, and J. Hua, "A-cnn: Annularly convolutional neural networks on point clouds," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7421-7430.
    [19] S. Xie, S. Liu, Z. Chen, and Z. Tu, "Attentional shapecontextnet for point cloud recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4606-4615.
    [20] W. Zhang and C. Xiao, "PCAN: 3D attention map learning using contextual information for point cloud based retrieval," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12436-12445.
    [21] J. Yang et al., "Modeling point clouds with self-attention and gumbel subset sampling," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3323-3332.
    [22] A. Paigwar, O. Erkent, C. Wolf, and C. Laugier, "Attentional pointnet for 3d-object detection in point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0-0.
    [23] Q. Xu, Y. Zhou, W. Wang, C. R. Qi, and D. Anguelov, "Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15446-15456.
    [24] D. Rethage, J. Wald, J. Sturm, N. Navab, and F. Tombari, "Fully-convolutional point networks for large-scale point clouds," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 596-611.
    [25] S. Chen, S. Niu, T. Lan, and B. Liu, "PCT: Large-scale 3D point cloud representations via graph inception networks with applications to autonomous driving," in 2019 IEEE international conference on image processing (ICIP), 2019: IEEE, pp. 4395-4399.
    [26] Q. Hu et al., "Learning Semantic Segmentation of Large-Scale Point Clouds With Random Sampling," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, pp. 8338-8354, 2021.
    [27] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [28] F. Engelmann, T. Kontogianni, and B. Leibe, "Dilated point convolutions: On the receptive field size of point convolutions on 3d point clouds," in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020: IEEE, pp. 9463-9469.
    [29] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
    [30] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
    [31] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," CoRR, vol. abs/1409.0473, 2014.
    [32] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
    [33] H. Zhang et al., "Context encoding for semantic segmentation," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7151-7160.
    [34] H. Wu, J. Zhang, K. Huang, K. Liang, and Y. F. Yu, "Rethinking dilated convolution in the backbone for semantic segmentation. arXiv 2019," arXiv preprint arXiv:1903.11816.
    [35] J. Behley et al., "Semantickitti: A dataset for semantic scene understanding of lidar sequences," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9297-9307.
    [36] Y. Liao, J. Xie, and A. Geiger, "KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
    [37] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [38] H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, "Kpconv: Flexible and deformable convolution for point clouds," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6411-6420.
    [39] M. Everingham, S. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International journal of computer vision, vol. 111, no. 1, pp. 98-136, 2015.
    [40] M. Tatarchenko, J. Park, V. Koltun, and Q.-Y. Zhou, "Tangent convolutions for dense prediction in 3d," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3887-3896.
    [41] C. Xu et al., "Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation," in European Conference on Computer Vision, 2020: Springer, pp. 1-19.
    [42] B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, "Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud," in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 4376-4382.
    [43] B. Wu, A. Wan, X. Yue, and K. Keutzer, "Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud," in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018: IEEE, pp. 1887-1893.
    [44] A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, "Rangenet++: Fast and accurate lidar semantic segmentation," in 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2019: IEEE, pp. 4213-4220.
    [45] T. Cortinhal, G. Tzelepis, and E. Erdal Aksoy, "SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds," in International Symposium on Visual Computing, 2020: Springer, pp. 207-222.
    [46] R. A. Rosu, P. Schütt, J. Quenzel, and S. Behnke, "Latticenet: Fast point cloud segmentation using permutohedral lattices," arXiv preprint arXiv:1912.05905, 2019.
    [47] Y. Zhang et al., "Polarnet: An improved grid representation for online lidar point clouds semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9601-9610.

    QR CODE