基於自注意力池化及多尺度金字塔模塊之大規模點雲語義分割

簡易檢索 / 詳目顯示

回結果列表

研究生：	Vo Hoang Chuong Vo Hoang Chuong
論文名稱：	基於自注意力池化及多尺度金字塔模塊之大規模點雲語義分割 Semantic Segmentation of Large-Scale Point Clouds with Self-Attentive Pooling and Multi-Scale Pyramid Module
指導教授：	郭景明 Jing-Ming Guo
口試委員:	徐繼聖 Gee-Sern Hsu 王乃堅 Nai-Jian Wang 黃敬群 Ching-Chun Huang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	67
外文關鍵詞：	Large-scale point clouds, local feature aggregation, feature fusion module, multi-scale pyramid module
相關次數：	點閱：170 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

In this thesis, we address the challenge of efficiently performing large-scale 3D point cloud semantic segmentation. Many current algorithms are limited in their ability to handle large point clouds due to their reliance on expensive sampling techniques or computationally intensive pre/post-processing procedures. To address this issue, we present an improved version of RandLA-Net, which is called RandLASAMP-Net a neural architecture that is effective and efficient for per-point semantics inference on massive point clouds. We noticed that the local feature aggregation modules in RandLA-Net still rely on a basic attention mechanism, despite being computationally and memory efficient. To improve the local feature aggregation module for point cloud segmentation tasks, we propose integrating a self-attention mechanism. By allowing the model to attend to different parts of the point cloud, we aim to increase its ability to extract meaningful features from even small and unstructured sets of points. In addition, we introduce a multi-scale pyramid module adapted to the U-Net-shaped architecture for the point cloud, which helps the model leverage features from the encoding stage more effectively. These contributions aim to improve performance on the semantic segmentation task, and we evaluate their effectiveness through a series of experiments. Our goal is to create a model that can handle large-scale point clouds efficiently, while still achieving high accuracy on the semantic segmentation task.

ABSTRACT    i
ACKNOWLEDGEMENT    ii
TABLE OF CONTENTS    iv
ABBREVIATIONS AND SYMBOLS    vi
LIST OF FIGURES    vii
LIST OF TABLES    viii
CHAPTER 1:    INTRODUCTION    1
1    Background    1
2    Research Objective    4
3    Research Scope and Assumptions    4
4    Research Methodology    5
5    Research Outline    8
CHAPTER 2:    LITERATURE REVIEW    9
1    The first impact of deep learning on point cloud for 3d segmentation    9
2    Deep hierarchical feature learning on point clouds    11
3    Efficient semantic segmentation in point clouds integrating random sampling technique    13
4    Self-attention mechanism    15
5    Multi-scale Pyramid Module    17
CHAPTER 3:    METHODOLOGY    19
1    Local Feature Aggregation    19
1.1    Local Spatial Encoding    20
1.2    Self-attentive pooling    23
2    Dilated Residual Block    26
3    Multi-scale Pyramid Module (MPM) and Feature Fusion Module (FFM)    27
4    RandLASAMP-Net    30
4.1    Model input data    31
4.2    Encoding Stages    32
4.3    Decoding stages    32
4.4    Multi-scale Pyramid module (MPM) and Feature Fusion Module (FFM)    33
4.5    Final Semantic Prediction output    33
4.6    Loss function    33
CHAPTER 4:    EXPERIMENTAL RESULTS AND EVALUATIONS    35
1    Experimental Setup    35
2    Datasets    36
2.1    SemanticKITTI    36
2.2    KITTI-360    37
3    Evaluation metric    37
4    Model Results and Analysis    38
4.1    Model Results    38
4.2    Model Analysis    40
5    Ablation study    45
5.1    Substituting self-attentive pooling by simple attentive/max/mean/sum pooling    45
5.2    Ablation study on Multi-scale Pyramid Module    46
6    Result Interpretation    47
CHAPTER 5:    CONCLUSION AND FUTURE WORKS    49
1    Conclusion    49
2    Future works    50
REFERENCES    51

                                

[1] Z. Liang et al., "Stereo matching using multi-level cost volume and multi-scale feature constancy," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 300-315, 2019.
[2] Y. Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan, "Rotational projection statistics for 3D local surface description and object recognition," International journal of computer vision, vol. 105, no. 1, pp. 63-86, 2013.
[3] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-view 3d object detection network for autonomous driving," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 1907-1915.
[4] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, "Pointnet: Deep learning on point sets for 3d classification and segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652-660.
[5] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, "Pointnet++: Deep hierarchical feature learning on point sets in a metric space," Advances in neural information processing systems, vol. 30, 2017.
[6] J. Li, B. M. Chen, and G. H. Lee, "So-net: Self-organizing network for point cloud analysis," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9397-9406.
[7] H. Zhao, L. Jiang, C.-W. Fu, and J. Jia, "Pointweb: Enhancing local neighborhood features for point cloud processing," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5565-5573.
[8] Z. Zhang, B.-S. Hua, and S.-K. Yeung, "Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1607-1616.
[9] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, "Dynamic graph cnn for learning on point clouds," Acm Transactions On Graphics (tog), vol. 38, no. 5, pp. 1-12, 2019.
[10] Y. Shen, C. Feng, Y. Yang, and D. Tian, "Mining point cloud local structures by kernel correlation and graph pooling," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4548-4557.
[11] C. Wang, B. Samari, and K. Siddiqi, "Local spectral graph convolution for point set feature learning," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 52-66.
[12] L. Wang, Y. Huang, Y. Hou, S. Zhang, and J. Shan, "Graph attention convolution for point cloud semantic segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10296-10305.
[13] C. Chen, G. Li, R. Xu, T. Chen, M. Wang, and L. Lin, "Clusternet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4994-5002.
[14] H. Su et al., "Splatnet: Sparse lattice networks for point cloud processing," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2530-2539.
[15] B.-S. Hua, M.-K. Tran, and S.-K. Yeung, "Pointwise convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 984-993.
[16] W. Wu, Z. Qi, and L. Fuxin, "Pointconv: Deep convolutional networks on 3d point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9621-9630.
[17] H. Lei, N. Akhtar, and A. Mian, "Octree guided cnn with spherical kernels for 3d point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9631-9640.
[18] A. Komarichev, Z. Zhong, and J. Hua, "A-cnn: Annularly convolutional neural networks on point clouds," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7421-7430.
[19] S. Xie, S. Liu, Z. Chen, and Z. Tu, "Attentional shapecontextnet for point cloud recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4606-4615.
[20] W. Zhang and C. Xiao, "PCAN: 3D attention map learning using contextual information for point cloud based retrieval," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12436-12445.
[21] J. Yang et al., "Modeling point clouds with self-attention and gumbel subset sampling," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3323-3332.
[22] A. Paigwar, O. Erkent, C. Wolf, and C. Laugier, "Attentional pointnet for 3d-object detection in point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0-0.
[23] Q. Xu, Y. Zhou, W. Wang, C. R. Qi, and D. Anguelov, "Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15446-15456.
[24] D. Rethage, J. Wald, J. Sturm, N. Navab, and F. Tombari, "Fully-convolutional point networks for large-scale point clouds," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 596-611.
[25] S. Chen, S. Niu, T. Lan, and B. Liu, "PCT: Large-scale 3D point cloud representations via graph inception networks with applications to autonomous driving," in 2019 IEEE international conference on image processing (ICIP), 2019: IEEE, pp. 4395-4399.
[26] Q. Hu et al., "Learning Semantic Segmentation of Large-Scale Point Clouds With Random Sampling," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, pp. 8338-8354, 2021.
[27] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[28] F. Engelmann, T. Kontogianni, and B. Leibe, "Dilated point convolutions: On the receptive field size of point convolutions on 3d point clouds," in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020: IEEE, pp. 9463-9469.
[29] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[30] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[31] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," CoRR, vol. abs/1409.0473, 2014.
[32] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[33] H. Zhang et al., "Context encoding for semantic segmentation," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7151-7160.
[34] H. Wu, J. Zhang, K. Huang, K. Liang, and Y. F. Yu, "Rethinking dilated convolution in the backbone for semantic segmentation. arXiv 2019," arXiv preprint arXiv:1903.11816.
[35] J. Behley et al., "Semantickitti: A dataset for semantic scene understanding of lidar sequences," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9297-9307.
[36] Y. Liao, J. Xie, and A. Geiger, "KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[37] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[38] H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, "Kpconv: Flexible and deformable convolution for point clouds," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6411-6420.
[39] M. Everingham, S. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International journal of computer vision, vol. 111, no. 1, pp. 98-136, 2015.
[40] M. Tatarchenko, J. Park, V. Koltun, and Q.-Y. Zhou, "Tangent convolutions for dense prediction in 3d," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3887-3896.
[41] C. Xu et al., "Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation," in European Conference on Computer Vision, 2020: Springer, pp. 1-19.
[42] B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, "Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud," in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 4376-4382.
[43] B. Wu, A. Wan, X. Yue, and K. Keutzer, "Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud," in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018: IEEE, pp. 1887-1893.
[44] A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, "Rangenet++: Fast and accurate lidar semantic segmentation," in 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2019: IEEE, pp. 4213-4220.
[45] T. Cortinhal, G. Tzelepis, and E. Erdal Aksoy, "SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds," in International Symposium on Visual Computing, 2020: Springer, pp. 207-222.
[46] R. A. Rosu, P. Schütt, J. Quenzel, and S. Behnke, "Latticenet: Fast point cloud segmentation using permutohedral lattices," arXiv preprint arXiv:1912.05905, 2019.
[47] Y. Zhang et al., "Polarnet: An improved grid representation for online lidar point clouds semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9601-9610.

簡易檢索 / 詳目顯示

相關論文