一種應用於子流形稀疏卷積神經網路之針對流通量優化的硬體加速器

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴右軒 Yu-Hsuan Lai
論文名稱：	一種應用於子流形稀疏卷積神經網路之針對流通量優化的硬體加速器 A Throughput-Optimized Accelerator for Submanifold Sparse Convolutional Networks
指導教授：	阮聖彰 Shanq-Jang Ruan
口試委員:	阮聖彰 Shanq-Jang Ruan 張延任 Yen-Jen Chang 林銘波 Ming-Bo Lin 李佩君 Pei-Jun Lee
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	70
中文關鍵詞：	點雲、子流形稀疏卷積神經網路、脈動陣列、硬體加速器、現場可程式化邏輯閘陣列
外文關鍵詞：	Point clouds, Submanifold sparse convolutional Networks, Systolic array, Hardware accelerators, Field Programmable Gate Array
相關次數：	點閱：319 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

三維點雲 (3D Point Cloud) 提供了精確的空間與深度資訊，使其在基於深度學習的視覺任務中發揮關鍵的作用，因此點雲資料對於其深度學習 (Deep Learning) 的應用變得越來越重要。然而，三維點雲的稀疏性質帶來了處理與運算上的挑戰。許多研究已經探索了子流形稀疏卷積神經網路（Submanifold Sparse Convolutional Networks）來處理點雲資料並保留其具備的稀疏性。然而，現有的卷積神經網路（Convolutional Neural Network）加速器並不能有效地被用來加速子流形稀疏卷積神經網路，因此，近年來有許多研究開始致力於開發專門針對點雲網絡的加速器，以改善處理點雲性能。本論文介紹了一種應用於子流形稀疏卷積神經網之針對流通量優化的硬體加速器，使其可以有效運算稀疏的三維點雲資料。所提出的加速器與之前的研究相比，在流通量密度 (Throughput Density) 上實現了2.51倍的改善，突顯其提出的加速器在點雲處理中的有效性。

The 3D point cloud plays a crucial role in deep learning-based vision tasks by providing precise spatial and depth information, leading to its increasing importance in various applications. However, the sparse nature of 3D point clouds poses computational challenges. Researches have explored the Submanifold Sparse Convolutional Network (SSCN) for processing point cloud data while preserving sparsity. Nevertheless, existing Convolutional Neural Network (CNN) accelerators encounter difficulties in effectively handling SSCNs, prompting recent studies to focus on developing dedicated accelerators for point cloud networks to improve processing performance. This thesis presents a specialized hardware architecture designed for SSCNs to address the challenges of effectively processing sparse 3D point clouds. The proposed accelerator achieves a significant 2.51× improvement in throughput density compared to previous works, highlighting its effectiveness in point cloud processing.

RECOMMENDATION FORM    I
COMMITTEE FORM    II
摘要    III
ABSTRACT    IV
ACKNOWLEDGEMENTS    V
TABLE OF CONTENTS    IX
LIST OF FIGURES    XII
LIST OF TABLES    XV
CHAPTER 1    1
INTRODUCTION    1
1.1    Advances and Challenges in 3D Point Cloud Processing for Deep Learning Applications    1
1.2    Challenges in Specialized Accelerators for Point Cloud Networks and Submanifold Sparse Convolutional Networks    4
1.3    Contribution of This Thesis    6
1.4    Organization of This Thesis    7
CHAPTER 2    8
BACKGROUND    8
2.1    Point Cloud    9
2.2    Voxelization    12
2.3    Convolutional Neural Networks    15
2.4    Submanifold Sparse Convolutional Networks    18
CHAPTER 3    22
RELATED WORKS    22
3.1    Systolic Array    23
3.2    Hardware Accelerators for Neural Networks    25
CHAPTER 4    28
ARCHITECTURE    28
4.1    Architectural Overview    29
4.2    Rule Table Generation Unit    31
4.3    Gather Unit and Scatter Unit    33
4.4    Computation Unit    34
CHAPTER 5    37
EVALUATION    37
5.1    Evaluation Setup    38
5.2    Analysis of Rule Table Generation    39
5.3    Benchmark Results    41
5.4    Comparison With Previous Works    44
CHAPTER 6    46
CONCLUSION    46
REFERENCES    48

                                

[1] H. Tang, Z. Liu, X. Li, Y. Lin, and S. Han, “Torchsparse: Efficient point cloud inference engine,” Proceedings of Machine Learning and Systems, vol. 4, pp. 302–315, 2022.
[2] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361.
[3] Y. Chen, Q. Wang, H. Chen, X. Song, H. Tang, and M. Tian, “An overview of augmented reality technology,” in Journal of Physics: Conference Series, vol. 1237, no. 2. IOP Publishing, 2019, p. 022082.
[4] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen, M. Krivokuca, S. Lasserre, Z. Li´ et al., “Emerging mpeg standards for point cloud compression,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 1, pp. 133–148, 2018.
[5] Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, “Std: Sparse-to-dense 3d object detector for point cloud,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1951–1960.
[6] B. Yang, W. Luo, and R. Urtasun, “Pixor: Real-time 3d object detection from point clouds,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7652–7660.
[7] C. Xu, B. Wu, Z. Wang, W. Zhan, P. Vajda, K. Keutzer, and M. Tomizuka, “Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16. Springer, 2020, pp. 1–19.
[8] Q. Liu, H. Yuan, H. Su, H. Liu, Y. Wang, H. Yang, and J. Hou, “Pqa-net: Deep no reference point cloud quality assessment via multiview projection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 12, pp. 4645–4660, 2021.
[9] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
[10] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in neural information processing systems, vol. 30, 2017.
[11] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4490–4499.
[12] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
[13] B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232.
[14] B. Graham and L. Van der Maaten, “Submanifold sparse convolutional networks,” arXiv preprint arXiv:1706.01307, 2017.
[15] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2016.
[16] C. Zhu, K. Huang, S. Yang, Z. Zhu, H. Zhang, and H. Shen, “An efficient hardware accelerator for structured sparse convolutional neural networks on fpgas,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 9, pp. 1953–1965, 2020.
[17] Y.-X. Chen and S.-J. Ruan, “A throughput-optimized channel-oriented processing element array for convolutional neural networks,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 2, pp. 752–756, 2020.
[18] Y. Lin, Z. Zhang, H. Tang, H. Wang, and S. Han, “Pointacc: Efficient point cloud accelerator,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 449–461.
[19] Y. Feng, B. Tian, T. Xu, P. Whatmough, and Y. Zhu, “Mesorasi: Architecture support for point cloud analytics via delayed-aggregation,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2020, pp. 1037–1050.
[20] Z. Wang, W. Mao, P. Yang, Z. Wang, and J. Lin, “An efficient fpga accelerator for point cloud,” in 2022 IEEE 35th International Systemon-Chip Conference (SOCC). IEEE, 2022, pp. 1–6.
[21] F. Leberl, A. Irschara, T. Pock, P. Meixner, M. Gruber, S. Scholz, and A. Wiechert, “Point clouds,” Photogrammetric Engineering & Remote Sensing, vol. 76, no. 10, pp. 1123–1134, 2010.
[22] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
[23] C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9277–9286.
[24] N. Sedaghat, M. Zolfaghari, E. Amiri, and T. Brox, “Orientation-boosted voxel nets for 3d object recognition,” arXiv preprint arXiv:1604.03351, 2016.
[25] Y. Xiang, W. Choi, Y. Lin, and S. Savarese, “Data-driven 3d voxel patterns for object category recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1903–1911.
[26] J. Papon, A. Abramov, M. Schoeler, and F. Worgotter, “Voxel cloud connectivity segmentation-supervoxels for point clouds,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2027–2034.
[27] L. Bergouignan, M. Chupin, Y. Czechowska, S. Kinkingnehun, C. Lemogne, G. Le Bastard, M. Lepage, L. Garnero, O. Colliot, and P. Fossati, “Can voxel based morphometry, manual segmentation and automated segmentation equally detect hippocampal volume differences in acute depression?” Neuroimage, vol. 45, no. 1, pp. 29–37, 2009.
[28] G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3d voxel reconstruction of human motions,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), vol. 2. IEEE, 2000, pp. 714–720.
[29] L. Truong-Hong, D. F. Laefer, T. Hinks, and H. Carr, “Combining an angle criterion with voxelization and the flying voxel method in reconstructing building models from lidar data,” Computer-Aided Civil and Infrastructure Engineering, vol. 28, no. 2, pp. 112–129, 2013.
[30] R. Xu, S. Ma, Y. Wang, Y. Guo, D. Li, and Y. Qiao, “Heterogeneous systolic array architecture for compact cnns hardware accelerators,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 11, pp. 2860–2871, 2021.
[31] X. Liu, C. Yuan, and F. Zhang, “Targetless extrinsic calibration of multiple small fov lidars and cameras using adaptive voxelization,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
[32] S. Laine and T. Karras, “Efficient sparse voxel octrees,” in Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games, 2010, pp. 55–63.
[33] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
[34] G. Tolias, R. Sicre, and H. J ́egou, “Particular object retrieval with integral max-pooling of cnn activations,” arXiv preprint arXiv:1511.05879, 2015.
[35] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.
[36] S.-i. Amari, “Backpropagation and stochastic gradient descent method,” Neurocomputing, vol. 5, no. 4-5, pp. 185–196, 1993.
[37] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015.
[38] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” Frontiers in neuroscience, vol. 13, p. 95, 2019.
[39] S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: Generalizing residual architectures,” arXiv preprint arXiv:1603.08029, 2016.
[40] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
[41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[42] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017.
[43] H.-T. Kung, “Why systolic architectures?” Computer, vol. 15, no. 1, pp. 37–46, 1982.
[44] T. Hinks, H. Carr, L. Truong-Hong, and D. F. Laefer, “Point cloud data conversion into solid models via point-based voxelization,” Journal of Surveying Engineering, vol. 139, no. 2, pp. 72–83, 2013.
[45] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, 2017, pp. 1–12.
[46] M. Pellauer, Y. S. Shao, J. Clemons, N. Crago, K. Hegde, R. Venkatesan, S. W. Keckler, C. W. Fletcher, and J. Emer, “Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration,” in Proceedings of the Twenty-Fourth International Conference on Archi- tectural Support for Programming Languages and Operating Systems, 2019, pp. 137–151.
[47] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
[48] Y. Yu, W. Mao, J. Luo, and Z. Wang, “A low-latency framework with algorithm-hardware co-optimization for 3d point cloud,” IEEE Transactions on Circuits and Systems II: Express Briefs, 2023.

全文公開日期 2025/07/31 (校內網路)
全文公開日期 2025/07/31 (校外網路)
全文公開日期 2025/07/31 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文