簡易檢索 / 詳目顯示

研究生: 蔡易蓁
Yi-Chen Tsai
論文名稱: 一種應用於N:M架構稀疏卷積神經網路加速器之面積優化與高效率的索引模組電路
An Area-optimized and Time-efficient Indexing Module for N:M Structured Sparse Convolutional Neural Network Accelerators
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 阮聖彰
Shanq-Jang Ruan
林銘波
Ming-Bo Lin
李佩君
Pei-Jun Lee
張延任
Yen-Jen Chang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 72
中文關鍵詞: 卷積神經網路類單指令流多資料流加速器資料壓縮N:M 結構稀疏模型索引模組電路
外文關鍵詞: Convolutional neural networks (CNNs), SIMD-like CNN accelerators, CNN Sparsity Compression, N:M sparsity architecture, Indexing module
相關次數: 點閱:346下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,卷積神經網路 (Convolutional Neural Networks) 在各種應用上表現出色,為了實現巨量且複雜的卷積運算於便攜式邊緣裝置上,設計針對卷積神經網路特殊運算的硬體架構已經成為重要的課題。而除了針對硬體的設計外,模型本身也可以透過剪枝 (Pruning) 以符合邊緣裝置記憶體之限制,NVIDIA對此提出了一個硬體友善的 N:M 結構稀疏模型,實現在 A100 圖形處理器 (GPU) 上,與緊密模型相比加速了兩倍的推理速度,同時保持著高準確度。基於此結構稀疏模型硬體友善及高準確度的特性,我們將N:M結構的稀疏模型應用於類單指令流多資料流 (Single Instruction Multiple Data, SIMD) 加速器上,並為其設計索引模組電路及特殊的編碼器來提取必要的計算組合,以節省執行時間與降低功耗。相較於之前的相關研究,本論文提出用於驗證的N:M結構稀疏模型達到高準確度及高稀疏度的成果,此外,此設計架構還節省了37.16%的硬體合成面積。


    The use of Convolutional Neural Networks (CNNs) has increasingly been the object of study in recent years due to their outstanding performance on various tasks. A new issue is implementing CNNs on portable edge devices, which makes it necessary to design specific hardware to afford complicated computations. CNN models are pruned to fit the memory limit, and the N:M fined-grained structured sparsity structure is proposed by NVIDIA and achieves 2x speedup on A100 GPU while maintaining high accuracy compared to dense models. Due to the future development of this structured pruning format, we apply the layer-wise N:M structured sparsity model on a SIMD-like accelerator and design an indexing module and an encoder for detecting the necessary computation pairs and saving execution time simultaneously for hardware acceleration. The proposed design accomplishes the lowest hardware requirement and achieves high accuracy and sparsity features on N:M sparsity models compared with previous studies.

    RECOMMENDATION FORM I COMMITTEE FORM II 摘要 III ABSTRACT IV ACKNOWLEDGEMENTS V TABLE OF CONTENTS VII LIST OF FIGURES XI LIST OF TABLES XIV CHAPTER 1 1 INTRODUCTION 1 1.1 Background of CNN Accelerators 1 1.2 Model Compression Methods for CNN 2 1.3 Challenges of Previous Works 5 1.4 Contribution of This Thesis 6 1.5 Organization 8 CHAPTER 2 9 BACKGROUND 9 1.1 Convolutional Neural Networks 9 1.2 Remarkable CNN models 14 CHAPTER 3 17 RELATED WORK 17 1.1 Sparsity in activations and weights: 17 1.2 N:M Fine-Grained Structured Sparse Neural Networks 18 1.2.1 STE 20 1.2.2 SR-STE 21 1.3 Layer-wise N:M sparsity network 22 1.4 Indexing Module 24 1.5 Indexing Method 27 CHAPTER 4 29 N:M SPARSITY STRUCTURE BASED INDEXING MODULE 29 1.1 Compression Format of the N:M Sparsity Model 29 1.2 Hardware Architecture 31 1.2.1 Indexing Module 32 1.2.2 Encoder 38 CHAPTER 5 39 EXPERIMENTAL RESULTS 39 1.1 Software Analyzing Experiments 40 1.1.1 N:M Sparsity CNN Model Verification and Training 40 1.1.2 Sparsity of Activations in N:M Sparsity CNN Models 42 1.1.3 Execution Time Reduction Analysis 45 1.1.4 Load Imbalance Alleviation Analysis 46 1.2 Hardware Design Results 47 CHAPTER 6 50 CONCLUSIONS 50 REFERENCES 52

    [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., vol. 25. Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf2012.
    [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [3] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems, vol. 28, 2015.
    [4] W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen, “Compressing neural networks with the hashing trick,” in International conference on machine learning. PMLR, 2015, pp. 2285–2294.
    [5] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 609–616. [Online]. Available: https://doi.org/10.1145/1553374.1553453
    [6] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” Advances in neural information processing systems, vol. 29, 2016.
    [7] N. Zmora, H. Wu, and J. Rodge, “Achieving fp32 accuracy for int8 inference using quantization aware training with nvidia tensorrt. nvidia,” 2021.
    [8] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” arXiv preprint arXiv:1412.6115, 2014.
    [9] V. Vanhoucke, A. Senior, and M. Z. Mao, “Improving the speed of neural networks on cpus,” 2011.
    [10] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in International conference on machine learning. PMLR, 2015, pp. 1737–1746.
    [11] Bucilua, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 535–541.
    [12] J. Ba and R. Caruana, “Do deep nets really need to be deep?” Advances in neural information processing systems, vol. 27, 2014.
    [13] T. Yang, Y. Liao, J. Shi, Y. Liang, N. Jing, and L. Jiang, “A winograd-based cnn accelerator with a fine-grained regular sparsity pattern,” in 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2020, pp. 254–261.
    [14] S. Anwar, K. Hwang, and W. Sung, “Structured pruning of deep convolutional neural networks,” ACM Journal on Emerging Technologies in Computing Systems (JETC), vol. 13, no. 3, pp. 1–18, 2017.
    [15] T. Zhang, S. Ye, K. Zhang, J. Tang, W. Wen, M. Fardad, and Y. Wang, “A systematic dnn weight pruning framework using alternating direction method of multipliers,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 184–199.
    [16] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
    [17] A. Zhou, Y. Ma, J. Zhu, J. Liu, Z. Zhang, K. Yuan, W. Sun, and H. Li, “Learning N: M fine-grained structured sparse neural networks from scratch,” arXiv preprint arXiv:2102.04010, 2021.
    [18] N. Nvidia, “A100 tensor core gpu architecture,” 2020.
    [19] J. Krashinsky, Giroux and R. Stam, “Nvidia ampere sparse tensor core.” [Online]. Available: https://developer.nvidia.com/blog/ nvidia-ampere-architecture-in-depth/
    [20] I. Hubara, B. Chmiel, M. Island, R. Banner, J. Naor, and D. Soudry, “Accelerated sparse neural training: A provable and efficient method to find n: m transposable masks,” Advances in Neural Information Processing Systems, vol. 34, pp. 21099–21111, 2021.
    [21] J. Pool and C. Yu, “Channel permutations for n: m sparsity,” Advances in Neural Information Processing Systems, vol. 34, pp. 13316–13327, 2021.
    [22] C. Fang, A. Zhou, and Z. Wang, “An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 30, no. 11, pp. 1573–1586, 2022.
    [23] Y. Zhang, M. Lin, Z. Lin, Y. Luo, K. Li, F. Chao, Y. Wu, and R. Ji,“Learning best combination for efficient n: M sparsity,” Advances in Neural Information Processing Systems, vol. 35, pp. 941–953, 20
    [24] Nvidia, “[nvidia. automatic sparsity (asp),” 2020. [Online]. Available: https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity
    [25] Y. Bengio, N. Leonard, and A. Courville, “Estimating or propagating´ gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
    [26] H.-J. Kang, “Accelerator-aware pruning for convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 2093–2103, 2019.
    [27] Z.-G. Liu, P. N. Whatmough, Y. Zhu, and M. Mattina, “S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 573–586.
    [28] Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019.
    [29] X. Xie, J. Lin, Z. Wang, and J. Wei, “An efficient and flexible accelerator design for sparse convolutional neural networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 7, pp. 2936–2949, 2021.
    [30] L. Lu, J. Xie, R. Huang, J. Zhang, W. Lin, and Y. Liang, “An efficient hardware accelerator for sparse convolutional neural networks on fpgas,” in 2019 IEEE 27th Annual International Symposium on FieldProgrammable Custom Computing Machines (FCCM). IEEE, 2019, pp. 17–25.
    [31] Z. Tan, J. Song, X. Ma, S.-H. Tan, H. Chen, Y. Miao, Y. Wu, S. Ye, Y. Wang, D. Li et al., “Pcnn: Pattern-based fine-grained regular pruning towards optimizing cnn accelerators,” in 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6.
    [32] Y.-K. Weng, S.-H. Huang, and H.-Y. Kao, “Block-based compression and corresponding hardware circuits for sparse activations,” Sensors, vol. 21, no. 22, p. 7468, 2021.
    [33] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016, pp. 1–12.
    [34] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 1–13, 2016.
    [35] C.-Y. Lin and B.-C. Lai, “Supporting compressed-sparse activations and weights on simd-like accelerator for sparse convolutional neural networks,” in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2018, pp. 105–110.
    [36] B.-C. Lai, J.-W. Pan, and C.-Y. Lin, “Enhancing utilization of simd-like accelerator for sparse convolutional neural networks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 5, pp. 1218– 1222, 2019.
    [37] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [38] M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, “Automatic localization of casting defects with convolutional neural networks,” in 2017 IEEE international conference on big data (big data). IEEE, 2017, pp. 1726–1735.
    [39] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky, “Sparse convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
    [40] W. Sun, A. Zhou, S. Stuijk, R. G. J. Wijnhoven, A. Nelson, H. Li, and H. Corporaal, “Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks,” in Thirty-Fifth Conference on Neural Information Processing Systems, 2021. [Online]. Available: https://openreview.net/forum?id=IGrC6koWg
    [41] M. Wortsman, A. Farhadi, and M. Rastegari, “Discovering neural wirings,” Advances in Neural Information Processing Systems, vol. 32, 2019.
    [42] X. Xie, M. Zhu, S. Lu, and Z. Wang, “Efficient layer-wise n: M sparse cnn accelerator with flexible spec: Sparse processing element clusters,” Micromachines, vol. 14, no. 3, p. 528, 2023.

    無法下載圖示 全文公開日期 2025/07/31 (校內網路)
    全文公開日期 2025/07/31 (校外網路)
    全文公開日期 2025/07/31 (國家圖書館:臺灣博碩士論文系統)
    QR CODE