研究生: |
張薏萱 Yi-Hsuan Chang |
---|---|
論文名稱: |
設計與實現一個植基於FPGA 之手寫數字辨識系統之CNN加速器 Design and Implementation of an FPGA-Based CNN Accelerator for Handwritten Digits Recognition Systems |
指導教授: |
林銘波
Ming-Bo Lin |
口試委員: |
陳郁堂
Yie-Tarng Chen 林書彥 Shu-Yen Lin 蔡政鴻 Jeng-Hung Tsai |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 68 |
中文關鍵詞: | 卷積神經網路 (CNN) 、加速器 、現場可程式邏輯陣列(FPGA) 、手寫辨識 、深度學習 |
外文關鍵詞: | Convolutional Neural Network (CNN), accelerator, FPGA, handwriting recognition, deep learning |
相關次數: | 點閱:505 下載:23 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年深度學習的技術逐漸影響人們的日常生活,而以卷積神經網路(CNN)的應用尤為大宗,其中更以圖片辨識、分類最為廣泛,也最能反映卷積神經網路的成熟發展。隨著卷積運算的發展成熟,其模型亦日趨複雜,運算過程中使用的參數數目也越來越多。因此,如何利用有限的資源設計一個高準確率、高速率、低使用資源的加速器將是未來卷積神經網路廣泛應用的關鍵。
為達到上述目的,本論文提出一個以現場可程式邏輯陣列(FPGA)為實現基礎的CNN加速器。它首先由MNIST數字集內取出訓練資料,並使用Google Colaboratory訓練模型,測試出最佳精準度的參數,然後使用FPGA實現整個CNN加速器。
本論文提出之加速器已經在Xilinx Vertex 5系列的FPGA (XC5VLX330T) 實現與驗證。其準確率為97.89%,工作頻率125MHz,運算平均速率為10.87 µs,平均較CPU快83.13 µs,較GPU快17.13 µs,占用的LUT為49,162個,為總資源的50%,記憶體佔用2,720個LUT,為總資源的10%,使用的暫存器為41,099個,占總暫存器資源的45%,使用DSP為122個,占總資源95%,平均扇出為3.42。
In recent years, the technology of deep learning has gradually affected the daily-life of human beings, especially, the applications of convolutional neural networks (CNNs). Among them, image recognition and classification are the most popular and can best reflect the matured development of convolutional neural networks. Along the maturing development of CNNs, their models are more and more complicated, thereby, needing more and more parameters in the calculation process. As a consequence, how to design a high-accuracy and high-speed CNN accelerator with low resource usage will be the key factor for CNNs to be widely used in the future.
To achieve the above-mentioned objective, this thesis proposes a CNN accelerator on the basis of a field-programmable logic array (FPGA). To do this, the Google Colaboratory is used to train the model with the training data extracted from the MNIST digit set, and then the parameters with the best accuracy are used to implement the required CNN accelerator.
The proposed CNN accelerator has been implemented and verified with an FPGA device from the Xilinx Vertex 5 series FPGA (XC5VLX330T). The results show that the accuracy is 97.89%, the average running time is 10.87 µs, which is faster than CPU by 83.13 µs and than GPU by 17.13 µs. The CNN accelerator occupies 49,162 LUTs, accounting for 50% of the total LUT resources, uses 2,720 LUTS for the memory, accounting for 10% of total memory LUT resources, and employs 41,099 temporary registers, accounting for 45% of total register resources.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Neural Information Processing Systems 25, Jan. 2012. doi:10.1145/3065386.
[2] A. Sharma, “Understanding activation functions in deep learning,” https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ Oct. 2017.
[3] A. Bala et al., “A memristive activation circuit for deep learning neural networks,” in Proceedings of the 2018 8th International Symposium on Embedded Computing and System Design (ISED), Cochin, India, pp. 1-5, Dec. 2018. doi:10.1109/ISED.2018.8704116.
[4] CS231n: Convolutional Neural Networks for Visual Recognition, https://cs231n.github.io/convolutional-networks/.
[5] G. Cybenko, “Approximations by superpositions of sigmoidal functions” Math. Control Signal Systems, vol. 2, no.4, pp. 303-314, Oct. 1989.
[6] M. Harris. “A brief history of GPGPU,” Hisa Ando, 2017.
[7] I. Arel, D. C. Rose, and T. P. Karnowski, “Deep machine learning - a new frontier in artificial intelligence research,” IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13-18, Nov. 2010. doi:10.1109/MCI.2010.938364.
[8] J. Qiu et al., “Going deeper with embedded FPGA platform for convolutional neural network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 26-35, Fab. 2016.
[9] J. Lee and G. Song, “Super-Systolic array for 2D convolution,” in Proceedings of the 2006 IEEE Region 10 Conference (TENCON 2006), Hong Kong, pp. 1-4, Nov. 2006. doi:10.1109/TENCON.2006.343739.
[10] K. Simonyan and A. Zisserman, ”Very deep convolutional networks for large-scale image recognition,” arXiv, Sep. 2014. doi:CoRR,abs/1409.1556.
[11] K. S. et al., “Design space exploration of convolution algorithms to accelerate CNNs on FPGA,” in Proceedings of the 2018 8th International Symposium on Embedded Computing and System Design (ISED), Cochin, India, pp. 21-25, Dec. 2018. doi:10.1109/ISED.2018.8704043.
[12] L. Rizzatti, “A breakthrough in FPGA-based deep learning inference,” https://www.eeweb.com/profile/lauro/articles/a-breakthrough-in-fpga-based-deep-learning-inference, Nov. 2018.
[13] M. Gagliardi, E. Fusella, and A. Cilardo, “Improving deep learning with a customizable GPU-like FPGA-based accelerator,” in Proceedings of the 14th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME2018), Prague, pp. 273-276, Jul. 2018. doi:10.1109/ PRIME.2018.8430335.
[14] N. Shah, P. Chaudhari, and K. Varghese, “Runtime programmable and memory bandwidth optimized FPGA-based coprocessor for deep convolutional neural network,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 12, pp. 5922-5934, Dec. 2018. doi:10.1109/ TNNLS.2018.2815085.
[15] S. Jain and S. Saini, “High speed convolution and deconvolution algorithm (based on ancient indian vedic mathematics),” in Proceedings of the 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON2014), Nakhon Ratchasima, pp. 1-5, May 2014. doi:10.1109/ECTICon.2014.6839756.
[16] S. Hareth, H. Mostafa, and K. A. Shehata, “Low power CNN hardware FPGA implementation,” in Proceedings of the 31st International Conference on Microelectronics (ICM2019), Cairo, Egypt, pp. 162-165, Dec. 2019. doi:10.1109/ICM48031.2019.9021904.
[17] S. W. Keckler et al., “GPUs and the future of parallel computing,” IEEE Micro, vol. 31, no. 5, pp. 7-17, Sep.-Oct. 2011. doi:10.1109/MM.2011.89.
[18] S. Han et al., “EIE: efficient inference engine on compressed deep neural network,” in Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA2016), Seoul, Korea, pp. 243-254, May 2016. doi:10.1109/ISCA.2016.30.
[19] S. Chakradhar et al., “A dynamically configurable coprocessor for convolutional neural networks,” in Proceedings of the 37th International Symposium on Computer Architecture (ISCA2010), Saint-Malo, France, pp. 247-257, Jun. 2010. doi:10.1145/1816038.1815993
[20] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” ArXiv, Mar. 2016.
[21] T. Carneiro et al., “Performance analysis of google colaboratory as a tool for accelerating deep learning applications,” IEEE Access, vol. 6, pp. 61677-61685, 2018.
[22] T. Tsai, Y. Ho, and M. Sheu, “Implementation of FPGA-based accelerator for deep neural networks,” in Proceedings of the IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS2019), Cluj-Napoca, Romania, pp. 1-4, Dec. 2019. doi:10.1109/ DDECS.2019.8724665.
[23] T. V. Huynh, “Deep neural network accelerator based on FPGA,” in Proceedings of the 2017 4th NAFOSTED Conference on Information and Computer Science, Hanoi, pp. 254-257, Nov. 2017. doi:10.1109/NAFOSTED.2017.8108073.
[24] R. Hameed et al., “Understanding sources of inefficiency in general-purpose chips,” in Proceedings of the 37th annual international symposium on Computer architecture (ISCA ’10), Association for Computing Machinery, New York, NY, USA, pp. 37-47, Jun. 2010. doi:10.1145/1815961.1815968
[25] R. Ding et al., “A FPGA-based accelerator of convolutional neural network for face feature extraction,” in Proceedings of the IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC2019), Xi'an, China, pp. 1-3, Jun. 2019. doi:10.1109/ EDSSC.2019.8754067.
[26] Y. Lecun et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. doi:10.1109/5.726791
[27] Y. LeCun, C. Cortes, and C. J.C. Burges, “The MNIST database of handwritten digits,” http://yann.lecun.com/exdb/mnist/, Dec. 2019.
[28] Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends® in Machine Learning, vol. 2: no. 1, pp 1-127, Jan. 2009. doi:10.1561/220000000