簡易檢索 / 詳目顯示

研究生: 張薏萱
Yi-Hsuan Chang
論文名稱: 設計與實現一個植基於FPGA 之手寫數字辨識系統之CNN加速器
Design and Implementation of an FPGA-Based CNN Accelerator for Handwritten Digits Recognition Systems
指導教授: 林銘波
Ming-Bo Lin
口試委員: 陳郁堂
Yie-Tarng Chen
林書彥
Shu-Yen Lin
蔡政鴻
Jeng-Hung Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 68
中文關鍵詞: 卷積神經網路 (CNN)加速器現場可程式邏輯陣列(FPGA)手寫辨識深度學習
外文關鍵詞: Convolutional Neural Network (CNN), accelerator, FPGA, handwriting recognition, deep learning
相關次數: 點閱:320下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年深度學習的技術逐漸影響人們的日常生活,而以卷積神經網路(CNN)的應用尤為大宗,其中更以圖片辨識、分類最為廣泛,也最能反映卷積神經網路的成熟發展。隨著卷積運算的發展成熟,其模型亦日趨複雜,運算過程中使用的參數數目也越來越多。因此,如何利用有限的資源設計一個高準確率、高速率、低使用資源的加速器將是未來卷積神經網路廣泛應用的關鍵。
    為達到上述目的,本論文提出一個以現場可程式邏輯陣列(FPGA)為實現基礎的CNN加速器。它首先由MNIST數字集內取出訓練資料,並使用Google Colaboratory訓練模型,測試出最佳精準度的參數,然後使用FPGA實現整個CNN加速器。
    本論文提出之加速器已經在Xilinx Vertex 5系列的FPGA (XC5VLX330T) 實現與驗證。其準確率為97.89%,工作頻率125MHz,運算平均速率為10.87 µs,平均較CPU快83.13 µs,較GPU快17.13 µs,占用的LUT為49,162個,為總資源的50%,記憶體佔用2,720個LUT,為總資源的10%,使用的暫存器為41,099個,占總暫存器資源的45%,使用DSP為122個,占總資源95%,平均扇出為3.42。


    In recent years, the technology of deep learning has gradually affected the daily-life of human beings, especially, the applications of convolutional neural networks (CNNs). Among them, image recognition and classification are the most popular and can best reflect the matured development of convolutional neural networks. Along the maturing development of CNNs, their models are more and more complicated, thereby, needing more and more parameters in the calculation process. As a consequence, how to design a high-accuracy and high-speed CNN accelerator with low resource usage will be the key factor for CNNs to be widely used in the future.
    To achieve the above-mentioned objective, this thesis proposes a CNN accelerator on the basis of a field-programmable logic array (FPGA). To do this, the Google Colaboratory is used to train the model with the training data extracted from the MNIST digit set, and then the parameters with the best accuracy are used to implement the required CNN accelerator.
    The proposed CNN accelerator has been implemented and verified with an FPGA device from the Xilinx Vertex 5 series FPGA (XC5VLX330T). The results show that the accuracy is 97.89%, the average running time is 10.87 µs, which is faster than CPU by 83.13 µs and than GPU by 17.13 µs. The CNN accelerator occupies 49,162 LUTs, accounting for 50% of the total LUT resources, uses 2,720 LUTS for the memory, accounting for 10% of total memory LUT resources, and employs 41,099 temporary registers, accounting for 45% of total register resources.

    摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 VII 表目錄 IX 第 1 章 緒論 1 1.1 研究動機 1 1.2 研究方向 2 1.3 章節安排 2 第 2 章 背景介紹 3 2.1 深度學習 3 2.2 硬體加速器 4 2.3 MNIST 手寫數字資料集 基本介紹 6 2.4 卷積神經網路專有名詞介紹 7 2.4.1 卷積 8 2.4.2 卷積核 9 2.4.3 移動步伐 10 2.4.4 補充像素 11 2.4.5 偏差值 11 2.4.6 池化 12 2.4.7 激勵函數 13 2.4.8 全連接層 17 2.4.9 One-Hot 解碼器 18 2.5 經典CNN模型架構 18 2.5.1 Le Net 19 2.5.2 Alex Net 20 2.5.3 VGG Net 21 第 3 章 CNN架構分析與硬體設計 22 3.1 資料分析 22 3.1.1 卷積 22 3.1.2 全連接層 24 3.2 模型建立 25 3.2.1 Le Net 25 3.2.2 Alex Net 27 3.2.3 模型一 29 3.2.4 模型二 30 3.2.5 模型三 31 3.2.6 模型四 32 第 4 章 CNN加速器模組設計 34 4.1 CNN加速器架構 34 4.1.1 卷積運算模組 37 4.1.2 加法樹加法器 39 4.1.3 最大池化運算模組 40 4.1.4 ReLU模組 42 4.1.5 全連接層模組 43 4.1.6 One-Hot解碼器模組 44 4.1.7 主控制器模組 46 第 5 章 FPGA設計實現與結果分析 47 5.1 設計的與實現流程 47 5.2 測試與驗證 48 5.3 模擬波形圖 49 5.4 FPGA設計實現結果分析 52 第 6 章 結論 54 參考文獻 55

    [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Neural Information Processing Systems 25, Jan. 2012. doi:10.1145/3065386.
    [2] A. Sharma, “Understanding activation functions in deep learning,” https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ Oct. 2017.
    [3] A. Bala et al., “A memristive activation circuit for deep learning neural networks,” in Proceedings of the 2018 8th International Symposium on Embedded Computing and System Design (ISED), Cochin, India, pp. 1-5, Dec. 2018. doi:10.1109/ISED.2018.8704116.
    [4] CS231n: Convolutional Neural Networks for Visual Recognition, https://cs231n.github.io/convolutional-networks/.
    [5] G. Cybenko, “Approximations by superpositions of sigmoidal functions” Math. Control Signal Systems, vol. 2, no.4, pp. 303-314, Oct. 1989.
    [6] M. Harris. “A brief history of GPGPU,” Hisa Ando, 2017.
    [7] I. Arel, D. C. Rose, and T. P. Karnowski, “Deep machine learning - a new frontier in artificial intelligence research,” IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13-18, Nov. 2010. doi:10.1109/MCI.2010.938364.
    [8] J. Qiu et al., “Going deeper with embedded FPGA platform for convolutional neural network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 26-35, Fab. 2016.
    [9] J. Lee and G. Song, “Super-Systolic array for 2D convolution,” in Proceedings of the 2006 IEEE Region 10 Conference (TENCON 2006), Hong Kong, pp. 1-4, Nov. 2006. doi:10.1109/TENCON.2006.343739.
    [10] K. Simonyan and A. Zisserman, ”Very deep convolutional networks for large-scale image recognition,” arXiv, Sep. 2014. doi:CoRR,abs/1409.1556.
    [11] K. S. et al., “Design space exploration of convolution algorithms to accelerate CNNs on FPGA,” in Proceedings of the 2018 8th International Symposium on Embedded Computing and System Design (ISED), Cochin, India, pp. 21-25, Dec. 2018. doi:10.1109/ISED.2018.8704043.
    [12] L. Rizzatti, “A breakthrough in FPGA-based deep learning inference,” https://www.eeweb.com/profile/lauro/articles/a-breakthrough-in-fpga-based-deep-learning-inference, Nov. 2018.
    [13] M. Gagliardi, E. Fusella, and A. Cilardo, “Improving deep learning with a customizable GPU-like FPGA-based accelerator,” in Proceedings of the 14th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME2018), Prague, pp. 273-276, Jul. 2018. doi:10.1109/ PRIME.2018.8430335.
    [14] N. Shah, P. Chaudhari, and K. Varghese, “Runtime programmable and memory bandwidth optimized FPGA-based coprocessor for deep convolutional neural network,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 12, pp. 5922-5934, Dec. 2018. doi:10.1109/ TNNLS.2018.2815085.
    [15] S. Jain and S. Saini, “High speed convolution and deconvolution algorithm (based on ancient indian vedic mathematics),” in Proceedings of the 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON2014), Nakhon Ratchasima, pp. 1-5, May 2014. doi:10.1109/ECTICon.2014.6839756.
    [16] S. Hareth, H. Mostafa, and K. A. Shehata, “Low power CNN hardware FPGA implementation,” in Proceedings of the 31st International Conference on Microelectronics (ICM2019), Cairo, Egypt, pp. 162-165, Dec. 2019. doi:10.1109/ICM48031.2019.9021904.
    [17] S. W. Keckler et al., “GPUs and the future of parallel computing,” IEEE Micro, vol. 31, no. 5, pp. 7-17, Sep.-Oct. 2011. doi:10.1109/MM.2011.89.
    [18] S. Han et al., “EIE: efficient inference engine on compressed deep neural network,” in Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA2016), Seoul, Korea, pp. 243-254, May 2016. doi:10.1109/ISCA.2016.30.
    [19] S. Chakradhar et al., “A dynamically configurable coprocessor for convolutional neural networks,” in Proceedings of the 37th International Symposium on Computer Architecture (ISCA2010), Saint-Malo, France, pp. 247-257, Jun. 2010. doi:10.1145/1816038.1815993
    [20] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” ArXiv, Mar. 2016.
    [21] T. Carneiro et al., “Performance analysis of google colaboratory as a tool for accelerating deep learning applications,” IEEE Access, vol. 6, pp. 61677-61685, 2018.
    [22] T. Tsai, Y. Ho, and M. Sheu, “Implementation of FPGA-based accelerator for deep neural networks,” in Proceedings of the IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS2019), Cluj-Napoca, Romania, pp. 1-4, Dec. 2019. doi:10.1109/ DDECS.2019.8724665.
    [23] T. V. Huynh, “Deep neural network accelerator based on FPGA,” in Proceedings of the 2017 4th NAFOSTED Conference on Information and Computer Science, Hanoi, pp. 254-257, Nov. 2017. doi:10.1109/NAFOSTED.2017.8108073.
    [24] R. Hameed et al., “Understanding sources of inefficiency in general-purpose chips,” in Proceedings of the 37th annual international symposium on Computer architecture (ISCA ’10), Association for Computing Machinery, New York, NY, USA, pp. 37-47, Jun. 2010. doi:10.1145/1815961.1815968
    [25] R. Ding et al., “A FPGA-based accelerator of convolutional neural network for face feature extraction,” in Proceedings of the IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC2019), Xi'an, China, pp. 1-3, Jun. 2019. doi:10.1109/ EDSSC.2019.8754067.
    [26] Y. Lecun et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. doi:10.1109/5.726791
    [27] Y. LeCun, C. Cortes, and C. J.C. Burges, “The MNIST database of handwritten digits,” http://yann.lecun.com/exdb/mnist/, Dec. 2019.
    [28] Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends® in Machine Learning, vol. 2: no. 1, pp 1-127, Jan. 2009. doi:10.1561/220000000

    無法下載圖示
    全文公開日期 2026/01/13 (校外網路)
    全文公開日期 2026/01/13 (國家圖書館:臺灣博碩士論文系統)
    QR CODE