簡易檢索 / 詳目顯示

研究生: 戴崇光
Chung-Kuang Tai
論文名稱: 設計與實現一個卷積神經網路手寫數字辨識加速器
Design and Implementation of an CNN Accelerator for Handwritten Digit Recognition
指導教授: 林銘波
Ming-Bo Lin
口試委員: 蔡政鴻
Cheng-Hung Tsai
林書彥
Shu-Yen Lin
陳郁堂
Yie-Tarng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 68
中文關鍵詞: FPGAASIC加速器卷積神經網路手寫數字辨識
外文關鍵詞: FPGA, ASIC, accelerator, convolution neural network, handwritten digit recognition
相關次數: 點閱:310下載:34
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,人工智慧(Artificial Intelligence,AI)在許多領域取得了驚人的發展,其中卷積神經網路的應用更是與我們息息相關,如人臉辨識、物體偵測和圖像辨識等。發展至今,神經網路模型日趨複雜並且大型化,其運算過程中需要讀取大量的參數,因此如何在資源有限的環境下降低資料的讀取,同時維持高效率的運算成為一大課題。
    為了解決上述問題,本論文使用MNIST資料集為例,設計一個卷積神經網路加速器。此加速器使用line buffer架構,並透過消除冗餘的運算層,降低了權重參數的數量,而不影響準確率。此外,為進一步減少資源使用量,使用8位元定點數取代32位元的浮點數運算。再者,在加速器的架構設計上,同時採用平行與管線處理方式,以達到資料的最大化共享,與提升電路執行效率。
    完成的加速器架構分別使用FPGA元件與ASIC標準元件庫驗證。在FPGA驗證上,使用Xilinx Virtex 7系列的xc7vx330tffv元件實現,共使用了16,067個LUT與14,257個暫存器,以及368個DSP模組,最大操作頻率為100 MHz。在ASIC驗證上,使用tsmc 0.18 μm標準元件庫,晶片面積4177.42 µm × 4174.24 µm,等效邏輯閘約為789,265個,最大操作頻率為100 MHz。兩種版本在MNIST資


    Recently, artificial intelligence (AI) has made remarkable advancements in various fields. Among these, the applications of Convolution Neural Networks (CNN) are particularly closely related to us, such as in facial recognition, object detection, and image recognition. As neural network models have become increasingly complex and large, the amount of parameters required for the computation process becomes significantly huge. Therefore, how to reduce the amount of data accesses and to maintain the high-efficiency computations in a limited-resource environment at the same time has become a major challenge.
    To tackle the above problem, in this thesis, we design a CNN accelerator with the MNIST dataset as an example. This CNN accelerator can significantly reduce the amount of parameters without severely affecting the accuracy through the use of line buffers and the removal of redundancy. In addition, to further reduce the cost of hardware, an 8-bit fixed-point arithmetic is employed instead of the 32-bit floating-point arithmetic. Moreover, both the parallel and pipeline techniques are applied to the accelerator simultaneously to achieve the maximum data sharing and to promote the system performance.
    The resulting accelerator has been verified in both FPGA and ASIC techniques. In FPGA verification, a Xilinx Virtex 7 device (xc7vx330tffv device) is employed. It consumes 16,067 LUTs, 14,257 registers, and 368 DSPs, and can operate up to 100 MHz. In the ASIC verification, a tsmc 0.18-μm standard cell library is used. The resulting chip area is of 4177.42 µm × 4174.24 µm, equivalent to 789,281 gates. The maximum operating frequency is also 100 MHz. Both implementations of the accelerator can achieve an accuracy of 98.23% on the test with the MNIST dataset.

    摘要 I Abstract II 致謝 IV 目錄 V 圖目錄 VIII 表目錄 XI 第1章 緒論 1 1.1 研究動機 1 1.2 研究方向 2 1.3 章節安排 2 第2章 卷積神經網路 3 2.1 深度學習 3 2.2 卷積神經網路 4 2.3 卷積神經網路基本架構 5 2.3.1 卷積層 5 2.3.2 卷積核 6 2.3.3 移動步伐 7 2.3.4 補充像素 8 2.3.5 偏差值 9 2.3.6 池化層 10 2.3.7 激勵函數 11 2.3.8 平坦化層 13 2.3.9 全連接層 14 第3章 卷積神經網路模型分析與設計 15 3.1 經典卷積神經網路模型 15 3.1.1 LeNet 15 3.1.2 AlexNet 16 3.1.3 VGGNet 17 3.2 MNIST手寫數字資料集 18 3.3 模型建立 19 3.4 模型量化 23 第4章 CNN推斷加速器設計與實現 25 4.1 CNN加速器之腳位定義 25 4.2 CNN加速器架構 26 4.2.1 卷積運算模組 27 4.2.2 ReLU函數運算模組 29 4.2.3 最大池化運算模組 30 4.2.4 全連接層運算模組 31 4.2.5 結果比較運算模組 32 4.3 管線化資料路徑 33 4.3.1 第一層 33 4.3.2 第二層 34 4.3.3 第三層 35 4.4 控制路徑 36 4.4.1 卷積控制器 36 4.4.2 最大池化控制器 37 第5章 設計實現與結果分析 39 5.1 測試與驗證 39 5.2 FPGA的設計與實現 40 5.2.1 Behavioral模擬 41 5.2.2 合成結果 42 5.2.3 Post-Synthesis模擬 42 5.2.4 佈局結果 43 5.2.5 Post-Implementation模擬 43 5.3 ASIC的設計與實現 44 5.3.1 Design Compiler之合成結果 45 5.3.2 Post-Synthesis模擬波形圖 45 5.3.3 IC Compiler之佈局結果 46 5.3.4 Post-Implementation模擬波形圖 48 5.4 結果分析與比較 49 第6章 結論 52 第7章 參考文獻 53

    [1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. doi: 10.1109/5.726791.
    [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, Jun. 2017. doi: 10.1145/3065386.
    [3] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, Sep. 2014.
    [4] MNIST handwritten digit database, URL: http://yann.lecun.com/exdb/mnist/.
    [5] Quantization, URL: https://pytorch.org/docs/stable/quantization.html.
    [6] PyTorch, URL: https://pytorch.org/.
    [7] J. Qiu et al., "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 26-35, Feb. 2016. doi: 10.1145/2847263.2847265.
    [8] D. G. Bailey, Design for Embedded Image Processing on FPGAs. Wiley-IEEE Press, Jun. 2011.
    [9] Vivado, URL: https://www.xilinx.com/products/design-tools/vivado.html.
    [10] Y. Zhou, W. Wang, and X. Huang, "FPGA design for PCANet deep learning network," in Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, BC, pp. 232-232, 2-6 May 2015. doi: 10.1109/FCCM.2015.45.
    [11] Z. Yongmei and J. Jingfei, "An FPGA-based accelerator implementation for deep convolutional neural networks," in Proceedings of the 2015 4th International Conference on Computer Science and Network Technology (ICCSNT 2015), Harbin, China, pp. 829-832, 19-20 Dec. 2015, vol. 01. doi: 10.1109/ICCSNT.2015.7490869.
    [12] A. Kyriakos, V. Kitsakis, A. Louropoulos, E. A. Papatheofanous, I. Patronas, and D. Reisis, "High performance accelerator for cnn applications," in Proceedings of the 2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS 2019), Rhodes, Greece, pp. 135-140, 1-3 Jul. 2019. doi: 10.1109/PATMOS.2019.8862166.
    [13] S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," in Proceedings of the 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS 2016), Tehran, Iran, pp. 1-6, 14-15 Dec. 2016. doi: 10.1109/ICSPIS.2016.7869873.
    [14] C. Chung, Y. Lee, and H. Zhang, "Design of a DBN hardware accelerator for handwritten digit recognitions," in Proceedings of the 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW 2019), Yilan, Taiwan, pp. 1-2, 20-22 May 2019. doi: 10.1109/ICCE-TW46550.2019.8991890.
    [15] 鄭又瑄,設計與實現一個植基於 FPGA 的列固定資料流 CNN 加速器,碩士論文-國立台灣科技大學電子工程系,2020年。
    [16] 張薏萱,設計與實現一個植基於 FPGA 之手寫數字辨識系統之 CNN 加速器,碩士論文-國立台灣科技大學電子工程系,2020年。

    QR CODE