研究生: |
戴崇光 Chung-Kuang Tai |
---|---|
論文名稱: |
設計與實現一個卷積神經網路手寫數字辨識加速器 Design and Implementation of an CNN Accelerator for Handwritten Digit Recognition |
指導教授: |
林銘波
Ming-Bo Lin |
口試委員: |
蔡政鴻
Cheng-Hung Tsai 林書彥 Shu-Yen Lin 陳郁堂 Yie-Tarng Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 68 |
中文關鍵詞: | FPGA 、ASIC 、加速器 、卷積神經網路 、手寫數字辨識 |
外文關鍵詞: | FPGA, ASIC, accelerator, convolution neural network, handwritten digit recognition |
相關次數: | 點閱:310 下載:34 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,人工智慧(Artificial Intelligence,AI)在許多領域取得了驚人的發展,其中卷積神經網路的應用更是與我們息息相關,如人臉辨識、物體偵測和圖像辨識等。發展至今,神經網路模型日趨複雜並且大型化,其運算過程中需要讀取大量的參數,因此如何在資源有限的環境下降低資料的讀取,同時維持高效率的運算成為一大課題。
為了解決上述問題,本論文使用MNIST資料集為例,設計一個卷積神經網路加速器。此加速器使用line buffer架構,並透過消除冗餘的運算層,降低了權重參數的數量,而不影響準確率。此外,為進一步減少資源使用量,使用8位元定點數取代32位元的浮點數運算。再者,在加速器的架構設計上,同時採用平行與管線處理方式,以達到資料的最大化共享,與提升電路執行效率。
完成的加速器架構分別使用FPGA元件與ASIC標準元件庫驗證。在FPGA驗證上,使用Xilinx Virtex 7系列的xc7vx330tffv元件實現,共使用了16,067個LUT與14,257個暫存器,以及368個DSP模組,最大操作頻率為100 MHz。在ASIC驗證上,使用tsmc 0.18 μm標準元件庫,晶片面積4177.42 µm × 4174.24 µm,等效邏輯閘約為789,265個,最大操作頻率為100 MHz。兩種版本在MNIST資
Recently, artificial intelligence (AI) has made remarkable advancements in various fields. Among these, the applications of Convolution Neural Networks (CNN) are particularly closely related to us, such as in facial recognition, object detection, and image recognition. As neural network models have become increasingly complex and large, the amount of parameters required for the computation process becomes significantly huge. Therefore, how to reduce the amount of data accesses and to maintain the high-efficiency computations in a limited-resource environment at the same time has become a major challenge.
To tackle the above problem, in this thesis, we design a CNN accelerator with the MNIST dataset as an example. This CNN accelerator can significantly reduce the amount of parameters without severely affecting the accuracy through the use of line buffers and the removal of redundancy. In addition, to further reduce the cost of hardware, an 8-bit fixed-point arithmetic is employed instead of the 32-bit floating-point arithmetic. Moreover, both the parallel and pipeline techniques are applied to the accelerator simultaneously to achieve the maximum data sharing and to promote the system performance.
The resulting accelerator has been verified in both FPGA and ASIC techniques. In FPGA verification, a Xilinx Virtex 7 device (xc7vx330tffv device) is employed. It consumes 16,067 LUTs, 14,257 registers, and 368 DSPs, and can operate up to 100 MHz. In the ASIC verification, a tsmc 0.18-μm standard cell library is used. The resulting chip area is of 4177.42 µm × 4174.24 µm, equivalent to 789,281 gates. The maximum operating frequency is also 100 MHz. Both implementations of the accelerator can achieve an accuracy of 98.23% on the test with the MNIST dataset.
[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. doi: 10.1109/5.726791.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, Jun. 2017. doi: 10.1145/3065386.
[3] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, Sep. 2014.
[4] MNIST handwritten digit database, URL: http://yann.lecun.com/exdb/mnist/.
[5] Quantization, URL: https://pytorch.org/docs/stable/quantization.html.
[6] PyTorch, URL: https://pytorch.org/.
[7] J. Qiu et al., "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp. 26-35, Feb. 2016. doi: 10.1145/2847263.2847265.
[8] D. G. Bailey, Design for Embedded Image Processing on FPGAs. Wiley-IEEE Press, Jun. 2011.
[9] Vivado, URL: https://www.xilinx.com/products/design-tools/vivado.html.
[10] Y. Zhou, W. Wang, and X. Huang, "FPGA design for PCANet deep learning network," in Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, BC, pp. 232-232, 2-6 May 2015. doi: 10.1109/FCCM.2015.45.
[11] Z. Yongmei and J. Jingfei, "An FPGA-based accelerator implementation for deep convolutional neural networks," in Proceedings of the 2015 4th International Conference on Computer Science and Network Technology (ICCSNT 2015), Harbin, China, pp. 829-832, 19-20 Dec. 2015, vol. 01. doi: 10.1109/ICCSNT.2015.7490869.
[12] A. Kyriakos, V. Kitsakis, A. Louropoulos, E. A. Papatheofanous, I. Patronas, and D. Reisis, "High performance accelerator for cnn applications," in Proceedings of the 2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS 2019), Rhodes, Greece, pp. 135-140, 1-3 Jul. 2019. doi: 10.1109/PATMOS.2019.8862166.
[13] S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," in Proceedings of the 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS 2016), Tehran, Iran, pp. 1-6, 14-15 Dec. 2016. doi: 10.1109/ICSPIS.2016.7869873.
[14] C. Chung, Y. Lee, and H. Zhang, "Design of a DBN hardware accelerator for handwritten digit recognitions," in Proceedings of the 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW 2019), Yilan, Taiwan, pp. 1-2, 20-22 May 2019. doi: 10.1109/ICCE-TW46550.2019.8991890.
[15] 鄭又瑄,設計與實現一個植基於 FPGA 的列固定資料流 CNN 加速器,碩士論文-國立台灣科技大學電子工程系,2020年。
[16] 張薏萱,設計與實現一個植基於 FPGA 之手寫數字辨識系統之 CNN 加速器,碩士論文-國立台灣科技大學電子工程系,2020年。