研究生: |
陳泓銘 Hong-Ming Chen |
---|---|
論文名稱: |
基於CIFAR-10資料集之CNN模型推斷硬體加速器 A Hardware Accelerator for CNN Model Inference on CIFAR-10 Dataset |
指導教授: |
林銘波
Ming-Bo Lin |
口試委員: |
林書彥
蔡政鴻 陳郁堂 |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2020 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 70 |
中文關鍵詞: | 卷積神經網路 、硬體加速器 、平行化運算 |
外文關鍵詞: | Hardware Accelerator |
相關次數: | 點閱:350 下載:30 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
人工智慧的應用在各個產業上快速地發展,意味著將會有更龐大的資料量需要更準確且即時的分析,其伴隨而來的不只是需要在類神經網路模型演算法的進步,更代表著要有強大的計算平台來處理更加繁複的運算量,因此各類型的硬體加速器,如GPU和ASIC應運而生。除此之外FPGA更是因擁有極高的平行化能力、易於修改的彈性,和較好的功率消耗,成為實現神經網路的絕佳選擇[1]。本篇論文將根據CIFAR-10資料集在FPGA上建立卷積神經網路。
在本論文中,我們將分析卷積運算平行化的可行性,並探討實現電路的架構比較。另外,我們也針對管線化設計之於卷積運算的方式,進行不同平行化程度的調整。為了提高整體的產量及硬體利用率,我們開發了三個版本的設計以利於在效能上的比較,分別為未管線化設計、管線化設計以及雙緩衝架構設計。
最後,本論文所提出的三個CNN模型推斷硬體加速器版本已在Xilinx Virtex6的xc6vlx195t-3ff784系列器件上實現。所使用的LUT資源分別為37,511、37,353和78,923個,而所使用的暫存器資源分別為10,795、20,910和21,869個,並各自操作在最大工作頻率47 MHz、143 MHz以及100 MHz。
With the rapid development of all kinds of industrial applications on AI (Artificial Intelligence), the requirements for more accurate and more immediate analysis on batches of data have emerged. It comes not only on the demand of evolvement on the neural networks’ algorithm, but also on the stronger computing platform to process such complex tasks. Therefore, many kinds of hardware accelerators based on GPUs, ASICs and FPGAs have been proposed in the last decade. Among these, due to FPGAs’ abilities of highly parallelism, configurability, and power efficiency, they become the appropriate choices for establishing neural networks[1]. Therefore, in this thesis we propose an FPGA-based CNN with CIFAR-10 dataset.
In our thesis, we will analyze the possibility of parallelism in convolution, and disscuss the related architectures. Additionally, according to the pipelined design consideration, we also customize the degree of parallelism for each convolutional layer. To increase the overall throughput and the hardware usability, we implement three design versions, which is used for performance comparison. The three design versions are: unpipelined、pipelined design and double buffering design.
Finally, the three proposed Hardware Accelerator for CNN Model Inference architectures have been implemented on Xilinx Virtex6 xc6vlx195t-3ff784 device. The numbers of LUTs used are: 37,511、37,353 and 78,923; the numbers of registers used are: 10,795、20,910 and 21,869. They can operate up to frequencies of 47 MHz、143 MHz and 100 MHz independently.
[1] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient Processing of Deep Neural Networks: A tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12 pp. 2295-2329, Dec. 2017.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211-252, Jan. 2015.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of Adv. Neural Inf. Process. Syst., vol. 25, pp. 1097-110, Dec. 2012.
[4] S. Ioffe, and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, Mar. 2015.
[5] R. Krishnamoorthi, “Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper,” arXiv:1806.08342, Jun. 2018.
[6] M. Lin, Q. Chen, and S. Yan, “Network in Network,” arXiv:1312.4400, Mar 2014
[7] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A Dynamically Configurable Coprocessor for Convolutional Neural Networks,” ACM SIGARCH Computer Architecture News, vol. 38, no. 3, pp. 247-257. ACM, Jun. 2010.
[8] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey California, USA, pp. 161-170, Feb. 2015.
[9] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey California, USA, pp. 26-35, Feb. 2016.
[10] CIFAR-10 Dataset
https://www.cs.toronto.edu/~kriz/cifar.html
[11] Keras Official Site
https://keras.io/
[12] Compact CNN Accelerator IP Core - Lattice Semiconductor
https://www.latticesemi.com/en/Products/DesignSoftwareAndIP/IntellectualProperty/IPCore/IPCores04/compactcnn
[13] D. G. Gailey, Design for Embedded Image Processing on FPGAs, Wiley-IEEE Press, 2011.
[14] Altera DE2-115 User Manual - Intel
https://www.terasic.com.tw/