簡易檢索 / 詳目顯示

研究生: 陳泓銘
Hong-Ming Chen
論文名稱: 基於CIFAR-10資料集之CNN模型推斷硬體加速器
A Hardware Accelerator for CNN Model Inference on CIFAR-10 Dataset
指導教授: 林銘波
Ming-Bo Lin
口試委員: 林書彥
蔡政鴻
陳郁堂
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 中文
論文頁數: 70
中文關鍵詞: 卷積神經網路硬體加速器平行化運算
外文關鍵詞: Hardware Accelerator
相關次數: 點閱:350下載:30
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人工智慧的應用在各個產業上快速地發展,意味著將會有更龐大的資料量需要更準確且即時的分析,其伴隨而來的不只是需要在類神經網路模型演算法的進步,更代表著要有強大的計算平台來處理更加繁複的運算量,因此各類型的硬體加速器,如GPU和ASIC應運而生。除此之外FPGA更是因擁有極高的平行化能力、易於修改的彈性,和較好的功率消耗,成為實現神經網路的絕佳選擇[1]。本篇論文將根據CIFAR-10資料集在FPGA上建立卷積神經網路。
    在本論文中,我們將分析卷積運算平行化的可行性,並探討實現電路的架構比較。另外,我們也針對管線化設計之於卷積運算的方式,進行不同平行化程度的調整。為了提高整體的產量及硬體利用率,我們開發了三個版本的設計以利於在效能上的比較,分別為未管線化設計、管線化設計以及雙緩衝架構設計。
    最後,本論文所提出的三個CNN模型推斷硬體加速器版本已在Xilinx Virtex6的xc6vlx195t-3ff784系列器件上實現。所使用的LUT資源分別為37,511、37,353和78,923個,而所使用的暫存器資源分別為10,795、20,910和21,869個,並各自操作在最大工作頻率47 MHz、143 MHz以及100 MHz。


    With the rapid development of all kinds of industrial applications on AI (Artificial Intelligence), the requirements for more accurate and more immediate analysis on batches of data have emerged. It comes not only on the demand of evolvement on the neural networks’ algorithm, but also on the stronger computing platform to process such complex tasks. Therefore, many kinds of hardware accelerators based on GPUs, ASICs and FPGAs have been proposed in the last decade. Among these, due to FPGAs’ abilities of highly parallelism, configurability, and power efficiency, they become the appropriate choices for establishing neural networks[1]. Therefore, in this thesis we propose an FPGA-based CNN with CIFAR-10 dataset.
    In our thesis, we will analyze the possibility of parallelism in convolution, and disscuss the related architectures. Additionally, according to the pipelined design consideration, we also customize the degree of parallelism for each convolutional layer. To increase the overall throughput and the hardware usability, we implement three design versions, which is used for performance comparison. The three design versions are: unpipelined、pipelined design and double buffering design.
    Finally, the three proposed Hardware Accelerator for CNN Model Inference architectures have been implemented on Xilinx Virtex6 xc6vlx195t-3ff784 device. The numbers of LUTs used are: 37,511、37,353 and 78,923; the numbers of registers used are: 10,795、20,910 and 21,869. They can operate up to frequencies of 47 MHz、143 MHz and 100 MHz independently.

    摘要 I ABSTRACT II 誌謝 III 目錄 V 圖目錄 VIII 表目錄 XI 第一章 緒論 1 1.1 研究動機 1 1.2 研究方向 1 1.3 章節安排 2 第二章 卷積神經網路 3 2.1 CIFAR-10資料集 3 2.2 基本架構 4 2.2.1 卷積層 5 2.2.2 激活函數層 7 2.2.3 最大池化層 9 2.2.4 Dropout正則化層 10 2.2.5 平坦化層 10 2.2.6 全連接層 11 2.3 訓練模型架構修改 12 2.3.1 填充 12 2.3.2 批次正則化層 13 2.3.3 全域性池化層 14 2.3.4 模型比較 14 2.3.5 最終模型 17 2.4 模型推斷架構 22 2.4.1 Dropout層處理 22 2.4.2 Softmax激活函數層處理 22 2.4.3 8-bit模型推斷 23 第三章 設計分析與考量 24 3.1 卷積運算平行化分析 24 3.2 資料流架構分析 26 3.3 資料流管線化考量 28 第四章 CNN模型推斷硬體加速器架構與分析 30 4.1 整體架構 30 4.2 整體架構 32 4.3 元件架構 34 4.3.1 卷積層運算單元 34 4.3.2 激活函數層運算單元 35 4.3.3 池化層運算單元 35 4.3.4 全連接層運算單元 36 4.3.5 結果比較層運算單元 37 4.4 資料路徑 37 4.4.1 管線化資料路徑 37 4.4.2 雙緩衝架構 40 4.5 控制路徑 42 4.5.1 主控制器 42 4.5.2 層控制器 42 第五章 FPGA設計實現與結果分析 44 5.1 FPGA設計與驗證流程 44 5.2 FPGA設計結果 44 5.3 效能分析與結果 47 5.4 FPGA驗證實現 50 5.4.1 驗證平台 50 5.4.2 驗證系統 51 5.4.3 使用者操作流程 52 5.4.4 驗證結果 53 第六章 結論與未來展望 54 參考文獻 55

    [1] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient Processing of Deep Neural Networks: A tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12 pp. 2295-2329, Dec. 2017.
    [2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211-252, Jan. 2015.
    [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of Adv. Neural Inf. Process. Syst., vol. 25, pp. 1097-110, Dec. 2012.
    [4] S. Ioffe, and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, Mar. 2015.
    [5] R. Krishnamoorthi, “Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper,” arXiv:1806.08342, Jun. 2018.
    [6] M. Lin, Q. Chen, and S. Yan, “Network in Network,” arXiv:1312.4400, Mar 2014
    [7] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A Dynamically Configurable Coprocessor for Convolutional Neural Networks,” ACM SIGARCH Computer Architecture News, vol. 38, no. 3, pp. 247-257. ACM, Jun. 2010.
    [8] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey California, USA, pp. 161-170, Feb. 2015.
    [9] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey California, USA, pp. 26-35, Feb. 2016.
    [10] CIFAR-10 Dataset
    https://www.cs.toronto.edu/~kriz/cifar.html
    [11] Keras Official Site
    https://keras.io/
    [12] Compact CNN Accelerator IP Core - Lattice Semiconductor
    https://www.latticesemi.com/en/Products/DesignSoftwareAndIP/IntellectualProperty/IPCore/IPCores04/compactcnn
    [13] D. G. Gailey, Design for Embedded Image Processing on FPGAs, Wiley-IEEE Press, 2011.
    [14] Altera DE2-115 User Manual - Intel
    https://www.terasic.com.tw/

    QR CODE