Basic Search / Detailed Display

Author: 李文傑
WEN-JIE LI
Thesis Title: 基於FPGA之深度卷積神經網路之高能效威諾格拉德一維最小化濾波器演算法
Energy-Efficient One-Dimension Winograd’s Minimal Filter Algorithm for Deep Convolutional Neural Networks based on FPGA
Advisor: 阮聖彰
Shanq-Jang Ruan
Committee: 阮聖彰
Shanq-Jang Ruan
林昌鴻
Chang Hong Lin
林淵翔
Yuan-Hsiang Lin
劉一宇
Yi-Yu Liu
Degree: 碩士
Master
Department: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
Thesis Publication Year: 2019
Graduation Academic Year: 107
Language: 英文
Pages: 41
Keywords (in Chinese): 卷積神經網路硬體加速器空間架構行固定資料流威諾格拉德演算法現場可程式邏輯閘陣列
Keywords (in other languages): Convolutional Neural Networks, Hardware Accelerator, Spatial Architecture, Row Stationary Dataflow, Winograd’s Algorithm, FPGA
Reference times: Clicks: 423Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

近期,最先進的卷積神經網路 (CNNs) 已經被廣泛應用於許多深度神經網路 (DNNs) 的模型。隨著深度神經網路的模型變得越來越準確,神經網路的計算量與需求的資料頻寬也明顯增加。我們針對以上問題,實現基於現場可程式邏輯閘陣列之卷積神經網路之硬體加速器。在本篇論文中,我們提出一種基於行固定資料流且具有空間特性的晶片網路架構 (NoCs),並使用快速卷積演算法,同時減少乘法計算量與資料頻寬。經過實驗結果指出,在批量大小為三的 VGG-16 卷積神經網路模型下,本論文所提出的設計與其他相關研究相比,總計算量能降低1.497倍,內部記憶體降低1.07倍的資料頻寬,並且外部記憶體降低1.46倍的資料頻寬。


The state-of-the-art convolutional neural networks (CNNs) have been widely applied to many deep neural networks (DNNs) models. As the model becomes more accurate, both the number of computation and the data bandwidth are significantly increased. This paper presents the design and implementation of CNN accelerator based on FPGA. The proposed design uses the row stationary with the NoC and the fast convolution algorithm in process elements to reduce the number of computation and data bandwidth simultaneously. The experimental result which using the CNN layers of VGG-16 with a batch size of three shows that the proposed design is more energy efficient than the state-of-the-art work. The proposed design improves the total GOPs of the algorithm by 1.50 times and reduces the on-chip memory and off-chip memory bandwidth by 1.07 and 1.46 times than prior work respectively.

摘 要 ABSTRACT ACKNOWLEDGEMENTS TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF ALGORITHMS CHAPTER 1 INTRODUCTION 1.1 Introduction of Deep Neural Networks 1.2 Challenges of Existing Works 1.3 Contributions 1.4 Organization of This Thesis CHAPTER 2 BACKGROUND 2.1 Convolutional Neural Networks CHAPTER 3 RELATED WORKS 3.1 Data Quantization 3.2 Winograd’s Algorithm 3.3 Dataflow Topology CHAPTER 4 ARCHITECTURE 4.1 Overview 4.2 Processing Element and Algorithm 4.3 Network on Chip and Dataflow CHAPTER 5 EXPERIMENTAL RESULTS 5.1 Resource Usage 5.2 Performance Evaluation CHAPTER 6 CONCLUSION REFERENCE

[1] Long, Jonathan and Shelhamer, Evan and Darrell, Trevor, Fully convolutional networks for semantic segmentation, IEEE Computer Vision and Pattern Recognition (2015) 3431-3440.
[2] Zhang, Kaipeng and Zhang, Zhanpeng and Li, Zhifeng and Qiao, Yu, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Processing Letters (2016) 1499-1503.
[3] Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E, Imagenet classification with deep convolutional neural networks, Neural Information Processing Systems (2012) 1097-1105.
[4] Szegedy, Christian and Liu, Wei and Jia, Yangqing and Sermanet, Pierre and Reed, Scott and Anguelov, Dragomir and Erhan, Dumitru and Vanhoucke, Vincent and Rabinovich, Andrew, IEEE Computer Vision and Pattern Recognition (2015) 1-9.
[5] Mathieu, Michael and Henaff, Mikael and LeCun, Yann, Fast training of convolutional networks through ffts, International Conference on Learning Representations (2013).
[6] Zlateski, Aleksandar and Jia, Zhen and Li, Kai and Durand, Fredo, A Deeper Look at FFT and Winograd Convolutions, Conference on Systems and Machine Learning (2018).
[7] Tahmid Abtahi and Amey M. Kulkarni and Tinoosh Mohsenin, Accelerating convolutional neural network with FFT on tiny cores, IEEE International Symposium on Circuits and Systems (2017) 1-4.
[8] Simonyan, Karen and Zisserman, Andrew, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (2014).
[9] Dai, Jifeng and Li, Yi and He, Kaiming and Sun, Jian, R-fcn: Object detection via region-based fully convolutional networks, Neural Information Processing Systems (2016) 379-387.
[10] Christian Szegedy and Vincent Vanhoucke and Sergey Ioffe and Jonathon Shlens and Zbigniew Wojna, Rethinking the Inception Architecture for Computer Vision, IEEE Computer Vision and Pattern Recognition (2015) 2818-2826.
[11] Szegedy, Christian and Ioffe, Sergey and Vanhoucke, Vincent and Alemi, Alexander A, Inception-v4, inception-resnet and the impact of residual connections on learning, Association for the Advancement of Artificial Intelligence (2017).
[12] Chen, Yu-Hsin and Emer, Joel and Sze, Vivienne, Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks, International Symposium on Computer Architecture (2016) 367-379.
[13] Verhelst, Marian and Moons, Bert, Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to IoT and edge devices, IEEE Solid-State Circuits Magazine (2017) 55-65.
[14] Lukas Cavigelli and Luca Benini, Origami: A 803-GOp/s/W Convolutional Network Accelerator, IEEE Transactions on Circuits and Systems for Video Technology (2016) 2461-2475.
[15] Du, Li and Du, Yuan and Li, Yilei and Su, Junjie and Kuan, Yen-Cheng and Liu, Chun-Chen and Chang, Mau-Chung Frank, A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things, IEEE Transactions on Circuits and Systems (2017) 198-208.
[16] Peemen, Maurice and Setio, Arnaud AA and Mesman, Bart and Corporaal, Henk, Memory-centric accelerator design for convolutional neural networks, International Conference on Computer Design (2013) 13-19.
[17] Du, Zidong and Fasthuber, Robert and Chen, Tianshi and Ienne, Paolo and Li, Ling and Luo, Tao and Feng, Xiaobing and Chen, Yunji and Temam, Olivier, ShiDianNao: Shifting vision processing closer to the sensor, International Symposium on Computer Architecture (2015) 29-104.
[18] Albericio, Jorge and Judd, Patrick and Hetherington, Tayler and Aamodt, Tor and Jerger, Natalie Enright and Moshovos, Andreas, Cnvlutin: Ineffectual-neuron-free deep neural network computing, International Symposium on Computer Architecture (2016) 1-13.
[19] Judd, Patrick and Delmas, Alberto and Sharify, Sayeh and Moshovos, Andreas, Cnvlutin2: Ineffectual-activation-and-weight-free deep neural network computing (2017), [online] Available: https://arxiv.org/abs/1705.00125.
[20] Chen, Tianshi and Du, Zidong and Sun, Ninghui and Wang, Jia and Wu, Chengyong and Chen, Yunji and Temam, Olivier, Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning, Architectural Support for Programming Languages and Operating Systems (2014).
[21] Yunji Chen and Tao Luo and Shaoli Liu and Shijin Zhang and Liqiang He and Jia Wang and Ling Li and Tianshi Chen and Zhiwei Xu and Ninghui Sun and Olivier Temam, DaDianNao: A Machine-Learning Supercomputer, IEEE/ACM International Symposium on Microarchitecture (2014) 609-622.
[22] Yu-Hsin Chen and Tien-Ju Yang and Joel S. Emer and Vivienne Sze, Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices, IEEE Emerging and Selected Topics in Circuits and Systems (2018) 292-308.
[23] Moons, Bert and Verhelst, Marian, An energy-efficient precision-scalable ConvNet processor in 40-nm CMOS, IEEE Solid-State Circuits (2016) 903-914.
[24] Han, Song and Mao, Huizi and Dally, William J, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, International Conference on Learning Representations (2015).
[25] Song Han and Xingyu Liu and Huizi Mao and Jing Pu and Ardavan Pedram and Mark Horowitz and William J. Dally, EIE: Efficient Inference Engine on Compressed Deep Neural Network, International Symposium on Computer Architecture (2016) 243-254.
[26] Andrew Lavin and Scott Gray, Fast Algorithms for Convolutional Neural Networks, IEEE Computer Vision and Pattern Recognition (2015) 4013-4021.
[27] Zlateski, Aleksandar and Jia, Zhen and Li, Kai and Durand, Fredo, A Deeper Look at FFT and Winograd Convolutions, Systems and Machine Learning (2018).
[28] Vincent, Kevin and Stephano, Kevin and Frumkin, Michael and Ginsburg, Boris and Demouth, Julien, On improving the numerical stability of winograd convolutions, International Conference on Learning Representations (2017).
[29] Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel S, Efficient processing of deep neural networks: A tutorial and survey, Proceedings of the IEEE (2017) 2295–2329.
[30] Yu-Hsin Chen and Tushar Krishna and Joel S. Emer and Vivienne Sze, Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, IEEE Solid-State Circuits (2016) 127-138.

無法下載圖示 Full text public date 2024/08/15 (Intranet public)
Full text public date This full text is not authorized to be published. (Internet public)
Full text public date This full text is not authorized to be published. (National library)
QR CODE