簡易檢索 / 詳目顯示

研究生: 楊東昇
Dong-Sheng Yang
論文名稱: 基於快速卷積演算法之針對動態卷積神經網路的統合高效能可調變之乘加器
Unified Energy-Efficient Reconfigurable MAC for Dynamic Convolutional Neural Network Based on Winograd Algorithm
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 李佩君
Pei-Jun Lee
沈中安
Chung-An Shen
蔡宗漢
Tsung-Han Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 70
中文關鍵詞: 卷積神經網路卷積神經網路加速器處理單元動態卷積威諾格拉德演算法
外文關鍵詞: convolutional neural networks, CNN accelerators, processing element, Dynamic convolution, Winograd Algorithm
相關次數: 點閱:337下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來卷積神經網絡(CNNs)已在許多領域中得到廣泛運用,像是臉部偵測、車輛重新識別和語音識別等。隨著卷積神經網路模型越來越龐大,神經網路的運算量需求也迅速增加,為了解決巨大的計算壓力,過去的研究中,已經有提出了許多基於場可編程門陣列(FPGA)或專用集成電路(ASIC)實現的CNN加速器。另外,因為數十億次乘法和累加(MAC)計算在CNNs中的邏輯運算上引起了巨大的能耗,所以先前有許多的研究都集中在MAC單元的設計上,以此來提高整體系統的能源效率。本文提出了一種基於Winograd最小濾波演算法(WMFA)的高效能可調節的MAC單元以及高性能CNN加速器。實驗結果表明,實驗所提出的Winograd處理元件(PE)與基線相比,可以將功率改善8%至12%。另外與之前的CNN加速器相比,DSP的效率也提高了1.48倍至6.82倍。


    Convolutional Neural Networks (CNNs) have been widely utilized in many fields such as face detection, vehicle re-identification, and speech recognition. In order to solve tremendous computation pressure, lots of CNN accelerators have been proposed. Moreover, many works focus on the design of a multiply-and-accumulate (MAC) unit since ten billions of MAC computations induce enormous energy consumption on logical operations in CNNs. In this thesis, an energy-efficient reconfigurable MAC and high performance CNN accelerator based on Winograd minimal filtering algorithm (WMFA) were proposed. Experimental results showed that the proposed Winograd processing element (PE) gives 8% to 12% power improvements. The DSP efficiency was improved in the range of 1.19x–5.46x as well compared with the previous works.

    Chapter 1 Introduction 1.1. Background of the accelerator 1.2. Fast convolution algorithm 1.3. Challenges of previous works 1.4. Contribution of this thesis 1.5. Organization Chapter 2 Backgrounds 2.1. Convolutional Neural Networks 2.2. WMFA Chapter 3 Related work 3.1. CNN accelerators with fast algorithms 3.2. Effective CNN models Chapter 4 Dataflow 4.1. The hybrid output stationary data flow Chapter 5 Architecture 5.1. Hardware architecture overview 5.2. The design of Winograd PE 5.3. The reconfigurable unit for MACs 5.4. The accumulation unit Chapter 6 Evaluation 6.1. Experimental setup 6.2. The analysis for redundant computations 6.3. Power analysis for Winograd PE 6.4. Performance analysis 6.5. Comparison with previous CNN accelerators 6.6. Comparison with designs based on WMFA Chapter 7 Conclusion

    [1] Wenqi Wu, Yingjie Yin, Xingang Wang, and De Xu, “Face Detection With Different Scales Based on Faster R-CNN,” IEEE Transactions on Cybernetics, vol. 49, no. 11, Nov. 2019, pp. 4017-4028.
    [2] Yi Zhou, Li Liu, and Ling Shao, ‘‘Vehicle re-identification by deep hidden multi-view inference,’’ IEEE Trans. Image Process, vol. 27, no. 7, Jul. 2018, pp. 3275–3287.
    [3] Pengxu Jiang, Hongliang Fu, Huawei Tao, Peizhi Lei, and Li Zhao, “Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition,” IEEE Access, vol. 7, Jul. 2019, pp. 90368-90377
    [4] Jaehyeong Sim, Somin Lee, and Lee-Sup Kim, “An Energy-Efficient Deep Convolutional Neural Network Inference Processor With Enhanced Output Stationary Dataflow in 65-nm CMOS,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 1, Jan. 2020, pp. 87-100.
    [5] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, Jan. 2017, pp. 127-138.
    [6] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally, “SCNN: An accelerator for compressed-sparse convolutional neural networks,” ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Jun. 2017, pp. 27-40.
    [7] Yeongjae Choi, Dongmyung Bae, Jaehyeong Sim, Seungkyu Choi, Minhye Kim, and Lee-Sup Kim, “Energy-Efficient Design of Processing Element for Convolutional Neural Network,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 64, no. 11, Nov. 2017, pp. 1332-1336.
    [8] James Garland, and David Gregg, “Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks,” IEEE Computer Architecture Letters, vol. 16, no. 2, Jul. 2017, pp. 132-135.
    [9] Yizhi Wang, Jun Lin, Zhongfeng Wang, “FPAP: A Folded Architecture for Energy-Quality Scalable Convolutional Neural Networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 1, Jan. 2019, pp. 288-301.
    [10] Tahmid Abtahi, Colin Shea, Amey Kulkarni, and Tinoosh Mohsenin, “Accelerating Convolutional Neural Network With FFT on Embedded Hardware,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 9, Sep. 2018, pp. 1737-1749.
    [11] Yun Liang, Liqiang Lu, Qingcheng Xiao, and Shengen Yan, "Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 4, April 2020, pp. 857-870.
    [12] Abhinav Podili, Chi Zhang, and Viktor Prasanna, “Fast and efficient implementation of Convolutional Neural Networks on FPGA,” 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Jul. 2017.
    [13] Juan Yepez, and Seok-Bum Ko, “Stride 2 1-D, 2-D, and 3-D Winograd for Convolutional Neural Networks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Jan. 2020, pp. 1-11.
    [14] Chen Yang, Yizhou Wang, Xiaoli Wang and Li Geng, "WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 9, Sept. 2019, pp. 3480-3493.
    [15] Chun-Ya Tasi, De-Qin Gao, and Shanq-Jang Ruan, “An Effective Hybrid Pruning Architecture of Dynamic Convolution for Surveillance Videos,” Journal of Visual Communication and Image Representation, 2020, doi : https://doi.org/10.1016/j.jvcir.2020.102798.
    [16] Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee, "Performance analysis of CNN frameworks for GPUs," 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr. 2017, pp. 55-64.
    [17] Syed Tahir Hussain Rizvi, Gianpiero Cabodi, and Gianluca Francini, "GPU-only unified ConvMM layer for neural classifiers," 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT), Apr. 2017, pp. 0539-0543
    [18] Rui Xu, Sheng Ma, and Yang Guo, "Performance Analysis of Different Convolution Algorithms in GPU Environment," 2018 IEEE International Conference on Networking, Architecture and Storage (NAS), Oct. 2018, pp. 1-10.
    [19] Dai Rongshi, and Tang Yongming, "Accelerator Implementation of Lenet-5 Convolution Neural Network Based on FPGA with HLS," 2019 3rd International Conference on Circuits, System and Simulation (ICCSS), Jun. 2019, pp. 64-67.
    [20] Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, and Yu-Wing Tai, "Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs," 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2017, pp. 1-6
    [21] Corey Lammie, Alex Olsen, Tony Carrick, and Mostafa Rahimi Azghadi, "Low-Power and High-Speed Deep FPGA Inference Engines for Weed Classification at the Edge," IEEE Access, vol. 7, Apr. 2019, pp. 51171-51184
    [22] Hongxiang Fan et al., "A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA," 2018 International Conference on Field-Programmable Technology (FPT), Dec. 2018, pp. 14-21.
    [23] Wen-Jie Li, Shanq-Jang Ruan, and Dong-Sheng Yang, “Implementation of Energy-Efficient Fast Convolution Algorithm for Deep Convolutional Neural Networks based on FPGA,” Electronic Letters, 2020.
    [24] Emilio Maggio, Elisa Piccardo, Carlo Regazzoni, and Andrea Cavallaro, “Particle PHD Filtering for Multi-Target Visual Tracking,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007, pp. I-1101-I-1104.
    [25] Andrew Lavin and Scott Gray, "Fast Algorithms for Convolutional Neural Networks," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4013-4021.
    [26] Michael Mathieu, Mikael Henaff, and Yann LeCun, “Fast Training of Convolutional Networks through FFTs,” In CoRR, 2013.
    [27] Armin Mehrabian, Mario Miscuglio, Yousra Alkabani, Volker J. Sorger, and Tarek EI-Ghazawi, "A Winograd-Based Integrated Photonics Accelerator for Convolutional Neural Networks," IEEE Journal of Selected Topics in Quantum Electronics, vol. 26, Jan.-Feb. 2020, pp. 1-12.
    [28] Bizhao Shi, Zhucheng Tang, Guojie Luo, and Ming Jiang, “Winograd-Based Real-Time Super-Resolution System on FPGA,” 2019 International Conference on Field-Programmable Technology (ICFPT), Dec. 2019, pp. 423-426.
    [29] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans P. Graf, “Pruning filters for efficient convnet,” International Conference on Learning Representations (ICLR), 2017.
    [30] Haonan Wang, Wenjian Liu, Tianyi Xu, Jun Lin, and Zhongfeng Wang, “A Low-latency Sparse-Winograd Accelerator for Convolutional Neural Networks,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May. 2019, pp.1448-1452
    [31] Shaohui Lin, Rongrong Ji, Yuchao Li, Cheng Deng, and Xuelong Li, “Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 2, Feb. 2020, pp. 574-588
    [32] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
    [33] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848-6856.
    [34] Ping Hu, Gang Wang, and Yap-Peng Tan, “Recurrent Spatial Pyramid CNN for Optical Flow Estimation,” IEEE Transactions on Multimedia, vol. 20, no. 10, Oct. 2018, pp. 2814-2823.
    [35] Yikui Zhai et al., "MF-SarNet: Effective CNN with data augmentation for SAR automatic target recognition," The Journal of Engineering, vol. 2019, no. 19, Oct. 2019, pp. 5813-5818.
    [36] Karnam Silpaja Chandrasekar and Planisamy Geetha, "Highly efficient neoteric histogram–entropy-based rapid and automatic thresholding method for moving vehicles and pedestrians detection," IET Image Processing, vol. 14, no. 2, Feb. 2020, pp. 354-365.
    [37] S. Kala, Babita R. Jose, Jimson Mathew, and S. Nalesh, "High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 12, Dec. 2019, pp. 2816-2828.
    [38] Yu-Hsin Chen, Joel Emer, and Vivienne Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Jun. 2016, pp. 367-379.
    [39] Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong, "FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates," IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Arp. 2017, pp. 152-159.
    [40] Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7, July 2018, pp. 1354-1367.
    [41] Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang, "Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, Jan. 2018, pp. 35-47.

    無法下載圖示 全文公開日期 2025/08/13 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE