簡易檢索 / 詳目顯示

研究生: 劉惠文
Hui-Wen Liu
論文名稱: 基於輕量化神經網路模型之可調式硬體加速器設計與實現
The Design and Implementation of a Configurable Hardware Accelerator Based on the Lightweight Neural Network
指導教授: 沈中安
Chung-An Shen
口試委員: 吳晉賢
Chin-Hsien Wu
林昌鴻
Chang-Hong Lin
陳永耀
Yung-Yao Chen
沈中安
Chung-An Shen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 36
中文關鍵詞: 卷積神經網絡深度可分離卷積特殊應用積體電路加速器高吞吐量低複雜度
外文關鍵詞: Convolutional Neural Network (CNN), Depthwise Separable Convolution (DSC), Application Specific Integrated Circuit (ASIC) Accelerator, High Throughput, Low Complexity
相關次數: 點閱:227下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文提出了一種基於MobileNetV3的可調式硬體加速器,MobileNetV3是具有深度可分離卷積的輕量級神經網絡。據我們所知,這是MobileNetV3的第一個操作流程和硬體加速器的架構設計,而MobileNetV3是在最近輕量級卷積神經網絡中有最高準確度。根據對準確度和每秒幀數(FPS)的不同要求,可以將提出的加速器架構配置為具有不同寬度乘數的MobileNetV3的Large和Small模型。在這項工作中,針對MobileNetV3中的深度可分離卷積和反向殘差的結構特徵,我們提出了操作流程可以有效地提高吞吐量。另外,根據提出的操作流程,加速器中的處理元件架構可以共享必要的暫存器,以實現較低的硬件複雜度。所提出的處理元件陣列可以處理具有不同卷積層的各種形狀的卷積,並實現最高的並行化和資料重用。最後,採用台積電90nm技術對所實現的電路進行了合成,並基於合成後的估計進行了性能和硬體複雜度的評估。實驗結果表明,針對MobileNetV3-Large,我們的體系結構可以達到197.7 FPS,硬體複雜度為 5392 KGEs。與基於MobileNet的最先進的加速器相比,該架構不僅具有1.75倍FPS,而且具有89.6%的硬件複雜性。


    This work proposes a configurable hardware accelerator based on MobileNetV3, a lightweight neural network with depthwise separable convolution. To the best of our knowledge, this is the first operation flow and architecture design of hardware accelerator for MobileNetV3, the highest accuracy of lightweight convolutional neural networks in recent. The proposed accelerator architecture can be configured as a large and small model of MobileNetV3 with different width multipliers according to different requirements for accuracy and frames per second (FPS). In this work, for the structural features of depthwise separable convolution and inverted residual in MobileNetV3, we propose the operation flow can effectively improve throughput. In addition, according to the proposed operation flow, the processing element architecture in the accelerator can share the necessary registers to achieve low hardware complexity. The proposed processing element array can process convolutions of various shapes with different convolution layers and achieve the highest parallelization and data reuse. Finally, the implemented circuit is synthesized with TSMC 90nm technology and the evaluations for the performance and area complexity are conducted based on the post-synthesized estimations. Experimental results show that our architecture can reach 197.7 FPS with the hardware complexity of 5392 KGEs for MobileNetV3-Large. Compared to the state-of-the-art accelerator based on MobileNet, the proposed architecture is not only 1.75× FPS but also 89.6% of hardware complexity.

    摘要 I Abstract II Table of Contents III Figures IV Tables V I. Introduction 1 II. Background 4 2.1 An Overview of Convolutional Neural Network 4 2.2 The CNN Model and Depthwise Separable Convolution 7 2.3 The Structure of MobileNetV3 8 2.4 Related Works 11 2.4.1 Accelerator Based on Conventional Convolution 12 2.4.2 Accelerator Based on Depthwise Separable Convolution 13 III. The Proposed Convolution Operation Flow and Timing Schedules 16 3.1 Analysis of Operation Flow for Related Work 16 3.2 Proposed Operation Flow 18 IV. The Proposed Architecture for Accelerator based on MobileNetV3 21 4.1 The Overview of the Accelerator 21 4.2 Architecture and Operation of Processing Element 22 4.2.1 Operation of Processing Element 24 4.3 Memory Module and Switching Module 27 V. Experimental Results and Comparisons 29 5.1 Implementation Results 29 5.2 Comparisons with Related Literature 30 VI. Conclusion 33 References 34

    [1] J. Li, X. Liang, S. Shen, T. Xu, J. Feng and S. Yan, "Scale-Aware Fast R-CNN for Pedestrian Detection," IEEE Transactions on Multimedia, vol. 20, no. 4, pp. 985-996, Apr. 2018.
    [2] Y. Yang, H. Luo, H. Xu and F. Wu, "Towards Real-Time Traffic Sign Detection and Classification.", IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 7, pp.2022-2031, July 2016.
    [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton., "ImageNet Classification with Deep Convolutional Neural Networks.", Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp.1097-1105, Dec. 2012.
    [4] K. Simonyan and A. Zisserman., "Very Deep Convolutional Networks for Large-Scale Image Recognition.", arXiv:1409.1556, Apr. 2015.
    [5] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally and K. Keutzer, "SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size.", arXiv:1602.07360, Nov. 2016.
    [6] X. Zhang, X. Zhou, M. Lin and J. Sun, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," IEEE/CVF Conference on Computer Vision and Pattern Recognition, Dec. 2018.
    [7] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications.", arXiv:1704.04861, Apr. 2017.
    [8] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, June 2018.
    [9] A. Howard et al., "Searching for MobileNetV3.", 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314-1324, 2019.
    [10] L. Sifre, and P. S. Mallat., "Rigid-Motion Scattering for Image Classification. ", Ph. D. thesis, 2014.
    [11] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng, "Quantized Convolutional Neural Networks for Mobile Devices.", CVPR, 2016.
    [12] J. Wu, C. Leng, Y. Wang, Q. Hu and J. Cheng, "Quantized Convolutional Neural Networks for Mobile Devices," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4820-4828, June 2016.
    [13] S. Han, H. Mao and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning Trained Quantization and Huffman Coding.", International Conference on Learning Representations, 2016.
    [14] A. Ardakani, C. Condo, M. Ahmadi and W. J. Gross, "An Architecture to Accelerate Convolution in Deep Neural Networks," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 4, pp. 1349-1362, Apr. 2018.
    [15] G. Desoli et al., "14.1 A 2.9 TOPS/W Deep Convolutional Neural Network SoC in FD-SOI 28nm for Intelligent Embedded Systems.", 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 238-239, Feb. 2017.
    [16] Y. Chen, T. Krishna, J. S. Emer and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.", IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp.127-138, Jan. 2017.
    [17] L. Bai, Y. Zhao and X. Huang, "A CNN Accelerator on FPGA Using Depthwise Separable Convolution," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 10, pp. 1415-1419, Oct. 2018.
    [18] W. Chen, Z. Wang, S. Li, Z. Yu and H. Li, "Accelerating Compact Convolutional Neural Networks with Multi-threaded Data Streaming.", 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 519-522, 2019.
    [19] Y. Chen, T. Yang, J. Emer and V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292-308, June 2019.
    [20] H. Lin, "The Efficient VLSI Design and Implementation of Neural Networks Based on Depthwise Separable Convolution.", Master's Thesis of Department of Electronic and Computer Engineering National Taiwan University of Science and Technology, 2018.
    [21] K. T. Malladi, F. A. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis and M. Horowitz, "Towards energy-proportional datacenter memory with mobile DRAM," Annual International Symposium on Computer Architecture (ISCA), pp. 37-48, 2012.
    [22] M. Tan and Q. V. Le. "Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks.", arXiv:1905.11946, Nov. 2019.
    [23] J. Cheng, J. Wu, C. Leng, Y. Wang and Q. Hu, "Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp. 4730-4743, Oct. 2018.
    [24] E. Park, D. Kim and S. Yoo, "Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation," 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 688-698, June 2018.

    無法下載圖示 全文公開日期 2025/08/20 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE