簡易檢索 / 詳目顯示

研究生: 陳鈺憲
Yu-Xian Chen
論文名稱: 一種應用於卷積神經網路之針對流通量優化的通道導向運算單元陣列
A Throughput-optimized Channel-oriented Processing Element Array for Convolutional Neural Networks
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 阮聖彰
Shanq-Jang Ruan
沈中安
Chung-An Shen
李佩君
Pei-Jun Lee
蔡宗漢
Tsung-Han Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 70
中文關鍵詞: 通道導向資料格式卷積神經網路卷積神經網路加速器處理單元陣列處理單元利用率
外文關鍵詞: Channel-oriented data pattern, Convolutional neural networks (CNNs), CNN accelerators, Processing element (PE), PE utilization
相關次數: 點閱:353下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近十年來,深度學習領域已經取得了重大的發展。深度學習的一個分支,最先進的卷積神經網絡(CNNs)更是逐漸應用於各種領域例如圖像分類,語音辨識及自然語言處理。由於卷積神經網路的高運算複雜度,許多研究已提出卷積神經網路加速器來解決上述問題。其中,最近有幾項研究專門設計加速器中的處理單元(PE)陣列來達到高效能及高流通量。本文提出了一種基於通道導向資料格式的流通量優化之卷積神經網路處理單元陣列。所提出陣列中的處理單元以完全互連方式以達到可擴充性,並通過通道導向的資料格式使處理單元陣列能運算任何大小的濾波器同時使處理單元利用率最大化。與先前研究相比,在AlexNet和VGG-16的神經網路下,流通量密度分別提高了1.22倍和1.25倍。


    Over the past decade, significant developments have taken place in the field of deep learning. State-of-the-art convolutional neural networks (CNNs), a branch of deep learning, have been increasingly applied in various fields such as image classification, speech recognition, and natural language processing. Due to the high computational complexity of CNNs, lots of works have been proposed their CNN accelerators to address this issue. Specifically, several works have recently focused on the specialized design of a processing element (PE) array in CNN accelerators for energy efficiency and high throughput. In this thesis, a throughput-optimized PE array for CNNs based on the channel-oriented data pattern is proposed. The proposed PE array features fully PE interconnection which achieves scalability. Besides, any sized convolution can be processed in the PE array while maximizing the utilization of PEs by exploiting the channel-oriented data pattern. Compared with the previous works, the proposed work achieves 1.22× and 1.25× improvement in the throughput density on AlexNet and VGG-16 respectively.

    Chapter 1 Introduction Chapter 2 Background Chapter 3 Related Works Chapter 4 Mapping Strategy Chapter 5 Proposed Architecture Chapter 6 Evaluation Chapter 7 Conclusion

    [1] J. Jiang, X. Feng, F. Liu, Y. Xu and H. Huang, "Multi-Spectral RGB-NIR Image Classification Using Double-Channel CNN," in IEEE Access, vol. 7, pp. 20607-20613, 2019.
    [2] J. Feng et al., "CNN-Based Multilayer Spatial–Spectral Feature Fusion and Sample Augmentation With Local and Nonlocal Constraints for Hyperspectral Image Classification," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 4, pp. 1299-1313, April 2019.
    [3] J. Sun, W. Guo, Z. Chen and Y. Song, "Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 7285-7289.
    [4] S. Park, J. Song and Y. Kim, "A Neural Language Model for Multi-Dimensional Textual Data based on CNN-LSTM Network," 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Busan, 2018, pp. 212-217.
    [5] Y. Chen, T. Krishna, J. S. Emer and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017.
    [6] A. Aimar et al., "NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps," in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 3, pp. 644-656, March 2019.
    [7] S. Han et al., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, 2016, pp. 243-254.
    [8] X. Wei et al., "Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs," 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, 2017, pp. 1-6.
    [9] N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, 2017, pp. 1-12.
    [10] Y. Choi, D. Bae, J. Sim, S. Choi, M. Kim and L. Kim, "Energy-Efficient Design of Processing Element for Convolutional Neural Network," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 64, no. 11, pp. 1332-1336, Nov. 2017.
    [11] J. Lee, C. Kim, S. Kang, D. Shin, S. Kim and H. Yoo, "UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision," in IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 173-185, Jan. 2019.
    [12] H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar and T. Krishna, “Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach,” arXiv preprint arXiv: 1805.02566, 2018.
    [13] F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu and S. Wei, "Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 8, pp. 2220-2233, Aug. 2017.
    [14] Y. Chen, T. Yang, J. Emer and V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292-308, June 2019.
    [15] L. Du et al., "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 1, pp. 198-208, Jan. 2018.
    [16] J. Yue et al., "A 3.77TOPS/W Convolutional Neural Network Processor With Priority-Driven Kernel Optimization," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 66, no. 2, pp. 277-281, Feb. 2019.
    [17] M. Peemen, A. A. A. Setio, B. Mesman and H. Corporaal, "Memory-centric accelerator design for Convolutional Neural Networks," 2013 IEEE 31st International Conference on Computer Design (ICCD), Asheville, NC, 2013, pp. 13-19.
    [18] C. Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan and Jason Cong, "Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks," 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Austin, TX, 2016, pp. 1-8.
    [19] Kung, "Why systolic architectures?," in Computer, vol. 15, no. 1, pp. 37-46, Jan. 1982.
    [20] W. Li, S. Ruan and D. Yang, "Implementation of energy-efficient fast convolution algorithm for deep convolutional neural networks based on FPGA," in Electronics Letters, vol. 56, no. 10, pp. 485-488, 2020.
    [21] J. Yu et al., "Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA," 2017 International Conference on Field Programmable Technology (ICFPT), Melbourne, VIC, 2017, pp. 227-230.
    [22] A. Vasudevan, A. Anderson and D. Gregg, "Parallel Multi Channel convolution using General Matrix Multiplication," 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Seattle, WA, 2017, pp. 19-24.
    [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), vol. 1, Red Hook, NY, USA, 2012, pp. 1097–1105.
    [24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [25] C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9.
    [26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 2818-2826.
    [27] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778.
    [28] A. G. Howard et al., " MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017.
    [29] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 4510-4520.
    [30] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814.
    [31] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
    [32] M. Shoukry, F. Gebali and P. Agathoklis, "Decimator systolic arrays design space exploration for multirate signal processing applications," in IET Circuits, Devices & Systems, vol. 13, no. 8, pp. 1232-1240, 11 2019.
    [33] C. W. Chiou, Y. S. Sun, C. M. Lee and J. Y. Liou, "Low-complexity unidirectional systolic Dickson basis multiplier for lightweight cryptosystems," in Electronics Letters, vol. 55, no. 1, pp. 28-30, 10 01 2019.
    [34] M. Sasikumar, K. N. Sreehari and R. Bhakthavatchalu, "Systolic Array Implementation of Mix Column and Inverse Mix Column of AES," 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 2019, pp. 0730-0734.
    [35] A. C. Jacob, J. D. Buhler and R. D. Chamberlain, "Design of throughput-optimized arrays from recurrence abstractions," ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, Rennes, 2010, pp. 133-140.
    [36] Y. Huan, J. Xu, L. Zheng, H. Tenhunen and Z. Zou, "A 3D Tiled Low Power Accelerator for Convolutional Neural Network," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, 2018, pp. 1-5.
    [37] Y. Wang, Y. Wang, H. Li, C. Shi and X. Li, "Systolic Cube: A Spatial 3D CNN Accelerator Architecture for Low Power Video Analysis," 2019 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 2019, pp. 1-6.
    [38] S. Wang, D. Zhou, X. Han and T. Yoshimura, "Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, Lausanne, 2017, pp. 1032-1037.
    [39] F. Conti, L. Cavigelli, G. Paulin, I. Susmelj and L. Benini, "Chipmunk: A systolically scalable 0.9 mm2, 3.08Gop/s/mW @ 1.2 mW accelerator for near-sensor recurrent neural network inference," 2018 IEEE Custom Integrated Circuits Conference (CICC), San Diego, CA, 2018, pp. 1-4.
    [40] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
    [41] R. Andri, L. Cavigelli, D. Rossi and L. Benini, "Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 309-322, June 2019.
    [42] C. Shu, W. Pang, H. Liu and S. Lu, "High Energy Efficiency FPGA-Based Accelerator for Convolutional Neural Networks Using Weight Combination," 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 2019, pp. 578-582.
    [43] X. Hu, Y. Zeng, Z. Li, X. Zheng, S. Cai and X. Xiong, "A Resources-Efficient Configurable Accelerator for Deep Convolutional Neural Networks," in IEEE Access, vol. 7, pp. 72113-72124, 2019.

    無法下載圖示 全文公開日期 2025/08/13 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE