簡易檢索 / 詳目顯示

研究生: 韓鼎紘
Ding-Hung Han
論文名稱: ENGINER:用於基於 ReRAM 的可調整式加速器的節能多神經網路推理引擎
ENGINER: Energy-efficient Multiple Neural Network Inference Engine for Configurable ReRAM-based Accelerator
指導教授: 陳雅淑
Ya-Shu Chen
口試委員: 陳雅淑
Ya-Shu Chen
謝仁偉
Jen-Wei Hsieh
吳晉賢
Chin-Hsien Wu
曾學文
Hsueh-Wen Tseng
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 48
中文關鍵詞: 神經網路加速器記憶體內計算電阻式隨機存取記憶體交叉陣列類比-數位轉換器
外文關鍵詞: Neural network acceleration, Processing-in-memory, ReRAM crossbar array, ADC
相關次數: 點閱:235下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電阻式隨機存取記憶體(ReRAM)已經成為一種有潛力的解決方案,通過在記憶體中進行處理的操作來加速深度神經網路(DNN)的推理。然而,在這樣的加速器上實施多個神經網路推理會遇到資源分配和調度問題。在這項研究中,我們首先提出了一種權重複製機制,透過考慮層之間的數據依賴關係和網路之間的資源競爭,為每個網路分配適當的資源,目的為最小化多個網路推理的延遲。為了進一步提升能量效率,我們使用可調整式類比-數位轉換器(ADC),在運行時分割運算單元,以最小化能量消耗。實驗結果表明,提出的方法在各種深度神經網路工作負載中可顯著節省能源並減少延遲,與現有基於 ReRAM 加速器的技術相比,ENGINER 減少了高達99% 的延遲,同時節省了95% 的能源。


    Resistive random-access memory (ReRAM) has emerged as a promising solution to
    accelerate the inference of deep neural networks (DNNs) through processing-in-memory
    (PIM) operations. However, implying multiple neural network inferences on such an accelerator suffers from resource allocation and scheduling issues. In this study, we first propose a weight duplication mechanism to allocate proper resources for each network by considering the data dependency between layers and the resource contention between networks, aiming to minimize the latency of the given multiple network inference. To further achieve energy efficiency, we perform run-time operation splitting using configurable analog-to-digital converters (ADCs) to minimize energy consumption. Experimental results demonstrate that the proposed approach yields substantial energy savings and latency reduction across various deep neural network workloads, ENGINER reduce up to 99% of latency while saving 95% of energy compared to the state-of-the-art ReRAM-based accelerator

    1 Introduction 1 2 Background and Motivation 4 2.1 Background 4 2.1.1 ReRAM-based Accelerator Architecture 4 2.1.2 Data Dependency of Neural Network 6 2.1.3 A Configurable ReRAM-based Accelerator 8 2.2 Motivation 10 2.2.1 Resource Waste from Data Dependency 10 2.2.2 Mutiple Neural Network Challenges 12 3 Energy-Efficient Engine for A Configurable ReRAM-Based PIM Architecture 15 3.1 Mutiple Neural Network with Downscale OU 16 3.2 Adaptive Duplication Engine 20 3.3 OU Adjustment under Duplication 23 4 Performance Evaluation 25 4.1 Experimental Setup 25 4.2 Experimental Results 26 5 Related Work 30 6 Conclusion 34 References 34

    [1] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu,
    R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016.
    [2] B. Li, Y. Wang, and Y. Chen, “Hitm: High-throughput reram-based pim for multi- modal neural networks,” in Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1–7, 2020.
    [3] W.-T. Lin, H.-Y. Cheng, C.-L. Yang, M.-Y. Lin, K. Lien, H.-W. Hu, H.-S. Chang, H.-P. Li, M.-F. Chang, Y.-T. Tsou, et al., “Dl-rsim: A reliability and deployment strategy simulation framework for reram-based cnn accelerators,” ACM Transactions on Embedded Computing Systems (TECS), vol. 21, no. 3, pp. 1–29, 2022.
    [4] Y.-T. Tsou, K.-H. Chen, C.-L. Yang, H.-Y. Cheng, J.-J. Chen, and D.-Y. Tsai, “This is spatem! a spatial-temporal optimization framework for efficient inference on reram- based cnn accelerator,” in 2022 27th Asia and South Pacific Design Automation Con- ference (ASP-DAC), pp. 702–707, IEEE, 2022.
    [5] Y.-L. Zheng, W.-Y. Yang, Y.-S. Chen, and D.-H. Han, “An energy-efficient inference engine for a configurable reram-based neural network accelerator,” IEEE Transac- tions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
    [6] T.-H. Yang, H.-Y. Cheng, C.-L. Yang, I.-C. Tseng, H.-W. Hu, H.-S. Chang, and H.-
    P. Li, “Sparse reram engine: Joint exploration of activation and weight sparsity in compressed neural networks,” in Proceedings of the 46th International Symposium on Computer Architecture, pp. 236–249, 2019.
    [7] M. Saberi, R. Lotfi, K. Mafinezhad, and W. A. Serdijn, “Analysis of power consump- tion and linearity in capacitive digital-to-analog converters used in successive approx- imation adcs,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, no. 8, pp. 1736–1748, 2011.
    [8] A. Azamat, F. Asim, and J. Lee, “Quarry: Quantization-based adc reduction for reram-based deep neural network accelerators,” in 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–7, IEEE, 2021.
    [9] W. Fan, Y. Li, L. Du, L. Li, and Y. Du, “A 3-8bit reconfigurable hybrid adc architecture with successive-approximation and single-slope stages for computing in memory,” in 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 3393– 3397, IEEE, 2022.
    [10] Y. Hu, L. Hu, B. Tang, B. Li, Z. Wu, and X. Liu, “A 100 ks/s 8–10-bit resolution- reconfigurable sar adc for biosensor applications,” Micromachines, vol. 13, no. 11, p. 1909, 2022.
    [11] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness,D. R. Hower, T. Krishna, S. Sardashti, et al., “The gem5 simulator,” ACM SIGARCH computer architecture news, vol. 39, no. 2, pp. 1–7, 2011.
    [12] S. Xu, X. Chen, Y. Wang, Y. Han, X. Qian, and X. Li, “Pimsim: A flexible and detailed processing-in-memory simulator,” IEEE Computer Architecture Letters, vol. 18, no. 1, pp. 6–9, 2018.
    [13] M. Poremba and Y. Xie, “Nvmain: An architectural-level main memory simulator for emerging non-volatile memories,” in IEEE Computer Society Annual Symposium on VLSI, pp. 392–397, IEEE, 2012.
    [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
    [15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van- houcke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
    [17] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram- based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
    [18] L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based acceler- ator for deep learning,” in 2017 IEEE international symposium on high performance computer architecture (HPCA), pp. 541–552, IEEE, 2017.
    [19] T. Chou, W. Tang, J. Botimer, and Z. Zhang, “Cascade: Connecting rrams to extend analog dataflow in an end-to-end in-memory processing paradigm,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 114– 125, 2019.
    [20] Q. Zheng, Z. Wang, Z. Feng, B. Yan, Y. Cai, R. Huang, Y. Chen, C.-L. Yang, and
    H. H. Li, “Lattice: An adc/dac-less reram-based processing-in-memory architecture for accelerating deep convolution neural networks,” in 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6, IEEE, 2020.
    [21] W. Li, P. Xu, Y. Zhao, H. Li, Y. Xie, and Y. Lin, “Timely: Pushing data move- ments and interfaces in pim accelerators towards local and in time domain,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 832–845, IEEE, 2020.
    [22] Y. Zhao, Z. He, N. Jing, X. Liang, and L. Jiang, “Re2pim: A reconfigurable reram- based pim design for variable-sized vector-matrix multiplication,” in Proceedings of the 2021 on Great Lakes Symposium on VLSI, pp. 15–20, 2021.
    [23] W.-H. Chen, K.-X. Li, W.-Y. Lin, K.-H. Hsu, P.-Y. Li, C.-H. Yang, C.-X. Xue, E.-Y. Yang, Y.-K. Chen, Y.-S. Chang, et al., “A 65nm 1mb nonvolatile computing-in- memory reram macro with sub-16ns multiply-and-accumulate for binary dnn ai edge processors,” in 2018 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 494–496, IEEE, 2018.
    [24] C.-Y. Tsai, C.-F. Nien, T.-C. Yu, H.-Y. Yeh, and H.-Y. Cheng, “Repim: Joint exploita- tion of activation and weight repetitions for in-reram dnn acceleration,” in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 589–594, IEEE, 2021.
    [25] Y. Zhang, Z. Jia, H. Du, R. Xue, Z. Shen, and Z. Shao, “A practical highly paralleled reram-based dnn accelerator by reusing weight pattern repetitions,” IEEE Transac- tions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 4, pp. 922–935, 2021.
    [26] Z. Xu, D. Yang, C. Yin, J. Tang, Y. Wang, and G. Xue, “A co-scheduling framework for dnn models on mobile and edge devices with heterogeneous hardware,” IEEE Transactions on Mobile Computing, 2021.
    [27] D. Xu, M. Xu, Q. Wang, S. Wang, Y. Ma, K. Huang, G. Huang, X. Jin, and X. Liu, “Mandheling: Mixed-precision on-device dnn training with dsp offloading,” in Pro- ceedings of the 28th Annual International Conference on Mobile Computing And Net- working, pp. 214–227, 2022.
    [28] W.-Y. Yang, Y.-S. Chen, and J.-W. Xiao, “A lazy engine for high-utilization and energy-efficient reram-based neural network accelerator,” in 2022 IEEE 20th Inter- national Conference on Industrial Informatics (INDIN), pp. 140–145, IEEE, 2022.
    [29] E. Baek, D. Kwon, and J. Kim, “A multi-neural network acceleration architecture,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 940–953, IEEE, 2020.
    [30] C. Li, X. Fan, X. Wu, Z. Yang, M. Wang, M. Zhang, and S. Zhang, “Memory- computing decoupling: A dnn multitasking accelerator with adaptive data arrange- ment,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys- tems, vol. 41, no. 11, pp. 4112–4123, 2022.

    無法下載圖示 全文公開日期 2028/08/30 (校內網路)
    全文公開日期 2028/08/30 (校外網路)
    全文公開日期 2028/08/30 (國家圖書館:臺灣博碩士論文系統)
    QR CODE