簡易檢索 / 詳目顯示

研究生: 鄭揚霖
Yang-Lin Zheng
論文名稱: 改善電阻式記憶體為基礎之記憶體內神經網路運算能源效率
Towards Energy-Efficient Neural Network Acceleration on ReRAM-Based Processing-in-Memory Architecture
指導教授: 陳雅淑
Ya-Shu Chen
口試委員: 吳晉賢
Chin-Hsien Wu
修丕承
Pi-Cheng Hsiu
方劭云
Shao-Yun Fang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 60
中文關鍵詞: 神經網路加速記憶體內運算電阻式記憶體陣列
外文關鍵詞: Neural network acceleration, Processing-in-memory, ReRAM crossbar array
相關次數: 點閱:291下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電阻式記憶體的特性提供了在記憶體內運算的可能性,被視為加速神經網路運算的解決方案。為了應付神經網路中大量的向量矩陣相乘運算,周邊電路被使用於組織大量的電阻式記憶體元件,使其形成能提供高度平行化的記憶體內運算架構。然而,其周邊電路消耗龐大功耗。為了改善電阻式記憶體為基礎之記憶體內神經網路運算能源效率,本論文提出具動態參考電壓調整且可配置的電阻式記憶體陣列結構提升其神經網路運算的能源效率;並基於所提出電阻式記憶體陣列結構,提供了實時運行引擎,針對記憶體內運算要求進行排程,以及基於各個神經網路層之間的平行化資料,設置運算單元的大小。實驗結果顯示本論文所提出的方法能夠節省大量的能量消耗並保持神經網路運算效能。


    Resistive random-access memory (ReRAM) offers a potential solution to accelerate the inference of deep neural networks~(DNNs) by performing arithmetic operations in memory, which refers to processing-in-memory~(PIM).
    However, the peripheral circuits for constructing ReRAM cells as a ReRAM crossbar to handle the high data-level parallelism vector-matrix multiplication in neural networks consumes significant power.
    To provide an energy-efficient neural network acceleration on ReRAM-based PIM architecture, we propose a configurable ReRAM crossbar with dynamic reference voltage scaling and a run-time engine to schedule PIM requests and assign the size of operation unit based on the data-level parallelism of each layer.
    The experiment results show that the proposed approaches deliver significant energy-savings while maintaining performance.

    Introduction Background and Motivation A Configurable ReRAM Xbar for PIM Run-time Engine for a Configurable ReRAM-Based PIM Architecture Experiment Result Related Work Conclusion References

    [1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
    [2] Karpathy, Andrej, et al. "Large-scale video classification with convolutional neural networks." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014.
    [3] Chen, Guoguo, Carolina Parada, and Georg Heigold. "Small-footprint keyword spotting using deep neural networks." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.
    [4] Shafiee, Ali, et al. "ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars." ACM SIGARCH Computer Architecture News 44.3 (2016): 14-26.
    [5] Chi, Ping, et al. "Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 27-39.
    [6] Long, Yun, Taesik Na, and Saibal Mukhopadhyay. "ReRAM-based processing-in-memory architecture for recurrent neural network acceleration." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26.12 (2018): 2781-2794.
    [7] Yang, Tzu-Hsien, et al. "Sparse reram engine: Joint exploration of activation and weight sparsity in compressed neural networks." Proceedings of the 46th International Symposium on Computer Architecture. 2019.
    [8] Chen, Xizi, et al. "A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization." 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2018.
    [9] Xia, Lixue, et al. "Switched by input: Power efficient structure for RRAM-based convolutional neural network." Proceedings of the 53rd Annual Design Automation Conference. 2016.
    [10] Kim, Yulhwa, et al. "Input-splitting of large neural networks for power-efficient accelerator with resistive crossbar memory array." Proceedings of the International Symposium on Low Power Electronics and Design. 2018.
    [11] Kawahara, Akifumi, et al. "An 8 Mb multi-layered cross-point ReRAM macro with 443 MB/s write throughput." IEEE Journal of Solid-State Circuits 48.1 (2012): 178-185.
    [12] Jouppi, Norman P., et al. "In-datacenter performance analysis of a tensor processing unit." Proceedings of the 44th Annual International Symposium on Computer Architecture. 2017.
    [13] Yip, Marcus, and Anantha P. Chandrakasan. "A resolution-reconfigurable 5-to-10-bit 0.4-to-1 V power scalable SAR ADC for sensor applications." IEEE Journal of Solid-State Circuits 48.6 (2013): 1453-1464.
    [14] Liu, Chun-Cheng, et al. "A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching procedure." IEEE Journal of Solid-State Circuits 45.4 (2010): 731-740.
    [15] Wang, Tzu-Yun, et al. "A bypass-switching SAR ADC with a dynamic proximity comparator for biomedical applications." IEEE Journal of Solid-State Circuits 53.6 (2018): 1743-1754.
    [16] Zhu, Zhangming, et al. "A 6-to-10-Bit 0.5 V-to-0.9 V Reconfigurable 2 MS/s Power Scalable SAR ADC in 0.18$\mu {\rm m} $ CMOS." IEEE Transactions on Circuits and Systems I: Regular Papers 62.3 (2014): 689-696.
    [17] Saberi, Mehdi, et al. "Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs." IEEE Transactions on Circuits and Systems I: Regular Papers 58.8 (2011): 1736-1748.
    [18] Pereira, Nuno, and João Goes. "A charge‐sharing analog‐to‐digital converter with embedded downconversion using a variable reference voltage." International Journal of Circuit Theory and Applications (2020).
    [19] Lin, Meng-Yao, et al. "DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning." 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2018.
    [20] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
    [21] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
    [22] Binkert, Nathan, et al. "The gem5 simulator." ACM SIGARCH computer architecture news 39.2 (2011): 1-7.
    [23] Xu, Sheng, et al. "PIMSim: A flexible and detailed processing-in-memory simulator." IEEE Computer Architecture Letters 18.1 (2018): 6-9.
    [24] Poremba, Matt, and Yuan Xie. "Nvmain: An architectural-level main memory simulator for emerging non-volatile memories." 2012 IEEE Computer Society Annual Symposium on VLSI. IEEE, 2012.
    [25] Kumar, Dinesh, et al. "Design of hybrid flash-SAR ADC using an inverter based comparator in 28 nm CMOS." Microelectronics Journal 95 (2020): 104666.
    [26] Boroumand, Amirali, et al. "LazyPIM: An efficient cache coherence mechanism for processing-in-memory." IEEE Computer Architecture Letters 16.1 (2016): 46-50.
    [27] Pattnaik, Ashutosh, et al. "Scheduling techniques for GPU architectures with processing-in-memory capabilities." Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. 2016.
    [28] Wang, Yi, et al. "Towards memory-efficient allocation of CNNs on processing-in-memory architecture." IEEE Transactions on Parallel and Distributed Systems 29.6 (2018): 1428-1441.
    [29] Peng, Xiaochen, Rui Liu, and Shimeng Yu. "Optimizing weight mapping and data flow for convolutional neural networks on RRAM based processing-in-memory architecture." 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2019.
    [30] Lou, Qian, Wujie Wen, and Lei Jiang. "3dict: a reliable and qos capable mobile process-in-memory architecture for lookup-based cnns in 3d xpoint rerams." Proceedings of the International Conference on Computer-Aided Design. 2018.
    [31] Cheng, Ming, et al. "Time: A training-in-memory architecture for memristor-based deep neural networks." 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 2017.
    [32] Song, Linghao, et al. "Pipelayer: A pipelined reram-based accelerator for deep learning." 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2017.
    [33] Alibart, Fabien, et al. "High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm." Nanotechnology 23.7 (2012): 075201.
    [34] Xia, Lixue, et al. "Technological exploration of RRAM crossbar array for matrix-vector multiplication." Journal of Computer Science and Technology 31.1 (2016): 3-19.
    [35] Xu, Cong, et al. "Overcoming the challenges of crossbar resistive memory architectures." 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2015.

    無法下載圖示 全文公開日期 2025/08/28 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE