簡易檢索 / 詳目顯示

研究生: 徐若綺
Ruo-Chi Hsu
論文名稱: 應用於AI加速器之低功耗動態存取記憶體之自適應權重位元重要性感知刷新技術
Adaptive Weight Significance-Aware Refresh Techniques for Low-Power and Reliable Design of DRAMs in AI Accelerators
指導教授: 呂學坤
Shyue-Kung Lu
口試委員: 李進福
許鈞瓏
洪進華
王乃堅
呂學坤
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 中文
論文頁數: 56
中文關鍵詞: 動態隨機存取記憶體錯誤修正碼刷新功率消耗
外文關鍵詞: DRAM, Error Correction Code, Refresh Power
相關次數: 點閱:82下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

動態隨機存取記憶體 (Dynamic Random Access Memory, DRAM) 因為具有高密度、 使用壽命長與低成本等優點,被廣泛應用於現代電子產品上,也包括近年快速發展的神經網路硬體中。而動態隨機存取記憶體本身構造之影響,為了維持資料正確性需要週期性做刷新動作 (Refresh),隨著記憶體越變越大,刷新動作所消耗的能量以及造成的性能 (Performance) 下降變得越來越嚴重。
然而,動態隨機存取記憶體雖然需要定期刷新,但並不是每個細胞都需要如此頻繁的刷新動作,僅有極少數的細胞其資料暫存時間較短,卻導致整體記憶體需要受限於該少數細胞。因此本篇論文提出低功耗動態存取記憶體之自適應權重位元重要性感知刷新技術,透過自適應的錯誤修正碼,根據細胞暫存時間的長短來決定要使用的錯誤修正碼為較強保護能力的錯誤修正碼,還是較弱之錯誤修正碼,並分析權重資料的位元敏感度,以區分權重資料中哪些位元比較需要使用錯誤修正碼保護。藉由使用自適應錯誤修正碼技術提高資料的保護能力後,資料即使發生資料保留錯誤,經過解碼器解碼獲得錯誤位元位置後,可透過變補將錯誤修正回來,因此可以延長刷新的時間,以降低刷新功率消耗。
本研究實現了低功耗設計技術之硬體電路,並以神經網路框架開發模擬器模擬不同神經網路應用此技術的結果,實驗顯示在正確率損失 0.5% 內,Lenet5、SqueezeNet 以及 Resnet18 能減少接近 98% 的刷新功率消耗。


Dynamic Random Access Memory (DRAM) is widely used as main memory in electronic systems, including the fast-developing neural AI accelerators due to its high density, fast access speed, and low cost. However, due to DRAM’s inherent structure, it requires periodic refresh operations for data loss prevention and data integrity. The refresh power consumption would increase as the DRAM capacity increases and eventually dominate the system’s power consumption. Many techniques are proposed for reducing the refresh power consumption, including partitioning DRAM into different zones or using error correction codes. As the DRAM technology scales down, the growing bit error rate would necessitate the ECC-based techniques using a stronger ECC protection level, which would incur significant decoding latency and storage overhead.
In this thesis, we proposed adaptive weight significance-aware refresh techniques to mitigate the power consumption of DRAM refresh and reduce the storage overhead in AI accelerators. Based on the analysis, we partition the weight data into left and right nibbles, and only use ECC to protect the more significant left nibble, which has more impact on inference accuracy. We adopt two protection levels of ECC, where ECC-2 is applied to the less frequent leaky words, while others use the default ECC-1. Also, the ECC decoder is decoupled into error detection and error correction procedures. By identifying the number of faults in a codeword, we can choose the most suitable decoding path to enhance the decoding latency.
We used the deep learning framework for evaluating the accuracy of different DNN models. Experimental results show that the proposed techniques can significantly extend the refresh period and reduce the refresh power consumption around 98% within 0.5% accuracy loss in DNN models.

摘要 i ABSTRACT ii 誌謝 iii 目錄 iv 圖目錄 vi 表目錄 viii 第一章 簡介 1 1.1 背景及動機 1 1.2 組織架構 3 第二章 深度學習基本介紹 4 2.1 神經網路架構介紹 4 2.2 全連結神經網路 7 2.3 卷積神經網路 7 第三章 動態隨機存取記憶體基本介紹與測試 12 3.1 動態隨機存取記憶體基本架構及特性 12 3.1.1 基本元件架構及特性 12 3.1.2 動態隨機存取記憶體電路架構及基本操作動作 15 3.2 動態隨機存取記憶體之內建自我測試 19 3.2.1 動態隨機存取記憶體故障模型 19 3.2.2 測試演算法 20 3.2.3 內建自我測試技術 22 3.3 錯誤檢查及修正碼技術 23 3.3.1 漢明碼 (Hamming Code) 24 3.3.2 修正漢明碼 (Modified Hamming Code) 25 3.3.3 BCH碼 (Bose–Chaudhuri–Hocquenghem Code) 25 3.4 相關降低刷新功率消耗技術 27 第四章 低功耗動態隨機存取記憶體之自適應權重位元重要性感知刷新技術 31 4.1 神經網路權重位元重要性分析 31 4.2 低功耗設計技術概念 34 4.2.1 刷新週期挑選之流程 36 4.2.2 讀取及寫入操作流程 38 4.3 低功耗設計技術之硬體架構 41 第五章 實驗結果 45 5.1 深度學習框架與深度學習模型設定 45 5.2 刷新功率消耗分析 45 5.3 解碼效能分析 47 5.4 硬體成本分析 49 5.5 超大型積體電路實現 51 第六章 結論與未來展望 53 6.1 結論 53 6.2 未來展望 53 參考文獻 54

[1] A. Reuther, P. Michaleas, M. Jones, V. Gadepally, S. Samsi, and J. Kepner, “Survey of Machine Learning Accelerators,” in Proc. 2020 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–12, Sep. 2020.
[2] S. K. Lu and H. K. Huang, “Adaptive Block-based Refresh Techniques for Mitigation of Data Retention Faults and Reduction of Refresh Power,” in Proc. International Test Conference in Asia (ITC-Asia), pp. 101–106, Sep. 2017.
[3] S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, “Flikker: Saving DRAM Refresh-power Through Critical Data Partitioning,” in Proc. International Conference on Architectural support for programming languages and operating systems (ASPLOS), pp. 213–224, Mar. 2011.
[4] J. Hong, H. Kim, and S. Kim, “EAR: ECC-aided Refresh Reduction Through 2-D Zero Compression,” in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 1–11, Nov. 2018.
[5] T. F. Hsieh, J. F. Li, J. S. Lai, C. Y. Lo, D. M. Kwai, and Y. F. Chou, “Refresh Power Reduction of DRAMs in DNN Systems Using Hybrid Voting and ECC Method,” in Proc. IEEE International Test Conference in Asia (ITC-Asia), pp. 41–46, Sep. 2020.
[6] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” in Proc. of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323, Jun. 2011.
[7] W. G. Hatcher and W. Yu, “A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends,” IEEE Access, vol. 6, pp. 24411–24432, 2018.
[8] K. O’Shea and R. Nash, “An Introduction to Convolutional Neural Networks.” arXiv, Dec. 2015.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems (NIPS), 2012.
[10] A. Raha, H. Jayakumar, S. Sutar, and V. Raghunathan, “Quality-aware Data Allocation in Approximate DRAM,” in Proc. Int’l Conf. on Compilers Architecture and Synthesis for Embedded Systems (CASES '15), pp. 89–98, Oct. 2015.
[11] T. Hamamoto, S. Sugiura, and S. Sawada, “On the Retention Time Distribution of Dynamic Random Access Memory (DRAM),” IEEE Trans. Electron Devices, vol. 45, no. 6, pp. 1300–1309, June 1998.
[12] P. G. Emma, W. R. Reohr, and M. Meterelliyoz, “Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications,” IEEE Micro, vol. 28, no. 6, pp. 47–56, Jan. 2008.
[13] S. Baek, S. Cho, and R. Melhem, “Refresh Now and Then,” IEEE Trans. Computers, vol. 63, no. 12, pp. 3114–3126, Dec. 2014.
[14] K. Kim and J. Lee, “A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs,” IEEE Electron Device Letters, vol. 30, no. 8, pp. 846–848, Aug. 2009.
[15] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” in Proc. 2014 ACM/IEEE 41st Int. Symp. on Computer Architecture (ISCA), pp. 361–372, June 2014.
[16] R. Dekker, F. Beenker, and L. Thijssen, “Fault Modeling and Test Algorithm Development for Static Random Access Memories,” in Proc. IEEE Int’l Test Conf., pp. 343-352, Sep. 1988.
[17] C. T. Huang, J. R. Huang, C. F. Wu, C. W. Wu, and T. Y. Chang, “A Programmable BIST Core for Embedded DRAM,” IEEE Design & Test of Computers, vol. 16, no. 1, pp. 59–70, Jan. 1999.
[18] Nair, Thatte, and Abraham, “Efficient Algorithms for Testing Semiconductor Random-Access Memories,” IEEE Trans. Computers, vol. C–27, no. 6, pp. 572–576, June 1978.
[19] A. J. Van De Goor, “Using March Tests to Test SRAMs,” IEEE Design & Test of Computers, vol. 10, no. 1, pp. 8–14, Mar. 1993.
[20] M. Bushnell and V. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits, Springer Science & Business Media, 2004.
[21] R. W. Hamming, “Error Detecting and Error Correcting Codes,” The Bell System Technical Journal, vol. 29, no. 2, pp. 147–160, Apr. 1950.
[22] J. Zhang, Z. G. Wang, Q. S. Hu, and J. Xiao, “Optimized Design for High-speed Parallel BCH Encoder,” in Proc. IEEE Int’l Workshop on VLSI Design and Video Technology, pp. 97–100, May 2005.
[23] S. Lin and D. J. Costello, Error Control Coding, 2nd ed., Englewood Cliffs, NJ: Pearson Prentice Hall, 2014.
[24] X. Youzhi, “Implementation of Berlekamp-Massey Algorithm without Inversion,” IEE Proc. Communications, Speech and Vision, vol. 138, no. 3, pp. 138-140, June 1991.
[25] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa, “A Method for Solving Key Equation for Decoding Goppa Codes,” Information and Control, pp. 87-99, Jan. 1975.
[26] Y. Chen and K. K. Parhi, “Small Area Parallel Chien Search Architectures for Long BCH Codes,” IEEE Trans. on VLSI Systems, vol. 12, no. 5, pp. 545–549, May 2004.
[27] J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: Retention-Aware Intelligent DRAM Refresh,” in Proc. 2012 ACM/IEEE 39st Int’l Symp. on Computer Architecture (ISCA), pp. 1-12, June 2012.
[28] D. T. Nguyen, N. M. Ho, and I. J. Chang, “St-DRC: Stretchable DRAM Refresh Controller with No Parity-overhead Error Correction Scheme for Energy-efficient DNNs,” in Proc. 56th ACM/IEEE Design Automation Conf. (DAC), pp. 1–6, June 2019.
[29] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” in Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Jan. 1998.
[30] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5MB Model Size.” arXiv, Nov. 2016.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
[32] S. Mittal, “A Survey on Modeling and Improving Reliability of DNN Algorithms and Accelerators,” Journal of Systems Architecture, vol. 104, p. 101689, Mar. 2020.
[33] S. Choi, H. K. Ahn, B. K. Song, J. P. Kim, S. H. Kang, and S. O. Jung, “A Decoder for Short BCH Codes With High Decoding Efficiency and Low Power for Emerging Memories,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 27, no. 2, pp. 387–397, Feb. 2019.
[34] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic Differentiation in Pytorch,” in Proc. Conf. Neural Info. Process. Syst. Workshop, pp. 1–4, Oct. 2017.
[35] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521, no. 7553, May 2015.
[36] S. Weidman, “Deep Learning with Pytorch: a 60 Minute Blitz Training a Classifier,” Oct. 2019. [Online]. Available: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html.
[37] Micron, “1Gb: x4, x8, x16 DDR3 SDRAM,” http://www.micron.com/resourcedetails/31780d73-ea30-4400-af00-86e22c6c22ea.”
[38] R. C. Bose and D. K. Ray-Chaudhuri, “On a Class of Error Correcting Binary Group Codes,” Information and Control, vol. 3, no. 1, pp. 68–79, Mar. 1960.

QR CODE