使用權重長度縮減及位址重映射技術以提升神經網路加速器快閃記憶體之可靠度

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳義正 Yi-Jheng Wu
論文名稱：	使用權重長度縮減及位址重映射技術以提升神經網路加速器快閃記憶體之可靠度 Weight-Length Reduction and Remapping Techniques for Enhancing Reliability of Flash Memories of DNN Accelerators
指導教授：	呂學坤 Shyue-Kung Lu
口試委員:	李建模黃俊郎黃錫瑜王乃堅呂學坤
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	112
語文別：	中文
論文頁數：	80
中文關鍵詞：	神經網路加速器、快閃記憶體、可靠度、位址重映射技術、權重長度縮減
外文關鍵詞：	DNN Accelerators, Flash Memories, Reliability, Remapping Techniques, Weight-Length Reduction
相關次數：	點閱：48 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

近年來，深度神經網路 (Deep Neural Network, DNN) 迅速進展，已廣泛應用於多個領域，如智慧家居、臉部辨識和自駕車等。深度神經網路模型透過大量的訓練數據，以達到一定標準的準確度，完成訓練後會產生大量的權重數據，這些權重數據需要被儲存。快閃記憶體是一種適合儲存這些權重數據的儲存設備，它具有低功耗、可擴充性和高效能等優勢。然而，隨著製程的進步，儘管快閃記憶體擁有較高的儲存密度和較低的成本，但也同時引發了可靠度 (Reliability) 和耐久性 (Endurance) 下降的問題。
如果儲存權重數據時出現錯誤，這些錯誤可能會在計算過程中引起偏差，從而導致準確度下降。本篇論文首先對可壓縮的權重進行壓縮，同時利用一個標誌位 (Flag Bit)，將權重分成可縮減位元權重 (Reducible Weight, RW) 及不可縮減位元權重 (Irreducible Weight, IRW)。這種縮減方式可以幫助節省空間，進而將其分配給儲存錯誤更正碼 (Error Correction Code, ECC) 的檢查位元 (Check Bits)。此外，由於快閃記憶體的編碼字 (Codeword) 通常包含許多權重，每個編碼字中可縮減位元權重的數量可能會有所不同。這將影響每個編碼字的壓縮效能。因此，本篇論文還引入了位址重映射 (Address Remapping) 演算法，在每個編碼字中均勻分佈可縮減和不可縮減位元的權重，藉此使每個編碼字能夠容納差不多數量的檢查位元，提升每個編碼字的錯誤更正能力，同時增加快閃記憶體的可靠度。
本篇論文實現位址解映射 (Address Demapping) 技術之電路，將權重還原成原始順序，並結合具有錯誤更正能力的 BCH 碼 (Bose-Chaudhuri-Hocquenghem Codes) 電路；同時，在深度學習框架下開發了模擬器，用於模擬不同深度神經網路模型應用位址重映射演算法的實驗。實驗結果顯示，對於 MLP、LeNet 和Alexnet 模型，當位元錯誤率 (Bit Error Rate, BER) 達到 1.00E-02 時，使用BCH19 的準確度分別為 27％、11.02％和 0.22％。在使用本篇論文提出的方法後，錯誤更正能力分別提升為 BCH158、BCH141 和 BCH72。準確度因此提高至 90.63％、96.29％和 98.87％。相較於相同的錯誤更正能力，MLP、LeNet 和Alexnet 模型的硬體成本減少比例分別為 86.56％、30.4％和 8.65％。

In recent years, the rapid advancements of deep neural networks (DNNs) have found their ubiquitous applications in various domains such as smart homes, facial recognition, and autonomous driving. These DNN models are trained with vast amounts of data to achieve specific accuracy standards. A substantial amount of weight data after training are necessary to be stored in non-volatile memory. Flash memory serves as a suitable storage device for weight data due to its advantages of low-power consumption, scalability, and high performance. However, with the advancements in manufacturing processes, despite flash memory's higher storage density and lower cost, it has raised concerns regarding reliability and endurance.
If errors occur within the weight data stored in flash memory, they can introduce inexact computation results and lead to a reduction in inference accuracy. This thesis proposes a weight compression technique without compromising accuracy. To achieve this goal, we initially employ a flag bit to categorize weights into Reducible Weight (RW) and Irreducible Weight (IRW). This compression method helps conserve space for storing check bits of the adopted Error Correction Code (ECC). Moreover, since a codeword of flash memory usually consists of many weights, the proportion of RWs in each codeword might be different. This will impact the compression performance of each codeword. Therefore, a weight remapping technique is also presented such that IRWs and RWs can be evenly distributed to all codewords. This ensures that each codeword can accommodate a similar number of check bits, thereby enhancing error correction capabilities of each codeword and increasing the reliability of flash memory.
We also propose a circuit for address demapping to restore weights to their original order. Furthermore, a simulator is developed by using the deep learning framework Pytorch for evaluating the proposed address remapping algorithm for different deep neural network models. Experimental results show that for the MLP, LeNet, and Alexnet models, when the bit error rate reaches 1.00E-02, the inference accuracies with BCH19 are 27%, 11.02%, and 0.22%, respectively. After applying our approach, the error correction capabilities can be enhanced to BCH158, BCH141, and BCH72, respectively. The inference accuracies are consequently improved to 90.63%, 96.29%, and 98.87% for each model. The hardware overhead reduction ratios as compared to the same error correction capabilities are 86.56%, 30.4%, and 8.65% for the MLP, LeNet, and Alexnet models, respectively.

摘要	I
Abstract	III
致謝	V
目錄	VI
圖目錄	X
表目錄	XIV
第一章 簡介	1
1 背景及動機	1
2 組織架構	4
第二章 快閃記憶體之基本原理、測試與修復技術	6
1 快閃記憶體之基本原理	6
2 快閃記憶體之操作	7
2.1 寫入操作	7
2.2 讀取操作	7
2.3 抹除操作	8
3 快閃記憶體架構	9
3.1 非及型快閃記憶體	10
3.2 非或型快閃記憶體	11
4 快閃記憶體之測試	12
4.1 功能性故障模型	12
4.1.1 常見記憶體之故障模型	12
4.1.2 快閃記憶體之特定故障模型	14
4.2 測試演算法	16
4.3 測試流程	18
5 快閃記憶體之修復技術	19
5.1 內建自我修復技術	19
5.2 錯誤更正碼	19
5.2.1 漢明碼	19
5.2.2 BCH碼	20
5.2.3 低密度奇偶檢查碼	20
第三章 深度學習基本原理	23
1 神經元與深度神經網路架構	23
2 全連接神經網路	27
3 卷積神經網路	27
4 深度學習之容錯設計	33
4.1 深度學習容錯之基本概念	33
4.2 神經網路錯誤容忍 (Error Tolerance) 技術之相關研究	35
4.2.1 容錯 (Fault Tolerance) 設計	36
4.2.2 錯誤遮罩 (Error Masking) 技術及應用	38
第四章 神經網路加速器快閃記憶體之權重長度縮減及位址重映射技術	41
1 權重長度縮減技術	41
1.1 權重連續位元之分析	41
1.2 權重長度縮減技術之基本概念	43
2 位址重映射技術	47
2.1 位址重映射技術之基本概念	50
2.2 位址重映射技術範例	52
2.3 控制字產生之流程	55
3 權重長度縮減及位址重映射技術之硬體架構	58
第五章 實驗結果	63
1 實驗配置	63
2 準確度分析	64
3 可靠度分析	67
4 硬體成本分析	69
5 超大型積體電路實現	73
第六章 結論與未來展望	75
1 結論	75
2 未來展望	75
參考文獻	77
                                

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
[2] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti, “Introduction to Flash Memory,” Proc. IEEE, vol. 91, no. 4, pp. 489-502, Apr. 2003.
[3] Y. Li and K. N. Quader, “NAND Flash Memory: Challenges and Opportunities,” Computers, vol. 46, pp.23-29, Aug. 2013.
[4] A. S. Spinelli, C. M. Compagnoni, and A. L. Lacaita, “Reliability of NAND Flash Memories: Planar Cells and Emerging Issues in 3D Devices,” Computers, vol. 6, no. 2, pp. 16, Apr. 2017.
[5] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, “Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives,” Proc. IEEE, vol. 105, no. 9, pp. 1666-1704, Sept. 2017.
[6] G. Mayuga, Y. Yamato, T. Yoneda, M. Inoue, and Y. Sato, “An ECC-based Memory Architecture with Online Self-repair Capabilities for Reliability Enhancement,” in Proc. IEEE European Test Symposium (ETS), pp. 1-6, May 2015.
[7] L. Yuan, H. Liu, P. Jia, and Y. Yang, “Reliability-based ECC System for Adaptive Protection of NAND Flash Memories,” in Proc. IEEE International Conference on Communication Systems and Network Technologies (CSNT), pp. 897-902, Apr. 2015.
[8] C. Kim, C. Park, S. Yoo, and S. Lee, “Extending Lifetime of Flash Memory Using Strong Error Correction Coding,” IEEE Transactions on Consumer Electronics, pp. 206-214, May. 2015.
[9] W. Kang, L. Zhang, W. Zhao, J. O. Klein, Y. Zhang, D. Ravelosona, and C. Chappert, “Yield and Reliability Improvement Techniques for Emerging Nonvolatile STT-MRAM,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 5, no. 1, pp. 28–39, Mar. 2015.
[10] K. Mizoguchi, K. Maeda, and K. Takeuchi, “Automatic Data Repair Overwrite Pulse for 3D-TLC NAND Flash Memories with 38x Data-retention Lifetime Extension,” in Proc. IEEE International Reliability Physics Symposium (IRPS), pp. 1-5, Mar. 2019.
[11] M. Mehedi and B. Ray, “Tolerance of Deep Neural Network Against the Bit Error Rate of NAND Flash Memory,” in Proc. IEEE International Reliability Physics Symposium (IRPS), pp. 1-4, Mar. 2019.
[12] S. Kundu, S. Banerjee, A. Raha, S. Natarajan and K. Basu, "Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 3, pp. 485-498, March 2021.
[13] T. -F. Hsieh, J. -F. Li, J. -S. Lai, C. -Y. Lo, D. -M. Kwai and Y. -F. Chou, “Refresh Power Reduction of DRAMs in DNN Systems Using Hybrid Voting and ECC Method,” in Proc. IEEE International Test Conference in Asia (ITC-Asia), Taipei, Taiwan, 2020, pp. 41-46.
[14] S. -S. Lee and J. -S. Yang, “Value-aware Parity Insertion ECC for Fault-tolerant Deep Neural Network,” in Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2022, pp. 724-729.
[15] H. Guan, L. Ning, Z. Lin, X. Shen, H. Zhou, and S.-H. Lim, “In-place zero-space memory protection for CNN,” in Proc. 33rd Int. Conf. Neural Inf. Process. Syst., 2019, Art. no. 515.
[16] Y. Deguchi, T. Nakamura, A. Hayakawa, and K. Takeuchi, “3-D NAND Flash Value-Aware SSD: Error-Tolerant SSD Without ECCs for Image Recognition,” IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1800-1811, June 2019.
[17] J. Jang and J. H. Ko, “WISER: Deep Neural Network Weight-bit Inversion for State Error Reduction in MLC NAND Flash,” in Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2021, pp. 735-738.
[18] Qin, Minghai, Chao Sun, and Dejan Vucinic. “Robustness of Neural Networks Against Storage Media Errors.” arXiv preprint arXiv:1709.06173, 2017.
[19] S. Burel, A. Evans and L. Anghel, “Zero-Overhead Protection for CNN Weights,” in Proc. IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Athens, Greece, 2021, pp. 1-6.
[20] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic Differentiation in Pytorch,” in Proc. Conf. Neural Info. Process. Syst. Workshop, pp. 1–4, Oct. 2017.
[21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Adv. Neural Inf. Process. Syst., vol. 25, pp. 1097–1105, 2012.
[23] D. Kahng and S. M. Sze, “A Floating Gate and Its Application to Memory Devices,” The Bell System Technical Journal, vol. 46, no. 4, pp. 1288–1295, 1967.
[24] R. Dekker, F. Beenker and L. Thijssen, “Fault Modeling and Test Algorithm Development for Static Random Access Memories,” in Proc. International Test Conference 1988 Proceeding@m_New Frontiers in Testing, Washington, DC, USA, 1988, pp. 343-352.
[25] IEEE 1005 Standard Definitions and Characterization of Floating Gate Semiconductor Arrays, Piscataway, NJ: IEEE Standards Sep. 1999.
[26] J. C. Yeh, K. L. Cheng, Y. F. Chou, and C. W. Wu, “Flash Memory Testing and Built-in Self-diagnosis with March-like Test Algorithms,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 6, pp. 1101–1113, June 2007.
[27] M. G. Mohammad and L. Terkawi, “Fault Collapsing for Flash Memory Disturb Faults,” in Proc. IEEE European Symposium on Test (ETS), pp. 142–147, May 2005.
[28] Stefano D. C., Fabiano M., Piazza R., and Prinetto P, “Exploring Modeling and Testing of NAND Flash Memories,” in Proc. IEEE Design & Test Symp. (EWDTS), pp. 47-50, Sep. 2010.
[29] C. T. Huang, J. R. Huang, C. F. Wu, C. W. Wu, and T. Y. Chang, “A Programmable BIST Core for Embedded DRAM,” IEEE Design & Test of Computers, vol. 16, no. 1, pp. 59-70, Jan. 1999.
[30] A. J. V. D. Goor, “Using March Tests to Test SRAMs,” IEEE Design & Test of Computers, vol. 10, no. 1, pp. 8-14, Mar. 1993.
[31] R. Nair, S. M. Thatte, and J. A. Abraham, “Efficient Algorithms for Testing Semiconductor Random-access Memories,” IEEE Trans. Computers, vol. C-27, no. 6, pp. 572-576, June 1978.
[32] K. L. Cheng, J. C. Yeh, C. W. Wang, C. T. Huang, and C. W. Wu, “RAMSES-FT: A Fault Simulator for Flash Memory Testing and Diagnostics,” in Proc. IEEE VLSI Test Symp. (VTS), pp. 281-286, Apr. 2002.
[33] C. T. Huang, J. C. Yeh, Y. Y. Shih, R. F. Huang, and C. W. Wu, “On Test and Diagnostics of Flash Memories,” in Proc. IEEE Asian Test Symp. (ATS), pp. 260-265, Jan. 2005
[34] Y. Y. Hsiao, C. H. Chen, and C. W. Wu, “A Built-in Self-repair Scheme for NOR-type Flash Memory,” in Proc. IEEE VLSI Test Symp. (VTS), pp. 114-119, Apr. 2006.
[35] J. C. Yeh, K. L. Cheng, Y. F. Chou, and C. W. Wu, “Flash Memory Testing and Built-in Self-diagnosis with March-like Test Algorithms,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 6, pp. 1101-1113, June 2007.
[36] R. C. Bose and D. K. Ray Chaudhuri, “On a Class of Error-correcting Binary Group Codes,” Information and Contribution, vol. 3, Mar. 1960.
[37] E. R. Berlekamp, “Algebraic Coding Theory”, McGraw-Hill, 1968.
[38] R. Micheloni, R. Ravasio, A. Marelli, E. Alice, V. Altieri, A. Bovino, L. Crippa, E. Di Martino, L. D’Onofrio, A. Gambardella, E. Grillea, G. Guerra, D. Kim, C. Missiroli, I. Motta, A. Prisco, G. Ragone, M. Romano, M. Sangalli, P. Sauro, M. Scotti, and S. Won, “A 4Gb 2b/cell NAND Flash Memory with Embedded 5b BCH ECC for 36MB/s System Read Throughput,” in Proc. IEEE Int’1 Solid-State Cir. Conf. (ISSCC), pp. 497-506, Feb. 2006.
[39] Y. Chen, and K. Parhi, “Small Area Parallel Chien Search Architectures for Long BCH Codes,” IEEE Trans. VLSI System, vol. 12, no. 5, pp. 545-549, May 2004.
[40] S. Choi, H. K. Ahn, B. K. Song, J. P. Kim, S. H. Kang, and S. Jung, “A decoder for short BCH codes with high decoding efficiency and low power for emerging memories,” IEEE Trans. VLSI System, vol. 27, no. 2, pp. 387–397, Nov. 2019.
[41] R. Gallager, "Low-Density Parity-Check Codes," IRE Transactions on Information Theory, vol. 8, no. 1, pp. 21-28, Jan. 1962.
[42] P. Chen et al., “Rate-adaptive protograph LDPC codes for multilevel-cell NAND flash memory,” IEEE Commun. Lett., vol. 22, no. 6, pp. 1112–1115, June 2018.
[43] Q. Li, L. Shi, Y. Cui, and C. J. Xue, “Exploiting Asymmetric Errors for LDPC Decoding Optimization on 3D NAND Flash Memory,” IEEE Trans. Comput., vol. 69, no. 4, pp. 475–488, Apr. 2020.
[44] Y. LeCun, C. Cortes, C. Burges, “The MNIST Database of Handwritten Digits,” 2012. Available online: http://yann.lecun.com/exdb/mnist/.
[45] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, vol. 15, pp.315-323, Apr. 2011.
[46] F. F. Li, R. Krishna, D. Xu, A. Byun, W. Shen, J. Braatz, D. Cai, J. Gwak, De-An Huang, A. Kondrich, F. Yu Lin, D. Mrowca, B. Pan, N. Rai, L. P. Tchapmi, C. Waites, R. Wang, Yi Wen, K. Yang, B. Yi, C. Yuan, K. Zakka, and Y. Zhang, “CS231n Convolutional Neural Networks for Visual Recognition,” Jan. 2015. [Online].
[47] S. Teerapittayanon, B. McDanel, and H. T. Kung, “Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices,” in Proc. IEEE International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 2017, pp. 328-339.

全文公開日期 2027/01/29 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文