簡易檢索 / 詳目顯示

研究生: 施丞祐
Cheng-You Shi
論文名稱: 陣列架構人工智慧加速器之可測試性設計與內建自我測試技術
Design-for-Testability and Built-In Self-Test Techniques for Systolic Array Based AI Accelerators
指導教授: 呂學坤
Shyue-Kung Lu
口試委員: 李進福
Jin-Fu Li
王乃堅
Nai-Jian Wang
洪進華
Jin-Hua Hong
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 69
中文關鍵詞: 人工智慧加速器脈動陣列可測試性設計內建自我測試
外文關鍵詞: AI Accelerator, Systolic Array, Design-for-Testability, Built-In Self-Test
相關次數: 點閱:286下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於機械學習 (Machine Learning, ML) 需要高密度的計算,因此需要機械學習的硬體來加速運算,近年來機械學習在邊緣運算裝置上進行推論 (Inference) 的效能逐漸上升。而隨著製程微縮,錯誤 (Fault) 的數目也逐漸上升,且暫態錯誤 (Transient Fault) 和永久錯誤 (Permanent Fault) 會直接導致機械學習的效能下降,尤其是永久錯誤的影響相較於暫態錯誤更大,因此本篇論文針對機械學習硬體提供有效的可測試性設計 (Design for Testability) 技術。
    本篇論文主要針對Google所提出的張量處理單元 (Tensor Processing Unit, TPU) 內部的脈動陣列提出內建自我測試 (Built-In Self-Test) 和內建自我診斷 (Built-In Self-Diagnosis) 技術,在內建自我測試方面會將單一細胞 (Single Cell) 的測試向量透過陣列的資料流傳送到所有的細胞上,達成陣列的可控制性,而測試完的結果會透過細胞內部的可重組式多輸入特徵暫存器 (Reconfigurable Multiple-Input Signature Register, RMISR),進行測試結果的壓縮,並且將結果位移出來檢查,滿足陣列的可觀測性,故障涵蓋率可以達到100%。另外在內建自我診斷方面,會在內建自我測試結束後,將RMISR內的值位移進診斷電路中,找出錯誤的細胞位置。此外本篇論文提出的內建自我測試電路架構和內建自我診斷電路架構,其硬體成本約為5%。


    Due to the high computational density required for deep learning, specialized hardware for deep learning is necessary to accelerate computations. In recent years, the performance of deep learning inference on edge devices has been gradually increasing. However, as process technology scales down, the number of faults also increases. Both transient and permanent faults can directly lead to a decrease in the performance of deep learning. The impact of permanent faults is even greater than that of transient faults. Therefore, this thesis proposes effective design-for-testability (DFT) techniques for deep learning hardware accelerators, which is currently an important issue.
    This thesis focuses on the systolic array within Google's Tensor Processing Unit (TPU) and proposes built-in self-test (BIST) and built-in self-diagnosis (BISD) techniques. In terms of self-test, the test vectors for a single cell are propagated through the array via data flow paths to all cells for achieving full controllability of the array. The test results are then compressed using the proposed reconfigurable multiple-input signature register (RMISR) within each cell. Thereafter, the compressed results are shifted out for checking for achieving full observability of the array. Consequently, 100% fault coverage can be achieved. Additionally, from the self-diagnosis aspect, after the self-test is completed, the values in the RMISRs are shifted into the diagnosis circuit to identify the faulty cell locations. Furthermore, the proposed built-in self-test circuit architecture and built-in self-diagnosis circuit architecture have an overall hardware overhead of about 5%.

    致謝 1 摘要 2 Abstract 3 目錄 4 圖目錄 6 表目錄 9 第一章 簡介 10 1.1 背景及動機 10 1.2 組織架構 15 第二章 人工智慧加速器原理及應用 16 2.1 深度神經網路基本介紹 16 2.2 人工智慧加速器基本介紹 18 2.2.1 DianNao [26] 21 2.2.2 張量處理單元 (TPU) [6] 22 2.3 人工智慧加速器資料流 23 2.4 人工智慧加速器於神經網路之應用 26 第三章 人工智慧加速器之可測試性及容錯設計技術 28 3.1 人工智慧加速器可測試性設計技術 28 3.2 人工智慧加速器容錯設計技術 31 第四章 基於C-testable 特性之人工智慧加速器可測試性設計與內建自我測試、診斷技術 34 4.1 C-testability回顧 34 4.2 人工智慧加速器之可測試性設計與內建自我測試、診斷技術 35 4.2.1 人工智慧加速器之可測試性設計技術 35 4.2.2 人工智慧加速器內建自我測試技術 44 4.2.3 人工智慧加速器內建自我診斷技術 47 第五章 實驗結果 50 5.1 故障評估 (Fault Evaluation) 50 5.2 混疊 (Aliasing) 機率分析 54 5.3 測試時間分析 56 5.4 硬體成本分析 57 5.5 設計流程 58 5.5.1 實驗環境 58 5.5.2 超大型積體電路實現 59 第六章 結論與未來展望 63 6.1 結論 63 6.2 未來展望 63 參考文獻 64

    [1] W. A. Wulf and S. A. McKee, “Hitting the Memory Wall : Implications of the Obvious,” ACM SIGARCH Comput. Archit. News, vol. 23, no. 1, pp. 20–24, Mar. 1995.
    [2] M. Valle, “Analog VLSI Implementation of Artificial Neural Networks with Supervised on-Chip Learning,” Analog Integr. Circuits Signal Process., vol. 33, pp. 263–287, Dec. 2002.
    [3] M. Bouvier et al., “Spiking Neural Networks Hardware Implementations and Challenges: A Survey,” ACM J. Emerg. Technol. Comput. Syst., vol. 15, no. 2, pp. 1–35, Apr. 2019.
    [4] Y. Chen et al., “A Survey of Accelerator Architectures for Deep Neural Networks,” Engineering, vol. 6, no. 3, pp. 264–274, Mar. 2020.
    [5] S. Mittal, “A Survey on Optimized Implementation of Deep Learning Models on the Nvidia Jetson Platform,” Journal of Syst. Archit., vol 97, pp. 428-442, Aug. 2019.
    [6] N. P. Jouppi et al., “in-Datacenter Performance Analysis of A Tensor Processing Unit,” in Proc. International Symp. on Comput. Archit., pp. 1-12, June 2017.
    [7] P. J. Bannon, “Accelerated Mathematical Engine,” U.S. Patent 0 026 078 A1, Sep. 20, 2017.
    [8] Y. H. Chen, T. J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices,” IEEE Journal on Emerg. and Selected Topics in Circuits and Syst., vol. 9, no. 2, pp. 292-308, June 2019.
    [9] M. Sankaradas et al., “A Massively Parallel Coprocessor for Convolutional Neural Networks,” in Proc. IEEE International Conference on Application-specific Syst., Archit. and Processors, pp. 53-60, Aug. 2009.
    [10] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A Dynamically Configurable Coprocessor for Convolutional Neural Networks,” in Proc. International Symp. on Comput. Archit., pp. 247-257, June 2010.
    [11] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A Convolutional Network Accelerator,” in Proc. Great Lakes Symp. on VLSI, pp. 199-204, May 2015.
    [12] J. J. Zhang, T. Gu, K. Basu, and S. Garg, “Analyzing and Mitigating the Impact of Permanent Faults on A Systolic Array Based Neural Network Accelerator,” in Proc. IEEE VLSI Test Symp. (VTS), pp. 1-6, Apr. 2018.
    [13] S. Kundu, S. Banerjee, A. Raha, S. Natarajan, and K. Basu, “Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators,” IEEE Trans. on Very Large Scale Integration (VLSI) Syst., pp. 485-498, Mar. 2021.
    [14] U. S. Solangi, M. Ibtesam, M. A. Ansari, J. Kim, and S. Park, “Test Architecture for Systolic Array of Edge-Based AI Accelerator,” IEEE Access, vol. 9, pp. 96700-96710, July 2021.
    [15] Y. Chen, Y. Xie, L. Song, F. Chen, and T. Tang, “A Survey of Accelerator Architectures for Deep Neural Networks,” Engineering, vol. 6, no. 3, pp. 264–274, Mar. 2020.
    [16] A. Samajdar, Y. Zhu, P. N. Whatmough, M. Mattina, and T. Krishna, “Scale-Sim: Systolic CNN Accelerator Simulator,” Comput. Research Repository, vol. abs/1811.02883, Feb. 2019.
    [17] J. J. Zhang, K. Basu, and S. Garg, “Fault-Tolerant Systolic Array Based Accelerators for Deep Neural Network Execution,” IEEE Design & Test, vol. 36, no. 5, pp. 44-53, Oct. 2019.
    [18] C. Schorn, A. Guntoro, and G. Ascheid, “Accurate Neuron Resilience Prediction for a Flexible Reliability Management in Neural Network Accelerators,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 979-984, Apr. 2018.
    [19] C. Liu et al., “HyCA: A Hybrid Computing Architecture for Fault-Tolerant Deep Learning,” IEEE Trans. on Comput. Aided Design of Integr. Circuits and Syst., vol. 41, no. 10, pp. 3400-3413, Oct. 2022.
    [20] S. Lee et al., “Hybrid Test Access Mechanism for Multiple Identical Cores,” in Proc. International SoC Design Conference (ISOCC), pp. 365-366, Nov. 2021.
    [21] A. Chaudhuri, C. Liu, X. Fan, and K. Chakrabarty, “C-Testing of AI Accelerators, ” in Proc. 2020 IEEE 29th Asian Test Symp. (ATS), Dec. 2020.
    [22] E. Ozen and A. Orailoglu, “Low-Cost Error Detection in Deep Neural Network Accelerators with Linear Algorithmic Checksums,” Journal of Electronic Testing, vol. 36, no. 6, pp. 703–718, Dec. 2020.
    [23] C. W. Wu and P. R. Cappello, “Easily Testable Iterative Logic Arrays,” IEEE Trans. on Comput., vol. 39, no. 5, pp. 640-652, May 1990.
    [24] S. K. Lu, J. C. Wang, and C. W. Wu, “C-testable Design Techniques for Iterative Logic Arrays,” IEEE Trans. on Very Large Scale Integration (VLSI) Syst., vol. 3, no. 1, pp. 146-152, Mar. 1995.
    [25] J. Kim and S. M. Reddy, “on the Design of Fault-Tolerant Two-Dimensional Systolic Arrays for Yield Enhancement,” IEEE Trans. on Comput., vol. 38, pp. 515–525, Apr. 1989.
    [26] Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam, “DianNao Family: Energy-Efficient Hardware Accelerators for Machine Learning,” Communications of the ACM, vol. 59, pp. 105-112, Oct. 2016.
    [27] A. Avizienis et al., “Basic Concepts and Taxonomyof Dependable and Secure Computing,” IEEE Trans. Depend. Sec. Comput., vol. 1, no. 1, pp. 11–33, Jan. 2004.
    [28] H. Lee, J. Kim, J. Park, and S. Kang, “STRAIT: Self-Test and Self-Recovery for AI Accelerator,” IEEE Trans. on Comput. Aided Design of Integr. Circuits and Syst., Jan. 2023.
    [29] F. Su, C. Liu, and H. G. Stratigopoulos, “Testability and Dependability of AI Hardware: Survey, Trends, Challenges, and Perspectives,” IEEE Design & Test, vol. 40, no. 2, pp. 8-44, Apr. 2023.
    [30] L. T. Wang, C. W. Wu, and X. Wen, “VLSI Test Principles and Architectures: Design for Testability,” Elsevier, 2006.

    無法下載圖示 全文公開日期 2026/08/07 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE