研究生: 施丞祐
Cheng-You Shi
論文名稱: 陣列架構人工智慧加速器之可測試性設計與內建自我測試技術
Design-for-Testability and Built-In Self-Test Techniques for Systolic Array Based AI Accelerators
指導教授: 呂學坤
Shyue-Kung Lu
口試委員: 李進福
Jin-Fu Li
Nai-Jian Wang
Jin-Hua Hong
學位類別: 碩士
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 69
中文關鍵詞: 人工智慧加速器脈動陣列可測試性設計內建自我測試
外文關鍵詞: AI Accelerator, Systolic Array, Design-for-Testability, Built-In Self-Test
由於機械學習 (Machine Learning, ML) 需要高密度的計算,因此需要機械學習的硬體來加速運算,近年來機械學習在邊緣運算裝置上進行推論 (Inference) 的效能逐漸上升。而隨著製程微縮,錯誤 (Fault) 的數目也逐漸上升,且暫態錯誤 (Transient Fault) 和永久錯誤 (Permanent Fault) 會直接導致機械學習的效能下降,尤其是永久錯誤的影響相較於暫態錯誤更大,因此本篇論文針對機械學習硬體提供有效的可測試性設計 (Design for Testability) 技術。
本篇論文主要針對Google所提出的張量處理單元 (Tensor Processing Unit, TPU) 內部的脈動陣列提出內建自我測試 (Built-In Self-Test) 和內建自我診斷 (Built-In Self-Diagnosis) 技術,在內建自我測試方面會將單一細胞 (Single Cell) 的測試向量透過陣列的資料流傳送到所有的細胞上,達成陣列的可控制性,而測試完的結果會透過細胞內部的可重組式多輸入特徵暫存器 (Reconfigurable Multiple-Input Signature Register, RMISR),進行測試結果的壓縮,並且將結果位移出來檢查,滿足陣列的可觀測性,故障涵蓋率可以達到100%。另外在內建自我診斷方面,會在內建自我測試結束後,將RMISR內的值位移進診斷電路中,找出錯誤的細胞位置。此外本篇論文提出的內建自我測試電路架構和內建自我診斷電路架構,其硬體成本約為5%。

Due to the high computational density required for deep learning, specialized hardware for deep learning is necessary to accelerate computations. In recent years, the performance of deep learning inference on edge devices has been gradually increasing. However, as process technology scales down, the number of faults also increases. Both transient and permanent faults can directly lead to a decrease in the performance of deep learning. The impact of permanent faults is even greater than that of transient faults. Therefore, this thesis proposes effective design-for-testability (DFT) techniques for deep learning hardware accelerators, which is currently an important issue.
This thesis focuses on the systolic array within Google's Tensor Processing Unit (TPU) and proposes built-in self-test (BIST) and built-in self-diagnosis (BISD) techniques. In terms of self-test, the test vectors for a single cell are propagated through the array via data flow paths to all cells for achieving full controllability of the array. The test results are then compressed using the proposed reconfigurable multiple-input signature register (RMISR) within each cell. Thereafter, the compressed results are shifted out for checking for achieving full observability of the array. Consequently, 100% fault coverage can be achieved. Additionally, from the self-diagnosis aspect, after the self-test is completed, the values in the RMISRs are shifted into the diagnosis circuit to identify the faulty cell locations. Furthermore, the proposed built-in self-test circuit architecture and built-in self-diagnosis circuit architecture have an overall hardware overhead of about 5%.

致謝 1 摘要 2 Abstract 3 目錄 4 圖目錄 6 表目錄 9 第一章 簡介 10 1.1 背景及動機 10 1.2 組織架構 15 第二章 人工智慧加速器原理及應用 16 2.1 深度神經網路基本介紹 16 2.2 人工智慧加速器基本介紹 18 2.2.1 DianNao [26] 21 2.2.2 張量處理單元 (TPU) [6] 22 2.3 人工智慧加速器資料流 23 2.4 人工智慧加速器於神經網路之應用 26 第三章 人工智慧加速器之可測試性及容錯設計技術 28 3.1 人工智慧加速器可測試性設計技術 28 3.2 人工智慧加速器容錯設計技術 31 第四章 基於C-testable 特性之人工智慧加速器可測試性設計與內建自我測試、診斷技術 34 4.1 C-testability回顧 34 4.2 人工智慧加速器之可測試性設計與內建自我測試、診斷技術 35 4.2.1 人工智慧加速器之可測試性設計技術 35 4.2.2 人工智慧加速器內建自我測試技術 44 4.2.3 人工智慧加速器內建自我診斷技術 47 第五章 實驗結果 50 5.1 故障評估 (Fault Evaluation) 50 5.2 混疊 (Aliasing) 機率分析 54 5.3 測試時間分析 56 5.4 硬體成本分析 57 5.5 設計流程 58 5.5.1 實驗環境 58 5.5.2 超大型積體電路實現 59 第六章 結論與未來展望 63 6.1 結論 63 6.2 未來展望 63 參考文獻 64

