簡易檢索 / 詳目顯示

研究生: 陳聖文
SHEN-WEN CHEN
論文名稱: 相容32位元嵌入式微處理器之低功率資料快取記憶體架構設計與驗證
The Design and Verification of a Low-Power Data Cache Architecture for 32-bit Embedded Microprocessors
指導教授: 林銘波
Ming-Bo Lin
口試委員: 詹景裕
Ching-Yuh Jan
楊兆華
Zhao-Hua Yang
陳郁堂
Yie-Tarng Chen
呂紹偉
Shao-Wei Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 66
中文關鍵詞: 低功率設計資料快取記憶體路徑預測
外文關鍵詞: Low-power design, data cache memory, way-predict
相關次數: 點閱:231下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

深次微米製程技術的發展,使硬體持續依照摩爾定律(Moore's Law)成長,處理器與主記憶體之間的速度差距也因此不斷增加。為了縮短兩者之間存在的差距,快取記憶體的設計儼然成為處理器運算速度快慢的關鍵,然而快取記憶體也替整個晶片增加了可觀的功率消耗。在一般嵌入式系統中,快取記憶體的消耗功率佔了相當重的比例。因此,如何有效降低快取記憶體的功率消耗是嵌入式系統設計與未來發展的熱門議題。
在本論文中,我們剖析了傳統快取記憶體設計上的細節,透過觀察及分析處理器存取資料記憶體時產生的行為模式,提出一個低功率資料快取記憶體架構,稱之為部分啟動式路徑預測型快取(partial way-predicting cache)。在此架構中,我們結合了二種技術,標籤記憶體存取忽略與MRU路徑預測型快取記憶體;發展為切換型路徑預測架構,將預測型電路解構成預測(predict)與不預測(non-predict)兩種操作模式,以輔助單元幫助電路決定啟動時機,使電路能具備微量的思考能力,判斷出預測命中(predict-hit)機會較大的情況下才觸發預測。
實驗結果顯示,我們提出的方法能夠有效地降低傳統MRU路徑預測型快取記憶體約76%的效能損失,且達到與傳統路徑預測型快取記憶體相近的低功率效果約43%,並且提供一個AST輔助型電路架構的理念,給往後的研究者以此架構為基礎繼續發展。完成的低功率資料快取記憶體已經與Proto3-ARM9TM、AMBA匯流排與周邊智財整合,並在TSMC 0.18 μm元件庫上實現,其所耗費的核心面積為 1.963 × 1.955 mm2,整體晶片面積為2.501 × 2.497 mm2,在129 MHz操作頻率下,平均功耗為147 mW。


Deep submicron technology leads to the huge growth of hardware as the prediction of Moore's Law. The speed gap between processors and DRAM devices is increasingly widened along the progress of integrated circuit manufactures. To bridge such a gap, the design of cache memory in between the processor and main memory has become critical. However, the addition of cache memory also consumes a lot of power, which is a large part of the total power dissipation of an embedded system. Consequently, the design of a low-power cache memory has become an important issue in an embedded system.
In this thesis, a low-power data cache architecture, called partial way-predicting (assist-trigger) architecture, is proposed. This architecture is based on the improvement of conventional cache architecture according to the access behavior of data cache memory. To explore this, two low-power techniques, the way-predicting cache with MRU table and tag skipping technique, are combined. The resulting cache architecture can be operated in two modes: predict and non-predict. By using an assist unit to trigger the cache system into the prediction mode opportunely, this architecture avoids huge prediction miss effectively.
As compared to conventional 4-way way-predicting cache, the proposed partial way-predicting architecture can reduce about the 76% performance penalty at the cost of 57% power increase. We also provide this AST (assist-trigger) architecture concept for those whom are interested. In addition, the proposed data cache has been integrated with Proto3-ARM9TM, AMBA 2.0 system and related peripherals, designed previously in our lab, and implemented in TSMC 0.18-μm cell library. The resulting Proto3-ARM9TM has a core area of 1.963 × 1.955 mm2 and the whole chip area is 2.501 × 2.497 mm2. The average power consumption is 147 mW at the operating frequency of 129 MHz.

第1章 緒論 1 1.1 研究動機 1 1.2 低功率快取記憶體相關研究 2 1.3 低功率資料快取記憶體架構 4 第2章 傳統型資料快取記憶體之運作架構 5 2.1 概念 5 2.2 定址方式 6 2.3 快取區塊對映架構 7 2.4 置換策略(Replacement) 11 2.5 寫入策略(Write policy) 12 2.6 配置策略Allocation Policy 13 2.7 四路集合關聯式快取記憶體之範例 15 2.8 快取記憶體功率消耗分析 17 第3章 低功率快取記憶體相關研究 19 3.1 低功率快取記憶體架構 19 3.1.1 概念 19 3.1.2 直接映射式快取記憶體低功率發展 20 3.1.3 集合關聯式快取記憶體發展架構 22 3.1.4 分離式快取記憶體架構發展 24 3.1.5 結論 25 3.2 快取記憶體架構與預測 26 3.2.1 預測準則(Prediction for what) 26 3.2.2 預測的方式 26 3.2.3 路徑預測(Way-predicting)式快取記憶體架構發展 27 3.2.4 結論 28 第4章 部分路徑預測資料快取記憶體架構 29 4.1 預測型電路架構 29 4.2 傳統型路徑預測快取記憶體電路 30 4.3 啟動時機 31 4.4 Trigger型電路的演進 32 4.5 Hit stealing概念 35 4.6 AST(Assist)電路架構 41 第5章 實驗結果 45 5.1 實驗架構 45 5.2 實驗設定 46 5.2.1 效能評估程式介紹 46 5.2.2 資料記憶體存取追蹤檔案產生流程 47 5.2.3 資料快取記憶體實驗類型與EDA環境 50 5.3 實驗結果 52 第6章 FPGA驗證與元件庫實現 56 6.1 FPGA驗證 56 6.2 元件庫實現 57 6.2.1 元件庫設計的實現與驗證流程 57 6.2.2 合成 58 6.2.3 DFT與ATPG 58 6.2.4 自動化佈局 60 第7章 結論 62 參考文獻 63

[1] J. Montanaro, “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor,” in Proc. of International Solid-State Circuits Conference, vol. 32, no.11, pp. 1703–1714, 1996.
[2] S. Segars, “Low Power Design Techniques for Microprocessors,” in Proc. of International Solid-State Circuits Conference tutorial note, Feb. 2001.
[3] S. Bhattacharyya, T. Srikanthan, K. Vivekanandarajah, “Area and Power Efficient Pattern Prediction Architecture for Filter Cache Access Prediction in the Instruction Memory Hierarchy,” in Proc. of VLSI Design, Automation and Test, pp. 345-348, 2005.
[4] C. Jian, P. Ruihua, F. Yuzhuo, ”Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison,” in Proc. of International Conference on ASIC, pp. 73-77,October 2005.
[5] David A. Patterson, John L. Hennessy, Computer Organization and Design, The Hardware/Software Interface, 3rd ed., Morgan Kaufmann, 2003.
[6] David A. Patterson, John L. Hennessy, Computer Architecture: A Quantitative Approach, 4th ed., Morgan Kaufmann, 2007.
[7] Ming-Bo Lin, Digital System Designs and Practices: using Verilog HDL and FPGAs, Wiley, 2008.
[8] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, 2nd ed., Prentice Hall, 2003.
[9] N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proc. of 17th International Symposium on Computer Architecture, pp. 364 - 373, May 1990.
[10] R.K. Megalingam, K.B. Deepu, I.P. Joseph, and V. Vikram, "Phased set associative cache design for reduced power consumption," in Proc. of International Conference on Computer Science and Information Technology, pp. 551 - 556, August 2009.
[11] K. Inoue, T. Ishihara and K. Murakami, "Way-predicting set-associative cache for high performance and low energy consumption," in Proc. of International Symposium on Low Power Electronics and Design, pp. 273 - 275, May 1999.
[12] M. Soryani, M. Sharifi, M.H. Rezvani, "Performance Evaluation of Cache Memory Organizations in Embedded Systems," in Proc. of Information Technology, pp. 1045 -1050, April 2007.
[13] N. P. Jouppi, “Cache Write Policies and Performance,” in Proc. of International Symposium on Computer Architecture, pp. 191 – 201, May 1993.
[14] D. Kroning and S. M. Muller. "The Impact of write back on the cache performance," In Proc. of 18th International Conference on Applied Informatics, Innsbruck, pp. 213-217, ACTA Press, 2000.
[15] P. P. Chu and R. Gottipati, "Write buffer design for on-chip cache," in Proc. of International Conference on Computer Design, pp. 311-316, October 1994.
[16] A. Seznec and F. Bodin, "Skewed-associative caches," in Proc. of Parallel Architectures and Programming Languages Europe, pp. 305 – 316, June 1993.
[17] J. Sahuquillo, A. Pont, "Splitting the data cache: a survey," in IEEE Concurrency, Volume 8, Issue 3, pp. 30 - 35, July-Sept. 2000.
[18] L. John , A. Subramanian, "Design and Performance Evaluation of a Cache Assist to implement Selective Caching," in Proc. of International Conference on Computer Design, pp. 510 - 518, October 1997.
[19] J. Kin, M. Gupta and Mangione-Smith, W.H., "The filter cache: An energy efficient memory structure," in Proc. 30th International Synposzum on Microarchitecture, pp. 184–193, April 2001.
[20] N. Bellas, I.N. Hajj and C.D. Polychronopoulos, "Using dynamic cache management techniques to reduce energy in a highperformance processor," in Proc. of International Symposium on Low Power Electronics and Design, pp. 64 - 69, 1999.
[21] A. Gonzalez, C. Aliaga, and M. Valero, "A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality," in Proc. of International Conference on Supercomputing, pp. 338 – 347, 1995.
[22] P.J.D. Langen and B.H.H. Juurlink, "Limiting the Number of Dirty Cache Lines," in Proc. of Design, Automation and Test in Europe Conference and Exhibition, pp. 670 - 675, April 2009.
[23] J.E. Smith, "A Study of Branch Prediction Strategies," in Proc. 25 th International Symposium on Computer Architecture: Retrospectives and Reprints, pp. 202 – 215, 1998.
[24] D. Nicolaescu, A. Veidenbaum, and A. Nicolau, "Reducing power consumption for high-associativity data caches in embedded processors," in Proc. of Design, Automation and Test in Europe Conference and Exhibition, pp. 1064 - 1068, 2003.
[25] P. Jung-Wook, P. Gi-Ho, and P. Sung-Bae K. Shin-Dug, "Power-aware deterministic block allocation for low-power way-selective cache structure," in Proc. of International Conference on Computer Design, pp. 42-47, October 2004.
[26] C. Hsin-Chuan and C. Jen-Shiun, "Low-power way-predicting cache using valid-bit pre-decision for parallel architectures," in Proc. of International Conference on Advanced Information Networking and Applications, pp. 203 - 206. vol.2, March 2005.
[27] Z. Zhichun and Z. Xiaodong, "Access-mode predictions for low-power cache design," in Proc. of IEEE Micro, pp. 58 - 71, Mar./Apr. 2002.
[28] J. Alghazo, A. Akaaboune, and N. Botros, "Sf-lru cache replacement algorithm," in Proc. of 14th IEEE International Workshop on Memory Technology, Design, and Testing, pp. 19-24. Aug. 2004.
[29] ARM9E-J Technical Reference Manual, ARM Ltd., 2001, 2002.
[30] AMBA Specification (Rev2.0), ARM Ltd., 1999.
[31] Jim Handy, The Cache Memory Book, 2nd ed., Morgan Kaufmann, 1998.
[32] M.R. Guthaus, J.S. Ringenberg , D. Ernst, T.M. Austin, T. Mudge, R.B. Brown: “MiBench: A free, commercially representative embedded benchmark”, in Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pp 3-14, 2 Dec 2001.
[33] 方志中,ARMv4T指令集架構相容之微處理器智財設計與驗證,國立台灣科
技大學電子工程研究所,碩士論文,2009。
[34] 詹勝祥,AMBA 2.0之相容匯流排控制器智財設計與驗證,國立台灣科技大
學電子工程研究所,碩士論文,2007。
[35] 王偉臣,ARM922T架構相容之快取記憶體系統智財設計與驗證,國立台灣
科技大學電子工程研究所,碩士論文,2007。
[36] 黃逸杰,適用於32位元嵌入式微處理器之低功率指令快取記憶體架構設計
與驗證 ,國立台灣科技大學電子工程研究所,碩士論文,2009。

QR CODE