研究生: |
陳聖文 SHEN-WEN CHEN |
---|---|
論文名稱: |
相容32位元嵌入式微處理器之低功率資料快取記憶體架構設計與驗證 The Design and Verification of a Low-Power Data Cache Architecture for 32-bit Embedded Microprocessors |
指導教授: |
林銘波
Ming-Bo Lin |
口試委員: |
詹景裕
Ching-Yuh Jan 楊兆華 Zhao-Hua Yang 陳郁堂 Yie-Tarng Chen 呂紹偉 Shao-Wei Leu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 66 |
中文關鍵詞: | 低功率設計 、資料快取記憶體 、路徑預測 |
外文關鍵詞: | Low-power design, data cache memory, way-predict |
相關次數: | 點閱:231 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深次微米製程技術的發展,使硬體持續依照摩爾定律(Moore's Law)成長,處理器與主記憶體之間的速度差距也因此不斷增加。為了縮短兩者之間存在的差距,快取記憶體的設計儼然成為處理器運算速度快慢的關鍵,然而快取記憶體也替整個晶片增加了可觀的功率消耗。在一般嵌入式系統中,快取記憶體的消耗功率佔了相當重的比例。因此,如何有效降低快取記憶體的功率消耗是嵌入式系統設計與未來發展的熱門議題。
在本論文中,我們剖析了傳統快取記憶體設計上的細節,透過觀察及分析處理器存取資料記憶體時產生的行為模式,提出一個低功率資料快取記憶體架構,稱之為部分啟動式路徑預測型快取(partial way-predicting cache)。在此架構中,我們結合了二種技術,標籤記憶體存取忽略與MRU路徑預測型快取記憶體;發展為切換型路徑預測架構,將預測型電路解構成預測(predict)與不預測(non-predict)兩種操作模式,以輔助單元幫助電路決定啟動時機,使電路能具備微量的思考能力,判斷出預測命中(predict-hit)機會較大的情況下才觸發預測。
實驗結果顯示,我們提出的方法能夠有效地降低傳統MRU路徑預測型快取記憶體約76%的效能損失,且達到與傳統路徑預測型快取記憶體相近的低功率效果約43%,並且提供一個AST輔助型電路架構的理念,給往後的研究者以此架構為基礎繼續發展。完成的低功率資料快取記憶體已經與Proto3-ARM9TM、AMBA匯流排與周邊智財整合,並在TSMC 0.18 μm元件庫上實現,其所耗費的核心面積為 1.963 × 1.955 mm2,整體晶片面積為2.501 × 2.497 mm2,在129 MHz操作頻率下,平均功耗為147 mW。
Deep submicron technology leads to the huge growth of hardware as the prediction of Moore's Law. The speed gap between processors and DRAM devices is increasingly widened along the progress of integrated circuit manufactures. To bridge such a gap, the design of cache memory in between the processor and main memory has become critical. However, the addition of cache memory also consumes a lot of power, which is a large part of the total power dissipation of an embedded system. Consequently, the design of a low-power cache memory has become an important issue in an embedded system.
In this thesis, a low-power data cache architecture, called partial way-predicting (assist-trigger) architecture, is proposed. This architecture is based on the improvement of conventional cache architecture according to the access behavior of data cache memory. To explore this, two low-power techniques, the way-predicting cache with MRU table and tag skipping technique, are combined. The resulting cache architecture can be operated in two modes: predict and non-predict. By using an assist unit to trigger the cache system into the prediction mode opportunely, this architecture avoids huge prediction miss effectively.
As compared to conventional 4-way way-predicting cache, the proposed partial way-predicting architecture can reduce about the 76% performance penalty at the cost of 57% power increase. We also provide this AST (assist-trigger) architecture concept for those whom are interested. In addition, the proposed data cache has been integrated with Proto3-ARM9TM, AMBA 2.0 system and related peripherals, designed previously in our lab, and implemented in TSMC 0.18-μm cell library. The resulting Proto3-ARM9TM has a core area of 1.963 × 1.955 mm2 and the whole chip area is 2.501 × 2.497 mm2. The average power consumption is 147 mW at the operating frequency of 129 MHz.
[1] J. Montanaro, “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor,” in Proc. of International Solid-State Circuits Conference, vol. 32, no.11, pp. 1703–1714, 1996.
[2] S. Segars, “Low Power Design Techniques for Microprocessors,” in Proc. of International Solid-State Circuits Conference tutorial note, Feb. 2001.
[3] S. Bhattacharyya, T. Srikanthan, K. Vivekanandarajah, “Area and Power Efficient Pattern Prediction Architecture for Filter Cache Access Prediction in the Instruction Memory Hierarchy,” in Proc. of VLSI Design, Automation and Test, pp. 345-348, 2005.
[4] C. Jian, P. Ruihua, F. Yuzhuo, ”Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison,” in Proc. of International Conference on ASIC, pp. 73-77,October 2005.
[5] David A. Patterson, John L. Hennessy, Computer Organization and Design, The Hardware/Software Interface, 3rd ed., Morgan Kaufmann, 2003.
[6] David A. Patterson, John L. Hennessy, Computer Architecture: A Quantitative Approach, 4th ed., Morgan Kaufmann, 2007.
[7] Ming-Bo Lin, Digital System Designs and Practices: using Verilog HDL and FPGAs, Wiley, 2008.
[8] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, 2nd ed., Prentice Hall, 2003.
[9] N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proc. of 17th International Symposium on Computer Architecture, pp. 364 - 373, May 1990.
[10] R.K. Megalingam, K.B. Deepu, I.P. Joseph, and V. Vikram, "Phased set associative cache design for reduced power consumption," in Proc. of International Conference on Computer Science and Information Technology, pp. 551 - 556, August 2009.
[11] K. Inoue, T. Ishihara and K. Murakami, "Way-predicting set-associative cache for high performance and low energy consumption," in Proc. of International Symposium on Low Power Electronics and Design, pp. 273 - 275, May 1999.
[12] M. Soryani, M. Sharifi, M.H. Rezvani, "Performance Evaluation of Cache Memory Organizations in Embedded Systems," in Proc. of Information Technology, pp. 1045 -1050, April 2007.
[13] N. P. Jouppi, “Cache Write Policies and Performance,” in Proc. of International Symposium on Computer Architecture, pp. 191 – 201, May 1993.
[14] D. Kroning and S. M. Muller. "The Impact of write back on the cache performance," In Proc. of 18th International Conference on Applied Informatics, Innsbruck, pp. 213-217, ACTA Press, 2000.
[15] P. P. Chu and R. Gottipati, "Write buffer design for on-chip cache," in Proc. of International Conference on Computer Design, pp. 311-316, October 1994.
[16] A. Seznec and F. Bodin, "Skewed-associative caches," in Proc. of Parallel Architectures and Programming Languages Europe, pp. 305 – 316, June 1993.
[17] J. Sahuquillo, A. Pont, "Splitting the data cache: a survey," in IEEE Concurrency, Volume 8, Issue 3, pp. 30 - 35, July-Sept. 2000.
[18] L. John , A. Subramanian, "Design and Performance Evaluation of a Cache Assist to implement Selective Caching," in Proc. of International Conference on Computer Design, pp. 510 - 518, October 1997.
[19] J. Kin, M. Gupta and Mangione-Smith, W.H., "The filter cache: An energy efficient memory structure," in Proc. 30th International Synposzum on Microarchitecture, pp. 184–193, April 2001.
[20] N. Bellas, I.N. Hajj and C.D. Polychronopoulos, "Using dynamic cache management techniques to reduce energy in a highperformance processor," in Proc. of International Symposium on Low Power Electronics and Design, pp. 64 - 69, 1999.
[21] A. Gonzalez, C. Aliaga, and M. Valero, "A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality," in Proc. of International Conference on Supercomputing, pp. 338 – 347, 1995.
[22] P.J.D. Langen and B.H.H. Juurlink, "Limiting the Number of Dirty Cache Lines," in Proc. of Design, Automation and Test in Europe Conference and Exhibition, pp. 670 - 675, April 2009.
[23] J.E. Smith, "A Study of Branch Prediction Strategies," in Proc. 25 th International Symposium on Computer Architecture: Retrospectives and Reprints, pp. 202 – 215, 1998.
[24] D. Nicolaescu, A. Veidenbaum, and A. Nicolau, "Reducing power consumption for high-associativity data caches in embedded processors," in Proc. of Design, Automation and Test in Europe Conference and Exhibition, pp. 1064 - 1068, 2003.
[25] P. Jung-Wook, P. Gi-Ho, and P. Sung-Bae K. Shin-Dug, "Power-aware deterministic block allocation for low-power way-selective cache structure," in Proc. of International Conference on Computer Design, pp. 42-47, October 2004.
[26] C. Hsin-Chuan and C. Jen-Shiun, "Low-power way-predicting cache using valid-bit pre-decision for parallel architectures," in Proc. of International Conference on Advanced Information Networking and Applications, pp. 203 - 206. vol.2, March 2005.
[27] Z. Zhichun and Z. Xiaodong, "Access-mode predictions for low-power cache design," in Proc. of IEEE Micro, pp. 58 - 71, Mar./Apr. 2002.
[28] J. Alghazo, A. Akaaboune, and N. Botros, "Sf-lru cache replacement algorithm," in Proc. of 14th IEEE International Workshop on Memory Technology, Design, and Testing, pp. 19-24. Aug. 2004.
[29] ARM9E-J Technical Reference Manual, ARM Ltd., 2001, 2002.
[30] AMBA Specification (Rev2.0), ARM Ltd., 1999.
[31] Jim Handy, The Cache Memory Book, 2nd ed., Morgan Kaufmann, 1998.
[32] M.R. Guthaus, J.S. Ringenberg , D. Ernst, T.M. Austin, T. Mudge, R.B. Brown: “MiBench: A free, commercially representative embedded benchmark”, in Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pp 3-14, 2 Dec 2001.
[33] 方志中,ARMv4T指令集架構相容之微處理器智財設計與驗證,國立台灣科
技大學電子工程研究所,碩士論文,2009。
[34] 詹勝祥,AMBA 2.0之相容匯流排控制器智財設計與驗證,國立台灣科技大
學電子工程研究所,碩士論文,2007。
[35] 王偉臣,ARM922T架構相容之快取記憶體系統智財設計與驗證,國立台灣
科技大學電子工程研究所,碩士論文,2007。
[36] 黃逸杰,適用於32位元嵌入式微處理器之低功率指令快取記憶體架構設計
與驗證 ,國立台灣科技大學電子工程研究所,碩士論文,2009。