簡易檢索 / 詳目顯示

研究生: 黃逸杰
Yi-Chieh Huang
論文名稱: 適用於32位元嵌入式微處理器之低功率指令快取記憶體架構設計與驗證
The Design and Verification of a Low-Power Instruction Cache Architecture for 32-bit Embedded Microprocessors
指導教授: 林銘波
Ming-Bo Lin
口試委員: 詹景裕
Gene-Eu Jan
陳郁堂
Yie-Tarng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2009
畢業學年度: 98
語文別: 中文
論文頁數: 66
中文關鍵詞: 快取記憶體低功率
外文關鍵詞: cache memory, low-power
相關次數: 點閱:200下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著超大型積體電路製程的進步,處理器與主記憶體之間的速度差距也不斷增加。為了縮短它們之間的差距,快取記憶體已成為不可或缺之橋樑,然而也增加可觀的功率消耗。在一般嵌入式系統中,快取記憶體的消耗功率佔了相當的比例。因此,如何降低快取記憶體的功率消耗已經成為嵌入式系統設計的重要議題之一。
    在本論文中,我們藉由改良傳統快取記憶體架構的水平與垂直兩個層面,並觀察及分析處理器存取指令記憶體時的行為模式,提出一個低功率指令快取記憶體架構。在此架構中,我們綜合使用了三種技術,分別為區塊範圍偵測濾過式快取、資料記憶體槽化以及標籤記憶體存取忽略與資料記憶體致能控制。藉由這些技術,我們能有效地減少對第一層級快取的存取次數,並在存取第一層級快取時,可以盡可能減少對標籤記憶體與資料記憶體的存取次數,以降低功率消耗。
    實驗結果顯示,相對於傳統四路集合關聯式快取記憶體,在不影響記憶體系統效能之下,減少了約55%的功率消耗。完成的低功率指令快取記憶體已經與Proto3-ARM9TM、AMBA匯流排與周邊智財整合,並分別在Xilinx Spartan-3 XC3S1500-4FG676 FPGA以及TSMC 0.18 μm元件庫上實現。在FPGA實現部分,共消耗了10302個LUTs,最高操作頻率可達29 MHz。在元件庫實現方面,其所耗費的核心面積為 2.200 × 2.218 mm2,整體晶片面積為3.036 × 3.028 mm2,在129 MHz操作頻率下,平均功耗為147 mW。


    The speed gap between processors and DRAM devices is increasingly widened along with the progress of integrated circuit technique. To bridge such a gap, the use of cache memory in between the processor and the main memory has become indispensable. Nevertheless, the addition of cache memory also consumes a lot of power, which is a large part of the total power dissipation of an embedded system. Consequently, the design of a low-power cache memory has become an important issue in an embedded system.
    In this thesis, a low-power instruction cache architecture is proposed. This architecture is based on the improvement of conventional cache architecture from both horizontal and vertical aspects, as well as the access behavior of instruction cache memory. Based on this, three techniques, including the filter cache with block detection, the data memory sub-banking and tag check ignoring, and the enable control of data memory, are used. With these techniques, the power dissipation can be reduced by the reduction of the access of L1 cache and the access of the tag memory and data memory during accessing L1 cache.
    The resulting architecture can reduce about 55% power consumption compared to the conventional 4-way set-associative instruction cache. In addition, the proposed instruction cache has been integrated with Proto3-ARM9TM, AMBA 2.0 system and related peripherals, designed previously in our lab, and implemented as well as verified with Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18 μm cell library, respectively. When realized with the FPGA, the system consumes 10302 LUTs and operates at maximum frequently of 29 MHz. When realized with the cell library, the Proto3-ARM9TM has a core area of 2.200 × 2.218 mm2 and the whole chip area is 3.036 × 3.028 mm2. The average power consumption is 147 mW at the operating frequency of 129 MHz.

    第1章 緒論----------------------------------------------1 1.1 研究動機------------------------------------------1 1.2 低功率快取記憶體相關研究--------------------------2 1.3 提出之低功率指令快取記憶體架構綜述----------------3 1.4 章節編排------------------------------------------3 第2章 傳統型快取記憶體之運作架構------------------------4 2.1 概念----------------------------------------------4 2.2 定址方式------------------------------------------5 2.3 快取區塊對映架構----------------------------------6 2.4 置換策略-----------------------------------------10 2.5 寫入策略-----------------------------------------11 2.6 四路集合關聯式快取記憶體之範例-------------------12 2.7 快取記憶體功率消耗分析---------------------------14 第3章 低功率快取記憶體相關研究-------------------------16 3.1 歷史預測型濾過式指令快取記憶體架構---------------16 3.1.1 概念---------------------------------------------16 3.1.2 預測機制與演算法---------------------------------17 3.1.3 硬體架構-----------------------------------------20 3.1.4 結論---------------------------------------------21 3.2 部分標籤比較式快取記憶體架構---------------------21 3.2.1 原理概念-----------------------------------------21 3.2.2 運作方式-----------------------------------------23 3.2.3 組織架構-----------------------------------------24 3.2.4 時序問題-----------------------------------------24 3.2.5 結論---------------------------------------------26 第4章 提出之低功率指令快取記憶體架構-------------------27 4.1 區塊範圍偵測濾過式快取---------------------------27 4.2 資料記憶體槽化-----------------------------------35 4.3 標籤記憶體存取忽略與資料記憶體致能控制-----------37 第5章 實驗結果--------------------------------------------43 5.1 實驗架構-----------------------------------------43 5.2 實驗設定-----------------------------------------44 5.2.1 效能評估程式介紹---------------------------------44 5.2.2 測試程式產生流程---------------------------------45 5.2.3 指令記憶體存取追蹤檔案產生流程-------------------46 5.2.4 指令快取記憶體實驗類型與EDA環境------------------47 5.3 實驗結果-----------------------------------------49 第6章 FPGA驗證與元件庫實現-----------------------------52 6.1 FPGA驗證----------------------------------------------52 6.1.1 驗證流程--------------------------------------------52 6.1.2 FPGA設計實現驗證環境--------------------------------53 6.1.3 FPGA驗證操作與測試程式流程--------------------------54 6.1.4 FPGA合成結果----------------------------------------55 6.2 元件庫實現--------------------------------------------56 6.2.1 元件庫設計的實現與驗證流程--------------------------56 6.2.2 合成------------------------------------------------57 6.2.3 DFT與ATPG-------------------------------------------58 6.2.4 自動化佈局------------------------------------------61 第7章 結論---------------------------------------------63 參考文獻--------------------------------------------------65

    [1] Montanaro J., et al.: “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor”, in IEEE J. Solid-State Circuits, vol. 32, no.11, pp. 1703–1714, 1996.
    [2] S. Segars: “Low Power Design Techniques for Microprocessors”, in ISSCC Tutorial, Feb. 2001.
    [3] Bhattacharyya S., Srikanthan T., Vivekanandarajah K.: “Area and Power Efficient Pattern Prediction Architecture for Filter Cache Access Prediction in the Instruction Memory Hierarchy”, in VLSI Design, Automation and Test, pp. 345-348, 2005.
    [4] Jian Chen, Ruihua Peng, Yuzhuo Fu: ”Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison”, in ASIC, 2005. ASICON 2005. 6th International Conference, vol. 1, pp. 73-77, 24-0 Oct. 2005.
    [5] David A. Patterson, John L. Hennessy: “Computer Organization and Design, The Hardware/Software Interface”, 3rd ed., Morgan Kaufmann, 2003.
    [6] Tse-Yu Yeh, Patt. Y.N.: “Alternative Implementation of Two-Level Adaptive Branch Prediction”, in The 19th Annual International Symposium, pp. 124-134, May 19-21, 1992.
    [7] Hasegawa A: “High code density, Low power”, in IEEE micro, vol. 15, no. 6, pp. 11, 1995.
    [8] Ching-Long Su, Alvin M. Despain: “Cache Design Trade-offs for Power and Performance Optimization: A Case Study” in International Symposium on Low Power Electronics and Design Proceedings of the 1995 international symposium on Low power design, pp. 63-68, 1995.
    [9] PrimePower Manual, Synopsys Ltd., 2003.
    [10] David A. Patterson, John L. Hennessy, “Computer Architecture: A Quantitative Approach”, 3rd ed. Morgan Kaufmann, 2003.
    [11] Nakkar M., Ahmed N.: “Low Power Cache Architecture”, in System-on-Chip for Real-Time Applications, The 6th International Workshop, pp. 1-4, 2006.
    [12] Hongkyun Jung, Hyoungjun Kim, Kwangmyoung Kang, Kwangki Ryoo, Hanbat Nat. Univ.: ”Performance Improvement and Low Power Design of Embedded Processor”, in Convergence and Hybrid Information Technology, 2008. ICCIT '08. Third International Conference, vol. 2, pp 140-145, 11-13 Nov. 2008.
    [13] M.R. Guthaus, J.S. Ringenberg , D. Ernst, T.M. Austin, T. Mudge, R.B. Brown: “MiBench: A free, commercially representative embedded benchmark”, in Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pp 3-14, 2 Dec 2001.
    [14] Rui Min, Wen-Ben Jone, Yiming Hu: “Location Cache: A Low-Power L2 Cache System”, in Low Power Electronics and Design, 2004. ISLPED '04. Proceedings of the 2004 International Symposium, pp 120-125, 2004.
    [15] Inoue K., Ishihara T., Murakami K.: “Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption”, in Low Power Electronics and Design, 1999. Proceedings. 1999 International Symposium, pp. 273-275, 1999.
    [16] ARM9E-J Technical Reference Manual, ARM Ltd., 2001, 2002.
    [17] AMBA Specification (Rev2.0), ARM Ltd., 1999.
    [18] 方志中,ARMv4T指令集架構相容之微處理器智財設計與驗證,國立台灣科
    技大學電子工程研究所,碩士論文,2009。
    [19] 詹勝祥,AMBA 2.0之相容匯流排控制器智財設計與驗證,國立台灣科技大
    學電子工程研究所,碩士論文,2007。

    QR CODE