簡易檢索 / 詳目顯示

研究生: 方志中
Jhih-Jhong Fang
論文名稱: ARMv4T指令集架構相容之微處理器智財設計與驗證
The Design and Verification of an ARMv4T Instruction Set Architecture Compatible Microprocessor IP
指導教授: 林銘波
Ming-Bo Lin
口試委員: 陳郁堂
Yie-Tarng Chen
詹景裕
Gene Eu Jan
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 120
中文關鍵詞: 管線處理器危障前潰分支快取記憶體
外文關鍵詞: ARMv4T, processor, hazard, forwarding, BTC
相關次數: 點閱:231下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在本篇論文中,我們設計並實現了一個與ARMv4T指令集架構相容的微處理機智財(Intellectual Property, IP) – Proto3-ARM9TM。本論文為了改善Proto-ARM9M處理器效能[25],因此重新設計了處理器架構與主要模組。在架構設計方面:將管線級數由5級增加到9級、暫存器檔案以SRAM取代D型正反器架構、使用管線化的並行式乘加器取代迴圈式乘加器、使用Barrel移位器取代Design Ware模組。並藉由加入以下機制,來減少危障的影響:使用兩組資料前饋路徑、兩級例外偵測、分支目的快取、兩位元動態分支預測。在指令集實現方面,另外實現了協同處理器指令與Thumb指令集。另外,本論文中包含適用於Proto3-ARM9TM的協同處理器介面設計與實現。將本論文中的Proto3-ARM9TM處理器與Proto-ARM9M處理器做比較,其FPGA操作頻率由21MHz提升至45.3MHz,相同測試程式下的IPC由0.47提升至0.7,整體效能則增加221.58%。
此Proto3-ARM9TM系統器整合了AMBA匯流排與周邊智財,並分別在Xilinx Spartan-3 XC3S1500-4FG676 FPGA以及TSMC 0.18μm元件庫上實現。在FPGA實現部份,共消耗了9728個LUTs,最高操作頻率可達31MHz。在元件庫實現方面,其所耗費的核心面積為1.400×1.393 mm2,整體晶片面積為2.280×2.274mm2,在134MHz操作頻率下,平均功耗為83mW。


In this thesis, an ARMv4T instruction set architecture compatible microprocessor IP (Intellectual Property), Proto3-ARM9TM, is proposed. In order to improve the performance of the Proto-ARM9M processor [25], we redesign the architectureof the processor and its major modules. For the processor design, the number of pipeline stages is increased from 5 to 9, the register file is constructed by using SRAMs instead of D-FFs, multiply-accumulator is built with a pipelined parallel multiplier instead of the iterated multiplier, and a barrel shifter is designed to replace the DesignWare shifter block. We also employ the following mechanisms to reduce the effects of hazards: two groups of forwarding paths, two stages for exception detection, a 128x60 branch target cache, and the 2-bit branch prediction scheme. For the implemnted instruction set, we also implement both the coprocessor and the Thumb instruction sets. In addition, a coprocessor interface for the Proto3-ARM9TM processor is designed and implemented. Comparing to the Proto-ARM9M processor, the operating frequency is increased from 21 MHz to 45.3 MHz on the same FPGA platform, the IPC is increased from 0.47 to 0.7 on the same set of testing programs, and the performance is increased by an amount of 221.58%.
The Proto3-ARM9TM system along with AMBA and its related peripherals, are implemented and verified at Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18μm cell library, respectively. When realized with the FPGA, the Proto3-ARM9TM system consumes 9728 LUTs and operates at the maximum frequency of 31 MHz. When realized with the cell library, the Proto3-ARM9TM has a core area of 1.400×1.393 mm2 and the whole chip area is 2.280×2.274 mm2. The average power consumption is 83 mW at the operating frequency of 134 MHz.

第1章 緒論 1 1.1 研究動機 1 1.2 章節編排 2 第2章 ARM微處理器簡介 3 2.1 ARM的發展現況 3 2.2 ARM程式模型(PROGRAMMER’S MODEL) 3 2.3 THUMB程式模型 4 2.4 處理器狀態切換 6 2.5 ARM處理器定址 7 2.6 ARM指令集架構 8 2.7 THUMB指令集架構 12 第3章 PROTO3-ARM9TM處理器系統架構 15 3.1 ARM9TDMI的簡介 15 3.2 ARM9TDMI的簡介 16 3.3 PROTO3-ARM9TM的特性 17 3.4 架構規劃與資料路徑 21 3.4.1 Proto-ARM9M處理器資料路徑 22 3.4.2 Proto-ARM9M處理器時序說明 23 3.4.3 Proto2-ARM9M處理器資料路徑 25 3.4.4 Proto2-ARM9M處理器時序說明 28 3.4.5 Proto3-ARM9TM處理器資料路徑 29 3.4.6 Proto3-ARM9TM處理器時序說明 31 3.5 處理器核心時脈控制 35 第4章 PROTO3-ARM9TM的控制器與資料路徑設計 38 4.1 指令控制器(INSTRUCTION CONTROLLER) 38 4.2 THUMB指令解碼器 42 4.3 算術邏輯單元(ARITHMETIC AND LOGIC UNIT, ALU) 48 4.4 移位器(SHIFTER) 50 4.5 乘加器(MULTIPLY-ACCUMULATOR, MAC) 54 4.6 程式狀態暫存器(PROGRAM STATUS REGISTER, PSR) 58 4.7 暫存器檔案(REGISTER FILE) 61 4.8 協同處理器介面(COPROCESSOR INTERFACE) 69 第5章 PROTO3-ARM9TM的危障處理 72 5.1 結構危障(STRUCTURAL HAZARD) 72 5.2 控制危障(CONTROL HAZARD) 73 5.3 資料危障(DATA HAZARD) 78 5.4 PROTO3-ARM9TM例外處理 86 5.4.1 ARM處理器例外模式介紹 86 5.4.2 例外中斷處理 87 5.5 動態分支預測(DYNAMIC BRANCH PREDICTION) 92 第6章 PROTO3-ARM9TM矽智財設計 101 6.1 暫存器轉移層的設計驗證流程 101 6.2 PROTO3-ARM9TM體系統智財的驗證流程 102 第7章 FPAG驗證與結果分析 105 7.1 FPGA驗證流程 105 7.2 FPGA設計實現驗證環境 106 7.4 FPGA合成結果 108 7.5 效能評估 109 第8章 元件庫的實現與效能評估 110 8.1 元件庫設計的實現與驗證流程 110 8.2 合成 110 8.3 DFT與ATPG 112 8.4 自動化佈局 115 第9章 結論 117 參考文獻 119

[1] Steve Furber, ARM System-on-Chip Architecture, Addison-Wesley, 2000.
[2] http://www.arm.com
[3] ARM Architecture Reference Manual, ARM Ltd, 2000
[4] AMBA Specification (Rev 2.0), ARM Ltd, 1999
[5] ARM Instruction Set Quick Reference Card, ARM Ltd, 1999.
[6] ARM9TDMI Technical Reference Manual, ARM Ltd, 2000
[7] ARM9E-S Technical Reference Manual, ARM Ltd, 2000
[8] ARM926EJ-S Technical Reference Manual, ARM Ltd., 2000
[9] Simon Segars, “The ARM9 Family – High Performance Microprocessors for Embedded Applications”, IEEE International Conference on Computer Design 1999 (ICCD’99).
[10] Booth A.D., “A Signed binary multiplication technique”, Quart. J. MECH. Appl, Math., vol.4, pp.236-240, 1951.
[11] Milos D. Ercegovac, Tomas Lang, Digital Arithmetic, San Francisco, Morgan Kaufmann Publisher, 2004.
[12] Israel Koren, Computer Arithmetic Algorithms, New Jersey, Prentice Hall, 1994
[13] Behrooz Parhami, Computer Arithmetic, algorithms and hardware designs, New York, Oxford, 2000.
[14] Michael D. Ciletti, Advanced Digital Design with the Verilog HDL, Pearson.
[15] Ulrich Golze, VLSI Chip Design with the Hardware Description Language VERILOG, Springer, 1996.
[16] Ming-Bo Lin, VLSI System Design Lecture Notes, 2006
[17] Ming-Bo Lin, FPGA System Design and Practice Lecture Notes, 2006
[18] J. L. Hennessy and D. A. Patterson, Computer Organization and Design: The Hardware / Software interface, 3rd ed., Morgan Kaufmann, 2003.
[19] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 3rd ed., Morgan Kaufmann, 2003.
[20] D.S. Dawoud, “Modified Booth Algotithm for Higher radix fixed-point point multiplication”, IEEE Transactions on Electronic Computers, pp.95-100, 1997.
[21] Homayoon Sam and Arupratan Gupta, “A Generalized Multibit Recoding of Two’s complement Binary Numbers and its Proof with Application in Multiplier Implementations”, IEEE Transactions on Electronic Computers, pp.1006-1015, 1990.
[22] L. Dadda, Some schemes for parallel multipliers, Alta Frequenza 34 (March 1965), 346-356.
[23] C. S. Wallace, “A Suggestion for a Fast Multiplier”, IEEE Transactions on Electronic Computers, Feb. 1964 Page(s):14 – 17.
[24] 林銘波,數位系統設計 – 原理、實務與 ASIC 實現,第三版,全華科技圖書股份有限公司,2002。
[25] 林晉禾, ARM v4指令集架構相容之微處裡器智財設計與驗證,國立台灣科技大學電子工程系,碩士論文, 2005。
[26] 詹勝祥,AMBA 2.0 之相容匯流排控制器智財設計與驗證,國立台灣科技大學電子工程研究所,碩士論文,2007。

QR CODE