研究生: |
方志中 Jhih-Jhong Fang |
---|---|
論文名稱: |
ARMv4T指令集架構相容之微處理器智財設計與驗證 The Design and Verification of an ARMv4T Instruction Set Architecture Compatible Microprocessor IP |
指導教授: |
林銘波
Ming-Bo Lin |
口試委員: |
陳郁堂
Yie-Tarng Chen 詹景裕 Gene Eu Jan |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 120 |
中文關鍵詞: | 管線 、處理器 、危障 、前潰 、分支快取記憶體 |
外文關鍵詞: | ARMv4T, processor, hazard, forwarding, BTC |
相關次數: | 點閱:231 下載:15 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本篇論文中,我們設計並實現了一個與ARMv4T指令集架構相容的微處理機智財(Intellectual Property, IP) – Proto3-ARM9TM。本論文為了改善Proto-ARM9M處理器效能[25],因此重新設計了處理器架構與主要模組。在架構設計方面:將管線級數由5級增加到9級、暫存器檔案以SRAM取代D型正反器架構、使用管線化的並行式乘加器取代迴圈式乘加器、使用Barrel移位器取代Design Ware模組。並藉由加入以下機制,來減少危障的影響:使用兩組資料前饋路徑、兩級例外偵測、分支目的快取、兩位元動態分支預測。在指令集實現方面,另外實現了協同處理器指令與Thumb指令集。另外,本論文中包含適用於Proto3-ARM9TM的協同處理器介面設計與實現。將本論文中的Proto3-ARM9TM處理器與Proto-ARM9M處理器做比較,其FPGA操作頻率由21MHz提升至45.3MHz,相同測試程式下的IPC由0.47提升至0.7,整體效能則增加221.58%。
此Proto3-ARM9TM系統器整合了AMBA匯流排與周邊智財,並分別在Xilinx Spartan-3 XC3S1500-4FG676 FPGA以及TSMC 0.18μm元件庫上實現。在FPGA實現部份,共消耗了9728個LUTs,最高操作頻率可達31MHz。在元件庫實現方面,其所耗費的核心面積為1.400×1.393 mm2,整體晶片面積為2.280×2.274mm2,在134MHz操作頻率下,平均功耗為83mW。
In this thesis, an ARMv4T instruction set architecture compatible microprocessor IP (Intellectual Property), Proto3-ARM9TM, is proposed. In order to improve the performance of the Proto-ARM9M processor [25], we redesign the architectureof the processor and its major modules. For the processor design, the number of pipeline stages is increased from 5 to 9, the register file is constructed by using SRAMs instead of D-FFs, multiply-accumulator is built with a pipelined parallel multiplier instead of the iterated multiplier, and a barrel shifter is designed to replace the DesignWare shifter block. We also employ the following mechanisms to reduce the effects of hazards: two groups of forwarding paths, two stages for exception detection, a 128x60 branch target cache, and the 2-bit branch prediction scheme. For the implemnted instruction set, we also implement both the coprocessor and the Thumb instruction sets. In addition, a coprocessor interface for the Proto3-ARM9TM processor is designed and implemented. Comparing to the Proto-ARM9M processor, the operating frequency is increased from 21 MHz to 45.3 MHz on the same FPGA platform, the IPC is increased from 0.47 to 0.7 on the same set of testing programs, and the performance is increased by an amount of 221.58%.
The Proto3-ARM9TM system along with AMBA and its related peripherals, are implemented and verified at Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18μm cell library, respectively. When realized with the FPGA, the Proto3-ARM9TM system consumes 9728 LUTs and operates at the maximum frequency of 31 MHz. When realized with the cell library, the Proto3-ARM9TM has a core area of 1.400×1.393 mm2 and the whole chip area is 2.280×2.274 mm2. The average power consumption is 83 mW at the operating frequency of 134 MHz.
[1] Steve Furber, ARM System-on-Chip Architecture, Addison-Wesley, 2000.
[2] http://www.arm.com
[3] ARM Architecture Reference Manual, ARM Ltd, 2000
[4] AMBA Specification (Rev 2.0), ARM Ltd, 1999
[5] ARM Instruction Set Quick Reference Card, ARM Ltd, 1999.
[6] ARM9TDMI Technical Reference Manual, ARM Ltd, 2000
[7] ARM9E-S Technical Reference Manual, ARM Ltd, 2000
[8] ARM926EJ-S Technical Reference Manual, ARM Ltd., 2000
[9] Simon Segars, “The ARM9 Family – High Performance Microprocessors for Embedded Applications”, IEEE International Conference on Computer Design 1999 (ICCD’99).
[10] Booth A.D., “A Signed binary multiplication technique”, Quart. J. MECH. Appl, Math., vol.4, pp.236-240, 1951.
[11] Milos D. Ercegovac, Tomas Lang, Digital Arithmetic, San Francisco, Morgan Kaufmann Publisher, 2004.
[12] Israel Koren, Computer Arithmetic Algorithms, New Jersey, Prentice Hall, 1994
[13] Behrooz Parhami, Computer Arithmetic, algorithms and hardware designs, New York, Oxford, 2000.
[14] Michael D. Ciletti, Advanced Digital Design with the Verilog HDL, Pearson.
[15] Ulrich Golze, VLSI Chip Design with the Hardware Description Language VERILOG, Springer, 1996.
[16] Ming-Bo Lin, VLSI System Design Lecture Notes, 2006
[17] Ming-Bo Lin, FPGA System Design and Practice Lecture Notes, 2006
[18] J. L. Hennessy and D. A. Patterson, Computer Organization and Design: The Hardware / Software interface, 3rd ed., Morgan Kaufmann, 2003.
[19] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 3rd ed., Morgan Kaufmann, 2003.
[20] D.S. Dawoud, “Modified Booth Algotithm for Higher radix fixed-point point multiplication”, IEEE Transactions on Electronic Computers, pp.95-100, 1997.
[21] Homayoon Sam and Arupratan Gupta, “A Generalized Multibit Recoding of Two’s complement Binary Numbers and its Proof with Application in Multiplier Implementations”, IEEE Transactions on Electronic Computers, pp.1006-1015, 1990.
[22] L. Dadda, Some schemes for parallel multipliers, Alta Frequenza 34 (March 1965), 346-356.
[23] C. S. Wallace, “A Suggestion for a Fast Multiplier”, IEEE Transactions on Electronic Computers, Feb. 1964 Page(s):14 – 17.
[24] 林銘波,數位系統設計 – 原理、實務與 ASIC 實現,第三版,全華科技圖書股份有限公司,2002。
[25] 林晉禾, ARM v4指令集架構相容之微處裡器智財設計與驗證,國立台灣科技大學電子工程系,碩士論文, 2005。
[26] 詹勝祥,AMBA 2.0 之相容匯流排控制器智財設計與驗證,國立台灣科技大學電子工程研究所,碩士論文,2007。