Author: |
黃之鴻 Chih-hung Huang |
---|---|
Thesis Title: |
一個32位元多執行緒CPU架構研究與實現 The Design and Verification of a 32-bit Multithreading CPU Architecture |
Advisor: |
林銘波
Ming-bo Lin |
Committee: |
陳郁堂
Yie-tarng Chen 陳維美 Wei-mei Chen |
Degree: |
碩士 Master |
Department: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
Thesis Publication Year: | 2013 |
Graduation Academic Year: | 101 |
Language: | 中文 |
Pages: | 65 |
Keywords (in Chinese): | 中央處理單元 、ARM 、精簡指令集 、多執行緒 |
Keywords (in other languages): | CPU, ARM, RISC, multithreading |
Reference times: | Clicks: 809 Downloads: 9 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
在本篇論文中,我們設計並實現了一個與ARMv4T指令集架構相容的微處理器智財(Intellectual Property, IP) ─ HT-ARM9TM。為了改善~Proto3-ARM9TM~處理器效能,本論文選擇從多執行緒的議題著手,在承襲Proto3-ARM9TM的基本架構下,設計了配合處理器特性的執行緒管理單元:Thread-dispatcher,並配合多執行緒處理的特性,設計對應的記憶體存取路徑、暫存器存取路徑以及其他管線架構的修改。藉由適當的切換多執行緒的指令,使HT-ARM9TM處理器能有效的減少Proto3-ARM9TM處理器因危障產生的時間浪費。
HT-ARM9TM微處理器智財已在Xilinx的Virtex-5 XC5VLX110-FF676 FPGA上實現,其整體系統使用了9397個LUT,32個Block RAM,其最高操作頻率(含AMBA匯流排系統)為31 MHz。其處理器核心與Proto3-ARM9TM處理器核心做比較,操作頻率由45.2 MHz提升至45.7 MHz,相同測試程式下的IPC由0.7提升至0.856,整體效能則增加23.64%。
In this thesis, an ARMv4T instruction set architecture compatible microprocessor IP (Intellectual Property), HT-ARM9TM, is proposed. In order to improve the performance of the Proto3-ARM9TM processor, we focus on the subject of multithreading. To follow the architecture of the Proto3-ARM9TM processor, we designed a thread management unit: Thread-dispatcher, that matches with the processor property. To match with the multithreading property, we designed the memory access paths, register files access paths, and the modification of other pipeline architecture. By switching instructions of threads appropriately in the HT-ARM9TM processor, it can reduce the cycle wastes due to the hazards of the Proto3-ARM9TM processor.
The HT-ARM9TM processor are implemented and verified at Xilinx Virtex-5 XC5VLX110-FF676 FPGA. The HT-ARM9TM processor consumes 9397 LUTs, 32 Block RAMs, and operates at maximum frequency (included the AMBA bus system) of 31 MHz. As compared with the HT-ARM9TM processor core and the Proto3-ARM9TM processor core, the maximum operating frequency is increased from 45.2 MHz to 45.7 MHz, the IPC is increased from 0.7 to 0.856, and the performance is increased by an amount of 23.64%.
1. http://www.arm.com/
2. S. Furber, ARM system-on-chip architecture, 2nd edition, Addison Wesley, 2000.
3. A. N. Sloss, D. Symes and C. Wright, ARM system developer’s guide : designing and optimizing system software, Morgan Kaufmann, 2004.
4. K. Toda, K. Nishida, Y. Uchibori, S. Sakai and T. Shimada, “Parallel multi-context architecture with high-speed synchronization mechanism,” 1991. Proceedings of the Fifth International Conference on Parallel Processing Symposium, pp. 336-343, Apr. 1991.
5. D. M. Tullsen, S. J. Eggers and H. M. Levy, “Simultaneous multithreading: Maximizing on-chip parallelism, ” 22nd Annual International Symposium on Computer Architecture, pp. 392-403, Jun. 1995.
6. http://www.intel.com/
7. Intel, IntelR 64 and IA-32 Architectures Software Developer’s Manual, Vol. 3A, 2006.
8. J. Kreuzinger and T. Ungerer, “Context-switching Techniques for Decoupled Multithreaded Processors,” 1999. Proceedings of the 25th Euromicro Conference, vol. 1, pp. 248-251, Sep. 1999.
9. T. Matsuzaki, H. Tomiyasu and M. Amamiya, ”An architecture of on-chip-memory multi-threading processor,” 2001. Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 100-108, Jan. 2001.
10. T. Matsuzaki, S. Amamiya, M. Izumi and M. Amamiya, “A multi-thread processor architecture based on the continuation model,” 2005. Innovative Architecture for Future Generation High-Performance Processors and Systems, Jan. 2005.
11. M. Amamiya, H. Tomiyasu and S. Kusakabe, “Datarol: a parallel machine architecture for fine-grain multithreading,” 1997. Proceedings of the Third Working Conference on Massively Parallel Programming Models, pp. 151-162, Nov. 1997.
12. C. Guangzuo and L. Zhaolin, “MT-ARM: multithreading implementation in ARM7 architecture,” 2001. Proceedings of the Fourth International Conference on ASIC, pp. 793-796, Oct. 2001.
13. H. Kwak, B. Lee, A. R. Hurson, S.-H. Yoon and W.-J. Hahn, “Effects of multithreading on cache performance,” 1999. IEEE Transactions on Computers, pp. 176-184, Feb. 1999.
14. K. Tanaka, “Fast context switching by hierarchical task allocation and reconfigurable cache,” 2003. Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 20-29, Jul. 2003.
15. ARM, ARM7TDMI Technical Reference Manual, Rev. 4, 2001.
16. ARM, ARM9TDMI Technical Reference Manual, Rev. 3, 2000.
17. ARM, ARM926EJ-S Technical Reference Manual, Rev. 0, 2000.
18. 方志中,ARMv4T指令集架構相容之微處理器智財設計與驗證,國立台灣科技大學電子工程系碩士論文,2009.
19. 林晉禾,ARMv4T指令集架構相容之微處理器智財設計與驗證,國立台灣科技大學電子工程系碩士論文,2005.