簡易檢索 / 詳目顯示

研究生: 張齊文
Chi-wen Chang
論文名稱: 使用High Performance Linpack之EXTOLL網路介面效能評估
Performance Evaluation of the EXTOLL Interface Using High Performance Linpack
指導教授: 林昌鴻
Chang-Hong Lin
口試委員: 陳維美
Wei-Mei Chen
吳晉賢
Chin-Hsien Wu
許孟超
Mon-Chau Shie
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 63
中文關鍵詞: 高效能運算叢集
外文關鍵詞: HyperTransport, EXTOLL, HPL, High Performance Linpack
相關次數: 點閱:342下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近幾年來,高效能運算(high-performance computing)已經成為工程及科學運算不可或缺的重要資源。而在此領域裡,叢集(cluster)電腦是最常被用來組成高效能超級電腦的解決方案。在Top 500[1]的評比中,有越來越多的超級電腦選擇使用多核心處理器的個別系統,搭配內部連結網路(Interconnection Network)的方式,來組成叢集系統(Cluster),進行高效能運算。在運算能力為主要考量的情況下,理論上越多的叢集節點(Node)能夠得到越好的運算效能。但隨著節點數目的增加,通訊損失也跟著增加,增加越多的節點卻沒有帶來相同比例的效能增加。
本篇論文主要針對一種新的架構,使用HyperTransport匯流排連接一種稱為EXTOLL的高速網路卡,並選用High Performance Linpack (HPL) [2]來對整體的叢集作效能評估。傳統的電腦架構裡,資料的傳播路徑為:處理器(Processor)--北橋晶片(Northbridge)--南橋晶片(Southbridge)--網路卡(Network Interface Card),再經由傳播媒介傳送至遠端節點。將高速網卡透過HyperTransport匯流排直接連接處理器,將可以大大的減少傳輸延遲(latency),達到高速傳輸的目的。然而此設備目前還在研發階段,價格性能比尚未達到市場能夠接受的程度,但我們藉由此次的研究評估其效能,推測其未來發展的潛力。


In recent years, high-performance computing has become an indispensable resource for engineering and scientific applications. Clusters are the most commonly used solutions for constructing high-performance supercomputers. In the Top 500 [1] appraisal, there have more and more supercomputers that are constructed from clusters to perform high performance computing. Those clusters are consisted of individual multi-core processor systems and connect to each other by interconnection networks. In order to get better computing power, the node counts of each cluster can be increased. Theoretically, the more nodes within a cluster, the more computing power it has. But with the increase in the number of nodes, the communication loss would increase as well. Increasing more nodes does not get the same proportional growth of the performance.
In this thesis, we proposed a new architecture, a high-speed network device called EXTOLL connects directly to the processor through the HyperTransport bus. The High Performance Linpack (HPL) [2] benchmark is used to evaluate the overall cluster performance. For traditional computer architecture, data propagation path is: Processor – Northbridge – Southbridge – Network Interface Card, and then transmitted to remote nodes via the network media. The network card connects directly to the processor through the HyperTransport bus will greatly reduce the transmission latency, which achieve high speed transmission. The EXTOLL card, however, is currently in the development stage, the price performance ratio has not yet been acceptable in the market. According to this study, it is possible to see the development potential of the EXTOLL card in the future.

摘要 I Abstract II 誌謝 III Contents IV List of Figures VI List of Tables VII 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 2 1.3 Outline of the Thesis 2 2 Related Work 3 2.1 Cluster Performance Analysis 3 2.2 HyperTransport 6 2.2.1 HyperTransport Packet Format 7 2.2.2 HyperTransport Device Configurations 8 2.2.3 Direct Peripheral-to-CPU Interconnect: HTX 9 2.3 EXTOLL 10 2.4 High Performance Linpack - HPL 14 2.5 Message Passing Interface - MPI 16 3 Cluster Setup 17 3.1 Hardware Settings 17 3.1.1 System Platform Settings 17 3.1.2 EXTOLL Card Settings 19 3.2 Software Settings 22 3.2.1 OS Installation 22 3.2.2 Network Settings 23 3.2.3 NFS Settings 24 3.2.4 SSH Settings 26 3.2.5 EXTOLL Software Settings 27 3.3 Benchmark Settings 30 4 Research Methods 33 4.1 HPL Parameters Tuning 33 4.1.1 Block Size: NB 33 4.1.2 Process Grids: PxQ 36 4.1.3 Matrix Size: N 39 4.2 Results 42 5 Conclusions and Future Work 45 References 47 Appendix A - EXTOLL Script File 50 Appendix B - HPL Configuration File 53

[1] TOP500 – The TOP500 Supercomputer Sites. http://www.top500.org
[2] A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary, “HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers”, http://www.netlib.org/benchmark/hpl/.
[3] S. Gaissaryan, A. Avetisyan, O. Samovarov, and D. Grushin, “Comparative Analysis of High-Performance Clusters' Communication Environments Using HPL Test”, High Performance Computing and Grid in Asia Pacific Region, 2004. Proceedings. Seventh International Conference on. IEEE, 2004
[4] EXTOLL. http://www.extoll.de/
[5] The InfiniBand Trade Association. http://www.infinibandta.org/index.php
[6] Myrinet. http://www.myricom.com/scs/myrinet/overview/
[7] D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, “LogP Towards a Realistic Model of Parallel Computation”, In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 262-273,1993.
[8] R. P. Martin, A. M. Vahdat, D. E. Culler, and T. E. Anderson, “Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture”, Vol. 25. No. 2. ACM, 1997.
[9] http://www.hypertransport.org/
[10] HyperTransport Consortium, “HyperTransport I/O Technology Overview: An Optimized, Low-latency Board-level Architecture”, HyperTransport Consortium White Paper (June 2004).
[11] H. Litz, H. Froning, M. Nuessle, and U. Bruning, “A HyperTransport Network Interface Controller for Ultra-low Latency Message Transfers”, HyperTransport Consortium White Paper (2008).
[12] EXTOLL Technology Overview. http://www.extoll.de/images/pdf/extoll_technology_overview.pdf
[13] Chen Shao-Hu, Zhang Yun-Quan, Zhang Xian-Yi, Cheng Hao, “Performance Testing and Analysis of BLAS Libraries on Multi-Core CPUs”, Journal of Software, Vol.21, pp.214 – 223.2010.
[14] W. Gropp, E. Lusk, and T. Sterling, Beowulf Cluster Computing with Linux. MIT Press, 2003.
[15] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: portable parallel programming with the message passing interface. Vol. 1. MIT press, 1999.
[16] M. Sindi, “Evaluating MPI Implementations Using HPL on an Infiniband Nehalem Linux Cluster”, Information Technology: New Generations (ITNG), 2010 Seventh International Conference on. IEEE, 2010.
[17] AIC Inc. Octans Specification. http://www.aicipc.com.tw/ProductDetail.aspx?ref=Octans
[18] http://www.open-mpi.org/
[19] Zhang Wenli, Fan Jianping, and Chen Mingyu, “Efficient determination of block size NB for parallel Linpack test”, Parallel and Distributed Computing and Systems. ACTA Press, 2004.
[20] D. Dunlop, S. Varrette, and P. Bouvry, “On the use of a genetic algorithm in high performance computer benchmark tuning”, Performance Evaluation of Computer and Telecommunication Systems, 2008. SPECTS 2008. International Symposium on. IEEE, 2008.
[21] Tau Leng, R. Ali, Jenwei Hsieh, V. Mashayekhi, and R. Rooholamini, “Performance impact of process mapping on small-scale SMP clusters-A case study using High Performance Linpack”, 2002.
[22] D. A. Patterson, and J. L. Hennessy, Computer Organization and Design: The Hardware/Software interface, 4th ed., Morgan Kaufmann, 2009.
[23] J. J. Dongarra, P. Luszczek, and A. Petitet, “The LINPACK Benchmark : past, present and future”, Concurrency and Computation: Practice and Experience 15.9 (2003): 803-820.
[24] M. Sindi , “HowTo - High Performance Linpack (HPL)”, 2009.

無法下載圖示 全文公開日期 2018/07/16 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE