簡易檢索 / 詳目顯示

研究生: 徐偉郡
Wei-chun Hsu
論文名稱: 以低傳輸量的資料配置方法為基礎之稀疏矩陣向量乘法架構
Sparse Matrix-Vector Multiplication: A Low Communication Cost Data Mapping-Based Architecture
指導教授: 阮聖彰
Shanq-jang Ruan
口試委員: 吳安宇
An-yeu, Wu
沈中安
Chung-an Shen
Jurgen Gotze
Jurgen Gotze
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 43
中文關鍵詞: 稀疏矩陣向量乘法晶片網路資料配置方法場式可程式閘陣列
外文關鍵詞: Sparse Matrix-Vector Multiplication, Network on Chip, Data Mappng Method for NoC, FPGA-based Architecture
相關次數: 點閱:241下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 以平行系統處理稀疏矩陣向量乘法,其處理時間與資料在該系統原件上的分佈情形息息相關。而影響整個系統處理時間的主因為運算單元與資料在系統中的傳輸量,其中資料在系統中的傳輸量影響最大。因此找到一種能降低傳輸量的資料配置的方法尤其重要,同時也必須考慮到資料在運算元上的分配量是否均勻。此篇論文提出了一種在晶片網路架構上處理稀疏矩陣向量乘法的資料配置方法,同時達到低傳輸量並能均勻的將資料分配至每個運算元中,並以FPGA來客製化所提出之資料配置方法的硬體架構。最後實驗結果顯示,所提出的系統架構相較於先前的設計,少了40\%左右的資料傳輸量。


    The performance of the sparse matrix-vector multiplication (SMVM) on a parallel system is strongly conditioned by the distribution of data among its components. Two costs arise as a result of the used data mapping method: arithmetic and communication. The communication cost of an algorithm often dominates the arithmetic cost, and the gap between these costs tends to increase. Therefore, finding a mapping method
    that reduces the communication cost is of high importance. On the other hand, the load distribution among the processing units must not be sacrificed as well. In this paper, a data mapping method is proposed for SMVM on Network-on-Chip (NoC) which achieves balanced working load and reduces the communication cost. Afterwards, an FPGA-based architecture is introduced which is designed to fit the proposed data mapping method. The experimental results show that the communication cost of the proposed design is 40\% lower than the previous work.

    1 Introduction 1 2 Data Mapping Methods for NoC 6 2.1 Previous Method (Method I) 6 2.2 Proposed Method (Method II) 9 3 Analysis of Communication Cost 14 3.1 Communication Cost for Method I 14 3.2 Communication Cost for Method II 17 4 Analysis of Load Distribution 19 5 The Data Mapping-based Architecture 21 5.1 Top View of Proposed Architecture 21 5.2 Router 23 5.3 Processing Element 26 5.4 The Size of NoC Architecture 28 5.5 Packet Format 31 6 Experimental Result 32 7 Conclusion 37 8 References 38

    [1] Yousef El-Kurdi,Warren J.Gross, andDennisGiannacopoulos“, Sparsematrix-vector multiplication for finite element method matrices on FPGAs,”in 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006.
    [2] A. Langville, C.Meyer, and P. Fernandez, “Google’s pagerank and beyond: The science of search engine rankings,”in The Mathematical Intelligencer, vol. 30, pp. 68-69, 2008.
    [3] Ingyu Lee, “Efficient sparse matrix vector multiplication using compressed graph,” in IEEE SoutheastCon, pp. 328-331, 2010.
    [4] L. Yavits, A. Morad and R. Ginosar,“Sparse matrix multiplication on an associative processor,”in IEEE Transactions on Parallel and Distributed Systems, 2013.
    [5] E. Im and K. Yelick, “Optimizing the performance of sparse matrix-vector multiplication,” in Proceedings of the 1999 ACM/IEEE conference on Superoompulting(CDROM), p.30. ACM,1999.
    [6] A. Pinar and M. Heath,“Improving performance of sparse matrix-vector multiplication,” in University of California, berkeley, 2000.
    [7] S. Williams et al,“Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms,”in Parallel Computing 35, no.3, pp. 178-194, 2009.
    [8] E. Im and K. Yelick, “Optimizing sparse matrix vector multiplication on smps,”in SIAM conference on parallel processing for scientific computing PPSC, 1999.
    [9] E. Im and K. Yelick, “Model-based memory hierarchy optimizations for sparse matrices,” in Workshop on Profile and Feedback-Directed Compilation, Oct. 1998.
    [10] R. Agarwal, F. Gustavson, and M. Zubair, “A high performance algorithm using pre-processing for the sparse matrix-vector multiplication,”in Proceedings of Supercomputing’ 92, Nov 1992, pp. 32-41.
    [11] R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. Philadelphia, PA: SIAM, 1994.
    [12] S. Toledo, “Improving memory-system performance of sparse matrixvector multiplication,” IBM Journal of Research and Development, vol. 41, no. 6, pp. 711–725, 1997.
    [13] R. Vuduc, J. Demmel, K. Yelick, S.K., R. Nishtala, and B. Lee, “Performance optimizations and bounds for sparse matrix-vector multiply,”in Proceedings of Supercomputing, 2002.
    [14] N. Bell and M. Garland, “Efficient sparse matrix-vector multiplication on CUDA,” NVIDIA Technical Report NVR-2008-3004, December 2008.
    [15] G. Morris and V. Prasanna, “Sparse matrix computations on reconfigurable hardware,” Compuuter, vol. 40, no. 3, pp. 58–64, March 2007.
    [16] J. Sun, G. Peterson and O. Storaasli, “Sparse matrix-vector design on FPGAs,”in Field-Programmable Custom Computing Machines, 15th Annual IEEE Symposium on FCCM, pp. 349–352, 2007.
    [17] L. Zhuo and V. Prasanna, “Sparse matrix-vector multiplication on FPGAs,” inproceeding of the 2005 ACM/ SIGDA 13th international symposium on Fieldprogrammaable gate aarrays, pp. 63–74 ACM, March 2005.
    [18] G. Qing, X. Guo, R. Patel, E. Ipek and E. Friedman,“AP-DIMM: Associative computing with STT-MRAM,”inISCA, 2013.
    [19] N. Kapre and A. DeHon,“Optimistic parallelization of floating-point accumulation,” inIEEE Symposium on Commputer Arithmetic, pp.205-216, Jun. 2007.
    [20] L. Benini andGD.Micheli“, Networks on chips: AnewSoC paradigm,”inComputer, vol. 35, no. 1, pp. 70-78, Jan. 2002.
    [21] D. Bertozzi and L. Benini,“Xpipes: A Network-in-Chip Architecture for gigascale Systems-on-Chip,”inIEEE Circuits and Systems Magazine, vol. 4, no. 2, pp. 18-31, Sep. 2004.
    [22] C.-C. Sun, J. Gotze, H.-Y. Jheng, and S.-J. Ruan,“Sparse matrix-vector multiplication on network-on-chip,”Advances in Radio Science, vol. 8, pp. 289–294, 2010.
    [23] H.-Y. Jheng, C.-C. Sun, S.-J. Ruan, and J. Gotze, “FPGA acceleration of sparse matrix-vector multiplication based on network-on-chip,”in 19th European Signal Processing Conference (EUSIPCO), Barcelona, August 2011, pp. 744–748.
    [24] A. Ogielski and W. Aiello, “Sparse matrix computations on parallel processor arrays,” SIAM J. SCI. COMPUT, vol. 14, pp. 519–530, 1992.
    [25] J. Lewis and R. van de Geijn, “Distributed memory matrix-vector multiplication and conjugate gradient algorithms,”in Supercomputing’93. Proceedings, 1993, pp. 484–492.
    [26] B. Hendrickson, R. Leland, and S. Plimpton, “An efficient parallel algorithm for matrix-vector multiplication,”International Journal of High Speed Computing, vol. 7, pp. 73–88, 1995.
    [27] U. Catalyurek and C. Aykanat, “A hypergraph-partitioning approach for coarsegrain decomposition,”in Supercomputing, ACM/IEEE 2001 Conference, Nov 2001, pp. 42–42.
    [28] R. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. Dongarra, “Matrix Market: A web resource for test matrix collections,”in The Quality of Numerical Software: Assessment and Enhancement, R. Boisvert, Ed. London: Chapman & Hall, 1997, pp. 125–137.
    [29] U. Catalyurek and C. Aykanat, “PaToH: A multilevel hypergraph partitioning tool, version 3.0,”Bilkent University, Department of Computer Engineering, Ankara, Turkey, Tech. Rep., 1999. [Online]. Available: http://bmi.osu.edu/ umit/software.html
    [30] A. Mansour and J. Gotze,“An OMNeT++ based network-on-chip simulator for embedded systems,”in IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2012), Kaohsiung, Taiwan, December 2012, pp. 364–367.

    QR CODE