簡易檢索 / 詳目顯示

研究生: 柯俊宇
Jun-yu Ke
論文名稱: 在多重電腦模式平行前置演算法的改良
Improved Parallel Prefix on the Multicomputer
指導教授: 林彥君
Yen-Chun Lin
口試委員: 林伯慎
Bor-shen Lin
洪麗玲
Li-ling Hong
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 50
中文關鍵詞: 多重電腦平行演算法前置問題
外文關鍵詞: multicomputer, parallel algorithms, prefix problems
相關次數: 點閱:225下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

前置運算是對n個數值x1, x2,…, xn及具結合性的二元運算♁,求n個前置值x1 ♁ x2 ♁…♁ xi,1 ≤ i ≤ n。由於前置運算的使用相當廣泛,所以目前已發展出許多解決前置運算問題的演算法,稱為平行前置演算法。本篇論文針對多重電腦(multicomputer)模式改進一個既有的平行前置演算法。多重電腦使用分散式記憶體(distributed memory)架構,且利用訊息傳遞(message passing)的方式交換資料。雖然目前已有平行前置演算法能提供使用者選擇參數的彈性,可視軟硬體設備等級來選擇適當的參數;但是,在某些情況下能提供給使用者的選擇會很少。因此,我們改進該演算法以改善此缺陷,而有較多的參數可以選擇,達到更少的執行時間。論文中也推導出一些新演算法的性質,包括它的計算步數與通訊步數,並且提出使用新演算法應有的先決條件。


Give n values x1, x2,…, xn and an associative binary operation ♁, the prefix computation is to compute x1 ♁ x2 ♁…♁ xi, 1 ≤ i ≤ n. Because prefix computation has many applications, a large number of algorithms, called parallel prefix algorithms, have been developed to solve the prefix problem. Although some previous parallel prefix algorithms provide the flexibility of choosing parameter values to achieve less running time, they may have only few viable parameter values in some cases. Hence, we have improved a previous algorithm for the message-passing multicomputer to solve the drawback. It can thus be faster than previous algorithms when appropriate parameter values are used. We have also derived the number of computation steps and that of communication steps of the new algorithm. The precondition of this algorithm is also derived.

摘要I AbstractII 誌謝III 目錄IV 第1章 簡介1 1.1. 前置運算1 1.2. 相關研究2 1.3. 論文組織3 第2章 已有的平行前置演算法5 2.1. 演算法EK及演算法L5 2.2. 演算法A(n, p, k)及B(n, p, k)6 2.3. 演算法E(n, p, k)及F(n, p, k)10 2.4. 演算法PLL11 第3章 平行前置演算法Q12 3.1. 演算法Q(n, p, k, r)12 3.2. 演算法Q(n, p, k, r)的性質18 3.3. 演算法Q(n, p, k, r)執行時間的分析27 第4章 演算法Q的先決條件31 第5章 平行前置演算法的比較36 5.1. 演算法Q(n, p, k, r)與A(n, p, k)、B(n, p, k)的比較36 5.2. 演算法Q(n, p, k, r)與EK、PLL的比較40 第6章 結論與未來研究方向43 參考資料45

[1]S.G. Akl, Parallel Computation: Models and Methods, Prentice-Hall, Upper Saddle River, NJ, 1997.
[2]S. Aluru, N. Futamura, K. Mehrotra, Parallel biological sequence comparison using prefix computations, Journal of Parallel and Distributed Computing 63 (3) (2003) 264-272.
[3]A. Bilgory, D.D. Gajski, A heuristic for suffix solutions, IEEE Transactions on Computers C-35 (1) (1986) 34-42.
[4]G.E. Blelloch, Scans as primitive operations, IEEE Transactions on Computers 38 (11) (1989) 1526-1538.
[5]R.P. Brent, H.T. Kung, A regular layout for parallel adders, IEEE Transactions on Computers C-31 (3) (1982) 260-264.
[6]D.A. Carlson, B. Sugla, Limited width parallel prefix circuits, Journal of Supercomputing 4 (2) (1990) 107-129.
[7]L. Cinque, G. Bongiovanni, Parallel prefix computation on a pyramid computer, Pattern Recognition Letters 16 (1) (1995) 19-22.
[8]R. Cole, U. Vishkin, Faster optimal parallel prefix sums and list ranking, Information and Computation 81 (3) (1989) 334-352.
[9]A. Datta, Multiple addition and prefix sum on a linear array with a reconfigurable pipelined bus system, Journal of Supercomputing 29 (3) (2004) 303-317.
[10]G. Dimitrakopoulos, D. Nikolos, High-Speed Parallel-Prefix VLSI Ling Adders, IEEE Transactions on Computers 54 (2) (2005) 225-231.
[11]C. Efstathiou, H.T. Vergos, D. Nikolos, Fast parallel-prefix modulo 2n + 1 adders, IEEE Transactions on Computers 53 (9) (2004) 1211-1216.
[12]O. Egecioglu, C.K. Koc, Parallel prefix computation with few processors, Computers and Mathematics with Applications 24 (4) (1992) 77-84.
[13]S.C. Eisenstat, O(log* n) algorithms on a Sum-CRCW PRAM, Computing 79 (1) (2007) 93-97.
[14]A. Ferreira, S. Ubeda, Parallel complexity of the medial axis computation, in: Proc. Int. Conf. on Image Processing, Washington, D.C., vol. 2, 1995, pp. 105-108.
[15]F.E. Fich, New bounds for parallel prefix circuits, in: Proc. 15th Symposium on the Theory of Computing, 1983, pp. 100-109.
[16]A.L. Fisher, A.M. Ghuloum, Parallelizing complex scans and reductions, in: Proc. ACM SIGPLAN '94 Conf. on Programming Language Design and Implementation, Orlando, FL, 1994, pp. 135-146.
[17]W. Gropp, E. Lusk, A. Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, MIT Press, Cambridge, MA, 1994.
[18]T. Han, D.A. Carlson, Fast area-efficient VLSI adders, in: Proc. 8th Computer Arithmetic Symposium, Como, Italy, 1987, pp. 49-56.
[19]D.R. Helman, J. JaJa, Prefix computations on symmetric multiprocessors, Journal of Parallel and Distributed Computing 61 (2001) 265-278.
[20]L.-L. Hung, Y.-C. Lin, Parallel prefix algorithms on the multicomputer, WSEAS Transactions on Computer Research 3 (4) (2008) 229-239.
[21]Inmos, The Transputer Databook, 3rd ed., Inmos, Almondsbury, Bristol, UK, 1992.
[22]P.M. Kogge, H.S. Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations, IEEE Transactions on Computers C-22 (8) (1973) 783-791.
[23]D.W. Krumme, G. Cybenko, K.N. Venkataraman, Gossiping in minimal time, SIAM Journal on Computing 21 (1) (1992) 111-139.
[24]C.P. Kruskal, T. Madej, L. Rudolph, Parallel prefix on fully connected direct connection machines, in: Proc. Int. Conf. on Parallel Processing, St. Charles, IL, 1986, pp. 278-284.
[25]C.P. Kruskal, L. Rudolph, M. Snir, The power of parallel prefix, IEEE Transactions on Computers C-34 (1985) 965-968.
[26]R.E. Ladner, M.J. Fischer, Parallel prefix computation, Journal of the ACM 27 (4) (1980) 831-838.
[27]S. Lakshmivarahan, S.K. Dhall, Parallel Computing Using the Prefix Problem, Oxford University Press, Oxford, UK, 1994.
[28]S. Lakshmivarahan, C.M. Yang, S.K. Dhall, On a new class of optimal parallel prefix circuits with (size + depth) = 2n – 2 and élog nù ≤ depth ≤ (2 élog nù – 3), in: Proc. Int. Conf. on Parallel Processing, St. Charles, IL, 1987, pp. 58-65.
[29]F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Mateo, CA, 1992.
[30]R. Lin, K. Nakano, S. Olariu, M.C. Pinotti, J.L. Schwing, A.Y. Zomaya, Scalable hardware-algorithms for binary prefix sums, IEEE Transactions on Parallel and Distributed Systems 11 (8) (2000) 838-850.
[31]Y.-C. Lin, A family of computation-efficient parallel prefix algorithms, WSEAS Transactions on Computers 5 (12) (2006) 3060-3066.
[32]Y.-C. Lin, Optimal parallel prefix circuits with fan-out 2 and corresponding parallel algorithms, Neural, Parallel & Scientific Computations 7 (1) (1999) 33-42.
[33]Y.-C. Lin, J.-N. Chen, Z4: A new depth-size optimal parallel prefix circuit with small depth, Neural, Parallel & Scientific Computations 11 (3) (2003) 221-235.
[34]Y.-C. Lin, J.-W. Hsiao, A new approach to constructing optimal parallel prefix circuits with small depth, Journal of Parallel and Distributed Computing 64 (1) (2004) 97-107.
[35]Y.-C. Lin, Y.-H. Hsu, C.-K. Liu, Constructing H4, a fast depth-size optimal parallel prefix circuit, Journal of Supercomputing 24 (3) (2003) 279-304.
[36]Y.-C. Lin, L.-L. Hung, Fast problem-size-independent parallel prefix circuits, Journal of Parallel and Distributed Computing 69 (4) (2009) 382-388.
[37]Y.-C. Lin, L.-L. Hung, Four Families of Computation-Efficient Parallel Prefix Algorithms for Multicomputers, Technical Report NTUST-CSIE-08-01, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, Feb. 2008.
[38]Y.-C. Lin, L.-L. Hung, Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2, ACM Transactions on Design Automation of Electronic Systems 14 (1) (2009) Article 15.
[39]Y.-C. Lin, C.M. Lin, Efficient parallel prefix algorithms on multicomputers, Journal of Information Science and Engineering 16 (1) (2000) 41-64.
[40]Y.-C. Lin, C.-K. Liu, Constructing optimal parallel prefix circuits, in: Proc. National Computer Symposium, Taipei, Taiwan, vol. 3, 1999, pp. C-313-320.
[41]Y.-C. Lin, C.-K. Liu, Finding optimal parallel prefix circuits with fan-out 2 in constant time, Information Processing Letters 70 (4) (1999) 191-195.
[42]Y.-C. Lin, C.-C. Shih, A new class of depth-size optimal parallel prefix circuits, Journal of Supercomputing 14 (1) (1999) 39-52.
[43]Y.-C. Lin, C.-C. Shih, Optimal parallel prefix circuits with fan-out at most 4, in: Proc. 2nd IASTED Int. Conf. on Parallel and Distributed Computing and Networks, Brisbane, Australia, 1998, pp. 312-317.
[44]Y.-C. Lin, C.-Y. Su, Faster optimal parallel prefix circuits: New algorithmic construction, Journal of Parallel and Distributed Computing 65 (12) (2005) 1585-1595.
[45]Y.-C. Lin, C.-S. Yeh, Efficient parallel prefix algorithms on multiport message-passing systems, Information Processing Letters 71 (2) (1999) 91-95.
[46]Y.-C. Lin, C.-S. Yeh, Optimal parallel prefix on the postal model, Journal of Information Science and Engineering 19 (1) (2003) 75-83.
[47]J. Liu, S. Zhou, H. Zhu, C.-K. Cheng, An algorithmic approach for generic parallel adders, in: Proc. Int. Conf. on Computer-Aided Design, San Jose, CA, 2003, pp. 734-740.
[48]R. Manohar, J.A. Tierno, Asynchronous parallel prefix computation, IEEE Transactions on Computers 47 (11) (1998) 1244-1252.
[49]J.H. Park, H.K. Dai, Reconfigurable hardware solution to parallel prefix computation, Journal of Supercomputing 43 (1) (2008) 43-58.
[50]E.E. Santos, Optimal and efficient algorithms for summing and prefix summing on parallel machines, Journal of Parallel and Distributed Computing 62 (2002) 517-543.
[51]M. Sheeran, I. Parberry, A New Approach to the Design of Optimal Parallel Prefix Circuits, Technical Report 2006:1, Department of Computer Science and Engineering, Chalmers University of Technology, Goteborg, Sweden, 2006.
[52]M. Snir, Depth-size trade-offs for parallel prefix computation, Journal of Algorithms 7 (1986) 185-201.
[53]M. Snir, P. Hochschild, D.D. Frye, K.J. Gildea, The communication software and parallel environment of the IBM SP2, IBM Systems Journal 34 (2) (1995) 205-221.
[54]Thinking Machines, Connection Machine Parallel Instruction Set (PARIS), Thinking Machines, Cambridge, MA, 1986.
[55]H. Wang, A. Nicolau, K.S. Siu, The strict time lower bound and optimal schedules for parallel prefix with resource constraints, IEEE Transactions on Computers 45 (11) (1996) 1257-1271.
[56]F. Zhou, P. Kornerup, Computing moments by prefix sums, Journal of VLSI Signal Processing Systems 25 (1) (2000) 5-17.
[57]H. Zhu, C.-K. Cheng, R. Graham, Constructing zero-deficiency parallel prefix circuits of minimum depth, ACM Transactions on Design Automation of Electronic Systems 11 (2) (2006) 387-409.
[58]R. Zimmermann, Binary Adder Architectures for Cell-Based VLSI and Their Synthesis, Ph.D. thesis, Swiss Federal Institute of Technology (ETH), Zurich, 1997.
[59]R. Zimmermann, Non-heuristic optimization and synthesis of parallel-prefix adders, in: Proc. Int. Workshop on Logic and Architecture Synthesis, Grenoble, France, 1996, pp. 123-132.

QR CODE