簡易檢索 / 詳目顯示

研究生: 鄭福輝
Fu-hui Zheng
論文名稱: 序列比對的嵌入式系統之研究
An Embedded System for Sequence Alignment
指導教授: 陳省隆
Hsing-Lung Chen
口試委員: 莊博任
Po-Jen Chuang
陳郁堂
Yie-Tarng Chen
吳乾彌
Chen-Mie Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 54
中文關鍵詞: 序列比對Needleman-Wunsch分而擊破心跳式演算法虛擬處理單元FPGA
外文關鍵詞: Sequence alignment, Needleman-Wunsch, divide and conquer, systolic algorithm, virtual PE, FPGA
相關次數: 點閱:150下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 序列比對是生物資訊研究中最基本且相當重要的研究工具,它可以比較及分析出兩條或多條序列之間的相似程度,藉此推測是否源自共同的祖先。
    在序列比對當中,通常輸入的兩個字串的長度都相當長,因此所需的處理時間相當久。有很多演算法被研究出來以降低處理時間,先前心跳式演算法雖然只用線性個處理單元就可達到線性時間複雜度,但是每個處理單元需要線性記憶體空間儲存中間結果,整體所需記憶體空間太大,並不適合實作於FPGA。因此設計一個既可實作於FPGA上又可加速處理速度的演算法,將是重要課題。
    本論文旨在提出一個既可實作且有效率的心跳式演算法以解決序列比對問題。首先利用Needleman-Wunsch的遞回函式,定義前尋程序與後尋程序規則,經過前尋程序與後尋程序運算後的特性,使用“分而擊破”法找出一條最佳排列路徑。整體只需線性記憶體空間,且只增加少許時間複雜度。並運用虛擬處理單元之概念,其實體處理單元的數目可固定,因此本心跳式演算法可實作於FPGA上,以加速處理速度。


    Sequence alignment is the most basic and important research tool for the study of biological information. It can compare and analyze the degree of similarity between two or more sequences to speculate whether they derive from the common ancestor or not.
    In the sequence alignment the lengths of the two input strings are fairly long, so the processing time requires a very long time. Many algorithms have been proposed to reduce the processing time. Previous systolic algorithms employing linear processing units can achieve linear time complexity. However, each processing unit needs linear memory space to store intermediate results. It required too much memory space, resulting in bing not suitable for implementation in FPGA. Therefore, designing an algorithm, which not only can be implemented in FPGA but also speed up processing time, it will be an important issue.
    This paper aims to propose an algorithm to implement efficient systolic algorithm to solve the issue of sequennce alignment. We first uses the Needleman-Wunsch recursive function to define the rules of Forward process and Backward process. By the characteristics of the result of computing Forward process and Backward process, we employ the "divide and conquer" method to find a path with the best algorithm. It needs linear memory space with little increase in time complexity. And employing the concept of virtual processing units, the number of physical processing units can be fixed. Therefore, the proposed systolic algorithms can be implemented in the FPGA to speed up the processing.

    第一章、 緒論 1 1.1 簡介 1 1.2 研究背景 2 1.3 研究目的 3 1.4 章節介紹 3 第二章、 相關研究 4 2.1 Edit Distance 4 2.2 Sequence Alignment 4 2.3 動態規劃演算法 5 2.4 最長共同子序列 6 2.4.1 循序演算法(Sequential Algorithm) 6 2.4.2 平行演算法(Parallel Algorithm) 6 2.4.3 心跳式陣列硬體架構 7 2.5 基本Needleman-Wunsch動態規劃演算法概念[54][55] 7 第三章、 求解最佳排列路徑與分析 12 3.1 心跳式陣列演算法 13 3.2 分而擊破法 (Divide and Conquer) 16 第四章、 處理器硬體架構 27 4.1 RS-232溝通模組 27 4.2 心跳式陣列處理器模組 28 4.2.1 虛擬處理單元 28 4.3 控制器模組 31 4.3.1 控制單元 31 4.3.1.1 Ctr init程序 34 4.3.1.2 Physical PE operation程序 35 4.3.1.3 Load Bstr程序 35 4.3.1.4 Derive score程序 35 4.3.1.5 Forward Process程序 36 4.3.1.6 Backward Process程序 37 4.3.1.7 Special Process程序 38 4.3.1.8 Upload match location程序 39 4.4 時間分析 39 第五章、 驗證與結果分析 42 5.1 實作環境 42 5.2 時間評估 43 第六章、 結論 45 附錄A 46 參考文獻 50

    [1]A. Apostolico,and Z. Galil (eds.), Pattern Matching Algorithms. Oxford
    University Press, 1997.
    [2]M. Crochemore and W. Rytter, Text algorithms. Oxford University Press, 1994.
    [3]G. A. Stephen, String Searching Algorithm. World Scientific, 1994.
    [4]D. Gusfield, Algorithms on Strings, Tree and Sequences. Cambridge University Press, Cambridge, 1997.
    [5]E. W. Myers, An Overview of Sequence Comparison Algorithms in Molecular Biology. Technical Report TR 91-29, Dept. of CS, The University of Arizona, Tucson, Dec. 1991.
    [6]W. R. Pearson and D. J. Lipman, “Improved Tools for Biological Sequence Comparison,” in Proc. Natl. Acad. Sci. USA, Vol. 85, April 1988, pp. 2444-2448.
    [7]S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic Local Alignment Search Tool,” J. Mol. Biol., vol. 215, 1990, pp. 403-410.
    [8]X. Tang, R. Tian, and D. F. Wong, “Fast Evaluation of Sequence Pair in Block Placement by Longest Common Subsequence Computation,” IEEE Trans. Computer-Aided Design, vol. 20, no. 12, Dec. 2001, pp. 1406-1413.
    [9]X. Tang and M. D. F. Wong, “On Handling Arbitrary Rectilinear Shape Constraint,” in Proc. ASP-DAC., 2004, pp. 38-41.
    [10]M. Vlachos, D. Gunopulos, and G. Kollios, “Robust Similarity Measures for Mobile Object Trajectories,” in Proc. DEXA., 2002, pp. 721-728.
    [11]H. Fashandi and A. M. Eftekhari Moghaddam, “A New Rotation Invariant Similarity Measure for Trajectories,” in Proc. 2005 IEEE International Symposium on Computational Intelligence in Robotics and Automation, June 27-30. 2005, pp. 631-634.
    [12]W. J. Masek and M. S. Paterson, “A faster algorithm for computing string edit distance,” Journal of Computer and System Sciences, 20, 1980, pp. 18-31.
    [13]S. Wu and U. Manber, “Fast Text Searching: Allowing Errors,” Communications of the ACM, vol. 35, no. 10, October 1992, pp. 83-91.
    [14]A. Amir, D. Keselman, G. M. Landau, M. Lewenstein, N. Lewenstein, and M. Rodeh, “Text Indexing and Dictionary Matching with One Error,” Journal of Algorithms, vol. 37, no. 2, November 2000, pp. 309-325.
    [15]V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and reversals,” Soviet Physics Doklady, vol. 10, no. 8, 1966, pp. 707-710.
    [16]D. Sankoff and J. B. Kruskal(eds.), Time Warps, String Edits, and Macromolecules. The Theory and Practice of Sequence Comparison, 1983.
    [17]A. S. Deshpande, D. S. Richards, and W. R. Pearson, “A Platform for Biological Sequence Comparison on Parallel Computers,” Computer Applications in the Biosciences, vol. 7, no. 2, 1991, pp. 237-247.
    [18]J. Setubal and J. Meidanis, “Sequence comparison and dataBase search,” In Introduction To Computational Molecular Biology, 1997, pp. 47-103.
    [19]T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Advances in Applied Mathematics, vol. 2, 1981, pp. 482-489.
    [20]X. Huang and W. Miller, “A time-efficient, linear-space local similarity algorithm,” Advances in Applied Mathematics, vol. 12, no. 3, September 1991, pp. 337-357.
    [21]S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: A new generation of protein dataBase search programs,” Nucleic Acids Research, vol. 25, no. 17, 1997, pp. 3389-3402.
    [22]S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, March 1970, pp. 443–453.
    [23]O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of molecular Biology, vol. 162, no. 3, December 1982, pp. 705-708.
    [24]J. D. Thompson, D. G. Higgins, and T. J. Gibson, “Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, vol. 22, no. 22, 1994, pp. 4673-4680.
    [25]K. M. Caho, W. R. Pearson, and W. Miller, “Aligning Two Sequences within a Specified Diagonal Band,” Computer applications in the biosciences, vol. 8, no. 5, 1992, pp. 481-487.
    [26]K. M. Chao, “Dynamic-programming strategies for analyzing biomolecular sequences,” Department of Computer Science and Information Engineering, National Taiwan University, February 2003.
    [27]W. R. Pearson and W. Miller, “Dynamic Programming Algorithms for Biological Sequence Comparison,” Methods in Enzymology, vol. 210, 1992, pp. 575-601.
    [28]D. S. Hirschberg ”Algorithms for the longest common subsequence problem,” Journal of the ACM, vol. 24, no. 4, October 1977, pp. 664-675.
    [29]L. Bergroth, H. Hakonen, and T. Raita, “A Survey of Longest Common Subsequences Algorithms,” in Proc. Seventh International Symposium on String Processing and Information Retrieval, 2000, pp. 39-48.
    [30]A. V. Aho, D. Hirschberg, and J. D. Ullman, “Bounds on the Complexity of the Longest Common Subsequence Problem,” Journal of the ACM, vol. 23, no. 1, January 1976, pp. 1-12.
    [31]M. Du and W. Hsu, “New Algorithms for the Longest Common Subsequence Problem,” Journal of Computer and System Siences, vol. 29, 1984, pp. 133-152.
    [32]J. W. Hunt and T. G. Szymanski, “A Fast Algorithm for Computing Longest Common Subsequences,” Communication of the ACM, vol. 20, no. 5, May 1977, pp. 350-353.
    [33]Y. Kayambayashi, N .Nakatsu, and S. Yajima, “A longest common subsequence algorithm suitable for similar text strings,” Acta Informatica, vol. 18, no. 2, 1982, pp. 171-179.
    [34]W. J. Masek and M. S. Paterson, “A Faster Algorithm Computing String Edit Distances,” Journal of Computer and System Science, vol. 20, 1980, pp. 18-31.
    [35]A. Apostolico, S. Browne, and C. Guerra, “Fast Linear-Space Computations of Longest Common Subsequences,” Theoretical Computer Science, vol. 92, no. 1, January 1992, pp. 3-17.
    [36]A. Apostolico and C. Guerra, “A Fast Linear Space Algorithm Computing Longest Common Subsequences,” in Proc. of the 23rd Allerton Conf., Monticello, IL, 1985, pp. 76-84.
    [37]D. S. Hirschberg, “A Linear Space Algorithm for Computing Maximal Common Subsequences,” Communications of the ACM, vol. 18, no. 6, June 1975, pp. 341-343.
    [38]S. K. Kumar and C. P. Rangan, “A Linear-Space Algorithms for the LCS problem,” Acta Informatica, vol. 24, no. 3, June 1987, pp. 353-362.
    [39]A. Apostolico, M. J. Attalah, L. L. Larmore, and S. Mcfaddin, “Efficient Parallel Algorithms for String Editing and Related Problems,” SIMA Journal on Computing, vol. 19, no. 5, October 1990, pp. 968-988.
    [40]M. Lu and H. Lin, “Parallel Algorithms for the Longest Common Subsequence Problem,” IEEE Trans. Parall. Distr. System, vol. 5, no. 8, Aug. 1994, pp. 835-848.
    [41]T. Garcia, J. F. Myoupo, and D. Seme, “A Work-Optimal CGM Algorithm for the Longest Increasing Subsequence Problem,” in Proc. of the thirteenth annual ACM symposium on Parallel algorithms and architectures, 2001, pp. 330-331.
    [42]T. Garcia, J. F. Myoupo, and D. Seme, “A coarse-grained multicomputer algorithm for the longest common subsequence problem,” in Proc. Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003, pp. 349-356.
    [43]Y. Robert and M. Tchuente, “A systolic array for the longest common subsequence problem,” Information Processing Letters, vol. 21, no. 4, Oct. 1985, pp. 191-198.
    [44]J. H. Chang, O. H. Ibarra, and M. A. Palis, “Parallel parsing on a one-way array of finite-state machines,” IEEE Trans. Comput., vol. C-36, no. 1, Jan. 1987, pp. 64-75.
    [45]Y. C. Lin, “New systolic arrays for the longest subsequence problem,” Parallel Computing, vol. 20, no. 9, September 1994, pp. 1323-1334.
    [46]G. Luce and J. F. Myoupo, “An Efficient Linear Systolic Algorithm for Recovering Longest Common Subsequences,” IEEE First International Conference on Algorithms and Architectures for Parallel Processing, vol. 1, April 1995, pp. 20 – 29.
    [47]C. W. Wang, “An lmplementable and Efficient Systolic Algorithm for the Longest Common Subsequence Problem,” National Taiwan University of Science and Technology, 2006.
    [48]T. Oliver, B. Schmidt, and D. Maskell, “Hyper Customized Processors for Bio-Sequence DataBase Scanning on FPGAs,” ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005, pp. 229-237.
    [49]S. Dydel and P. Bala, “Large scale protein sequence alignment using FPGA reprogrammable logic devices,” LNCS Proceedings of FPL., 2004, pp. 23–32.
    [50]C. W. Yu, K. H. Kwong, K. H. Lee, and P. H. W. Leong, “A Smith-Waterman Systolic Cell,” LNCS Workshop on Field Programmable Logic and Applications, 2003, pp. 375-384.
    [51]B. West, R. D. Chamberlain, R. S. Indeck, and Q. Zhang, “An FPGA-Based Search Engine for Unstructured DataBase,” In Proc. of 2nd Workshop on Application Specific Processors, December 2003, pp. 25-32.
    [52]M. C. Herbordt, J. Model, Y. Gu, B. Sukhwani, and T. VanCourt, “Single Pass, BLAST-Like, Approximate String Matching on FPGAs,” 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, April 2006, pp. 217-226.
    [53]T. V. Court and M. C. Herbordt, “Families of FPGA-Based Accelerators for Approximate String Matching,” MicroProcessors and Microsystems, vol. 31, no. 2, March 2007, pp. 135-145.
    [54]C. L. Lu, “Gene Sequence Alignment Algorithm of Dynamic Programming,” Journal of Science Development of National Science Council, December 2005, pp. 36-41.
    [55]R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, “Biological sequence analysis,” Probabilistic Models of Proteins and Nucleic Acids, April 1998.

    QR CODE