研究生: |
楊馨豪 Hsin-Hao Yang |
---|---|
論文名稱: |
應用A*演算法於多重序列排比 Multiple Sequence Alignment using A-Star Algorithm |
指導教授: |
鮑興國
Hsing-Kuo Pao |
口試委員: |
呂永和
none 孫敏德 none 李育杰 Yuh-Jye Lee |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 42 |
中文關鍵詞: | 動態規劃法 、A*演算法 、多重序列排比 |
外文關鍵詞: | Dynamic programming, A* algorithm, Multiple sequence alignment |
相關次數: | 點閱:235 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多重序列排比廣泛的應用在去氧核醣核酸與蛋白質序列的分析,正確的序列幫助我們辨識序列間共同的保留區域與相依關係,可作為預測蛋白質三級結構之基礎。在現今多種排比演算法中,動態規劃演算法( dynamic programming )以求取最佳解為目標,因此普遍較能獲取良好的排比結果,然而使用在多序列排比時其計算過程所需之時間與記憶體空間始終為發展瓶頸。為此,我們提出切分的概念,先將未排比的序列以貪婪式策略找出數組切分位置,並將各序列依各切點切分而得數組序列片段,這些片段將分別經由MSA演算法進行多重序列排比,以及導入A*演算法進行片段序列排比結果之修正與整合,目的在於利用所設計的估計函式使A*演算法在搜尋過程中捨棄過差的排比,進而趨向更合適的排比結果。
A*演算法適合使用於解決最短路徑搜尋問題,經由設計合適的啟發式函式( heuristic function ),將搜尋結果導往最佳解。我們加入此演算法其目的除了幫助減低動態規劃過程所需之記憶體空間,與多數演算法僅能顧及全域性序列排比與區域性序列排比兩者之一相比,我們希望利用序列切分演算法保持區域性序列的特性;A*演算法藉由估計函式評估先前的片段序列排比結果是否合適,並同時調整緊接在後的片段序列,利用此修正整合與動態規劃排比演算法的方式達到全域性之序列排比,特別是針對長序列且相似度低的排比問題。
Multiple sequence alignment is a wide range of applications in DNA and protein sequence analysis. Correct alignment result can help us to find conserved regions in each sequence in addition to observe the sequences similarity. Even as a basis for prediction of protein structure. Dynamic programming method purpose obtain the optimal solution and get more better alignment results generally in a variety of algorithms today. However, the execute time and required memory space become bottleneck in dynamic programming procedure of multiple sequence alignment. For this reason, we propose a concept of segmentation. We find out some cut position of input sequences by greedy method, and according to each cut point in each sequence, dividing them and get fragment sets. These fragments are aligned and combined by MSA algorithm, and the A* algorithm applied in adjustment and integration of each fragment aligned result. The purpose is to use the estimated function in A* search process to abandon unreasonable alignment, so we can obtain whole multiple sequence alignment finally.
A * algorithm is suitable to solve the shortest path search problem. To find the optimal solution, through the design of appropriate heuristic function. Our proposed algorithm implement sequence cut, revise and combine method. It can be applied to both, global and local alignment problems compared with other alignment algorithm, and reduced memory space during dynamic programming procedure.
[1]S.B. Needleman and C.D. Wunsch. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 48, 443-453, 1970.
[2]J.D. Thompson, et al. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One6, e18093, 2011.
[3]M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. Pp. 345–352, 1978.
[4]S. Henikoff, J.G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919, 1992.
[5]S.F. Altschul. Gap Costs for Multiple Sequence Alignment. J. Theor. Biol. 138 297-309, 1989.
[6]S.F. Altschul, W. Gish, W. Miller, E. Myers, D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3): 403–410, 1990.
[7]D.J. Lipman, W. Pearson. Rapid and sensitive protein similarity searches. Science 227 (4693): 1435–41, 1985.
[8]T.F Smith and M.S. Waterman. Identification of Common Molecular Subsequences. Journal of Molecular Biology 147: 195–197, 1981.
[9]L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J Comput Biol 1 (4): 337–48, 1994.
[10]H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J.Appl. Math. 48, 1073-1082, 1988.
[11]S.F. Altschul, R.J. Carroll, D.J. Lipman. Weights for Data Related by a Tree.J. Molec. Biol. 207. 647-653, 1989.
[12]J. Stoye. Multiple sequence alignment with the divide-and-conquer method. Gene, 211(2), pp. GC45–GC56, 1998.
[13]S. K. Gupta, J. Kececioglu, and A.A. Schaffer. Improving the Practical Sapce and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment. J. Computational Biology, Fall;2(3): 459—472, 1995.
[14]P. Hogeweg and B. Hesper. The alignment of sets of sequences and the construction of phylogenetic trees : an integrated method. J. Mol. Evol. 20, 175-186, 1984.
[15]J.D. Thompson. et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680, 1994.
[16]N. Saitou and M. Nei. The Neighbor-joining Method: A New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4, 406-425, 1987.
[17]C. Notredame, et al. T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205–217, 2000.
[18]I.M. Wallace, O. O’Sullivan, D.G. Higgins, and C. Notredame. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res., 34, 1692–1699, 2006.
[19]A.R. Subramanian, M. Kaufmann, B. Morgenstern. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3: 6, 2008.
[20]B. Morgenstern, K. French, A. Dress, and T. Werner. DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14: 290–294, 1998
[21]P.E. Hart, N.J. Nilsson, B. Raphael. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics SSC44 (2): 100–107, 1968.
[22]G. Blackshields, et al. Analysis and comparison of benchmarks for multiple
sequence alignment. In Silico Biol., 6, 321–339, 2006.
[23]J.D. Thompson, F. Plewniak, and O. Poch. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Nucleic Acids Res., 15, 87–88, 1999.
[24]J. Blazewicz, et al. Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark. Int. J. Appl. Math. Comput. Sci., 19, 675–678, 2009.