簡易檢索 / 詳目顯示

研究生: 楊馨豪
Hsin-Hao Yang
論文名稱: 應用A*演算法於多重序列排比
Multiple Sequence Alignment using A-Star Algorithm
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 呂永和
none
孫敏德
none
李育杰
Yuh-Jye Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 42
中文關鍵詞: 動態規劃法A*演算法多重序列排比
外文關鍵詞: Dynamic programming, A* algorithm, Multiple sequence alignment
相關次數: 點閱:235下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

多重序列排比廣泛的應用在去氧核醣核酸與蛋白質序列的分析,正確的序列幫助我們辨識序列間共同的保留區域與相依關係,可作為預測蛋白質三級結構之基礎。在現今多種排比演算法中,動態規劃演算法( dynamic programming )以求取最佳解為目標,因此普遍較能獲取良好的排比結果,然而使用在多序列排比時其計算過程所需之時間與記憶體空間始終為發展瓶頸。為此,我們提出切分的概念,先將未排比的序列以貪婪式策略找出數組切分位置,並將各序列依各切點切分而得數組序列片段,這些片段將分別經由MSA演算法進行多重序列排比,以及導入A*演算法進行片段序列排比結果之修正與整合,目的在於利用所設計的估計函式使A*演算法在搜尋過程中捨棄過差的排比,進而趨向更合適的排比結果。
A*演算法適合使用於解決最短路徑搜尋問題,經由設計合適的啟發式函式( heuristic function ),將搜尋結果導往最佳解。我們加入此演算法其目的除了幫助減低動態規劃過程所需之記憶體空間,與多數演算法僅能顧及全域性序列排比與區域性序列排比兩者之一相比,我們希望利用序列切分演算法保持區域性序列的特性;A*演算法藉由估計函式評估先前的片段序列排比結果是否合適,並同時調整緊接在後的片段序列,利用此修正整合與動態規劃排比演算法的方式達到全域性之序列排比,特別是針對長序列且相似度低的排比問題。


Multiple sequence alignment is a wide range of applications in DNA and protein sequence analysis. Correct alignment result can help us to find conserved regions in each sequence in addition to observe the sequences similarity. Even as a basis for prediction of protein structure. Dynamic programming method purpose obtain the optimal solution and get more better alignment results generally in a variety of algorithms today. However, the execute time and required memory space become bottleneck in dynamic programming procedure of multiple sequence alignment. For this reason, we propose a concept of segmentation. We find out some cut position of input sequences by greedy method, and according to each cut point in each sequence, dividing them and get fragment sets. These fragments are aligned and combined by MSA algorithm, and the A* algorithm applied in adjustment and integration of each fragment aligned result. The purpose is to use the estimated function in A* search process to abandon unreasonable alignment, so we can obtain whole multiple sequence alignment finally.
A * algorithm is suitable to solve the shortest path search problem. To find the optimal solution, through the design of appropriate heuristic function. Our proposed algorithm implement sequence cut, revise and combine method. It can be applied to both, global and local alignment problems compared with other alignment algorithm, and reduced memory space during dynamic programming procedure.

摘要i Abstractii 誌謝iii 目錄iv 圖目錄vi 表目錄vii 第1章緒論1 1.1研究動機1 1.2研究目的2 1.3論文架構3 第2章相關研究及文獻探討4 2.1序列排比演算法4 2.1.1胺基酸之相似度計分系統4 2.1.2雙序列排比 ( Pairwise Alignment )6 2.1.3多重序列排比 ( Multiple Sequence Alignment )7 2.2其他排比演算法10 2.3A*演算法簡介11 第3章研究方法13 3.1序列切分演算法14 3.2多重序列排比19 3.3應用A*演算法於多重序列排比20 第4章實驗結果與分析23 4.1序列組介紹23 4.2實驗環境與參數設定24 4.3排比結果之評比24 4.4實驗結果與分析27 4.4.1序列切分結果27 4.4.2多重序列排比結果31 4.4.3計分矩陣與參數的影響35 第5章結論與未來工作38 5.1結論38 5.2未來工作39 參考資料40

[1]S.B. Needleman and C.D. Wunsch. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 48, 443-453, 1970.
[2]J.D. Thompson, et al. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One6, e18093, 2011.
[3]M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. Pp. 345–352, 1978.
[4]S. Henikoff, J.G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919, 1992.
[5]S.F. Altschul. Gap Costs for Multiple Sequence Alignment. J. Theor. Biol. 138 297-309, 1989.
[6]S.F. Altschul, W. Gish, W. Miller, E. Myers, D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3): 403–410, 1990.
[7]D.J. Lipman, W. Pearson. Rapid and sensitive protein similarity searches. Science 227 (4693): 1435–41, 1985.
[8]T.F Smith and M.S. Waterman. Identification of Common Molecular Subsequences. Journal of Molecular Biology 147: 195–197, 1981.
[9]L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J Comput Biol 1 (4): 337–48, 1994.
[10]H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J.Appl. Math. 48, 1073-1082, 1988.
[11]S.F. Altschul, R.J. Carroll, D.J. Lipman. Weights for Data Related by a Tree.J. Molec. Biol. 207. 647-653, 1989.
[12]J. Stoye. Multiple sequence alignment with the divide-and-conquer method. Gene, 211(2), pp. GC45–GC56, 1998.
[13]S. K. Gupta, J. Kececioglu, and A.A. Schaffer. Improving the Practical Sapce and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment. J. Computational Biology, Fall;2(3): 459—472, 1995.
[14]P. Hogeweg and B. Hesper. The alignment of sets of sequences and the construction of phylogenetic trees : an integrated method. J. Mol. Evol. 20, 175-186, 1984.
[15]J.D. Thompson. et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680, 1994.
[16]N. Saitou and M. Nei. The Neighbor-joining Method: A New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4, 406-425, 1987.
[17]C. Notredame, et al. T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205–217, 2000.
[18]I.M. Wallace, O. O’Sullivan, D.G. Higgins, and C. Notredame. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res., 34, 1692–1699, 2006.
[19]A.R. Subramanian, M. Kaufmann, B. Morgenstern. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3: 6, 2008.
[20]B. Morgenstern, K. French, A. Dress, and T. Werner. DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14: 290–294, 1998
[21]P.E. Hart, N.J. Nilsson, B. Raphael. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics SSC44 (2): 100–107, 1968.
[22]G. Blackshields, et al. Analysis and comparison of benchmarks for multiple
sequence alignment. In Silico Biol., 6, 321–339, 2006.
[23]J.D. Thompson, F. Plewniak, and O. Poch. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Nucleic Acids Res., 15, 87–88, 1999.
[24]J. Blazewicz, et al. Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark. Int. J. Appl. Math. Comput. Sci., 19, 675–678, 2009.

無法下載圖示 全文公開日期 2016/01/31 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE