簡易檢索 / 詳目顯示

研究生: 林珈瑋
Chia-wei Lin
論文名稱: 在兩條序列中找出包含限制字串的最長共同子序列
Finding the longest common subsequence with a constrained string between two sequences
指導教授: 徐俊傑
Chiun-chieh Hsu
口試委員: 黃世禎
Sun-jen Huang
蕭培墉
Pei-yung Hsiao
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 45
中文關鍵詞: 動態規劃A*演算法AOE網路圖限制字串最長共同子序列
外文關鍵詞: dynamic programming, A* algorithm, AOE network graph, constrained string, longest common subsequence
相關次數: 點閱:143下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

序列比對問題是在兩或多條序列中比較相似度的問題,它在生物計算學和數學領域都是非常重要的課題,到目前為止,最長共同子序列問題仍然是一個NP-Complete的問題,但是我們可以透過加入限制條件,找出更符合我們需要的資訊,在這篇論文中,我們研究一個加入限制的序列比對問題-加入一個限制字串,找出包含此限制字串個數最多的最長共同子序列,為了解決這個問題,我們提出了兩個方法,第一個是A*演算法,但是A*演算法在最差的情況下,需要相當多的時間,所以A*演算法在最差的情況下是不可行的方法,第二個方法則透過了在AOE(Activity on Edge)網路圖中找尋關鍵路徑的方法來解決,而也降低了計算所需要的時間複雜度,但後者只能在序列中的限制子字串沒有重疊的情況之下使用。


Sequence alignment problem is to compare the similarity of two or multiple sequences. It is very important in computational biology and mathematical fields. Up to now, the longest common subsequence problem is a NP-Complete problem. In this thesis, we study this problem with additional constraints. We add one new element into the problem: a constrained string. In addition to find the longest common subsequence, we would like to find the longest common subsequence with most constrained strings.
In order to solve this problem, we propose two methods. One is A* algorithm, but in worst case, the time complexity is too large. The other is implemented by applying finding critical path in AOE network graph. The method can obtain a smaller time complexity, but it just works under the condition of no overlapping.

論文摘要 Ⅰ ABSTRACT Ⅱ 誌謝 Ⅲ 目錄 Ⅳ 圖索引 Ⅴ 表索引 Ⅶ 第一章、緒論 - 1 - 1.1 研究背景與研究動機 - 1 - 1.2 研究架構 - 2 - 第二章、相關文獻探討 - 4 - 2.1 傳統的最長共同子序列問題的研究 - 4 - 2.2 編輯距離的相關研究 - 5 - 2.3 加入限制條件的最長共同子序列 - 7 - 2.3.1 在題目中加入一個限制序列 - 7 - 2.3.2 考慮序列中的限制字串位置範圍與順序資訊 - 8 - 2.4 以A*演算法解決最長共同子序列的研究 - 8 - 第三章、以A*演算法來找出兩條序列中包含最多次限制子字串的最長共同子序列 - 10 - 3.1 A*演算法的介紹 - 10 - 3.1.1 演算法中所用到的定義 - 12 - 3.1.2 A*演算法步驟 - 13 - 3.1.3 本方法的例子說明 - 14 - 3.2 本章演算法分析 - 20 - 第四章、以AOE演算法找出包含限制子字串的最長共同子序列 - 23 - 4.1 以動態規劃方法求出最長共同子序列 - 23 - 4.2 計算區塊間的距離 - 25 - 4.3 使用AOE演算法找出關鍵路徑 - 29 - 4.4 以AOE演算法找出包含限制子字串最多次的最長共同子序列 - 33 - 4.5 找出包含限制字串的最長共同子序列演算法 - 35 - 4.6 演算法分析 - 39 - 第五章、結論 - 42 - 參考文獻 - 43 -

[1] Apostolico, A., “Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings,” Information Processing Letters, Vol. 23, pp. 63–69, 1986.
[2] Apostolico, A. and Guerra, C., “The longest common subsequence problem revisited,” Algorithmica, No. 2, pp. 315-336, 1987.
[3] Baker, B. S. and Giancarlo, R., “Sparse dynamic programming for longest common subsequence from fragments,” Journal of Algorithms, Vol. 42, pp. 231-254, 2002.
[4] Boyer, Robert S. and Moore, J Strother, “A fast string searching algorithm,” Journal of the ACM, Vol. 20, No. 10, pp. 762-772, 1977.
[5] Chung Yun-Sheng, “Constrained alignment with range and ordering semantic information,” Master Thesis, Department of Computer Science, National Tsing-Hua University, 2004.
[6] Chin, Francis, Y.L., Santis, Alfredo De, Ferrara, Anna Lisa, Ho, N.L., Kim, S.K., “A simple algorithm for the constrained sequence problems,” Information Processing Letters, Vol. 90, pp. 175-179, 2004.
[7] Guo, J. Y. and Hwang, F. K., “An almost-linear time and linear space algorithm for the longest common subsequence problem,” Information Processing Letters, Vol. 94, pp. 131-135, 2005.
[8] Hirschberg, D. S., “A linear space algorithm for computing maximal common subsequences,” Communications of the ACM, Vol. 18, No. 6, pp. 341-343, 1975.
[9] Hirschberg, D.S., “Algorithms for the longest common subsequence problem,” Journal ACM, Vol. 24, pp. 664-675, 1977.
[10] Hunt, J. W. and Szymanski, T. G., “A fast algorithm for computing longest common subsequences,” Communications of the ACM, Vol. 20, No. 5, pp.350-353, 1977.
[11] Hsu, Jerone Jia-Rong, “The application of the A* algorithm to the sequence alignment problem,” Master Thesis, Department of Computer Science and Engineering, National Chi-Nan University, 2004.
[12] Knuth, D., Morris, J. and Pratt, V., “Fast Pattern Matching in Strings,” SIAM Journal on Computing, Vol. 6, pp. 323-350, 1977.
[13] Maxime, Crochemore, Costas, S. Iliopoulos, Yoan, J. Pinzon, James, F. Reid, “A fast and practical bit-vector algorithm for the longest common subsequence problem,”Information Processing Letters, Vol. 80, pp. 279–285, 2001.
[14] Masek, W.J. and Paterson, M.S., “A faster algorithm computing string edit distances,” Journal Computer System Science, Vol. 20, pp. 18–31, 1980.
[15] Needleman, S. B. and Wunsch, C. D., “A general method applicable to the search for similarities in the amino acid sequence of two proteins,”Journal of Molecular Biology, Vol. 48, pp.443-453, 1970.
[16] Pan, Yu-Mei, “Application of the A* algorithm to solve the longest common subsequence from fragments problem,” Master Thesis, Department of Computer Science and Engineering, National Chi-Nan University, 2005.
[17] Peng, Chao-Li, “An approach for solving the constrained longest common subsequence problem,” Master Thesis, Department of Computer Science and Engineering, National Sun Yat-Sen University, 2003.
[18] Pevzner, P.A. and Waterman, M. S., “Generalized sequence alignment and duality,” Advance Applied Mathematics, Vol. 14, pp. 139-171, 1993.
[19] Smith, T. F. and Waterman, M. S., “Comparison of biosequences,” Advances in applied mathematics, Vol. 2, pp. 482-489, 1981.
[20] Tsai, Yin-Te, “The constrained longest common subsequence problem,” Information Processing Letters, Vol. 88, pp. 173-176, 2003.
[21] Wagner, R. A. and Fisher, M. J., “The String-to-string correction problem,” Journal of the ACM, Vol. 21, No. 1, pp. 168-173, 1975.
[22] Wu, S., Manber, G., Myers, G., and Miller, W., “An O(NP) sequence comparison algorithm,” Information Processing Letters, Vol. 35, pp. 317-323, 1990.
[23] Lee, R. C. T., Chang, R. C., Tseng, S. S. and Tsai, Y. T., “Introduction to the Design and Analysis of Algorithms,” Flag Corporation, Second Edition, ISBN:957-717-777-8, 1991.

QR CODE