簡易檢索 / 詳目顯示

研究生: 林宏哲
Hung-Che Lin
論文名稱: 變動長度編碼後最大共同子序列問題的快速演算法
An Efficient Algorithm for Solving the Run-Length-Encoded Longest Common Subsequence Problem
指導教授: 王有禮
Yue-Li Wang
口試委員: 徐俊傑
Chun-Chieh Hsu
劉文卿
W.T. Liou
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 37
中文關鍵詞: 最大共同子序列生物資訊演算法變動長度編碼
外文關鍵詞: algorithms, bioinformatics., longest common subsequence, run-length-encode
相關次數: 點閱:277下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

找出兩字串的最大共同子序列問題被廣泛的應用在許多領域,有許多的研究皆是在探討如何降低其計算上的複雜度。然而,隨著生物資訊的發達,使得此問題不僅要考慮計算上的複雜度,另一個更重要的議題是要考慮實際使用的空間,而這類的研究是先對欲比對的兩字串做壓縮,再找出其最大共同子序列。
因此,本論文提出一個新的演算法,找出兩字串的最大共同子序列,然而,與一般最大共同子序列問題不同的是這兩字串中有一個字串已經事先使用變動長度編碼壓縮過了。而本論文提出的方法可在時間及空間雜度皆在O(Min{mN,nM})的時間內求出最佳解,其中M和N是原始字串的長度、m和n是壓縮後字串的長度。


Finding the longest common subsequence from two strings is a well-known problem that was generally applied to diverse domains. Many related researches focus on reducing the computing time complexity. However, with the bio-informatics’ flourishing, it is not only essential to take the time complexity into account but also the space complexity. Finding the longest common subsequence from two compressed strings is also an important topic.
Therefore, we propose a new algorithm in this thesis for solving the problem on finding the longest common subsequence from two strings. In this thesis, one of the strings is compressed by run-length-encoding. Furthermore, both the time complexity and the space complexity of our proposed algorithm can be solved in O(Min{mN,nM}), where M and N are the number of characters in two strings, respectively, and m and n are the run lengths of two compressed strings, respectively.

中文摘要 I Abstract II 誌 謝 III 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究架構 2 第2章 文獻探討 3 2.1 最大共同子序列 3 2.2 傳統的最大共同子序列演算法 3 2.3 RLE編碼的最大共同子序列演算法 6 2.3.1 變動長度編碼(Run-length encoding 簡稱 RLE) 6 2.3.2 Freschi-Bogliolo’s Algorithm 6 第3章 變動長度編碼後最大共同子序列問題的快速演算法 11 3.1 演算法 11 3.1.1 剩餘的計算 17 3.1.2 剩餘調整的快速演算法 23 3.2 分析 31 第4章 結論及未來研究方向 32 4.1 結論 32 4.2 未來研究方向 34 參考文獻 35

[1] A. Apostolico, G. Landau, and S. Skiena, Matching for Run-length Encoded Strings, Journal of Complexity 15 (1) (1999) 4–16.
[2] O. Arbell, G.M. Landau, and J.S.B. Mitchell, Edit Distance of Run-length- encoded Strings, Information Processing Letters 83 (2002) 307– 314.
[3] L. Bergroth, H. Hakonen, and T. Raita, A Survey of Longest Common Subsequence Algorithms, SPIRE, 2000, pp.39-48.
[4] H. Bunke, and J. Csirik, An Improved Algorithm for Computing the Edit Distance of Run Length Coded Strings, Information Processing Letters 54 (1995) 93–96.
[5] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, MIT Press, Cambridge, MA, 1990.
[6] M. Crochemore, G.M. Landau, and M. Ziv-Ukelson, A Subquadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices, Proceedings of ACM-SIAM Symposium on Discrete Algorithms, 2002, pp. 679–688.
[7] D. Eppstein, Z. Galil, R. Giancarlo, and G. Italiano, Sparse Dynamic programming I: Linear cost functions, Journal of the ACM 39 (1992) 519–545.
[8] V. Freschi, and A. Bogliolo, Longest Common Subsequence between Run-length-encoded Strings: A New Algorithm with Improved Parallelism, Information Processing Letters 90 (2004) 167– 173.
[9] D. S. Hirschberg, Algorithms for the Longest Common Subsequence Problem, Journal of the ACM, 24, 1977, pp.664-675
[10] J.W. Hunt, and T.G. Szymanski, A Fast Algorithm for Computing Longest Common Subsequences, Communications of the ACM 20 (1977) 350– 353.
[11] W. J. Masek, and M. S. Paterson, A Faster Algorithm Computing String Edit Distances, Journal of Computer and System Sciences 20, 1980, pp.18-31
[12] K. Sayoood, and E. Fow (Eds.), Introduction to Data Compression, second edition, Morgan Kaufmann, Los Altos, CA, 2000.
[13] T. Smith, and M. Waterman, Indentification of Common Molecular Subsequences, Journal of Molecular Biology, 147, 1981, pp.195-197
[14] R. A. Wanger, and M. J. Fischer, The String-to-string Correction Problem, Journal of the ACM, 21, 1974, pp.168-173
[15] J. Ziv, and A. Lempel, Compression of Individual Sequences Via Variable-rate Coding, IEEE Transactions on Information Theory II-24 (1978) 530–536.

QR CODE