簡易檢索 / 詳目顯示

研究生: 莊旺昇
Wang-Sheng Juang
論文名稱: 多重序列排比使用動態規劃及粒子群最佳化
Multiple Sequence Alignment Using Dynamic Programming and Particle Swarm Optimization
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 王文俊
none
黃有評
none
陶金旺
none
莊鎮嘉
none
王偉彥
none
鍾聖倫
none
鄭錦聰
none
李仁鐘
none
學位類別: 博士
Doctor
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 103
中文關鍵詞: 動態規劃多重序列排比粒子群最佳化
外文關鍵詞: dynamic programming, multiple sequence alignment, particle swarm optimization
相關次數: 點閱:366下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 動態規劃(Dynamic Programming)被廣泛的使用在多重序列排比的問題上。無論如何,當排比序列的數量在兩列以上時,多維度動態規劃可能會遭受到大量儲存及計算複雜度之苦。因此,往往在做多重序列排比時會利用漸進配對式動態規劃(Progressive Pairwise Dynamic Programming)。但是,這樣的做法也會遭遇到局部最佳化的問題。在本文中,關於解多重序列排比的問題我們建議一種混合演算法,這種演算法結合配對式動態規劃及粒子群最佳化(Particle Swarm Optimization)的技巧以克服上述的缺點。在這個演算法中,配對式動態規劃係以漸進的方式排比序列,而粒子群最佳化的技巧則是被用來避免排比的結果陷入局部最佳化。在我們的研究中,從兩個方面去考慮配對式動態規劃。首先,考慮計算序列所有可能的配對分數以形成多重序列排比時的排比次序。另一個方法是透過考慮隨機配對次序(random pair order)以省略計算序列所有可能的配對分數。在我們的研究裏,許多蛋白質的資料組被用來當作例子以證實我們的方法優於最常被廣泛使用的多重序列排比方法ClustalW。實驗結果證明了我們所建議的方法之優異性能(performance),換言之,建議的方法的確是大有可為的。從我們的實驗可以發現,在漸進式動態規劃中使用隨機選取配對次序會比使用配對分數次序產生較好的成果。最後的結論是配對分數次序的使用將引導序列匹配(sequence match)走向局部最佳結果,因此,使用隨機次序(random order)可以產生較佳的匹配局面。


    Dynamic Programming (DP) is wildly used in Multiple Sequence Alignment (MSA) problems. However, when the number of the considered sequences is more than two, multiple dimensional DP may suffer from large storage and computational complexities. Often, progressive pairwise DP is employed for MSA. However, such an approach may suffer from local optimal problems. In this dissertation, we propose a hybrid algorithm combining the pairwise DP and the particle swarm optimization (PSO) techniques to overcome the above drawbacks while solving MSA. In the algorithm, pairwise DP is used to align sequences progressively and PSO is employed to avoid the result of alignment being trapped into local optima. In our study, the pairwise dynamic programming is considered from two aspects. The first approach is to calculate all the possible pairwise alignment scores to form a MSA aligned order. The other approach is to omit the process of calculating all the possible pairwise alignment scores by considering a random pair order. In our study, several data sets of proteins are used as examples to demonstrate that our approach is superior to the most widely used multiple sequence alignment approach ClustalW. Experimental results show excellent performance of proposed algorithms. In other words, the proposed approaches are indeed promising. From our experiments, it can be found that the random selected pair order in progressive DP can result in better performance than that of using the pair scores’ order. It can be concluded that the use of pair scores’ order will lead the sequence match to a local optimum. Thus, the use of random order may yield better matching situations.

    Contents List of Figures List of Tables Appendix Chapter 1 Introduction 1.1 Overview and Motivation 1.2 Definition of Sequence Alignment 1.3 Substitution Matrices 1.4 Definition of Multiple Sequence Alignment 1.5 Summary Chapter 2 Improving the Results of Multiple Sequence Alignment with Particle Swarm Optimization 2.1 Particle Swarm Optimization 2.1.1 The Standard Particle Swarm Optimizer 2.1.2 A Modified Particle Swarm Optimizer 2.2 Improving the Results of MSA with PSO 2.3 Experiments and Discussion 2.4 Summary Chapter 3 Studies in Dynamic Programming 3.1 Dynamic Programming 3.2 Pairwise Dynamic Programming 3.3 Summary Chapter 4 Multiple Sequence Alignment Using Dynamic Programming and Particle Swarm Optimization 4.1 Introduction to Combine PSO with DP 4.2 MDPPSO Algorithm 4.3 Experiments and Discussion 4.4 Summary Chapter 5 Incorporating Random Pairwise Dynamic Programming with Particle Swarm Optimization in Solving Multiple Sequence Alignment 5.1 Introduction 5.2 The Proposed Algorithm 5.3 Experiments and Discussion 5.4 Summary Chapter 6 Conclusions and Future Work 6.1 Conclusions 6.2 Contributions 6.3 Future Research References

    References
    [1] K. Karadimitriou and D. H. Kraft, “Genetic algorithms and the multiple sequence alignment problem in biology,” Proceedings of the Second Annual Molecular Biology and Biotechnology Conference, 1996.
    [2] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequences of two proteins,” Journal of Molecular Biology, Vol. 48, No. 4, pp. 443-453, 1970.
    [3] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, Vol. 147, No. 1, pp. 195–197, 1981.
    [4] S. C. Chen, A. K. C. Wong, and D. K. Y. Chiu, “A survey of multiple sequence comparison methods,” Bulletin of Mathematical Biology, Vol. 54, pp. 563-598, 1992.
    [5] M. A. McClure, T. K. Vasi, and W. M. Fitch, “Comparative analysis of multiple protein sequence alignment methods,” Molecular Biology and Evolution, Vol. 11, No. 4, pp. 571-592, 1994.
    [6] J. D. Thompson, F. Plewniak, and O. Poch, “A comprehensive comparison of multiple sequence alignment programs,” Nucleic Acids Research, Vol. 27, No. 13, pp. 2682-2690, 1999.
    [7] C. Notredame, "Recent progresses in multiple sequence alignment: a survey," Pharmacogenomics, 3 (1), pp. 131-144, 2002.
    [8] C., Notredame, "Recent evolutions of multiple sequence alignment algorithms," PLOS Computational Biology Vol. 3, Issue 8, pp. 1405-1408, 2007.
    [9] I. M. Wallace, G. Blackshields, and D. G.. Higgins, "Multiple sequence alignments," Current Opinion in Structural Biology Vol. 15, pp. 261–266, 2005.
    [10] D. F. Feng and R. F. Doolittle, “Progressive sequence alignment as a prerequisite to correct phylogenetic trees,” Journal of Molecular Evolution, Vol. 25, pp. 351-360, 1987.
    [11] J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, Vol. 22, No. 22, pp. 4673-4680, 1994.
    [12] H. Carrillo and D. J. Lipman, “The multiple sequence alignment problem in biology,” SIAM J. Appl. Math, Vol. 48, pp. 1073-1082, 1988.
    [13] D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, “A tool for multiple sequence alignment,” Proceedings of National Academy of Science USA, Vol. 86, pp. 4412-4415, 1989.
    [14] J. Stoye, V. Moulton, and A. W. Dress, “DCA: An efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment,” Computer Applications in the Biosciences, Vol. 13, No. 6, pp. 625-626, 1997.
    [15] J. Stoye, “Multiple sequence alignment with the divide-and-conquer method,” Gene, Vol. 211, No. 2, pp. GC45-GC56, 1998.
    [16] E. W. Myers and W. Miller, “Multiple sequence alignment using simulated annealing,” Computer Applications in the Biosciences, Vol. 4, No. 1, pp.11-17, 1988.
    [17] M. Isokawa, M. Wayama, and T. Shimizu, “Multiple sequence alignment using a genetic algorithm,” Genome Informatics, Vol. 7, pp. 176-177, 1996.
    [18] C. Notredame and D. G. Higgins, “SAGA: Sequence alignment by genetic algorithm,” Nucleic Acids Research, Vol. 24, No. 8, pp. 1515-1524, 1996.
    [19] M. F. Omar, R. A. Salam, N. A. Rashid, and R. Abdullah, “Multiple sequence alignment using genetic algorithm and simulated annealing,” Proceeding of the 2004 IEEE 12th Conference on Signal Processing and Communications Application, pp. 28–30, 2004.
    [20] B.-H. Yang, An Approach to Multiple Protein Sequence Alignment Using A Genetic Algorithm, Master Thesis, National Central University, Department of Computer Science and Information Engineering, 2001.
    [21] C. Zhang, and A. K. C. Wong, “Toward efficient multiple molecular sequence alignment: A system of genetic algorithm and dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics-part B, Vol. 27, No. 6, pp. 918-932, 1997.
    [22] K. Chellapilla and G. B. Fogel, “Multiple sequence alignment using evolutionary programming,” Proceedings of the 1999 IEEE Congress on Evolutionary Computation, pp. 445–452, 1999.
    [23] K.-H. Liu, Multiple Sequence Alignment: An Evolutionary Programming Based Algorithm with Local Search, Master Thesis, National Taiwan University of Science and Technology, Department of Electrical Engineering, 2003.
    [24] R. Thomsen, G. B. Fogel, and T. Krink, “A Clustal alignment improver using evolutionary algorithms,” Proceedings of the 2002 IEEE Congress on Evolutionary Computation, Vol. 1, pp. 121-126, 2002.
    [25] R. L. Tatusov, E. V. Koonin, and D. J. Lipman, “A genomic perspective on protein families,” SCIENCE, Vol. 278, No. 24, 1997.
    [26] R. L. Tatusov, et al., “The COG database: an updated version includes eukaryotes,” BMC Bioinformatics, Vol. 4, 41, 2003.
    [27] J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, Vol. 22, No. 22, pp. 4673-4680, 1994.
    [28] R. C. Edgar, “MUSCLE: Multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Research, Vol. 31, No. 32, No. 5 pp. 1792-1797, 2004.
    [29] K. Katoh, K. Misawa, K. Kuma, and T. Miyata, “MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform,” Nucleic Acids Research, Vol. 30, No. 14, pp. 3059-3066, 2002.
    [30] K. Katoh, K. Kuma, H. Toh, and T. Miyata, “MAFFT version 5: Improvement in accuracy of multiple sequence alignment,” Nucleic Acids Research, Vol. 33, No. 2, pp. 511-518, 2005.
    [31] C. Notredame, D. G. Higgins, and J. Heringa, “T-Coffee: A novel method for fast and accurate multiple sequence alignment,” Journal of Molecular Biology, Vol. 302, No. 1, pp. 205–217, 2000.
    [32] O. Poirot, E. O'Toole, and C. Notredame, “Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments,” Nucleic Acids Research, Vol. 31, No. 13, pp. 3503-3506, 2003.
    [33] A. R. Subramanian, J. Weyer-Menkhoff, M. Kaufmann, and B. Morgenstern, “DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment,” BMC Bioinformatics, 6:66, 2005.
    [34] X. Huang, “On global sequence alignment,” CABIOS, Vol. 10, No. 3, pp. 227-235, 1994.
    [35] D. P. Clark, and L. D. Russell, Molecular Biology Made Simple and Fun, 2nd edition, chapter 7, 2000.
    [36] Dayhoff, M.O., Schwartz, R.M., and Orcutt, B. C., “A Model of Evolutionary Change in Proteins,” Atlas of protein sequence and structure, Vol. 5, pp. 345-352, 1978.
    [37] R.M. Schwartz and M. O. Dayhoff, Matrices for detecting distant relationships,” Atlas of protein sequence and structure, Vol. 5, pp. 353-358, 1978.
    [38] S. Henikoff and J. G., Henikoff, “Amino acid substitution matrices from protein blocks,” Proc. Natl. Acad. Sci. USA, Vol. 89, 10915-10919, 1992.
    [39] G. H. Gonnet, M. A. Cohen, and S. A. Benner, “Optimal scoring matrices for estimating distances between aligned sequences,” Science, Vol 256, No. 5062, pp. 1443-1445, 1992.
    [40] R. Eberhart, and J. Kennedy, “A new optimizer using particle swarm theory,” Proceedings of the Sixth IEEE International Symposium on Micro Machine and Human Science, pp. 39- 43, 1995.
    [41] J. Kennedy and R. Eberhart, “Particle swarm optimization,” IEEE International Conference on Neural Networks, pp. 1942-1948, 1995.
    [42] M. A. Abido, “Optimal design of power-system stabilizers using particle swarm optimization,” IEEE Transaction on Energy Conversion, Vol. 17, No. 3, pp. 406-413, 2002.
    [43] A. I. S. Kumar, K. Dhanushkodi, J. J. Kumar, and C. K. C. Paul, “Particle swarm optimization solution to emission and economic dispatch problem,” TENCON 2003 Conference on Convergent Technologies for Asia-Pacific Region, Vol. 1, pp. 435-439, 2003.
    [44] J. Robinson, and Y. Rahmat-Samii, “Particle swarm optimization in electromagnetics,” IEEE Transactions on Antennas and Propagation, Vol. 52, No. 2, pp. 397-407, 2004.
    [45] S. Baskar, R. T. Zheng, A. Alphones, and P. N. Suganthan, “Particle swarm optimization for the design of low-dispersion fiber Bragg gratings,” IEEE Photonics Technology Letters, Vol.17, No.3, pp. 615-617, 2005.
    [46] D. W. Van der Merwe and A. P. Engelbrecht, “Data clustering using particle swarm optimization,” The 2003 IEEE Congress on Evolutionary Computation, Vol. 1, pp. 215-220, 2003.
    [47] W. Wang, Y. Lu, J. S. Fu, and Y. Z. Xiong, “Particle swarm optimization and finite-element based approach for microwave filter design,” IEEE Transactions on Magnetics, Vol. 41, No. 5, pp. 1800-1803, 2005.
    [48] W. H. Slade, H. W. Ressom, M. T. Musavi, and R. L. Miller, “Inversion of ocean color observations using particle swarm optimization,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, No. 9, pp. 1915-1923, 2004.
    [49] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” IEEE International Conference on Evolutionary Computation, pp. 69-73, 1998.
    [50] Z.-L. Gaing, “Discrete particle swarm optimization algorithm for unit commitment,” IEEE Power Engineering Society General Meeting, Vol. 1, pp. 13-17, 2003.
    [51] H.-Y. Lv; W.-G. Zhou, and C.-G. Zhou, “A discrete particle swarm optimization algorithm for phylogenetic tree reconstruction,” IEEE International Conference on Machine Learning and Cybernetics, Vol. 4, pp. 2650 – 2654; 2004.
    [52] C.-L. Huang and C.-H., Tung, “Using mutation to improve discrete particle swarm optimization for single machine total weighted tardiness problem,” IEEE World Automation Congress, pp. 1-6, 2006.
    [53] S. Chandrasekaran, S. G. Ponnambalam, R. K. Suresh, and N. Vijayakumar, “A hybrid discrete particle swarm optimization algorithm to solve flow shop scheduling problems,” IEEE Conference on Cybernetics and Intelligent Systems, pp. 1-6, 2006.
    [54] P. F. Rodriguez, L. F. Nino, and O. M. Alonso, “Multiple sequence alignment using swarm intelligence,” International Journal of Computational Intelligence Research, Vol. 3, No. 2, pp. 123-130, 2007.
    [55] O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, Vol. 162, No. 3, pp. 705–708, 1982.
    [56] D. S. Hirschberg, “A linear space algorithm for computing longest common subsequences,” Communications Association Computing Machinery, Vol. 18, No. 6, pp.341–343, 1975.
    [57] S. Karlin and S. F. Altschul, “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes,” Proceedings of the National Academy of Science USA, Vol. 87, No. 6, pp. 2264–2268, 1990.
    [58] S. F. Altschul, “A protein alignment scoring system sensitive at all evolutionary distances,” Journal of Molecular Evolution, Vol. 36, No. 3, pp. 290-300, 1993.
    [59] W.-S. Juang and S.-F. Su, “Multiple sequence alignment using modified dynamic programming and particle swarm optimization,” Journal of the Chinese Institute of Engineers, Vol. 31, No. 4, 2008.
    [60] R. Kolodny, P. Koehl and M. Levitt, “Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures,” Journal of Molecular Biology, Vol. 346, Issue 4, pp. 1173-1188, 2005.

    QR CODE