簡易檢索 / 詳目顯示

研究生: 李建和
Chien-he Lee
論文名稱: 運用貝氏網路於微陣列基因表現資料集
Apply Bayesian Network on Microarray Gene Expression Data
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
none
項天瑞
none
吳怡樂
none
楊傳凱
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 36
中文關鍵詞: 貝氏網路基因調控微陣列
外文關鍵詞: Bayesian Networks, Gene Regulation, Microarray
相關次數: 點閱:304下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文主題在於利用貝氏網路探討酵母菌基因調控網路的問題。自從人類基因體序列草圖被公佈後,研究基因的調控機制一直是生物領域學者一項重要的工作之一,但這個問題的研究目前仍有很大的發展空間。由於微陣列基因晶片的問世,我們可以同步測量數千個基因的表現狀態,混合成基因表現值資料集,使得電腦科學可以有效的運用在探討基因的調控機制上。

    我們利用貝氏網路探討調控網路的模型,在對酵母菌微陣列基因表現資料集做標準化、離散化及處理遺失值等前處理動作之後,我們利用統計的方法獲得基因間相依程度的量化數據。若某些基因間具有高度的相依性,則這些基因彼此間很可能存在有調控關係。我們依照這些關係建立貝氏網路,作為基因調控網路的可能模型。

    在實驗的部份,我們使用Saccharomyces Cerevisiae資料集,在總共6177個基因特徵中,挑選京都基因及基因體百科內所記載的MAPK路徑中的53個基因特徵與酵母菌循環路徑中的89個基因特徵製成兩個資料集分別進行探討,並將其做為比較的標準,除此之外,我們也分別與WEKA中的K2及Hill Climber演算法進行比較,實驗證明我們的方法可以在獲得更多調控關係的同時,亦提升圖形的精確率。


    We study the gene regulatory network of yeast using Bayesian networks.Since the draft of Human Genome Sequence was announced, research of the gene regulatory mechanisms has been one of the important works for biologists, but still at an early stage.With the invention of DNA microarray chip, we can measure the expression levels of thousands of genes simultaneously and create a gene expression dataset.Computer science thus can be helpful to find the possible regulatory mechanism of genes.

    We use Bayesian networks to mine the model of regulatory networks.After performing preprocessing actions like missing value processing, normalization, and discretization on yeast microarray gene expression data, we use statistical methods to quantify dependencies between genes.Pairs of genes that exhibit high dependency will possibly have regulatory relationships between them.Based on this information, we build Bayesian networks to give the whole picture.

    In the experiments, we use the yeast Saccharomyces Cerevisiae dataset.After selecting 53 gene features in MAPK pathway and 89 gene features in the cell cycle pathway in KEGG database as the comparative standards from 6177 genes, we create two dataset ready to discuss respectively.Besides, we compare with K2 and hill-climber algorithms in WEKA.The experiments prove that our method can raise the precision rate of the graphs and get more regulatory relations at the same time.

    Contents Chinese Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . I Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . III Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII 1 Introduction 1 1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Biological Background . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Basic Molecular Biology . . . . . . . . . . . . . . . 3 1.2.2 Microarray Technology . . . . . . . . . . . . . . . . 4 1.3 Outline of this Thesis . . . . . . . . . . . . . . . . . . . . . 8 2 Bayesian Networks 9 3 Proposed Learning Algorithms 13 3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Normalization . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Discretization . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Scoring Function . . . . . . . . . . . . . . . . . . . . 15 3.2.2 Greedy Hill-Climbing . . . . . . . . . . . . . . . . . 16 3.2.3 Choosing Parent Sets . . . . . . . . . . . . . . . . . 18 3.2.4 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.5 K2 Algortihm . . . . . . . . . . . . . . . . . . . . . . 21 4 Experiments and Discussion 23 4.1 KEGG Database . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Yeast Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Experimental Result . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5 Conclusion and Future Work 32 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Introduction myself . . . . . . . . . . . . . . . . . . . . . . . . . i List of Figures 1.1 Central Dogma of Molecular Biology . . . . . . . . . . . . . . 3 1.2 Principle of cDNA Microarray . . . . . . . . . . . . . . . . . . 6 2.1 A Bayesian network example . . . . . . . . . . . . . . . . . . . 11 3.1 Proposed Learning Algorithm . . . . . . . . . . . . . . . . . . 22 4.1 MAPK signaling pathway graph . . . . . . . . . . . . . . . . . 25 4.2 Cell cycle pathway graph . . . . . . . . . . . . . . . . . . . . . 26 4.3 Original Image of Microarray . . . . . . . . . . . . . . . . . . 27 4.4 Diagram with hidden node . . . . . . . . . . . . . . . . . . . . 30 4.5 Diagram with lost nodes . . . . . . . . . . . . . . . . . . . . . 31 List of Tables 2.1 Conditional Probability Table(CPT) . . . . . . . . . . . . . . 12 4.1 Information of Yeast Dataset . . . . . . . . . . . . . . . . . . . 28 4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 29

    [1] Tatsuya Akutsu, Satoru Kuhara, Osamu Maruyama, and Satoru
    Miyano. A system for identifying genetic networks from gene expression
    patterns produced by gene disruptions and overexpressions. In Genome
    Informatics, pages 151–160, 1998.
    [2] Ethem Alpaydin. Introduction to Machine Learning. The MIT Press,
    2004.
    [3] S. D. Bay, J. Shrager, A. Pohorille, and P. Langley. Revising regulatory
    networks: From expression data to linear causal models, 2002.
    [4] G. Cooper and E. Herskovits. A bayesian method for the induction
    of rrobabilistic networks from data. Machine Learning, pages 309–347,
    1992.
    [5] Gregory F. Cooper and Changwon Yoo. Causal discovery from a mixture
    of experimental and observational data. In Proceedings of Uncertainty
    in Artificial Intelligence, pages 116–125, 1999.
    [6] Celera Corporation. http://www.celera.com/.
    [7] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory.
    John Wiley & Sons, 1991.
    [8] P. D’haeseleer, X. Wen, S. Fuhrman, and R. Somogyi. Linear modeling
    of mRNA expression levels during CNS development and injury. In
    Pacific Symposium on Biocomputing, pages 41–52, 1999.
    [9] Michael B. Eisen, Paul T. Spellman, Patrick O. Brown, and David Botstein.
    Cluster analysis and display of genome-wide expression pattern.
    In Proceedings of the National Academy of Science, pages 14863–14868,
    1998.
    [10] Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network
    classifiers. Machine Learning, pages 131–163, 1997.
    [11] Nir Friedman, Michal Linial, Iftach Nachman, and Dana Pe’er. Using
    Bayesian networks to analyze expression data. Journal of Computational
    Biology, pages 601–620, 2000.
    [12] Nir Friedman, Iftach Nachman, and Dana Pe’er. Learning Bayesian network
    structure from massive datasets: The “ Sparse Candidate ” algorithm.
    In Proceedings of the Fifteenth Annual Conference on Uncertainty
    in Artificial Intelligence, pages 196–205, 1999.
    [13] Greg Gibson and Spencer V. Muse. A Primer of Genome Science. Sinauer
    Associates Inc., 2002.
    [14] Fred Glover. Future paths for integer programming and links with artificial
    intelligence. Computers and Operations Research, pages 533–549,
    1986.
    [15] David Heckerman. A tutorial on learning with bayesian networks. Technical
    report, Microsoft Research, 1995.
    [16] Mewes HW, Albermann K, Bahr M, Frishman D, Gleissner A, Hani J,
    Heumann K, Kleine K, Maierl A, Oliver SG, Pfeiffer F, and Zollner A.
    Overview of the yeast genome. Nature, pages 7–9, 1997.
    [17] Minoru Kanehisa and Susumu Goto. Kegg: Kyoto encyclopedia of genes
    and genomes. Nucleic Acids Research, (1):27–30, 2000.
    [18] Minoru Kanehisa, Susumu Goto, Masahiro Hattori, Kiyoko F. Aoki-
    Kinoshita, Masumi Itoh, Shuichi Kawashima, Toshiaki Katayama,
    Michihrio Araki, and Mika Jirakawa. From genomics to chemical genomics:
    New developments in KEGG. Nucleic Acids Research, pages
    354–357, 2006.
    [19] Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, and Akihiro
    Nakaya. The kegg databases at genomenet. Nucleic Acids Research,
    (1):42–46, 2002.
    [20] Lei M. Li and Henry Horng-Shing Lu. Explore biological pathways
    from noisy array data by directed acyclic boolean networks. Journal of
    Computational Biology, (2):170–185, 2005.
    [21] S. Liang, S. Fuhrman, and R. Somogyi. Reveal, general reverse engineering
    algorithm for inference of genetic network architectures. In Pacific
    Symposium on Biocomputing, pages 18–29, 1998.
    [22] David J. Lockhart, Helin Dong, Michael C. Byrne, Maximillian T. Follettie,
    Michael V. Gallo, Mark S. Chee, Michael Mittmann, Chunwei
    Wanga, Michiko Kobayashi, Heidi Norton, and Eugene L. Brown. Expression
    monitoring by hybridization to high-density oligonucleotide arrays.
    Nature Biotechnology, pages 1675–1680, 1996.
    [23] Tom M. Mitchell. Machine Learning. McGraw-Hill Companies, Inc.,
    1997.
    [24] David W. Mount. Bioinformatics: Sequence and Genome Analysis. Cold
    Spring Harbor Laboratory Press, 2nd edition, 2004.
    [25] National Institutes of Health website. http://www.nih.gov/.
    [26] Kanehisa Laboratory of Kyoto University Bioinformatics Center.
    KEGG, Kyoto Encyclopedia of Genes and Genomes.
    http://www.genome.ad.jp/kegg/.
    [27] Mark Schena. DNA Microarrays: A Practical Approach. Oxford University
    Press, 1999.
    [28] Mark Schena, Dari Shalon, Ronald W. Davis, and Patrick O. Brown.
    Quantitative monitoring of gene expression patterns with a cdna microarray.
    Science, pages 467–470, 1995.
    [29] Paul T. Spellman, Gavin sherlock, Michael Q. Zhang, Vishwanath R.
    Iyer, Kirk Anders, Michael B. Eisen, Patrick O. Brown, David Botstein,
    and Bruce Futcher. Comprehensive identification of cell cycle-regulated
    genes of the yeast Saccharomyces Cerevisiae by microarray hybridization.
    Molecular Biology of the Cell, pages 3273–3297, 1998.
    [30] P. Spirtes and C. Meek. Learning Bayesian networks with discrete variables
    from data. In Proceedings of the ninth Annual Conference on
    Uncertainty in Artificial Intelligence, pages 259–265, 1995.
    [31] National Human Genome Research Institute website.
    http://www.genome.gov/.

    無法下載圖示 全文公開日期 2007/01/26 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE