簡易檢索 / 詳目顯示

研究生: 洪堃斌
Kun-Bin Hung
論文名稱: 透過GenNav及KEGG資料庫設計疾病相關資訊路徑的快速搜尋機制
Quick Query Mechanism of Disease Related Signaling Pathway via GenNav and KEGG Databases
指導教授: 蘇順豐
Shun-Feng Su
蔡孟勳
Meng-Shiun Tsai
口試委員: 鄭錦聰
Jin-Tsong Jeng
莊鎮嘉
Chen-Chia Chuang
王乃堅
Nai-Jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 149
中文關鍵詞: 多執行緒Gene Navigator(GenNav)Kyoto Encyclopedia of Genes and Genomes(KEGG)SQL Server Enterprise Manager多工搜尋引擎C#.NETASP.NET
外文關鍵詞: Meta Search Engine, SQL Server Enterprise Manager, Gene Navigator(GenNav), Kyoto Encyclopedia of Genes and Genomes(KEGG), ASP.NET, multiple thread, C#.NET
相關次數: 點閱:274下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究的目的可分成三個部份做討論:首先,鑒於現今對使用者操作上不夠方便和系統仍無法與其他資料庫結合的缺點,我們在此提出一個能讓使用者查詢所需資料比現在設計更簡易的搜尋機制。基於建置於Gene Navigator (GenNav)資料庫的豐富資訊,我們建構一個比現今查詢機制更友善的介面。有鑒於目前整合各式各樣資料庫的趨勢,該查詢機制對研究人員而言,擁有高度延展性。其次,疾病信號傳導路徑的快速搜尋機制是基於Kyoto Encyclopedia of Genes and Genomes (KEGG) 資料庫。為了改善KEGG太過繁複的弊病,吾人致力於將該資料庫區分成三類:疾病類型、成長因數及信號傳導路徑。研究人員能藉由該機制找出不同疾病之間的關聯性。根據意外的發現,他們可以設計新的實驗來驗證查詢到的結果,以便擴展完整的代謝途徑。為了要讓查詢的程式自動執行,我們透過ASP.NET設計查詢流程並應用座標概念來達成系統設計。再者,透過SQL Server Enterprise Manager建構出查詢資料。接下來多工搜尋引擎是透由C#.NET所實作出來。為了改善來自不同搜尋引擎所造成重複資料的缺點,我們特設計一套涵蓋Google, Vivisimo, Baidu, Yahoo及GAIS網站的查詢系統。該平臺的初始設定包括:搜尋引擎選擇、依序排序的方法、逾時的重複嘗試次數、多執行緒的數目以及單次搜尋頁數。如果該系統可以結合前兩個機制,那系統化的結構同時可以提供結構性的資料庫及非結構性的網路資源。另外,最常被使用到的分子生物資料庫詳列於附錄,可以提供新進人員詳細的資訊。資訊數位化及網際網路上搜尋資料庫的發展迅速,相關生物、藥物、醫學研究方面的資訊及分析工具已經累積相當豐富。最後,我們衷心企盼在未來該整合性查詢機制能被廣泛應用在醫學界、製藥及先進的科技上。


    The purpose of this study can divide three parts to discuss. First, based on the abundant information established in Gene Navigator (GenNav) database, we construct one more friendly interface than the current query mechanism. Whereas the trend of integrating various databases currently, this query mechanism has highly extension for researchers. Second, quick query mechanism of disease signaling pathway is based on Kyoto Encyclopedia of Genes and Genomes (KEGG) database. For improving the complicated drawback of KEGG, we devote on dividing the database into three categories: disease category, growth factor, and signaling pathway. Researchers can
    find out the connection of different diseases through query mechanism. According to the unexpected discovery, they can design new experiments to verify the query results, so as to stretch the current metabolism pathway completely. We design the query process and manipulate coordinate concepts via ASP.NET in order to execute the query procedure automatically. Furthermore, the query information is constructed via SQL Server Enterprise Manager. Third, Meta Search Engine is implemented by C#.NET. In order to improve the disadvantage of repeated materials from different search engines, we design one query system that combines Google, Vivisimo, Baidu, Yahoo, and GAIS websites. The initial settlement of platform includes: selective search engine, sort methods in an order, repeatedly trying times in exceeding time, numbers of multiple thread, and search webpage in single time. If this system can combine with the first two mechanisms, the systematical structure may offer researchers structural databases and non-structural network resources simultaneously. Moreover, the most common used Molecular Biology Database Collection is listed in appendix of this study that can provide newcomers the detailed information. Finally, we expect genuinely this integrated query mechanism can be used to medical science, pharmacy, and the advanced technology widely in the future.

    Table of Contents Abstract……………………………………………………………………………Ⅱ Acknowledgement..…………………………………………………………………Ⅳ Table of Contents……………………………………………………………………V List of Tables………………………………………………………………………Ⅶ List of Figures………………………………………………………………………Ⅷ Table of Contents 摘要 I Abstract III 誌謝 V Acknowledgement VI Table of Contents VII List of Tables IX List of Figures X Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivation and Related work 3 1.3 Thesis Organization 6 Chapter 2 Background and Literature Review 8 2.1 The Relevant Bioinformatics Sources And Domain Knowledge 8 2.2 Overview of Ontology 18 2.3 The methods of categorizing information 23 2.4 Introduction of Gene Nr4A3 26 Chapter 3 Compare With Characteristics and Operational Methods of Various Query Databases 33 3.1 Gene Ontology Consortium (GO) 35 3.2 GenNav (Gene Navigator) 49 3.3 KEGG 54 3.4 Cytoscape 66 Chapter 4 Implementation and Analysis 75 4.1 Implementation 75 4.2 Results and Analysis 110 Chapter 5 Conclusions and Future Work 119 5.1 Discussions and Conclusions 119 5.2 Future work 120 References 123 Appendix 131 作者簡介 149

    [1] U.C. Yang, “The comparison of package software and network resource,” http://binfo.ym.edu.tw/post/intro/gcg.htm.
    [2] The Entrez of NCBI, http://www.ncbi.nlm.nih.gov/Database/index.html.
    [3] Cynthia Gibas and Per Jambeck, “Developing bioinformatics computer skills,” O’Relly, April 2001.
    [4] James D. Watson, “The double helix,” Atheneum Press, 1968.
    [5] Kevin David, “Cracking the genome: inside the race to unlock human DNA,” Free Press, 2001.
    [6] Leland H. Hartwell, Leroy Hood, Michael L. Goldberge, Ann E. Reynolds, Lee M. Silver and Ruth C. Veres. “Genetics from genes to genomes,” McGraw-Hill, pp. 142-222, 2000.
    [7] Steve Jones, Borin Van Loon, “Genetics for beginners,” Icon Books, 1993.
    [8] A. Bairoch and R. Apweiler,“The swiss-prot protein sequence data bank and its supplement TrEMBL.” Nucleic Acids Res. 27, pp.49–54, 1999.
    [9] W. J. Chuang, “Bioinformatics,” http://sparc22.cc.ncku.edu.tw/~wjcnmr/ bioinfo_ intro.htm, December 2001.
    [10] T. R.Gruber, “Towards principles for the design of ontologies used for knowledge sharing,” In Formal Ontology in Conceptual Analysis and Knowledge Representation, Knowledge Systems Laboratory, Stanford University, March 1993.
    [11] “Bio-ontology-the ontology study in biology,” http://binfo.ym.edu.tw/styang/ seminar/ontolgy.htm.
    [12] M. Uschold, M, King, S.Moralee, and Y. Zorgios, ”The enterprise ontology,” The Knowledge Engineering Review, 13(1):31-89, 1998.
    [13] B. Chandrasekaran, J. R. Josephson, and V. R. Benjamins, “What are ontologies, and why do we need them?” IEEE, Intelligent Systems, pp.20-26, January-February 1999.
    [14] M. A. Ould, “Strategies for software engineering: the management of risk and quality,” Wiley, 1990.
    [15] M. Uschold and M. Gruninger, “Ontologies: principles, methods and applications,” The Knowledge Engineering Review, Vol. 11, No.2, pp. 93-136, 1996.
    [16] S. Staab and A. Maedche, “Knowledge portals ontologies at work,” AI Magazine, Vol. 22, No. 2, pp. 63-75, Summer 2001.
    [17] R. Stevens, C. A. Goble and S. Bechhofer, “Ontology-based knowledge representation for bioinformatics,” Briefings in Bioinformatics, Vol. 1, No. 4, pp. 398-414, November 2000.
    [18] RiboWeb, http://smi-web.stanford.edu/projects/helix/riboweb.html, December 2001.
    [19] EcoCyc/MetaCyc, http://ecocyc.PangeaSystems.com/ecocyc/, December 2001.
    [20] B. Chandrasekaran, J.R. Josephson, and V.R. Benjamins,”What are ontologies, and why do we need them?” IEEE, Intelligent Systems, pp.20-26, January-February 1999.
    [21] GeneOntology, http://www.geneontology.org/, December 2001.
    [22] TAO, http://img.cs.man.ac.uk/tambis, December 2001.
    [23] M. E. Maron, “AutoMatic indexing: an experimental inquiry,” Journal of the ACM, vol.10, no.1, pp.404-417, 1961.
    [24] H. Borko and M. Bernick, “Automatic document classification,” Journal of the ACM, vol. 10, no.1, pp.151-162, 1963.
    [25] D. F. Specht, “Probabilistic neural networks,” Neural Networks, vol.3, pp. 109-118, 1990.
    [26] A. McCallum and K. Nigam, “A comparison of event models for Naïve-Bayes text classification.”, AAAI-98 Workshop on Learning for Text Categorization, 1998.
    [27] E. H. Han, G. Karypis and V. Kumar, “Text categorization using weight adjusted K-nearest neighbor classification”, Technical report, Dept. of CS, University of Minnesota. 62.
    [28] C. Apte, F. Damerau, and S.M. Weiss, “Automated learning of decision rules for text classification,” ACM Transactions on Information Systems, IBM Research Report RC18879, 1994.
    [29] M.Craven, “Learning to extract relations from medline,” AAAI-99 Workshop on Machine Learning for Information Extraction – July 19, Orlando Florida, 1999.
    [30] M. Craven and J. Kumlien, “Constracting biological knowledge bases by extracting information from text sources,” Proc. 7th International Conference on Intelligent Systems for Molecular Biology 1999.
    [31] New York Ny Elsevier Science, “Cancer genetics cytogenetics,” Cancer genet cytogenet,152(2):101-7, Jul 2004.
    [32] Dallas, TX: American Heart Association, “Arteriosclerosis, thrombosis, and vascular biology,” Arterioscler thromb vasc Biol, Apr 2004.
    [33] Baltimore, Md. Lippincott Williams & Wilkins, “Arteriosclerosis and thrombosis,” Arterioscler thromb vasc Biol, Nov 2003.
    [34] Baltimore, Md. American Association for Cancer Research, “Cancer research,” Cancer res, 63(2):449-54, Jan 2003.
    [35] P. J. Roy, J. M. Stuart, J. Lurd, and S. K. Kim, “Chromosomal clustering of muscle-expressed genes in caenorhabditis elegans,” Nature, 418, 975–979, 2002.
    [36] M. J. Lercher, A. Urrutia, and L. D. Hurst, “Clustering of housekeeping genes provides a unified model of gene order in the human genome,” Nature Genet., 31, 180–183, 2002.
    [37] S. Tavazoie, J. D.Hughes, M. J. Campbell, R. J. Cho, and G. M.Church, “Systematic determination of genetic network architecture,” Nature Genet., 22, 281–285, 1999.
    [38] E. Segal, M. Shapira, A. Regev, D. Pe'er, D. Botstein, D. Koller, and N. Friedman, “Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data,” Nature Genet., 34, 166–176, 2003.
    [39] H. W. Mewes, D. Frishman, U. Guldener, G.. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkotter, S. Rudd, and B. Weil, “MIPS: a database for genomes and protein sequences,” Nucleic Acids Res., 30, 31–34, 2002.
    [40] M. Ashburner, C. A. Ball, J. A. Blake, D.Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, , S. S. Dwight, , J. T. Eppig, et al., “Gene ontology: tool for the unification of biology,” The Gene Ontology Consortium. Nature Genet., 25, 25–29, 2000.
    [41] G.. Jr Dennis, B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane and R. A. Lempicki, “DAVID: database for annotation, visualization, and integrated discovery,” Genome Biol., 4, R60, 2003.
    [42] B. R. Zeeberg, W. Feng, G.. Wang, M. D. Wang, A. T. Fojo, M. Sunshine, S. Narasimhan, D. W. Kane, W. C.Reinhold, S. Lababidi, et al., ”GoMiner: a resource for biological interpretation of genomic and proteomic data,” Genome Biol., 4, R28, 2003.
    [43] M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, and M. Hattori, ”The KEGG resource for deciphering the genome,” Nucleic Acids Res., 32, D277–D280, 2004.
    [44] K.D. Dahlquist, N. Salomonis, K. Vranizan, S. C. Lawlor and B. R. Conklin, ”GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways,” Nature Genet., 31, 19–20, 2002.
    [45] P. Grosu, J. P. Townsend, D. L. Hartl and D. Cavalieri, ”Pathway Processor: a tool for integrating whole-genome expression results into metabolic networks,” Genome Res., 12, 1121–1126, 2002.
    [46] P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski and Ideker,T, ”Cytoscape: a software environment for integrated models of biomolecular interaction networks,” Genome Res., 13, 2498–2504, 2003.
    [47] D. Pan, N. Sun, K. H. Cheung, Z. Guan, L. Ma, M. Holford, X. Deng and H. Zhao, ”PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for Arabidopsis,” BMC Bioinformatics, 4, 56, 2003.
    [48] T. R. Gruber,” A translation approach to portable ontologies,” Knowl. Acq. 5: 199–220, 1993.
    [49] T. R. Gruber,” Toward principles for the design of ontologies used for knowledge sharing,” Int. J. Hum. Computer Stud. 43: 907–928, 1995.
    [50] M. Ringwald, J. T. Eppig, J. A. Kadin, J. E. Richardson and the Gene Expression Database Group, ”GXD: A gene expression database for the laboratory mouse-current status and recent enhancements, ” Nucleic Acids Res. 28: 115–119, 2000.
    [51] C. A. Ball, K. Dolinski, S. S. Dwight, M. A. Harris, L. Issel-Tarver, A. Kasarskis, C. R. Scafe, F. Sherlock, G..Binkley, H. Jin, et al., “Integrating functional genomic information into the saccharomyces genome database,” Nucleic Acids Res. 28: 77–80, 2000.
    [52] J. A. Blake, J. T. Eppig, J. E. Richardson, M. T. Davisson, and the Mouse Genome Database Group, ”The Mouse Genome Database (MGD): expanding genetic and genomic resources for the laboratory mouse,” Nucleic Acids Res. 28: 108-111, 2000.
    [53] T. Ito, T. Chiba, and M. Yoshida, “Exploring the protein interactome using comprehensive two-hybrid projects,” Trends Biotechnol. 19: S23–S27, 2001.
    [54] C. von Mering, R. Krause, B. Sne, M. Cornell, S. G. Oliver, S. Fields, and P. Bork, “Comparative assessment of large-scale data sets of protein–protein interactions,” Nature 417: 399–403, 2002.
    [55] T. I. Lee, , N. J. Rinaldi, , F. Robert, , D. T. Odom, , Z. Bar-Joseph, , G.K.Gerber, , N. M. Hannett, , C. T. Harbison, C. M. Thompson, I. Simon, et al., “Transcriptional regulatory networks in saccharomyces cerevisiae,” Science 298: 799–804, 2002.
    [56] A. H. Tong, M. Evangelista, A.B.Parsons, H. Xu, G.D. Bader, N. Page, M. Robinson, S. Raghibizadeh, C. W. Hogue, H. Bussey, et al., “Systematic genetic analysis with ordered arrays of yeast deletion mutants,” Science 294: 2364–2368, 2001.
    [57] T. Ideker, O. Ozier, B. Schwikowski, and A. F. Siegel, “Discovering regulatory and signalling circuits in molecular interaction networks,” Bioinformatics 18: S233–S240, 2002.
    [58] T. J. Begley, A. S. Rosenbach, T. Ideker, and L. D. Samson, ”Damage recovery pathways in Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping,” Mol.Cancer Res. 1: 103–112, 2002.
    [59] N. S. Baliga, M. Pan, Y. A. Goo, E. C.Yi, D. R. Goodlett , K. Dimitrov, P. Shannon, R. Aebersold, W. V. Ng, , and L. Hood, “Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach,” Proc.Natl. Acad.Sci. 99: 14913–14918, 2002.
    [60] A. J. Enright, I. Iliopoulos, N. C. Kyrpides, and C. A. Ouzounis, ”Protein interaction maps for complete genomes based on gene fusion events,” Nature 402: 86–90, 1999.
    [61] M. Pellegrini, E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates, “Assigning protein functions by comparative genome analysis: protein phylogenetic profiles,” Proc.Natl.Acad.Sci. 96: 4285–4288, 1999.
    [62] R. L. Tatusov, D. A. Natale, I. V. Garkavtsev, T. A. Tatusova, U. T. Shankavaram, B. S. Rao, B. Kiryutin, M. Y. Galperin, N. D. Fedorova, and E. V. Koonin, “The COG database: new developments in phylogenetic classification of proteins from complete genomes,” Nucleic Acids Res. 29: 22–28, 2001.
    [63] M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. Doyle, and H. Kitano, “The ERATO systems biology workbench: enabling interaction and exchange between software tools for computational biology,” Pac.Symp.Biocomput. 450–461, 2002.
    [64] V. Batagelj, and A. Mrvar, “Pajek—program for large network analysis,” Connections 21: 47–57, 1998.
    [65] G. D. Bader, I. Donaldson, C. Wolting, B. F. Ouellette, T. Pawson, and C. W. Hogue, “BIND—the biomolecular interaction network database,” Nucleic Acids Res. 29: 242–245, 2001.
    [66] I. Xenarios and D. Eisenberg, ”Protein interaction databases,” Curr. Opin.Biotechnol. 12: 334–339, 2001.
    [67] E. Wingender, X. Chen, E. Fricke, R. Geffers, R. Hehl, I. Liebich, M. Krull, V. Matys, H. Michael, R. Ohnhauser, et al., “The transfac system on gene expression regulation,” Nucleic Acids Res. 29: 281–283, 2001.
    [68] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub, “Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation,” Proc.Natl.Acad.Sci. 96: 2907–2912, 1999.
    [69] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” Proc.Natl. Acad.Sci. 95: 14863–14868, 1998.
    [70] GO consortium, “Creating the gene ontology resource: design and implementation,” Genome Res. 11: 1425–1433, 2001.
    [71] M. Tomita, K. Hashimoto, K. Takahashi, T. S. Shimizu, Y. Matsuzaki, F. Miyoshi, K. Saito, S. Tanida, K. Yugi, J. C.Venter, et al., “E-cell: software environment for whole-cell simulation,” Bioinformatics 15: 72–84, 1999.
    [72] L. M. Loew, and J. C. Schaff, “The virtual cell: a software environment for computational cell biology,” Trends Biotechnol. 19: 401–406, 2001..
    [73] P. Mendes, “Biochemistry by numbers: simulation of biochemical pathways with gepasi 3,” Trends Biochem.Sci. 22: 361–363, 1997.
    [74] P. Mendes, “Biochemistry by numbers: simulation of biochemical pathways with gepasi 3,” Trends Biochem.Sci. 22: 361–363, 1997.

    QR CODE