簡易檢索 / 詳目顯示

研究生: 譚國斌
Kuo-Pin Tan
論文名稱: 透過粒子群最佳化演算法提升決策樹對癌症辨識力及探討對生技產業影響
A decision tree model empowered by particle swarm optimization algorithm for cancer identification and discuss the impact to biotechnology industry
指導教授: 王孔政
Kung-Jeng Wang
口試委員: 楊朝龍
Chao-Lung Yang
何秀青
Hsiu-Ching Ho
學位類別: 碩士
Master
系所名稱: 管理學院 - 管理學院MBA
School of Management International (MBA)
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 22
中文關鍵詞: 粒子群最佳化演算法決策樹
外文關鍵詞: particle swarm optimization algorithm, C4.5 decision tree
相關次數: 點閱:383下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在進行DNA微陣列資料分析時,如何從數以千計個基因中,選擇少數具有訊息意義而導致癌症發生的基因,將會是一個很重要的課題。很多研究人員會透過各種計算方法來分析基因表達的數據。
為了從數以千計個候選基因中,達成有效率的基因篩選,我們的研究透過結合粒子群最佳化演算法,發展出一種新穎的方法,作為篩選器,這個研究也將我們提出的方法(PSODT)與其他知名的基準分類方法(支持向量機、自組織映射圖、倒傳遞演算法、C4.5決策樹、簡單貝氏分類法、CART決策樹和人工免疫識別系統),進行判斷表現的比較,並且進行11種癌症基因表現數據的分析實驗。
根據統計分析,我們提出的方法在所有的測試數據中,優於其他現行常用的篩選器。另外,持家基因的不同表達模式和組織專一性的基因被分辨出。這些基因在癌症辨識上提供了高的鑑別力。
在最近的趨勢研究中發現,分子診斷的全球市場將由2011年的48億美元,在未來的5年內可成長至81億美元。除此之外,生物晶片的市場佔有率為10%,預計將由2011年的4億美元,到2017年成長到8.6億美元。根據產業分析的趨勢。我們可以透過本次研究的PSODT,在設計生物晶片時,可以準確篩選出候選基因,將其的製程的成本大幅降低,以提升未來市場占有率的競爭優勢。


When we analysis the DNA microarray data, how to choice a small number of informative genes from thousands of genes that might cause the occurrence of cancers is an important topic. Scientists have used many computational intelligence methods to analyzed gene expression data from proposed algorithm.
For upgrade the efficient gene selection from thousands of candidate genes that can contribute in classfying cancers, our study aims at giving a proposed method using particle swarm optimization combined with a decision tree as the classifier. By doingso, we compares the performance of our proposed method (PSODT) with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 10 gene expression cancer datasets.
According to the statistical analysis data, PSODT superiors other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. In addition, the housekeeping genes with various expression formats and tissue-specific genes are identified by PSODT. These genes provide a high discrimination power on cancer classification.
In recent trend study, we found the Molecular Diagnostics Market (MDx) crossed $4.8 billion in 2011, and is poised to grow in the next five years to reach $8.1 billion in global market. In addition, the market share of Microarray is about 10%, $0.4 billion in 2011 will grow to $0.86 billion in 2017. According to the analysis of trends in the biotechnology industry, we could achieve efficient gene selection by PSODT when the procedure of microarray design. And then lower the production costs to promote the competitive advantage of market share.

摘要 II ABSTRACT III CONTENT IV TABLE LIST VI FIGURE LIST VII CHAPTER 1 INTRODUCTION 1 CHAPTER 2 RELATED WORK 3 2.1 PARTICLE SWARM OPTIMIZATION ALGORITHM 3 2.2 C4.5 DECISION TREE 3 2.3 MICROARRAY 3 2.4 MOLECULAR DIAGNOSTICS MARKET (MDX) 6 CHAPTER 3 METHOD 8 3.1 PSO-BASED ALGORITHM 8 3.2 COMBINE PSO WITH DECISION TREE 9 CHAPTER 4 RESULTS AND DISCUSSION 11 4.1 SETTING UP THE EXPERIMENTAL PARAMETER 11 4.2 THE PROCESS OF THE RESULTING CANCER CLASSIFIER 11 4.3 MULTIPLE COMPARISON RESULTS WITH OTHER CLASSIFICATION ALGORITHMS 12 4.4 HOW THE PSODT AFFECT THE PROCESS OF PRODUCING MICROARRAY 16 CHAPTER 5 CONCLUSIONS 18 REFERENCES 20

Kun-Huang Chen, Kung-Jeng Wang, Ku-Shang Chang: Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinformatics 2014, 15:49
Li S, Wu X, Tan M: Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Computing 2008, 12:1039–1048.
Ahmad A, Dey L: A feature selection technique for classificatory analysis. Pattern Pattern Recognition Letters 2005, 26:43–56.
Kahavi R, John GH: Wrapper for feature subset selection. Artificial Intelligence 1997, 97:273–324.
Li X, Rao S, Wang Y, Gong B: Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Research 2004, 32:2685–2694.
Zhao XM, Cheung YM, Huang DS: A novel approach to extracting features from motif content and protein composition for protein sequence classification. Neural Networks 2005, 18:1019–1028.
Evers L, Messow CM: Sparse kernel methods for high-dimensional survival data. Bioinformatics 2008, 24:1632–1638.
Hua S, Sun Z: A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. Journal of Molecular Biology 2001, 308:397–407.
Oh JH, Gao J: A kernel-based approach for detecting outliers of high-dimensional biological data. BMC Bioinformatics 2009, 10:S7.
Zhu Y, Shen X, Pan W: Network-based support vector machine for classification of microarray samples. BMC Bioinformatics 2009, 10:S21.
Kennedy J, Eberhart R: Particle swarm optimization. IEEE International Conference Neural Networks 1995, 4:1942–1948.
Robinson J, Rahmat-Samii Y: Particle swarm optimization in Electromagnetics. IEEE Trans Antennas Propage 2004, 52:397–407.
Shen Q, Shi WM, Kong W: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data. Computational Biology and Chemistry 2008, 32:52–59.
GEMS Dataset: 2012, http://www.gems-system.org/.
Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. Journals of American Statistical Association 2002, 97:77–86.
Batuwita R, Palade V: MicroPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics 2009, 25:989–995.
Nanni L, Brahnam S, Lumini A: Combining multiple approaches for gene microarray classification. Bioinformatics 2008, 28:1151–1157.
Park I, Lee KH, Lee D: Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets. Bioinformatics 2010, 26:1506–1512.
Brazma A, Vilo J: Gene expression data analysis. FEBS Letters 2000, 480:2–16.
TOM laboratory: TOM laboratory; 2013. http://tom.im.ntust.edu.tw/.
Kennedy J, Eberhart RC, Shi Y: Swarm Intelligence. San Francisco, CA, USA: Morgan Kaufman; 2001.
Shi Y, Eberhart RC: A Modified Particle Swarm Optimizer. Anchorage Alaska: IEEE International Conference on Evolutionary Computation; 1998:69–73.
Geisser S: The predictive sample reuse method with applications. Journal of American Statistical Association 1975, 70:320–328.
Kononenko I: A counter example to the stronger version of the binary tree hypothesis. ECML-95 workshop on Statistics, machine learning, and knowledge discovery in databases; 1995:31–36.
ENS microarray platform: DNA microarray principle http://www.transcriptome.ens.fr/sgdb/
Molecular Diagnostics Market Growing at CAGR of 9.1% to Reach $8.1 Billion by 2017 - New Report by MarketsandMarkets http://www.prweb.com/releases/molecular-/diagnostic-market/prweb10291375.htm
All Cancers (excluding non-melanoma skin cancer) Estimated Incidence, Mortality and Prevalence Worldwide in 2012 http://globocan.iarc.fr/Pages/fact_sheets_cancer.aspx
Department of Health in Taiwan http://health99.hpa.gov.tw/Hot_News/h_NewsDetailN.aspx?TopIcNo=6798

無法下載圖示 全文公開日期 2019/07/31 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE