研究生: |
何明璇 MING-HSUAN HO |
---|---|
論文名稱: |
模糊分類關聯規則應用於微陣列資料分類之研究 Classification of microarray data using fuzzy classification association rules |
指導教授: |
呂永和
Yung-ho Leu |
口試委員: |
楊維寧
Wei-Ning Yang 陳雲岫 Yun-Shiow Chen |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 微陣列資料 、分類關聯規則 、模糊分類關聯規則 |
外文關鍵詞: | Microarray, Classification association rules, Fuzzy classification association rules |
相關次數: | 點閱:516 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來微陣列技術(microarray technology)已經成為研究基因表現的一個重要分析工具,有別於過去只能針對一個或數個基因的表現量進行探討,現今的微陣列技術可以同時分析數以千計的基因表現,導致生物資訊的蓬勃發展。目前微陣列技術被大量用於分析基因表現量之間的關係與疾病診斷。
本研究利用模糊關聯規則,分析基因表現量與樣本類別之間的關係。模糊關聯規則的左邊為多個不同的基因及其表現量範圍的組合,右邊則為所對應的樣本類別。本方法利用分群將每一個基因表現量的論域切成數個範圍,為每一個基因範圍定義一個歸屬函數;接著將原始微陣列資料相對於表現量範圍作模糊化。最後,以基因及其對應的不同範圍作為項目,以歸屬程度計算支持度,利用Apriori演算法產生關聯規則。關聯規則的使用方法是以樣本屬於規則的歸屬程度,判斷該樣本的所屬類別。此方法用在微陣列資料上,不只可以預測疾病類別,更可以顯示出基因表現量與疾病類別之間的因果關係,可提供更多的資訊作為疾病診斷。
本研究利用所提方法,分析三組微陣列資料,實驗結果顯示,本方法在未經過事先的基因挑選下,即擁有一定的正確率,分類正確率明顯優於決策樹和支援向量機。
With the advent of microarray technology, people can now measure thousands of gene’s expressions simultaneously in one experiment. The powerful microarray technology helps lay the foundation for bioinformatics and is widely used in disease diagnosis.
In this thesis, we use fuzzy classification association rules to study the relationship between gene expressions and diseases in microarray data. In the proposed method, we first divide the universe of discourse of each gene expression in microarray data into several intervals, and define a membership function for each interval. Then, we fuzzify the original microarray data against the gene intervals. Finally, we use the Apriori algorithm to derive a set of classification association rules for each class of the microarray data. When classifying a test sample, we calculate the membership degree of the sample against all the derived rules. The sample belongs to the class against which it has the largest membership degree. Compared with the existing classification methods for microarray data which attain high prediction accuracy with little interpretability, the proposed method attains comparable prediction accuracy with significant improvement on interpretability. Therefore, it can help the study of cancers and improve the efficiency of disease diagnosis.
In this research, we use three well-known microarray data sets to compare the performance of the proposed method with the decision tree induction method. The experimental results show that the proposed method significantly outperforms the decision tree induction method in prediction accuracy.
B. P. Bergeron, Bioinformatics Computing, ISBN, Prentice Hall PTR, (2003)
B. Liu, W. Hsu, Y. Ma, Intergrating Classification and Association Rule Mining, Appeared in KDD-98, New York, Aug 27-31, (1998).
D.J. Duggan, M. Bittner, Y. Chen, P. Meltzer, J.M. Trent, Expression profiling using cDNA microarrays Nature Genetics, 21 (1999) 10–14.
G. Chen, and Q. Wei, Fuzzy association rules and the extended mining algorithms, Information Sciences, Vol. 147, No. 1-4, (2002) 201-228.
H. Zhao, Anita Langerød, Youngran Ji,* Kent W. Nowels, Jahn M. Nesland, Rob Tibshirani, Ida K. Bukholm, Rolf Kåresen, David Botstein, Anne-Lise Børresen-Dale, and Stefanie S. Jeffrey, Different Gene Expression Patterns in Invasive Lobular and Ductal Carcinomas of the Breast, Molecular Biology of the Cell, June ,15 (2004) 2523–2536.
I. Weber, On Pruning Strategies for Discovery of Generalized and Quantitative Association Rules, in Proc. of the Knowledge Discovery and Data Mining Workshop, (1998).
Jong Soo Park, Ming -Syan Chen, Philip S. Yu, An Effective Hash Based Algorithm for Mining Association Rules, Proc. Of ACM SIGMOD, May 23-25, (1995) 175-186.
J. Han, and Y. Fu, Discovery of multiple-level association rules from large database, in proc. of the Int’l Conf. on VLDB, (1995).
J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation, Proc. ACM-SIGMOD Int’l Conf. Management of Data, May, (2000) 1-12.
J. Horng, L. Wu, B. Liu, J. Kuo, W. Kuo and J. Zhang, An expert system to classify mibroarray gene expression data using gene selection by decision tree, Expert Systems with Applications, (2009)9072-9081.
John P.T. Higgins, R. Shinghal, H. Gill, Jeffrey H. Reese, M. Terris, Ronald J. Cohen, M. Fero, Jonathan R. Pollack, Matt van de Rijn, and James D. Brooks Gene Expression Patterns in Renal Cell Carcinoma Assessed by Complementary DNA Microarray Am. J. Pathol. 162 (2003) 925-932.
J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, P.S. Meltzer, Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nat Med, 7 (2001) 673-679.
K. M. Lee, Mining Generalized Fuzzy Quantitative Association Rules with Fuzzy Generalization Hierarchies, IFSA World Congress and 20th NAFIPS International Conference, (2001).
R. Agrawal, T. Imielinski and A. Swami , Mining Association Rules between Sets of Items in Large Databases, Proceedings of the ACM SIGMOD Conference on Management of Data, (1993 ) 207-216.
R. Srikant, and R. Agrawal, Mining Generalized Association Rules, In Proc. of the Int’l Conf. on VLDB, (1995).
T. P. Hong, K. Y. Lin, and S.L. Wang, Fuzzy data mining for interesting generalized association rules, Fuzzy Sets and Systems Vol.138, No. 2, (2003)255-269.
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, E.S. Lander, Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science, 286 (1999) 531-537.
T. Watanabe and M. Nakayama, Fuzzy rule extraction based on the mining generalized association rules, in Proc. of the Int’l Conf. on Systems, Man and Cybernetics, (2003).
V. Vapnik, The Nature of Statistical Learning Theory, NY Springer, (1995).
Yeoh, E.-J., M. E. Ross, S. A. Shurtleff, W. K. Williams, D. Patel, R. Mahfouz, F. G. Behm, S. C. Raimondi, M. V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C.-H. Pui, W. E. Evans, C. Naeve, L. Wong, and J. R. Downing, Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell 1(2), (2002).
Y. Lee, C.K. Lee, Classification of multiple cancer types by multicategory support vector machines using gene expression data, Bioinformatics 19, (9) (2003) 1132–1139.
張雅方,黃正仲,微陣列生物科技,科學發展,381期,9月(2004)。
李建邦,在微陣列資料中進行特徵選取與建立基因調控網路之研究,台灣科技大
學,(2010)。
丁一賢,陳牧言,資料探勘,滄海書局,台中市,4月(2005)。
蔡玉娟,資料探勘之關聯法則發展與應用,屏東科技大學,(2003)。