研究生: |
陳嘉元 Chia-Yuan Chen |
---|---|
論文名稱: |
基於灰關聯分析與最小平方法以預測遺失值之改進方法 An Improved Missing Value Estimation Method based on Grey Relational Analysis and Least Squares Problem |
指導教授: |
楊鍵樵
Chen-Chau Yang |
口試委員: |
呂永和
Yung-Ho Leu 朱雨其 Yu-Chi Chu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 63 |
中文關鍵詞: | 遺失值 、基因微陣列 、基因表現資料 、局部最小平方法 、加權式灰關聯分析 |
外文關鍵詞: | Missing Value, Microarray, Gene Expression Data, Local Least Squares, Weighted Grey Relational Analysis |
相關次數: | 點閱:281 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在大型的資料庫中常會因為各種原因導致有遺失值的存在,若沒有適當地處理資料庫中的遺失值,則會影響到後續資料分析的品質,做出錯誤的決策,因此如何妥善的處理遺失值是相當重要的議題。而在基因微陣列實驗所得到的基因表現資料中,常有大量的基因表現值遺失,在此情形下,僅利用少量完整沒有遺失的基因資料去填補遺失值會造成其結果不甚理想。因此本研究沿用局部最小平方法在處理遺失值問題的優勢,並整合過去實驗室所提出的加權式灰關聯分析,提出一個在遺失率較高時遺失值處理的改進方法。藉由選取訓練資料規則上的改變,得以在高遺失率的狀況下適當的預測遺失值,提高後續研究的正確性。我們並以一些基因表現資料集進行實作,驗證本方法的可行性。
Missing Values usually exist in Large Database due to variousreasons. If we do not treat missing values in the database properly, the quality of the following data analysis will be affected, and we mightmake a wrong policy in the following steps. As a result, how to handlemissing values significantly is an important issue. Gene Expression Data acquired from Microarray with a series of experiments often have a lot of gene expression values missing. If we estimate the missing parts by using few complete genes without missing values, the result will not be ideal. Therefore, our research took advantage of the superiority of the Local Least Squares Method in dealing with missing values, and integrated theWeighted Grey Relational Analysis proposed by our library before. We brought up an improved estimation method for the data with highermissing rate. By changing the rule of selecting Training Data, we can predict the missing values more precisely when the missing rate is getting higher. Then the accuracy of further research can be raised. We also used several gene expression data to implement our proposed method to prove it can work effectively.
[1] Batista G. and Monard M.C., (2003) “An Analysis of Four Missing Data Treatment Methods for Supervised Learning.” Applied Artificial Intelligence, vol.17, no.5-6, pages 519-533.
[2] Liu W.Z., White A.P., Thompson S.G. and Bramer M.A., (1997) “Techniques for Dealing with Missing Values in Classification.” IDA, 527-536.
[3] Breiman L., Friedman J., Olshen R.A. and Stone C.J., (1984) “Classification and Regression Trees.” Wadsworth and Brooks, Pacific Grove CA.
[4] Charles L. Lawson and Richard J. Hanson., (1974) “Solving Least Squares Problems.” Prentice Hall Inc.
[5] Chi-Chun Huang and Hahn-Ming Lee., (2004) “A Grey-based Nearest Neighbor Approach for Predicting Missing Attribute Values.” Applied Intelligence, vol.20, pages 239-252.
[6] Fayyad U., Piatetsky Shapiro G. and Smyth P., (1996) “The KDD Process for Extracting Useful Knowledge from Volumes of Data.” Communications of the ACM, vol.39 no.11.
[7] Geoffery J. McLachlan, Kim-Anh Do and Christophe Ambroise., (2004). “Analyzing Microarray Gene Expression Data” John Wiley & Sons.
[8] Hynsoo Kim, Gene H., Golub, and Haesun Park., (2005) “Missing value estimation for DNA microarray gene expression data: local least squares imputation.” Bioinfomatics, vol.21 no.2, pages 187-198.
[9] Jianjun Hu, Haifeng Li, Micaael S. Waterman and Xianghong Jasmine Zhou. (2006) “Interative missing value estimation for microarray data.” Bioinformatics, vol.7 pages 449-462.
[10] Jiawei Han and Micheline Kamber, (2000) “Data Mining: Concept and Techniques.” Morgan Kaufmann Publishers.
[11] Mehmed Kantardzic., (2003) “Data Mining: Concepts, Models, Methods and Algorithms.” John Wiley & Sons.
[12] Muhammad Shoaib B. Sehgal, Iqbal Gonbal and Laurence Dooley., (2004) “K-Ranked Covariance Based Missing Values Estimation for Microarray Data.” IEEE Computer Society. Proceedings of the Fourth International Conference on Hybrid Intelligent Systems (HIS’04).
[13] Muhammad Shoaib B. Sehgal, Iqbal Gonbal and Laurence Dooley., (2005) “Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.” Bioinformatics, Vol. 21 no. 10, page 2417-2423.
[14] Neil J. Salkind., (2007) “Encyclopedia of measurement and statistics.” Sage Publications.
URL: http://www.utdallas.edu/~herve/Abdi-Distance2007-pretty.pdf
[15] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman., (2001) “Missing value estimation methods for DNA microarrays.” Bioinfomatics, vol.17 no.6, pages 520-525.
[16] Periklis A., (March 2002) “Data Clustering Techniques.”
URL: http://www.cs.toronto.edu/~periklis/pubs/depth.pdf
[17] Spellman P.T., Sherlock G., Zhang M.Q., Iyer V.R., Anders K., Eisen M.B., Brown P.O., Botstein D., and Futcher B., (1998) “Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.” Mol. Biol. Cell, vol.9, 3273-3297.
[18] Tseng S.M., Wang K.H. and Lee C.I., (2003) “A Preprocessing Method to Deal with Missing Values by Integrating Clustering and Regression Techniques.” Applied Artificial Intelligence, vol.17 no.5, pages 535-544.
[19] Yoshimoto H., Saltsman K., Gasch A.P., Li H.X., Ogawa N., Botstein D., Brown P.O. and Cyert M.S., (2002) “Genome-wide analysis of gene expression regulated by the calcineurin/Crz1p signaling pathway in Saccharomyces cerevisiae.” J. Biol. Chem., 277, 31079-31088.
[20] Zhu G., Spellman P.T., Volpe T., Brown P.O., Botstein D., Davis T.N. and Futcher B., (2000) “Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth.” Nature, 406, 90-94.
[21] 游裕昌,“應用基因群及技術於大型資料庫內遺失值之處理”國立台灣科技大學電子工程所論文, 2004
[22] 沈永勝,“整合自動分群與加權式灰關聯技術於大型資料庫內遺失值之處理”國立台灣科技大學電子工程所論文, 2005
[23] 鄧聚龍,“灰色系統理論與應用”高立出版, 2000