簡易檢索 / 詳目顯示

研究生: 陳嘉元
Chia-Yuan Chen
論文名稱: 基於灰關聯分析與最小平方法以預測遺失值之改進方法
An Improved Missing Value Estimation Method based on Grey Relational Analysis and Least Squares Problem
指導教授: 楊鍵樵
Chen-Chau Yang
口試委員: 呂永和
Yung-Ho Leu
朱雨其
Yu-Chi Chu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 63
中文關鍵詞: 遺失值基因微陣列基因表現資料局部最小平方法加權式灰關聯分析
外文關鍵詞: Missing Value, Microarray, Gene Expression Data, Local Least Squares, Weighted Grey Relational Analysis
相關次數: 點閱:281下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在大型的資料庫中常會因為各種原因導致有遺失值的存在,若沒有適當地處理資料庫中的遺失值,則會影響到後續資料分析的品質,做出錯誤的決策,因此如何妥善的處理遺失值是相當重要的議題。而在基因微陣列實驗所得到的基因表現資料中,常有大量的基因表現值遺失,在此情形下,僅利用少量完整沒有遺失的基因資料去填補遺失值會造成其結果不甚理想。因此本研究沿用局部最小平方法在處理遺失值問題的優勢,並整合過去實驗室所提出的加權式灰關聯分析,提出一個在遺失率較高時遺失值處理的改進方法。藉由選取訓練資料規則上的改變,得以在高遺失率的狀況下適當的預測遺失值,提高後續研究的正確性。我們並以一些基因表現資料集進行實作,驗證本方法的可行性。


    Missing Values usually exist in Large Database due to variousreasons. If we do not treat missing values in the database properly, the quality of the following data analysis will be affected, and we mightmake a wrong policy in the following steps. As a result, how to handlemissing values significantly is an important issue. Gene Expression Data acquired from Microarray with a series of experiments often have a lot of gene expression values missing. If we estimate the missing parts by using few complete genes without missing values, the result will not be ideal. Therefore, our research took advantage of the superiority of the Local Least Squares Method in dealing with missing values, and integrated theWeighted Grey Relational Analysis proposed by our library before. We brought up an improved estimation method for the data with highermissing rate. By changing the rule of selecting Training Data, we can predict the missing values more precisely when the missing rate is getting higher. Then the accuracy of further research can be raised. We also used several gene expression data to implement our proposed method to prove it can work effectively.

    第一章 緒論...................................................................1 1.1 研究動機................................................................1 1.2 研究方向................................................................2 1.3 研究背景................................................................2 1.4 研究架構................................................................4 第二章 相關文獻...............................................................5 2.1 遺失值..................................................................5 2.1.1 遺失值的定義........................................................5 2.1.2 遺失值的型態........................................................6 2.2 處理遺失值方法的探討....................................................7 2.2.1 忽略................................................................7 2.2.2 平均值取代..........................................................7 2.2.3 決策樹..............................................................8 2.2.4 群集以及迴歸技術....................................................8 2.2.5 基因群集技術........................................................9 2.2.6 K筆近似鄰居插補法..................................................10 2.2.7 奇異值分解插補法...................................................11 2.3 本研究應用之方法.......................................................11 2.3.1 加權式灰關聯分析...................................................11 2.3.2 局部最小平方插補法.................................................15 第三章 研究方法..............................................................18 3.1 研究架構...............................................................18 3.1.1 遺失值的預先處理...................................................19 3.1.2 訓練資料的挑選規則.................................................21 3.1.3 遺失值計算模組.....................................................23 3.1.4 決定相似鄰居數K....................................................26 3.2 例子...................................................................26 3.2.1 第一階段...........................................................28 3.2.2 第二階段...........................................................32 3.3 評估指標...............................................................36 3.3.1 均方根誤差.........................................................36 3.3.2 接近率法...........................................................36 第四章 實驗設計與討論........................................................38 4.1 實驗環境...............................................................38 4.2 實驗方法...............................................................38 4.3 實驗資料...............................................................39 4.4 實驗一.................................................................40 4.5 實驗二.................................................................50 4.6 實驗結論...............................................................52 第五章 結論及未來研究方向....................................................54 5.1 結論...................................................................54 5.2 未來研究方向...........................................................54 參考文獻.....................................................................56

    [1] Batista G. and Monard M.C., (2003) “An Analysis of Four Missing Data Treatment Methods for Supervised Learning.” Applied Artificial Intelligence, vol.17, no.5-6, pages 519-533.
    [2] Liu W.Z., White A.P., Thompson S.G. and Bramer M.A., (1997) “Techniques for Dealing with Missing Values in Classification.” IDA, 527-536.
    [3] Breiman L., Friedman J., Olshen R.A. and Stone C.J., (1984) “Classification and Regression Trees.” Wadsworth and Brooks, Pacific Grove CA.
    [4] Charles L. Lawson and Richard J. Hanson., (1974) “Solving Least Squares Problems.” Prentice Hall Inc.
    [5] Chi-Chun Huang and Hahn-Ming Lee., (2004) “A Grey-based Nearest Neighbor Approach for Predicting Missing Attribute Values.” Applied Intelligence, vol.20, pages 239-252.
    [6] Fayyad U., Piatetsky Shapiro G. and Smyth P., (1996) “The KDD Process for Extracting Useful Knowledge from Volumes of Data.” Communications of the ACM, vol.39 no.11.
    [7] Geoffery J. McLachlan, Kim-Anh Do and Christophe Ambroise., (2004). “Analyzing Microarray Gene Expression Data” John Wiley & Sons.
    [8] Hynsoo Kim, Gene H., Golub, and Haesun Park., (2005) “Missing value estimation for DNA microarray gene expression data: local least squares imputation.” Bioinfomatics, vol.21 no.2, pages 187-198.
    [9] Jianjun Hu, Haifeng Li, Micaael S. Waterman and Xianghong Jasmine Zhou. (2006) “Interative missing value estimation for microarray data.” Bioinformatics, vol.7 pages 449-462.
    [10] Jiawei Han and Micheline Kamber, (2000) “Data Mining: Concept and Techniques.” Morgan Kaufmann Publishers.
    [11] Mehmed Kantardzic., (2003) “Data Mining: Concepts, Models, Methods and Algorithms.” John Wiley & Sons.
    [12] Muhammad Shoaib B. Sehgal, Iqbal Gonbal and Laurence Dooley., (2004) “K-Ranked Covariance Based Missing Values Estimation for Microarray Data.” IEEE Computer Society. Proceedings of the Fourth International Conference on Hybrid Intelligent Systems (HIS’04).
    [13] Muhammad Shoaib B. Sehgal, Iqbal Gonbal and Laurence Dooley., (2005) “Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.” Bioinformatics, Vol. 21 no. 10, page 2417-2423.
    [14] Neil J. Salkind., (2007) “Encyclopedia of measurement and statistics.” Sage Publications.
    URL: http://www.utdallas.edu/~herve/Abdi-Distance2007-pretty.pdf
    [15] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman., (2001) “Missing value estimation methods for DNA microarrays.” Bioinfomatics, vol.17 no.6, pages 520-525.
    [16] Periklis A., (March 2002) “Data Clustering Techniques.”
    URL: http://www.cs.toronto.edu/~periklis/pubs/depth.pdf
    [17] Spellman P.T., Sherlock G., Zhang M.Q., Iyer V.R., Anders K., Eisen M.B., Brown P.O., Botstein D., and Futcher B., (1998) “Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.” Mol. Biol. Cell, vol.9, 3273-3297.
    [18] Tseng S.M., Wang K.H. and Lee C.I., (2003) “A Preprocessing Method to Deal with Missing Values by Integrating Clustering and Regression Techniques.” Applied Artificial Intelligence, vol.17 no.5, pages 535-544.
    [19] Yoshimoto H., Saltsman K., Gasch A.P., Li H.X., Ogawa N., Botstein D., Brown P.O. and Cyert M.S., (2002) “Genome-wide analysis of gene expression regulated by the calcineurin/Crz1p signaling pathway in Saccharomyces cerevisiae.” J. Biol. Chem., 277, 31079-31088.
    [20] Zhu G., Spellman P.T., Volpe T., Brown P.O., Botstein D., Davis T.N. and Futcher B., (2000) “Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth.” Nature, 406, 90-94.
    [21] 游裕昌,“應用基因群及技術於大型資料庫內遺失值之處理”國立台灣科技大學電子工程所論文, 2004
    [22] 沈永勝,“整合自動分群與加權式灰關聯技術於大型資料庫內遺失值之處理”國立台灣科技大學電子工程所論文, 2005
    [23] 鄧聚龍,“灰色系統理論與應用”高立出版, 2000

    無法下載圖示 全文公開日期 2008/07/24 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE