研究生: |
沈永勝 yung-sheng shen |
---|---|
論文名稱: |
整合自動分群與加權式灰關聯技術以處理大型資料庫遺失值問題之處理 Integrating Automatic Clustering and Weighted Grey Relational Technique for Missing Value Processing in Large Databases |
指導教授: |
楊鍵樵
Chen-Chau Yang |
口試委員: |
李漢銘
Hahn-Ming Lee 朱雨其 Yu-Chi Chu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | 加權式灰關聯 、自動分群 、遺失值 、資料探勘 |
外文關鍵詞: | Data Mining, weighted grey relational, Missing Values, Automatic Clustering |
相關次數: | 點閱:277 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料探勘為(Data Mining)目前非常熱門的研究領域,主要在研究如何從龐大資料庫中萃取出有用的知識,提供企業進行決策時的參考。資料庫中遺失值(Missing Values)的存在,會影響到資料分析的品質,所以如何妥善處理遺失值問題是相當重要的議題。到目前為止,在處理具遺失值類別資料時,慣例的作法仍是採忽略遺失值資料,但是這通常不是明智之舉。本文延用過去曾提出以灰關聯分析處理遺失值問題的優勢,提出一種新的遺失值處理方法,透過自動分群演算法與加權式灰關聯分析的整合,然後得以對遺失值部份計算出適當的值,藉以達成在資料庫內知識發掘之資料前置處理的需求,提高後續使用時的正確性。我們並以一些大型資料庫為例進行實作,來驗證本方法的可行性。
Data mining has became a very popular research area recently. It is a process of extracting desirable knowledge from huge database, and offering enterprises the consultation while making policies. The quality of data analysis results will be affected if there exist missing values in the database, so how to deal carefully with the missing value problem is a quite important topic. So far, while dealing with classificatory data of missing value, the method of convention had been to ignore the missing value data. But this is usually not a wise move. In this paper, we continue with the advantage of using grey relational analysis to deal with missing value problem proposed in the past. We propose a new approach to handle missing values. The proposed approach integrates the automatic clustering algorithm and weighted grey relational analysis, then we can compute suitable values for the part of missing value. We hope to fulfill the needs of data preprocessing in KDD(Knowledge Discovery in Database) by this method, and improve the correctness of the follow-up use. We also implement this method and use some of large databases to justify the feasibility of the method we proposed.
[1]Jiawei, H. and Micheline, K.,"Data Mining: Concepts and Techniques", Morgan Kaufmann Publishers, 2000.
[2]Plye, D.,"Data PreParation for Data Mining", Morgan Kaufmann Publishers, 1999.
[3]Little, R. J. A. and D. B. Rubin, Statistic Analysis with Missing Data,
John Wiley & Sons, 1986.
[4]Fayyad, U., G. Piatetsky-Shapiro and P. Smyth, “The KDD Process for Extracting Useful Knowledge from Volumes of Data,” Communications of the ACM, vol. 39, no. 11, November 1996.
[5]Batista, G. E. A. P. A. and M. C. Monard, “An Analysis of Four Missing Data Treatment Methods for Supervised Learning,” Applied Artificial Intelligence, vol. 17, no. 5-6, pp. 519-533, 2003.
[6]Periklis, A., “Data Clustering Techniques”, March 2002.
URL: http://www.cs.toronto.edu/~periklis/pubs/depth.pdf
[7]Kalton, G., and Kasprzyk, D.,"Imputing for missing survey response", Proc. Sect. Survey Res. Meth., Amer. Statist. Assoc., 22-23, 1982.
[8]Lien-Chin, C., "A Correlation-Based Approach for Validating Gene Expression Clustering", Department of Computer Science and Information Engineering National Cheng Kung University, 2002.
[9]Rubin, D.B.,"Multiple imputation for nonresponse in surveys", New York, Wiley, 1987.
[10]Chi-Chun Huang , and Hahn-Ming Lee, 2004, “A Grey-based Nearest Neighbor Approach for Predicting Missing Attribute Values,” Applied Intelligence, vol. 20, pp. 239-252.
[11]Tseng, S. M., K. H. Wang and C. I. Lee, “A Preprocessing Method to Deal with Missing Values by Integrating Clustering and Regression Techniques,” Applied Artificial Intelligence, vol. 17, no. 5, pp. 535-
544, 2003.
[12]Chen, S. M. and C. M. Huang, “A New Method to Estimating Null Values in Relational Database Systems Using Genetic Algorithms,” Proceedings of the 6th Conference on Artificial Intelligence and Applications, pp. 599-604, November 2001.
[13]Liu, W., A. White, S. Thompson and M. Bramer, “Techniques for Dealing with Missing Values in Classification,” In International Symposium on Intelligent Data Analysis, 1997.
[14]Siripitayananon, P., H. C. Chen and K. R. Jin, “Estimating Missing Data of Wind Speeds Using Neural Network,” SoutheastCon, 2002, Proceedings IEEE, pp. 343-348, 2002.
[15]Ragel, A. and B. Cremilleux, “Treatment of Missing Values for Association Rules,” PAKDD 98, pp. 258-270, 1998.
[16]Honda, K., N. Sugiura and H. Ichihashi, “Simultaneous Approach to Principal Component Analysis and Fuzzy Clustering with Missing Values,” IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, vol. 3, 2001.
[17]Lobo, O. O. and M. Numao, “Suitable Domains for Using Ordered Attribute Trees to Impute Missing Values,” IEICE Transactions on Inf. & Syst., vol. 84, no. 2, February 2001.
[18]H. R. Hsiao and S. M. Chen, "A new automatic clustering algorithm for fuzzy query processing," Proceedings of the 6th Conference on Artificial Intelligence and Applications, Kaohsiung, Taiwan, Republic of China, pp. 550-555, November 2001.
[19]Deng Julong,The Control Problems of Grey System,System & Control Letters,1982, No 5, p.288-294。
[20]游裕昌, “應用基因群集技術於大型資料庫內遺失值之處理,” 國
立台灣科技大學電子工程所碩士論文, 2004.
[21]鄧聚龍、郭洪,灰預測原理與應用,全華出版,1996。
[22]鄧聚龍,灰色系統理論與應用,高立出版,2000。
[23]吳漢雄、鄧聚龍、溫坤禮,灰色理論入門,高立出版,1996。