簡易檢索 / 詳目顯示

研究生: 沈永勝
yung-sheng shen
論文名稱: 整合自動分群與加權式灰關聯技術以處理大型資料庫遺失值問題之處理
Integrating Automatic Clustering and Weighted Grey Relational Technique for Missing Value Processing in Large Databases
指導教授: 楊鍵樵
Chen-Chau Yang
口試委員: 李漢銘
Hahn-Ming Lee
朱雨其
Yu-Chi Chu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 69
中文關鍵詞: 加權式灰關聯自動分群遺失值資料探勘
外文關鍵詞: Data Mining, weighted grey relational, Missing Values, Automatic Clustering
相關次數: 點閱:274下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資料探勘為(Data Mining)目前非常熱門的研究領域,主要在研究如何從龐大資料庫中萃取出有用的知識,提供企業進行決策時的參考。資料庫中遺失值(Missing Values)的存在,會影響到資料分析的品質,所以如何妥善處理遺失值問題是相當重要的議題。到目前為止,在處理具遺失值類別資料時,慣例的作法仍是採忽略遺失值資料,但是這通常不是明智之舉。本文延用過去曾提出以灰關聯分析處理遺失值問題的優勢,提出一種新的遺失值處理方法,透過自動分群演算法與加權式灰關聯分析的整合,然後得以對遺失值部份計算出適當的值,藉以達成在資料庫內知識發掘之資料前置處理的需求,提高後續使用時的正確性。我們並以一些大型資料庫為例進行實作,來驗證本方法的可行性。


    Data mining has became a very popular research area recently. It is a process of extracting desirable knowledge from huge database, and offering enterprises the consultation while making policies. The quality of data analysis results will be affected if there exist missing values in the database, so how to deal carefully with the missing value problem is a quite important topic. So far, while dealing with classificatory data of missing value, the method of convention had been to ignore the missing value data. But this is usually not a wise move. In this paper, we continue with the advantage of using grey relational analysis to deal with missing value problem proposed in the past. We propose a new approach to handle missing values. The proposed approach integrates the automatic clustering algorithm and weighted grey relational analysis, then we can compute suitable values for the part of missing value. We hope to fulfill the needs of data preprocessing in KDD(Knowledge Discovery in Database) by this method, and improve the correctness of the follow-up use. We also implement this method and use some of large databases to justify the feasibility of the method we proposed.

    中文摘要.....................................................................................................i 英文摘要....................................................................................................ii 誌謝...........................................................................................................iii 目錄...........................................................................................................iv 表目錄......................................................................................................vii 圖目錄...................................................................................................... ix 第一章 緒論............................................................................................1 1.1研究動機......................................................................................1 1.2 研究方向......................................................................................2 1.3 章節摘要......................................................................................3 第二章相關文獻......................................................................................4 2.1 遺失值..........................................................................................4 2.1.1 遺失值的定義分類及型態..............................................4 2.2 遺失值的處理方法......................................................................6 2.2.1 傳統作法............................................................................6 2.2.2 統計法................................................................................7 2.2.3 近年來的新方法................................................................8 2.3 本研究應用的二大方法..........................................................11 2.3.1分群演算法........................................................................11 2.3.2 灰色系統理論..................................................................14 2.3.2.1 灰關聯分析.............................................................17 第三章 研究方法....................................................................................19 3.1 研究架構....................................................................................19 3.2自動分群演算法........................................................................21 3.3符合的群集找尋........................................................................21 3.4加權式灰關聯分析....................................................................22 3.5例子............................................................................................29 3.6評估指標....................................................................................33 第四章實驗結果....................................................................................34 4.1實驗環境.....................................................................................34 4.2實驗資料....................................................................................34 4.2.1Iris的實驗資料格式....................................................34 4.2.2Liver-disorders的實驗資料格式................................35 4.2.3Auto-Mpg的實驗資料格式.........................................36 4.2.4環保署的實驗資料格式...............................................36 4.3實驗方法....................................................................................37 4.3.1實驗一...........................................................................37 4.3.2實驗二...........................................................................40 4.3.3實驗三...........................................................................42 4.3.4實驗四...........................................................................49 4.4實驗結論....................................................................................50 第五章結論與及未來的研究方向........................................................52 5.1結論............................................................................................52 5.2未來的研究方向........................................................................52 參考文獻..................................................................................................54

    [1]Jiawei, H. and Micheline, K.,"Data Mining: Concepts and Techniques", Morgan Kaufmann Publishers, 2000.

    [2]Plye, D.,"Data PreParation for Data Mining", Morgan Kaufmann Publishers, 1999.

    [3]Little, R. J. A. and D. B. Rubin, Statistic Analysis with Missing Data,
    John Wiley & Sons, 1986.

    [4]Fayyad, U., G. Piatetsky-Shapiro and P. Smyth, “The KDD Process for Extracting Useful Knowledge from Volumes of Data,” Communications of the ACM, vol. 39, no. 11, November 1996.

    [5]Batista, G. E. A. P. A. and M. C. Monard, “An Analysis of Four Missing Data Treatment Methods for Supervised Learning,” Applied Artificial Intelligence, vol. 17, no. 5-6, pp. 519-533, 2003.

    [6]Periklis, A., “Data Clustering Techniques”, March 2002.
    URL: http://www.cs.toronto.edu/~periklis/pubs/depth.pdf

    [7]Kalton, G., and Kasprzyk, D.,"Imputing for missing survey response", Proc. Sect. Survey Res. Meth., Amer. Statist. Assoc., 22-23, 1982.

    [8]Lien-Chin, C., "A Correlation-Based Approach for Validating Gene Expression Clustering", Department of Computer Science and Information Engineering National Cheng Kung University, 2002.

    [9]Rubin, D.B.,"Multiple imputation for nonresponse in surveys", New York, Wiley, 1987.

    [10]Chi-Chun Huang , and Hahn-Ming Lee, 2004, “A Grey-based Nearest Neighbor Approach for Predicting Missing Attribute Values,” Applied Intelligence, vol. 20, pp. 239-252.

    [11]Tseng, S. M., K. H. Wang and C. I. Lee, “A Preprocessing Method to Deal with Missing Values by Integrating Clustering and Regression Techniques,” Applied Artificial Intelligence, vol. 17, no. 5, pp. 535-
    544, 2003.

    [12]Chen, S. M. and C. M. Huang, “A New Method to Estimating Null Values in Relational Database Systems Using Genetic Algorithms,” Proceedings of the 6th Conference on Artificial Intelligence and Applications, pp. 599-604, November 2001.

    [13]Liu, W., A. White, S. Thompson and M. Bramer, “Techniques for Dealing with Missing Values in Classification,” In International Symposium on Intelligent Data Analysis, 1997.

    [14]Siripitayananon, P., H. C. Chen and K. R. Jin, “Estimating Missing Data of Wind Speeds Using Neural Network,” SoutheastCon, 2002, Proceedings IEEE, pp. 343-348, 2002.

    [15]Ragel, A. and B. Cremilleux, “Treatment of Missing Values for Association Rules,” PAKDD 98, pp. 258-270, 1998.

    [16]Honda, K., N. Sugiura and H. Ichihashi, “Simultaneous Approach to Principal Component Analysis and Fuzzy Clustering with Missing Values,” IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, vol. 3, 2001.

    [17]Lobo, O. O. and M. Numao, “Suitable Domains for Using Ordered Attribute Trees to Impute Missing Values,” IEICE Transactions on Inf. & Syst., vol. 84, no. 2, February 2001.

    [18]H. R. Hsiao and S. M. Chen, "A new automatic clustering algorithm for fuzzy query processing," Proceedings of the 6th Conference on Artificial Intelligence and Applications, Kaohsiung, Taiwan, Republic of China, pp. 550-555, November 2001.

    [19]Deng Julong,The Control Problems of Grey System,System & Control Letters,1982, No 5, p.288-294。

    [20]游裕昌, “應用基因群集技術於大型資料庫內遺失值之處理,” 國
    立台灣科技大學電子工程所碩士論文, 2004.

    [21]鄧聚龍、郭洪,灰預測原理與應用,全華出版,1996。

    [22]鄧聚龍,灰色系統理論與應用,高立出版,2000。

    [23]吳漢雄、鄧聚龍、溫坤禮,灰色理論入門,高立出版,1996。

    無法下載圖示 全文公開日期 2006/07/04 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE