簡易檢索 / 詳目顯示

研究生: 錢佳萱
Chia-Hsuan Chien
論文名稱: 應用於類別屬性資料之基因演算法為基礎之多目標模糊分群法
Genetic Algorithm-Based Multiobjective Fuzzy Clustering Method on Categorical Data
指導教授: 郭人介
Ren-Jieh Kuo
楊朝龍
Chao-Lung Yang
口試委員: 歐陽超
Chao Ou-Yang
黃奎隆
Kwei-Long Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 69
中文關鍵詞: 類別屬性資料多目標基因演算法模糊分群
外文關鍵詞: Categorical attributes, Multiobjective, Genetic algorithm, Fuzzy clustering.
相關次數: 點閱:336下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 類別屬性資料之資料分群,因相似度的計算有別於數值型的資料,因此在演算法的設計上針對相似度計算作調整。常見的方式除了將類別屬性資料轉碼成數量化資料後再計算相似度外,亦有以資料發生頻率為基礎之計算方式,如相對於K-means演算法之K-mode方法。基於前人研究的基礎,本研究針對類別屬性資料之分群問題,提出一結合模糊基因演算法及多目標最佳化之資料分群演算法,並提升分群結果的品質。其演算法的主要特色在於以模糊分數作為基因演算法之染色體成份,在分群過程中,同時最佳化群心與各個資料點的緊密度(π)及群心與群心的分離度(sep)。此外,本研究利用模糊分數提出了一項新技術,有效率地尋找柏拉圖前緣線最佳解。
    本研究的實驗使用三個公開的UCI類別屬性資料,分別針對所提出之NSGA-FMC演算法、基因模糊K-modes分群演算法以及MOGA(π,sep)演算法進行比較。評比指標使用Adjusted Rand Index (ARI)、π、sep以及程式運行時間(Running time)來進行統計分析。實驗結果發現本研究所提的方法在ARI、π、sep三項指標的共同評比下,其分群品質較其他方法優異。


    For the categorical data which can be nominal or ordinal, due to the different dissimilarity measurement, a variety of clustering algorithms which accommodates the dissimilarity measurement of categorical data has been proposed. For example, K-modes algorithm, an extension of the K-means algorithm for clustering categorical data, uses modes instead of mean as centroid of a cluster to perform data clustering on categorical data. In this research, based on K-modes method, a data clustering algorithm which combines fuzzy genetic algorithm and multiobjective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. In addition, a more efficient solution selection technique which selects a solution from non-dominated pareto front is integrated in the algorithm. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed Non-dominated Sorting Genetic Algorithm-Fuzzy Membership Chromosome (NSGA-FMC), Genetic Algorithm fuzzy K-modes (GA-FKM) and Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes (MOGA(π, sep)). Adjusted Rand Index (ARI), π, sep, and computation time (running time) were used as performance indexes for comparison. The experimental result showed the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously.

    摘 要 I ABSTRACT II 誌 謝 III CONTENTS IV LIST OF TABLES VII LIST OF FIGURES VIII CHAPTER 1 INTRODUCTION 1 CHAPTER 2 LITERATURE REVIEW 5 2.1 Categorical Clustering 5 2.2 Fuzzy K-modes Clustering 7 2.3 Genetic Algorithm (GA) 10 2.4 Multi-Objective Optimization 11 2.5 Relabeling Technique 16 2.6 Adjusted Rand Index 18 CHAPTER 3 RESEARCH METHODOLOGY 19 3.1 NSGA-FMC Algorithm 21 3.1.1 Initialization Process 23 3.1.2 Selection 23 3.1.3 Crossover 25 3.1.4 Mutation 25 3.1.5 Elitism Non-dominated Sorting 26 3.2 Selecting a Solution from the Non-dominated Set (Pareto Optimal Front) 29 3.2.1 Selecting a Solution from the Non-dominated Set in MOGA(π, sep) 30 3.2.2 Selecting a Solution from the Non-dominated Set in NSGA-FMC 32 CHAPTER 4 EXPERIMENTAL RESULTS 34 4.1 Datasets 34 4.2 Results 35 4.2.1 Soybean Dataset 35 4.2.2 Zoo Dataset 37 4.2.3 Congressional Votes dataset (Votes) 39 CHAPTER 5 DISCUSSION AND CONCLUSIONS 43 REFERENCES 45 Appendix 50 A. Soybean Experimental Results of GA-FKM 50 B. Zoo Experimental Results of GA-FKM 50 C. Congressional Votes Experimental Results of GA-FKM 51 D. Soybean Experimental Results of MOGA(π, sep) 51 E. Zoo Experimental Results of MOGA(π, sep) 52 F. Congressional Votes Experimental Results of MOGA(π, sep) 52 G. Soybean Experimental Results of NSGA-FMC 53 H. Zoo experimental Results of NSGA-FMC 53 I. Congressional Votes Experimental Results of NSGA-FMC 54 J. Main Code of NSGA-FMC (MATLAB) 54

    Ahmad, A. & Dey, L., "Algorithm for Fuzzy Clustering of Mixed Data with Numeric and Categorical Attributes Distributed Computing and Internet Technology." vol. 3816, G. Chakraborty, Ed., ed: Springer Berlin / Heidelberg, 2005, pp. 561-572.
    Andritsos, P., Tsaparas, P., Miller, R. J., & Sevcik, K. C., "LIMBO: A Scalable Algorithm to Cluster Categorical Data University of Toronto," CSRG Technical Report, no. 467, 2003.
    Bandyopadhyay, S. & Pal, S. K., Classification and Learning Using Genetic Algorithms: Springer, 2007.
    Barbara, D., Li, Y., & Couto, J., "COOLCAT: an entropy-based algorithm for categorical clustering," presented at the Proceedings of the eleventh international conference on Information and knowledge management, 2002.
    Bensaid, A. M., Hall, L. O., Bezdek, J. C., Clarke, L. P., Silbiger, M. L., Arrington, J. A., et al., "Validity-guided (re)clustering with applications to image segmentation," Fuzzy Systems, IEEE Transactions on, vol. 4, no. 2, pp. 112-123, 1996.
    C.M., F. & P.J., F., "Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization," presented at the Proceedings of the 5th International Conference on Genetic Algorithms, San Francisco, CA, USA, 1993.
    Chang, D., Zhao, Y., Zheng, C., & Zhang, X., "A genetic clustering algorithm using a message-based similarity measure," Expert Systems with Applications, vol. 39, no. 2, pp. 2194-2202, 2012.
    Chatzis, S. P., "A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional," Expert Systems with Applications, vol. 38, no. 7, pp. 8684-8689, 2011.
    Corne, D. W., Jerram, N. R., Knowles, J. D., & Oates, M. J., "PESA-II: Region-based selection in evolutionary multiobjective optimization," in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’2001, 2001.
    Corne, D. W., Knowles, J. D., & Oates, M. J., "The Pareto envelope-based selection algorithm for multiobjective optimization," in Parallel Problem Solving from Nature PPSN VI, 2000, pp. 839-848.
    Davis, L., Handbook of genetic algorithms, 1991.
    Deb, K., Multi-objective Optimization using Evolutionary Algorithms, 2001.
    Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T., "A fast and elitist multiobjective genetic algorithm: NSGA-II," Evolutionary Computation, IEEE Transactions on, vol. 6, no. 2, pp. 182-197, 2002.
    Fanizzi, N., D'Amato, C., & Esposito, F., "Fuzzy clustering for categorical spaces an application to semantic knowledge bases," in 18th International Symposium on Methodologies for Intelligent Systems, ISMIS 2009, September 14, 2009 - September 17, 2009, Prague, Czech republic, 2009, pp. 161-170.
    Gan, G., Wu, J., & Yang, Z., "A genetic fuzzy -Modes algorithm for clustering categorical data," Expert Systems with Applications, vol. 36, no. 2, Part 1, pp. 1615-1620, 2009.
    Ganti, V., Gehrke, J., & Ramakrishnan, R., "CACTUS—clustering categorical data using summaries," presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 1999.
    Gibson, D., Kleinberg, J., & Raghavan, P., "Clustering Categorical Data: An Approach Based on Dynamical Systems," The VLDB Journal vol. 8, no. 3-4, pp. 222-226, 1998.
    Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning, 1989.
    Guha, S., Rastogi, R., & Shim, K., "ROCK: a robust clustering algorithm for categorical attributes," in Data Engineering, 1999. Proceedings., 15th International Conference on, 1999, pp. 512-521.
    Horn, J. & Nafpliotis, N., "Multiobjective Optimization Using the Niched Pareto Genetic Algorithm," IlliGAL report, no. 93005, 1993.
    Huang, Z., "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining," presented at the Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.
    Jain, A. K., "Data clustering: 50 years beyond K-means," Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, 2010.
    Jain, A. K., Murty, M. N., & Flynn, P. J., "Data Clustering: A Review," ACM Computng Surveys, vol. 31, no. 3, pp. 265-323, 1999.
    Kim, D.-W., Lee, K., Lee, D., & Lee, K. H., "A k-populations algorithm for clustering categorical data," Pattern Recognition, vol. 38, no. 7, pp. 1131-1134, 2005.
    Kim, D.-W., Lee, K. H., & Lee, D., "Fuzzy clustering of categorical data using fuzzy centroids," Pattern Recognition Letters, vol. 25, no. 11, pp. 1263-1271, 2004.
    Knowles, J. & Corne, D., "The pareto archived evolution strategy: A new baseline algorithm for pareto multiobjective optimisation," in Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, 1999.
    Krishna, K. & Narasimha Murty, M., "Genetic K-means algorithm," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 29, no. 3, pp. 433-439, 1999.
    Maulik, U., Bandyopadhyay, S., & Mukhopadhyay, A., Multiobjective Genetic Algorithms for Clustering Application in Data Mining and Bioinformatics, 2011.
    Michalewicz, Z., Genetic algorithms+ data structures= evolution programs: springer, 1998.
    Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S., "Multiobjective Genetic Fuzzy Clustering of Categorical Attributes," in Information Technology, (ICIT 2007). 10th International Conference on, 2007, pp. 74-79.
    Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S., "Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes," IEEE Transactions on Evolutionary Computation, vol. 13, no. 5, pp. 991-1005, 2009.
    Saha, I. & Maulik, U., "Improved Fuzzy Clustering Techniques for Categorical Data," AIP Conference Proceedings, vol. 1089, no. 1, pp. 82-93, 2009.
    Saha, I. & Mukhopadhyay, A., "Genetic Algorithm and Simulated Annealing based Approaches to Categorical Data Clustering," in Industrial and Information Systems, 2008. ICIIS 2008. IEEE Region 10 and the Third international Conference on, ed, 2008, pp. 1-6.
    Srinivas, N. & Deb, K., "Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms," Evolutionary Computation,, vol. 2, no. 3, pp. 221-248, 1994.
    Yang, C.-L., Kuo, R.-J., & Chien, C.-H., "A Relabeling Algorithm for Clustering Comparison," presented at the ICMLDA 2013 : International Conference on Machine Learning and Data Analysis, Tokyo, Japan, 2013.
    Yee, Y. K. & Ruzzo, W. L., "Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper An empirical study on Principal Component Analysis for clustering gene expression data (to appear in Bioinformatics)," Bioinformatics, vol. 17, no. 9, pp. 763-774, 2001.
    Yip, K. Y., Cheung, D. W., & Ng, M. K., "A Highly usable Projected Clustering Algorithm for Gene Expression Profiles," presented at the Proc. 3rd ACM SIGKDD Workshop Data Mining Bioinformatics, 2003.
    Zahid, N., Limouri, M., & Essaid, A., "A new cluster-validity for fuzzy clustering," Pattern Recognition, vol. 32, no. 7, pp. 1089-1097, 1999.
    Zhao, L., Tsujimura, Y., & Gen, M., "Genetic algorithm for fuzzy clustering," in Evolutionary Computation, 1996., Proceedings of IEEE International Conference on, 1996, pp. 716-719.
    Zhexue, H. & Ng, M. K., "A fuzzy k-modes algorithm for clustering categorical data," Fuzzy Systems, IEEE Transactions on, vol. 7, no. 4, pp. 446-452, 1999.

    QR CODE