簡易檢索 / 詳目顯示

研究生: Neysa Ignacia Han
Neysa - Ignacia Han
論文名稱: Automatic Clustering using Improved Artificial Bee Colony Algorithm
Automatic Clustering using Improved Artificial Bee Colony Algorithm
指導教授: 郭人介
Ren-Jieh Kuo
口試委員: Chao-Lung Yang
Chao-Lung Yang
Chieh-Yuan Tsai
Chieh-Yuan Tsai
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 126
中文關鍵詞: Cluster analysisAutomatic clusteringArtificial bee colony
外文關鍵詞: Cluster analysis, Automatic clustering, Artificial bee colony
相關次數: 點閱:231下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Cluster analysis is an important technique in data mining. Many clustering algorithms have been proposed, but most of them need predetermined number of clusters. Unfortunately, unavailable information regarding number of clusters is commonly happened in real-world problems. Thus, this study intends to overcome this problem by proposing an algorithm for automatic clustering. The proposed algorithm is developed based on a population-based heuristic method, automatic clustering using improved artificial bee colony algorithm (ACIBC). It overcomes two main issues in automatic clustering, namely determining number of clusters and cluster centroids. In the automatic clustering using improved artificial bee colony, the exploration is conducted by solution which is comprised of two sections. Furthermore, sigmoid function is employed to handle infeasible solution. In addition, K-means algorithm is applied to adjust the cluster centroids. Method validation using four benchmark data sets reveals that overall ACIBC outperformed other two previous methods, namely DCPG and DCPSO, and also automatic clustering using original bee colony in terms of number of clusters, fitness value, accuracy and consistency.


Cluster analysis is an important technique in data mining. Many clustering algorithms have been proposed, but most of them need predetermined number of clusters. Unfortunately, unavailable information regarding number of clusters is commonly happened in real-world problems. Thus, this study intends to overcome this problem by proposing an algorithm for automatic clustering. The proposed algorithm is developed based on a population-based heuristic method, automatic clustering using improved artificial bee colony algorithm (ACIBC). It overcomes two main issues in automatic clustering, namely determining number of clusters and cluster centroids. In the automatic clustering using improved artificial bee colony, the exploration is conducted by solution which is comprised of two sections. Furthermore, sigmoid function is employed to handle infeasible solution. In addition, K-means algorithm is applied to adjust the cluster centroids. Method validation using four benchmark data sets reveals that overall ACIBC outperformed other two previous methods, namely DCPG and DCPSO, and also automatic clustering using original bee colony in terms of number of clusters, fitness value, accuracy and consistency.

ABSTRACTi ACKNOWLEDGEMENTSii CONTENTSiii LIST OF FIGURESvi LIST OF TABLESvii Chapter 1 INTRODUCTION1 1.1Background1 1.2Research Objectives2 1.3Research Scope and Assumptions2 1.4Research Framework2 Chapter 2 LITERATURE REVIEW4 2.1Cluster Analysis4 2.1.1Similarity Measures6 2.1.2K-means Algorithm8 2.1.3Clustering Validation10 2.1.4Artificial Neural Networks for Clustering11 2.1.5Automatic Clustering13 2.2Artificial Bee Colony26 2.2.1Foraging Behavior of Honey Bees27 2.2.2The Algorithm30 2.2.3Applications of Artificial Bee Colony33 Chapter 3 METHODOLOGY36 3.1Solution Representation37 3.2Data Preprocessing38 3.3Solution Improvement38 3.4Avoiding Infeasible Solution40 3.5Improvement in ABC Algorithm41 3.6Fitness Function42 3.7Proposed Algorithm43 Chapter 4 EXPERIMENTAL RESULT47 4.1Data Sets47 4.2Parameter Setting51 4.2.1Number of Food Sources51 4.2.2Limit51 4.2.3Maximum cycle number (MCN)52 4.2.4Rate52 4.2.5Kmax55 4.2.6Mean and Standard Deviation55 4.2.7Parameter Setting for Comparison Algorithms56 4.3Experimental Results and Analysis57 4.3.1Number of clusters, VI value, and accuracy57 4.3.2Statistical Test61 4.3.3Computational Time63 4.3.4Convergence Analysis63 Chapter 5 CONCLUSIONS AND FUTURE STUDY68 5.1Conclusions68 5.2Contributions69 5.3Future Study69 Appendix I PSEUDOCODE75 Appendix II COMPUTATIONAL RESULTS83 A2.1Result of VI value83 A2.2Result of number of clusters86 A2.3Result of accuracy89 A2.4Result of VI value for different rate values92 A2.5Result of computational time95 A2.6Result of VI value through iteration98 Appendix III STATISTICAL TEST OUTPUT106 A3.1Difference of VI value106 A3.1.1Difference of VI value of glass data set106 A3.1.2Difference of VI value of aggregation data set107 A3.1.3Difference of VI value of R15 data set108 A3.1.4Difference of VI value of D31 data set109 A3.2Number of clusters110 A3.2.1Difference number of clusters for glass data set110 A3.2.2Difference number of clusters for aggregation data set111 A3.2.3Difference number of clusters for R15 data set112 A3.2.4Difference number of clusters for D31 data set113 A3.3Accuracy114 A3.3.1Difference accuracy for glass data set114 A3.3.2Difference accuracy for aggregation data set115 A3.3.3Difference accuracy for R15 data set116 A3.3.4Difference accuracy for D31 data set117 A3.4Difference of VI value for different rates118 A3.4.1Difference of VI value for different rate of glass data set118 A3.4.2Difference of VI value for different rate of aggregation data set119 A3.4.3Difference of VI value for different rate of R15 data set120 A3.4.4Difference of VI value for different rate of D31 data set121

[1]A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM Comput. Surv., vol. 31, No. 3, pp. 264-323, 1999.

[2]A. K. Jain, R. P. W. Duin, and J. Mao, "Statistical pattern recognition: A review," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, No. 1, pp. 4-37, 2000.

[3]H. M. Abbas and M. M. Fahmy, "Neural networks for maximum likelihood clustering," Signal Process., vol. 36, No. 1, pp. 111-126, 1994.

[4]G. B. Coleman and H. C. Andrews, "Image segmentation by clustering," Proceedings of the IEEE, vol. 67, No. 5, pp. 773-785, 1979.

[5]C. Carpineto and G. Romano, "A lattice conceptual clustering system and its application to browsing retrieval," Mach. Learn., vol. 24, No. 2, pp. 95-122, 1996.

[6]L. Junlin and F. Hongguang, "Molecular dynamics-like data clustering approach," Pattern Recognition, vol. 44, No. 8, pp. 1721-1737, 8// 2011.

[7]G. Hamerly and C. Elkan, "Learning the k in k-means," in Advances in neural information processing systems 16: proceedings of the 2003 conference, 2004, p. 281.

[8]R. J. Kuo, Y. J. Syu, Z.-Y. Chen, and F. C. Tien, "Integration of particle swarm optimization and genetic algorithm for dynamic clustering," Information Sciences, vol. 195, No. 0, pp. 124-140, 7/15/ 2012.

[9]C.-H. Chou, M.-C. Su, and E. Lai, "A new cluster validity measure and its application to image compression," Pattern Anal. Appl., vol. 7, No. 2, pp. 205-220, 2004.

[10]M. K. Pakhira, S. Bandyopadhyay, and U. Maulik, "Validity index for crisp and fuzzy clusters," Pattern Recognition, vol. 37, No. 3, pp. 487-501, // 2004.

[11]D.-X. Chang, X.-D. Zhang, C.-W. Zheng, and D.-M. Zhang, "A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem," Pattern Recognition, vol. 43, No. 4, pp. 1346-1360, 4// 2010.

[12]Y. Liu, X. Wu, and Y. Shen, "Automatic clustering using genetic algorithms," Applied Mathematics and Computation, vol. 218, No. 4, pp. 1267-1279, 10/15/ 2011.

[13]H. He and Y. Tan, "A two-stage genetic algorithm for automatic clustering," Neurocomputing, vol. 81, No. 0, pp. 49-59, 4/1/ 2012.
[14]M. H. Omran, A. Salman, and A. Engelbrecht, "Dynamic clustering using particle swarm optimization with application in image segmentation," Pattern Analysis and Applications, vol. 8, No. 4, pp. 332-344, 2006/02/01 2006.

[15]S. Das, A. Abraham, and A. Konar, "Automatic clustering using an improved differential evolution algorithm," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, vol. 38, No. 1, pp. 218-237, 2008.

[16]S.-M. Pan and K.-S. Cheng, "Evolution-based tabu search approach to automatic clustering," Trans. Sys. Man Cyber Part C, vol. 37, No. 5, pp. 827-838, 2007.

[17]D. Karaboga and B. Akay, "A comparative study of artificial bee colony algorithm," Applied Mathematics and Computation, vol. 214, No. 1, pp. 108-132, 8/1/ 2009.

[18]C. Y. Lee and E. K. Antonsson, "Dynamic partitional clustering using evolution strategies," in Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE, 2000, pp. 2716-2721 vol.4.

[19]S. Das, A. Abraham, and A. Konar, "Automatic kernel clustering with a multi-elitist particle swarm optimization algorithm," Pattern Recogn. Lett., vol. 29, No. 5, pp. 688-699, 2008.

[20]A. K. Jain, "Data clustering: 50 years beyond K-means," Pattern Recognition Letters, vol. 31, No. 8, pp. 651-666, 6/1/ 2010.

[21]X. Rui and D. Wunsch, II, "Survey of clustering algorithms," Neural Networks, IEEE Transactions on, vol. 16, No. 3, pp. 645-678, 2005.

[22]P. Brucker, "On the complexity of clustering problems," in Optimization and Operations Research. vol. 157, R. Henn, B. Korte, and W. Oettli, Eds., ed: Springer Berlin Heidelberg, 1978, pp. 45-54.

[23]H. Frigui and R. Krishnapuram, "A robust competitive clustering algorithm with applications in computer vision," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, No. 5, pp. 450-465, 1999.

[24]B. Everitt, S. Landau, and M. Leese, Cluster Analysis: Wiley, 2009.

[25]J. B. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297.

[26]M. Halkidi, Y. Batistakis, and M. Vazirgiannis, "On clustering validation techniques," J. Intell. Inf. Syst., vol. 17, No. 2-3, pp. 107-145, 2001.

[27]M. Halkidi and M. Vazirgiannis, "Clustering validity assessment: Finding the optimal partitioning of a data set," presented at the Proceedings of the 2001 IEEE International Conference on Data Mining, 2001.
[28]D. L. Davies and D. W. Bouldin, "A cluster separation measure," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-1, No. 2, pp. 224-227, 1979.

[29]R. H. Turi, M. U. S. o. C. Science, and S. Engineering, Clustering-based Colour Image Segmentation: Monash University, 2001.

[30]J. A. Hertz, A. S. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computacion: Addison-Wesley Publishing Company, 1991.

[31]A. K. Jain, M. Jianchang, and K. M. Mohiuddin, "Artificial neural networks: a tutorial," Computer, vol. 29, No. 3, pp. 31-44, 1996.

[32]T. Kohonen, "The self-organizing map," Proceedings of the IEEE, vol. 78, No. 9, pp. 1464-1480, 1990.

[33]M.-C. Su and H.-T. Chang, "Fast self-organizing feature map algorithm," Trans. Neur. Netw., vol. 11, No. 3, pp. 721-733, 2000.

[34]H. Yin, "ViSOM - a novel method for multivariate data projection and structure visualization," Trans. Neur. Netw., vol. 13, No. 1, pp. 237-243, 2002.

[35]H. Ressom, D. Wang, and P. Natarajan, "Adaptive double self-organizing maps for clustering gene expression profiles," Neural Netw., vol. 16, No. 5-6, pp. 633-640, 2003.

[36]S. Wu and T. W. S. Chow, "PRSOM: a new visualization method by hybridizing multidimensional scaling and self-organizing map," Trans. Neur. Netw., vol. 16, No. 6, pp. 1362-1380, 2005.

[37]S. Grossberg, "Competitive learning: From interactive activation to adaptive resonance," in Neural networks and natural intelligence, ed: Massachusetts Institute of Technology, 1988, pp. 213-250.

[38]G. A. Carpenter and S. Grossberg, "ART 2: self-organization of stable category recognition codes for analog input patterns," Applied Optics, vol. 26, No. 23, pp. 4919-4930, 1987.

[39]G. A. Carpenter and S. Grossberg, "ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures," Neural Networks, vol. 3, No. 2, pp. 129-152, 1990.

[40]G. A. Carpenter, S. Grossberg, and D. B. Rosen, "Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system," Neural Netw., vol. 4, No. 6, pp. 759-771, 1991.

[41]R. Abdule-Wahab, N. Monmarché, M. Slimane, M. Fahdil, and H. Saleh, "A scatter search algorithm for the automatic clustering problem," in Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. vol. 4065, P. Perner, Ed., ed: Springer Berlin Heidelberg, 2006, pp. 350-364.

[42]R. Eberhart and J. Kennedy, "A new optimizer using particle swarm theory," in Micro Machine and Human Science, 1995. MHS '95., Proceedings of the Sixth International Symposium on, 1995, pp. 39-43.

[43]C.-Y. Chen, Feng, H.-M. and Ye,F., "Automatic particle swarm optimization clustering algorithm," International Journal of Electrical Engineering, vol. 13, No. 4, pp. 379-387, 2006.

[44]Ouadfel S., Batouche M., Taleb-Ahmed A., "A modified particle swarm optimization algorithm for automatic image clustering," King Saud University, College of Computer and Information Science, 2010.

[45]Z. Yan, Z. Chun-Guang, W. Sheng-Sheng, and H. Lan, "A dynamic clustering based on genetic algorithm," in Machine Learning and Cybernetics, 2003 International Conference on, 2003, pp. 222-224.

[46]J. Olesen, J. Cordero H, and Y. Zeng, "Auto-clustering using particle swarm optimization and bacterial foraging," in Agents and Data Mining Interaction. vol. 5680, L. Cao, V. Gorodetsky, J. Liu, G. Weiss, and P. Yu, Eds., ed: Springer Berlin Heidelberg, 2009, pp. 69-83.

[47]K. M. Passino, "Biomimicry of bacterial foraging for distributed optimization and control," Control Systems, IEEE, vol. 22, No. 3, pp. 52-67, 2002.

[48]D. Karaboga, "An idea based on honey bee swarm for numerical optimization," Technical report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department, 2005.

[49]D. Karaboga and B. Basturk, "On the performance of artificial bee colony (ABC) algorithm," Applied Soft Computing, vol. 8, No. 1, pp. 687-697, 1// 2008.

[50]D. Karaboga and B. Basturk, "Artificial bee colony (ABC) optimization algorithm for solving constrained optimization problems," presented at the Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing, Cancun, Mexico, 2007.

[51]D. Karaboga and B. Basturk, "A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm," Journal of Global Optimization, vol. 39, No. 3, pp. 459-471, 2007.

[52]B. Akay and D. Karaboga, "Artificial bee colony algorithm for large-scale problems and engineering design optimization," Journal of Intelligent Manufacturing, vol. 23, No. 4, pp. 1001-1014, 2012.

[53]J. A. Pacurib, G. M. M. Seno, and J. P. T. Yusiong, "Solving sudoku puzzles using improved artificial bee colony algorithm," in Innovative Computing, Information and Control (ICICIC), 2009 Fourth International Conference on, 2009, pp. 885-888.
[54]H. A. A. Bahamish, R. Abdullah, and R. A. Salam, "Protein tertiary structure prediction using artificial bee colony algorithm," in Modelling & Simulation, 2009. AMS '09. Third Asia International Conference on, 2009, pp. 258-263.

[55]B. Akay and D. Karaboga, "Solving integer programming problems by using artificial bee colony algorithm," in AI*IA 2009: Emergent Perspectives in Artificial Intelligence. vol. 5883, R. Serra and R. Cucchiara, Eds., ed: Springer Berlin Heidelberg, 2009, pp. 355-364.

[56]B. Akay and D. Karaboga, "Parameter tuning for the artificial bee colony algorithm," in Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems. vol. 5796, N. Nguyen, R. Kowalczyk, and S.-M. Chen, Eds., ed: Springer Berlin Heidelberg, 2009, pp. 608-619.

[57]D. Karaboga, B. Akay, and C. Ozturk, "Artificial bee colony (ABC) optimization algorithm for training feed-forward neural networks," presented at the Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence, Kitakyushu, Japan, 2007.

[58]C. Zhang, D. Ouyang, and J. Ning, "An artificial bee colony approach for clustering," Expert Systems with Applications, vol. 37, No. 7, pp. 4761-4767, 7// 2010.

[59]D. Zhang, X. Liu, and Z. Guan, "A dynamic clustering algorithm based on PSO and its application in fuzzy identification," in Intelligent Information Hiding and Multimedia Signal Processing, 2006. IIH-MSP '06. International Conference on, 2006, pp. 232-235.

QR CODE