簡易檢索 / 詳目顯示

研究生: 鄭俞榮
Yu-Rong Cheng
論文名稱: 萬用演算法為基礎之可能性模糊k-眾數演算法於類別型資料分群之研究
Metaheuristic-Based Possibilistic Fuzzy k-modes Algorithms for Categorical Data Clustering
指導教授: 郭人介
Ren-Jieh Kuo
口試委員: 歐陽超
Chao Ou-Yang
王孔政
Kung-Jeng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 87
中文關鍵詞: 分群分析類別型資料萬用演算法基因演算法粒子群演算法正弦餘弦演算法模糊分群模糊k眾數
外文關鍵詞: Clustering analysis, Categorical data, Metaheuristic, Genetic algorithm, Particle swarm optimization algorithm, Sine cosine algorithm, Fuzzy clustering, Fuzzy k-modes algorithm
相關次數: 點閱:283下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現今智慧型裝置及科技應用普及的世代,大量的數據以既便利又快速的方式被記錄、收集,因此,如何對數據進行處理與分析,並從中得到有價值的訊息,是現今非常受重視的議題。在資料探勘領域,分群分析是一項非常重要的技術,然而針對不同的數據型態應選擇合適的方法著手分析。
    本研究針對類別型資料之分群,探討以可能性概念 (possibilistic concept) 結合模糊k-眾數演算法 (Fuzzy k-modes, FKM),提出一可能性模糊k-眾數演算法(Possibilistic Fuzzy k-modes, PFKM) 以降低資料集中離群值的干擾進而改善分群結果,並進一步應用三種萬用演算法優化分群表現,分別為基因演算法、粒子群演算法和正弦餘弦演算法,進而提出三種分群演算法,分別為基因演算法為基礎之可能性模糊k-眾數演算法 (GA-PFKM)、粒子群演算法為基礎之可能性模糊k-眾數演算法 (PSO-PFKM) 和以正弦餘弦演算法為基礎之可能性模糊k-眾數演算法 (SCA-PFKM)。
    本研究之實驗使用八組類別型資料集,分別針對所提出之演算法進行分群分析並使用Sum-of-square-error (SSE) 及準確率兩種評比指標與模糊k-眾數 (FKM)演算法進行比較。根據實驗結果證實,PSO-PFKM與SCA-PFKM兩種演算法在大多數的資料集中能獲得較其他演算法之優異表現。此外,本研究針對Breast cancer資料集的分群結果進一步進行個案研究分析,分析結果顯示,當normal nucleoli、bare nuclei 及clump thickness三種特徵值的類別數值較高,罹患乳癌的風險機率也越高。


    Recently, smart devices and technology applications are applied widely in many fields. An enormous amount of information is recorded and collected rapidly. Thus, the process to analyze and obtain valuable information from the data becomes a very crucial issue. Clustering analysis plays an important role to solve the aforementioned issue. However, facing with the different types of data, the appropriate approach should be chosen to handle the data.
    This study focuses on categorical data. A possibilistic fuzzy k-modes (PFKM) algorithm is proposed by combining the possibilistic concept with fuzzy k-modes (FKM) algorithm in order to alleviate the effects of outlier points and improve the clustering result. In addition, this study also implements three metaheuristics, namely genetic algorithm (GA), particle swarm optimization (PSO), and sine-cosine algorithm (SCA) in order to enhance the clustering performance. Therefore, three clustering algorithms are proposed in this study, named GA-PFKM, PSO-PFKM, and SCA-PFKM algorithms.
    The proposed algorithms are utilized to perform a cluster analysis for eight categorical datasets. The performance of the algorithms is compared with the classical FKM algorithm using two indexes, namely sum-of-squared error (SSE) and accuracy. The experimental results indicate that PSO-PFKM and SCA-PFKM algorithms obtain the better performance for most of the datasets. Furthermore, this study analyzes the clustering result for breast cancer dataset more detailed. The analysis reveals that people with a higher range of normal nucleoli, bare nuclei, and clump thickness have a higher risk of breast cancer.

    摘要.............I ABSTRACT.............II 致謝.............III CONTENTS.............IV LIST OF TABLES.............VI LIST OF FIGURES.............VII CHAPTER 1 INTRODUCTION.............1 1.1 Research Background.............1 1.2 Research Objectives.............2 1.3 Research Scope and Constrains.............3 1.4 Research Organization.............3 CHAPTER 2 LITERATURE REVIEW.............5 2.1 Categorical Clustering Analysis.............5 2.2 Fuzzy k-modes Algorithm.............5 2.3 Possibilistic Fuzzy c-means Algorithm.............9 2.4 Metaheuristic.............12 2.4.1 Genetic Algorithm.............12 2.4.2 Particle Swarm Optimization Algorithm.............13 2.4.3 Sine Cosine Algorithm.............15 CHAPTER 3 RESEARCH METHODOLOGY.............17 3.1 Methodology Framework.............17 3.2 Data preprocessing.............18 3.3 Objective Function.............18 3.4 Possibilistic Fuzzy k-modes Algorithm (PFKM).............19 3.5 Metaheuristic-based PFKM Algorithm.............23 3.5.1 GA-PFKM Algorithm.............24 3.5.2 PSO-PFKM Algorithm.............27 3.5.3 SCA-PFKM Algorithm.............29 CHAPTER 4 EXPERIMENTAL RESULTS.............31 4.1 Datasets Collection.............31 4.2 Parameters Setting.............32 4.3 Performance Measurement.............32 4.4 Experimental Result and Statistical Hypothesis Testing.............33 4.4.1 Sum of Square Error (SSE).............33 4.4.2 Accuracy (AC).............40 4.4.3 Computational Time and Convergence History.............45 CHAPTER 5 CASE STUDY.............46 5.1 Background.............46 5.2 Research Framework.............47 5.3 Data Collection.............47 5.4 Clustering and Analysis.............48 5.5 Analysis Results.............49 CHAPTER 6 CONCLUSIONS AND FUTURE RESEARCH.............51 6.1 Conclusions.............51 6.2 Contributions.............51 6.3 Future Research.............52 REFERENCES.............53 APPENDIX.............55

    Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.
    Bezdek, J. C., Boggavarapu, S., Hall, L. O., & Bensaid, A. (1994). Genetic algorithm guided clustering. Paper presented at the Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on.
    Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
    Bouguessa, M. (2015). Clustering categorical data in projected spaces. Data mining and knowledge discovery, 29(1), 3-38.
    Chen YP et al., 2017, https://medcraveonline.com/MOJWH/MOJWH-06-00153.pdf, (online accessed:2019)
    Chen, C.-Y., & Ye, F. (2012). Particle swarm optimization algorithm and its application to clustering analysis. Paper presented at the Electrical Power Distribution Networks (EPDC), 2012 Proceedings of 17th Conference on.
    Diday, E., Govaert, G., Lechevallier, Y., & Sidi, J. (1981). Clustering in Pattern Recognition, Dordrecht.
    Djenouri, Y., Belhadi, A., & Belkebir, R. (2018). Bees swarm optimization guided by data mining techniques for document information retrieval. Expert Systems with Applications, 94, 126-136. doi:https://doi.org/10.1016/j.eswa.2017.10.042
    Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. Paper presented at the Micro Machine and Human Science, 1995. MHS'95., Proceedings of the Sixth International Symposium on.
    Farhang, Y. (2017). Face Extraction from Image based on K-Means Clustering Algorithms. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 8(9), 96-107.
    Garces, E., Munoz, A., Lopez‐Moreno, J., & Gutierrez, D. (2012). Intrinsic images by clustering. Paper presented at the Computer graphics forum.
    Holland, J. H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence: University of Michigan Press Ann Arbor.
    Horn, D., & Gottlieb, A. (2001). Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Physical review letters, 88(1), 018702.
    Huang, Z. (1997). A fast clustering algorithm to cluster very large categorical data sets in data mining. DMKD, 3(8), 34-39.
    Jain, A., Murty, M., & Flynn, P. (1999). Data Clustering: A Review ACM Computing surveys, vol. 31. Google Scholar, 264-318.
    Jia, H., Cheung, Y.-m., & Liu, J. (2016). A new distance metric for unsupervised learning of categorical data. IEEE transactions on neural networks and learning systems, 27(5), 1065-1079.
    Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254.
    Kachitvichyanukul, V. (2009). A particle swarm optimization for the vehicle routing problem with simultaneous pickup and delivery. Computers & Operations Research, 36(5), 1693-1702.
    MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
    Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455-1465.
    Michielssen, E., Ranjithan, S., & Mittra, R. (1992). Optimal multilayer filter design using real coded genetic algorithms. IEE Proceedings J-Optoelectronics, 139(6), 413-420.
    Mirjalili, S. (2016). SCA: a sine cosine algorithm for solving optimization problems. Knowledge-Based Systems, 96, 120-133.
    Murthy, C. A., & Chowdhury, N. (1996). In search of optimal clusters using genetic algorithms.
    Nicholls, T., & Bright, J. (2018). Understanding news story chains using information retrieval and network clustering techniques. arXiv preprint arXiv:1801.07988.
    Osman, I. H., & Kelly, J. P. (1996). Meta-heuristics: an overview Meta-heuristics (pp. 1-21): Springer.
    Pal, N. R., Pal, K., Keller, J. M., & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE transactions on fuzzy systems, 13(4), 517-530.
    Rokach, L., & Maimon, O. (2005). Clustering methods Data mining and knowledge discovery handbook (pp. 321-352): Springer.
    Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining (M. Goldstein Ed.). United States of America: Pearson education,Inc.
    Wang, K.-P., Huang, L., Zhou, C.-G., & Pang, W. (2003). Particle swarm optimization for traveling salesman problem. Paper presented at the Machine Learning and Cybernetics, 2003 International Conference on.
    Zhexue, H., & Ng, M. K. (1999). A fuzzy k-modes algorithm for clustering categorical data. Fuzzy Systems, IEEE Transactions on, 7(4), 446-452.

    無法下載圖示 全文公開日期 2024/06/19 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE