簡易檢索 / 詳目顯示

研究生: 范嘉容
Chia-Jung Fan
論文名稱: 結合稀疏二進制降維方法與遺傳演算法為基礎之模糊K-molds演算法於產品分群之研究
Integration of sparse binary dimensional reduction method and genetic algorithm-based fuzzy K-modes algorithm for product clustering
指導教授: 郭人介
Ren-Jieh Kuo
口試委員: 羅士哲
Shih-Che Lo
許嘉裕
Chia-Yu Hsu
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 76
中文關鍵詞: 分群分析稀疏二進制資料二進制降維方法模糊分群模糊K眾數基因演算法
外文關鍵詞: Clustering analysis, Sparse binary data, Binary dimension reduction, Fuzzy clustering, Fuzzy K-modes algorithm, Genetic algorithm
相關次數: 點閱:220下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技在現今社會應用的普及,大量的數據可以被快速又準確的收集並紀錄,使得日常生活中充斥著各式各樣的數據,而如何妥善地對這些數據進行妥善的處理與分析,並從中得到有價值的訊息成為了一項備受矚目的議題。在資料探勘的領域內,分群分析是一項非常重要的技術,不同的數據型態可以用不同的方法著手分析,但二進制數據的分群方法明顯的少於類別型和數值型資料的分群方法。本研究針對二進制高維稀疏之資料進行分群,提出稀疏二進制降維方法(BDR)結合模糊K-眾數演算法(FKM)來改善分群結果,再進一步以遺傳演算法(GA)為基礎之模糊K-眾數與稀疏二進制降維方法的應用優化分群的表現。
    在本研究中的實驗使用了九組高維稀疏的二進制資料作為標竿資料集,分別針對提出的演算法進行分析並使用準確率(accuracy)、召回率(recall)、精準度(precision)及F1-分數(F1-score)四種評比指標以及兩種適應度評比方法與提出的方法BDR-GAFKM和現有的分群方法進行比較,而根據實驗結果可以證實,BDR-GAFKM在各個指標中都有較優異的表現。此外除了標竿資料集。此外,本研究亦進行個案分析,分析結果顯示,本研究提出的方法BDR-GAFKM的適應值皆優於FKM、BDR-FKM和GAFKM的適應值,可證實本研究提出之方法聚類效果優於其他方法。


    With the popularization of technology in today's society, a large amount of data can be quickly and accurately collected and recorded, making daily life full of various data, and how to properly process and analyze these data, And getting valuable information from it has become a high-profile issue. In the field of data mining, cluster analysis is a very important technique. Different data types can be analyzed in different ways, but the clustering method of binary data is obviously less than that of categorical and numerical data.
    In this study, for clustering binary high-dimensional sparse data, a sparse binary dimensionality reduction method (BDR) combined with a fuzzy K-modes algorithm (FKM) is proposed to improve the clustering results, and further a fuzzy K-modes based on genetic algorithm (GA) is also proposed. Application of sparse binary dimensionality reduction methods is able to optimize clustering performance.
    The experiments in this study used nine benchmark data sets with highly sparse binary data and analyzed the proposed algorithm using accuracy, recall, precision, and F1-score. Four evaluation indicators and two fitness evaluation methods are used to compare the proposed method, BDR-GAFKM, with some existing clustering methods. According to the experimental results, it can be confirmed that BDR-GAFKM has better performance than the other algorithms in terms of four indicators.
    In addition, a case study is also carried out. The analysis results show that the fitness value of the proposed method, BDR-GAFKM, is better than those of FKM, BDR-FKM and GAFKM, which can confirm the clustering effect of the proposed method is better than the other methods in this study.

    摘要 I ABSTRACT II 致謝 III CONTENTS IV LIST OF FIGURES V LIST OF TABLES VII CHAPTER 1 INTRODUCTION 1 1.1 BACKGROUND AND MOTIVATION 1 1.2 RESEARCH OBJECTIVES 4 1.3 RESEARCH SCOPE AND CONSTRAINTS 4 1.4 THESIS ORGANIZATION 5 CHAPTER 2 LITERATURE REVIEW 6 2.1 CLUSTER ANALYSIS FOR CATEGORICAL DATA 6 2.1.1 K-modes algorithm 7 2.1.2 Fuzzy K-modes algorithm 10 2.2 DIMENSIONALITY REDUCTION 13 2.3 GENETIC ALGORITHM 21 CHAPTER 3 METHODOLOGY 24 3.1 METHODOLOGY FRAMEWORK 24 3.2 DATA PREPROCESSING 25 3.3 OBJECTIVE FUNCTION 26 3.4 BDR-GAFKM ALGORITHM 26 CHAPTER 4 EXPERIMENTAL RESULTS 30 4.1 DATA COLLECTION 30 4.2 PARAMETERS SETTING 31 4.3 PERFORMANCE MEASUREMENT 31 4.4 EXPERIMENTAL RESULTS 32 4.4.1 Fitness value 32 4.4.2 Accuracy 35 4.4.3 Recall 38 4.4.4 Precision 41 4.4.5 F1-score 43 4.4.6 Summary comparison 45 4.5 STATISTICAL TESTING 46 CHAPTER 5 CASE STUDY 52 5.1 BACKGROUND 52 5.2 DATA COLLECTION 53 5.3 DATA PREPROCESSING 54 5.4 NUMBER OF CLUSTERS 54 5.5 PRODUCT CLUSTERING 56 5.6 RESULT ANALYSIS 59 CHAPTER 6 CONCLUSIONS AND FUTURE RESEARCH 62 6.1 CONCLUSIONS 62 6.2 CONTRIBUTIONS 62 6.3 FUTURE RESEARCH 63 REFERENCE 64

    Alashwal, H., El Halaby, M., Crouse, J. J., Abdalla, A., and Moustafa, A. A. "The application of unsupervised clustering methods to Alzheimer’s disease," Frontiers in Computational Neuroscience, 13(31), 2019.
    Baxter, L. K. and Sacks, J. D. "Clustering cities with similar fine particulate matter exposure characteristics based on residential infiltration and in-vehicle commuting factors," Science of the Total Environment, 470: 631-638, 2014.
    Bezdek, J. C., Ehrlich R. and Full, W. E., "FCM: The fuzzy c-means clustering algorithm," Computers & Geosciences, 10: 191-203, 1984.
    Cao, F., Liang, J., Li, D. and Zhao, X. "A weighting k-modes algorithm for subspace clustering of categorical data," Neurocomputing, 108: 23-30, 2013.
    Chen, L., Wang, S., Wang, K. and Zhu, J. "Soft subspace clustering of categorical data with probabilistic distance," Pattern Recognition, 51: 322-332, 2016.
    Daie, P. and Li, S. "Hierarchical clustering for structuring supply chain network in case of product variety," Journal of Manufacturing Systems, 38: 77-86, 2016.
    Deb, K. and Agrawal, S., "Understanding interactions among genetic algorithm parameters. " Foundations of Genetic Algorithms, 5(5): 265-286, 1999.
    Geng, X., Zhan, D. C., & Zhou, Z. H., "Supervised nonlinear dimensionality reduction for visualization and classification." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(6): 1098-1107, 2005.
    Hochdoerffer, J., Laule, C., & Lanza, G. "Product variety management using data-mining methods—Reducing planning complexity by applying clustering analysis on product portfolios." 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 593-597, 2017.
    Holland, J. H. "Genetic Algorithm", Scientific American, 267(1), 66-72, 1975.
    Holland, J. H., "Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence," MIT Press, 1992.
    Huang, J. Z. and Ng, M. K., "A fuzzy k-modes algorithm for clustering categorical data," IEEE Transactions on Fuzzy Systems, 7: 446-452, 1999.
    Huang, Z. "A fast clustering algorithm to cluster very large categorical data sets in data mining," Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD): 1-8, 1997b.
    Huang, Z., "Extensions to the k-means algorithm for clustering large data sets with categorical values," Data Mining and Knowledge Discovery, 2(3): 283-304, 1998
    Imperial, J., "The multidimensional scaling (MDS) algorithm for dimensionality reduction," Medium-Data Driven Investor, 2019.
    James, B. T., Luczak, B. B. and Girgis, H. Z., "MeShClust: an intelligent tool for clustering DNA sequences," Nucleic acids research, 46(14), e83-e83, 2018.
    Jolliffe, I. and Cadima, J., "Principal component analysis: A review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374: 20150202, 2016.
    Kakushadze, Z., and Yu, W. "K-means and cluster models for cancer signatures," Biomolecular Detection and Quantification, 13: 7-31, 2017.
    Katz, S., "Estimation of probabilities from sparse data for the language model component of a speech recognizer," IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3): 400-401, 1987.
    Kaufman, L. and Rousseeuw, P. J., "Finding groups in data. an introduction to cluster analysis," Wiley Series in Probability and Mathematical Statistics, Applied Probability and Statistics, 1990.
    Kim, K. "A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures," Journal of Intelligent & Fuzzy Systems, 32(1): 979-990, 2017.
    Kline, P., An Easy Guide to Factor Analysis, Routledge, 2014.
    Kuo, R. J., & Nguyen, T. P. Q., "Partition-and-merge based fuzzy genetic clustering algorithm for categorical data," Applied Soft Computing, 75, 254-264, 2019.
    Kuo, R. J., Chen, C. K., & Keng, S. H., "Application of hybrid metaheuristic with perturbation-based K-nearest neighbors algorithm and densest imputation to collaborative filtering in recommender systems." Information Sciences, 575: 90-115, 2021.
    Kuo, R. J., Potti, Y. and Zulvia, F. E. "Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering," Computers & Industrial Engineering, 120: 298-307, 2018.
    Kuo, R. J., Zheng, Y. R., & Nguyen, T. P. Q., "Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering." Information Sciences, 557, 1-15, 2021.
    Li, H. J., Bu, Z., Wang, Z. and Cao, J. "Dynamical clustering in electronic commerce systems via optimization and leadership expansion," IEEE Transactions on Industrial Informatics, 16(8): 5327-5334, 2019.
    Maaten, L. v. d. and Hinton, G. E., "Visualizing data using t-SNE," Journal of Machine Learning Research, 9: 2579-2605, 2008.
    MacQueen, J. "Some methods for classification and analysis of multivariate observations," Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 2016.
    Mungle, S., Benyoucef, L., Son, Y. J. and Tiwari, M. K. "A fuzzy clustering-based genetic algorithm approach for time–cost–quality trade-off problems: A case study of highway construction project," Engineering Applications of Artificial Intelligence, 26(8): 1953-1966, 2013.
    Obaid, H., Dheyab, A. S., and Sabah, S., "The impact of data pre-processing techniques and dimensionality reduction on the accuracy of machine learning," 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON), 279-283, 2019.
    Osman, I. H., & Kelly, J. P., "Meta-heuristics: an overview,” Meta-heuristics, 1-21, 1996.
    Pal, N. R., Pal, K., Keller, J. M., & Bezdek, J. C. "A possibilistic fuzzy c-means clustering algorithm," IEEE Transactions on Fuzzy Systems, 13(4): 517-530, 2005.
    Pratap, R., Kulkarni, R. and Sohony, I., "Efficient dimensionality reduction for sparse binary data," 2018 IEEE International Conference on Big Data (Big Data): 152-157, 2018.
    Ralambondrainy, H., "A conceptual version of the K-means algorithm," Pattern Recognition Letters, 16: 1147-1157, 1995.
    Sadaaki, M., and Masao, M., "Fuzzy c-means as a regularization and maximum entropy approach," Proceedings of the 7th International Fuzzy Systems Association World Congress (IFSA’97), 2: 86-92, 1997.
    Schölkopf, B., Smola, A. and Müller, K., "Nonlinear component analysis as a Kernel Eigenvalue Problem," Neural Computation, 10(5): 1299-1319, 1998.
    Sen, P. K. "Gini diversity index, hamming distance and curse of dimensionality," Metron-International Journal of Statistics, 63(3): 329-349, 2005.
    Śmieja, M., Nakoneczny, S. and Tabor. J., "Fast entropy clustering of sparse high dimensional binary data," 2016 International Joint Conference on Neural Networks (IJCNN): 2397-2404, 2016.
    Tan, P.-N., Steinbach, M. S., Karpatne, A. and Kumar, V., Introduction to Data Mining, Pearson Education, Inc., 2019.
    Vandhana, S. and Anuradha, J. "Environmental air pollution clustering using enhanced ensemble clustering methodology," Environmental Science and Pollution Research, 28(30): 40746-40755, 2021.
    Wei, D., Jiang, Q., Wei, Y., & Wang, S. "A novel hierarchical clustering algorithm for gene sequences," BMC bioinformatics, 13(1): 1-15, 2012.

    無法下載圖示 全文公開日期 2025/09/19 (校內網路)
    全文公開日期 2032/09/19 (校外網路)
    全文公開日期 2042/09/19 (國家圖書館:臺灣博碩士論文系統)
    QR CODE