結合稀疏二進制降維方法與遺傳演算法為基礎之模糊K-molds演算法於產品分群之研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	范嘉容 Chia-Jung Fan
論文名稱：	結合稀疏二進制降維方法與遺傳演算法為基礎之模糊K-molds演算法於產品分群之研究 Integration of sparse binary dimensional reduction method and genetic algorithm-based fuzzy K-modes algorithm for product clustering
指導教授：	郭人介 Ren-Jieh Kuo
口試委員:	羅士哲 Shih-Che Lo 許嘉裕 Chia-Yu Hsu
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理系 Department of Industrial Management
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	76
中文關鍵詞：	分群分析、稀疏二進制資料、二進制降維方法、模糊分群、模糊K眾數、基因演算法
外文關鍵詞：	Clustering analysis, Sparse binary data, Binary dimension reduction, Fuzzy clustering, Fuzzy K-modes algorithm, Genetic algorithm
相關次數：	點閱：220 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著科技在現今社會應用的普及，大量的數據可以被快速又準確的收集並紀錄，使得日常生活中充斥著各式各樣的數據，而如何妥善地對這些數據進行妥善的處理與分析，並從中得到有價值的訊息成為了一項備受矚目的議題。在資料探勘的領域內，分群分析是一項非常重要的技術，不同的數據型態可以用不同的方法著手分析，但二進制數據的分群方法明顯的少於類別型和數值型資料的分群方法。本研究針對二進制高維稀疏之資料進行分群，提出稀疏二進制降維方法(BDR)結合模糊K-眾數演算法(FKM)來改善分群結果，再進一步以遺傳演算法(GA)為基礎之模糊K-眾數與稀疏二進制降維方法的應用優化分群的表現。
在本研究中的實驗使用了九組高維稀疏的二進制資料作為標竿資料集，分別針對提出的演算法進行分析並使用準確率(accuracy)、召回率(recall)、精準度(precision)及F1-分數(F1-score)四種評比指標以及兩種適應度評比方法與提出的方法BDR-GAFKM和現有的分群方法進行比較，而根據實驗結果可以證實，BDR-GAFKM在各個指標中都有較優異的表現。此外除了標竿資料集。此外，本研究亦進行個案分析，分析結果顯示，本研究提出的方法BDR-GAFKM的適應值皆優於FKM、BDR-FKM和GAFKM的適應值，可證實本研究提出之方法聚類效果優於其他方法。

With the popularization of technology in today's society, a large amount of data can be quickly and accurately collected and recorded, making daily life full of various data, and how to properly process and analyze these data, And getting valuable information from it has become a high-profile issue. In the field of data mining, cluster analysis is a very important technique. Different data types can be analyzed in different ways, but the clustering method of binary data is obviously less than that of categorical and numerical data.
In this study, for clustering binary high-dimensional sparse data, a sparse binary dimensionality reduction method (BDR) combined with a fuzzy K-modes algorithm (FKM) is proposed to improve the clustering results, and further a fuzzy K-modes based on genetic algorithm (GA) is also proposed. Application of sparse binary dimensionality reduction methods is able to optimize clustering performance.
The experiments in this study used nine benchmark data sets with highly sparse binary data and analyzed the proposed algorithm using accuracy, recall, precision, and F1-score. Four evaluation indicators and two fitness evaluation methods are used to compare the proposed method, BDR-GAFKM, with some existing clustering methods. According to the experimental results, it can be confirmed that BDR-GAFKM has better performance than the other algorithms in terms of four indicators.
In addition, a case study is also carried out. The analysis results show that the fitness value of the proposed method, BDR-GAFKM, is better than those of FKM, BDR-FKM and GAFKM, which can confirm the clustering effect of the proposed method is better than the other methods in this study.

摘要    I
ABSTRACT    II
致謝    III
CONTENTS    IV
LIST OF FIGURES    V
LIST OF TABLES    VII
CHAPTER 1 INTRODUCTION    1
1    BACKGROUND AND MOTIVATION    1
2    RESEARCH OBJECTIVES    4
3    RESEARCH SCOPE AND CONSTRAINTS    4
4    THESIS ORGANIZATION    5
CHAPTER 2 LITERATURE REVIEW    6
1    CLUSTER ANALYSIS FOR CATEGORICAL DATA    6
1.1 K-modes algorithm    7
1.2 Fuzzy K-modes algorithm    10
2    DIMENSIONALITY REDUCTION    13
3    GENETIC ALGORITHM    21
CHAPTER 3 METHODOLOGY    24
1    METHODOLOGY FRAMEWORK    24
2    DATA PREPROCESSING    25
3    OBJECTIVE FUNCTION    26
4    BDR-GAFKM ALGORITHM    26
CHAPTER 4 EXPERIMENTAL RESULTS    30
1    DATA COLLECTION    30
2    PARAMETERS SETTING    31
3    PERFORMANCE MEASUREMENT    31
4    EXPERIMENTAL RESULTS    32
4.1 Fitness value    32
4.2 Accuracy    35
4.3 Recall    38
4.4 Precision    41
4.5 F1-score    43
4.6 Summary comparison    45
5    STATISTICAL TESTING    46
CHAPTER 5 CASE STUDY    52
1    BACKGROUND    52
2    DATA COLLECTION    53
3    DATA PREPROCESSING    54
4    NUMBER OF CLUSTERS    54
5    PRODUCT CLUSTERING    56
6    RESULT ANALYSIS    59
CHAPTER 6 CONCLUSIONS AND FUTURE RESEARCH    62
1    CONCLUSIONS    62
2    CONTRIBUTIONS    62
3    FUTURE RESEARCH    63
REFERENCE    64


                                

Alashwal, H., El Halaby, M., Crouse, J. J., Abdalla, A., and Moustafa, A. A. "The application of unsupervised clustering methods to Alzheimer’s disease," Frontiers in Computational Neuroscience, 13(31), 2019.
Baxter, L. K. and Sacks, J. D. "Clustering cities with similar fine particulate matter exposure characteristics based on residential infiltration and in-vehicle commuting factors," Science of the Total Environment, 470: 631-638, 2014.
Bezdek, J. C., Ehrlich R. and Full, W. E., "FCM: The fuzzy c-means clustering algorithm," Computers & Geosciences, 10: 191-203, 1984.
Cao, F., Liang, J., Li, D. and Zhao, X. "A weighting k-modes algorithm for subspace clustering of categorical data," Neurocomputing, 108: 23-30, 2013.
Chen, L., Wang, S., Wang, K. and Zhu, J. "Soft subspace clustering of categorical data with probabilistic distance," Pattern Recognition, 51: 322-332, 2016.
Daie, P. and Li, S. "Hierarchical clustering for structuring supply chain network in case of product variety," Journal of Manufacturing Systems, 38: 77-86, 2016.
Deb, K. and Agrawal, S., "Understanding interactions among genetic algorithm parameters. " Foundations of Genetic Algorithms, 5(5): 265-286, 1999.
Geng, X., Zhan, D. C., & Zhou, Z. H., "Supervised nonlinear dimensionality reduction for visualization and classification." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(6): 1098-1107, 2005.
Hochdoerffer, J., Laule, C., & Lanza, G. "Product variety management using data-mining methods—Reducing planning complexity by applying clustering analysis on product portfolios." 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 593-597, 2017.
Holland, J. H. "Genetic Algorithm", Scientific American, 267(1), 66-72, 1975.
Holland, J. H., "Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence," MIT Press, 1992.
Huang, J. Z. and Ng, M. K., "A fuzzy k-modes algorithm for clustering categorical data," IEEE Transactions on Fuzzy Systems, 7: 446-452, 1999.
Huang, Z. "A fast clustering algorithm to cluster very large categorical data sets in data mining," Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD): 1-8, 1997b.
Huang, Z., "Extensions to the k-means algorithm for clustering large data sets with categorical values," Data Mining and Knowledge Discovery, 2(3): 283-304, 1998
Imperial, J., "The multidimensional scaling (MDS) algorithm for dimensionality reduction," Medium-Data Driven Investor, 2019.
James, B. T., Luczak, B. B. and Girgis, H. Z., "MeShClust: an intelligent tool for clustering DNA sequences," Nucleic acids research, 46(14), e83-e83, 2018.
Jolliffe, I. and Cadima, J., "Principal component analysis: A review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374: 20150202, 2016.
Kakushadze, Z., and Yu, W. "K-means and cluster models for cancer signatures," Biomolecular Detection and Quantification, 13: 7-31, 2017.
Katz, S., "Estimation of probabilities from sparse data for the language model component of a speech recognizer," IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3): 400-401, 1987.
Kaufman, L. and Rousseeuw, P. J., "Finding groups in data. an introduction to cluster analysis," Wiley Series in Probability and Mathematical Statistics, Applied Probability and Statistics, 1990.
Kim, K. "A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures," Journal of Intelligent & Fuzzy Systems, 32(1): 979-990, 2017.
Kline, P., An Easy Guide to Factor Analysis, Routledge, 2014.
Kuo, R. J., & Nguyen, T. P. Q., "Partition-and-merge based fuzzy genetic clustering algorithm for categorical data," Applied Soft Computing, 75, 254-264, 2019.
Kuo, R. J., Chen, C. K., & Keng, S. H., "Application of hybrid metaheuristic with perturbation-based K-nearest neighbors algorithm and densest imputation to collaborative filtering in recommender systems." Information Sciences, 575: 90-115, 2021.
Kuo, R. J., Potti, Y. and Zulvia, F. E. "Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering," Computers & Industrial Engineering, 120: 298-307, 2018.
Kuo, R. J., Zheng, Y. R., & Nguyen, T. P. Q., "Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering." Information Sciences, 557, 1-15, 2021.
Li, H. J., Bu, Z., Wang, Z. and Cao, J. "Dynamical clustering in electronic commerce systems via optimization and leadership expansion," IEEE Transactions on Industrial Informatics, 16(8): 5327-5334, 2019.
Maaten, L. v. d. and Hinton, G. E., "Visualizing data using t-SNE," Journal of Machine Learning Research, 9: 2579-2605, 2008.
MacQueen, J. "Some methods for classification and analysis of multivariate observations," Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 2016.
Mungle, S., Benyoucef, L., Son, Y. J. and Tiwari, M. K. "A fuzzy clustering-based genetic algorithm approach for time–cost–quality trade-off problems: A case study of highway construction project," Engineering Applications of Artificial Intelligence, 26(8): 1953-1966, 2013.
Obaid, H., Dheyab, A. S., and Sabah, S., "The impact of data pre-processing techniques and dimensionality reduction on the accuracy of machine learning," 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON), 279-283, 2019.
Osman, I. H., & Kelly, J. P., "Meta-heuristics: an overview,” Meta-heuristics, 1-21, 1996.
Pal, N. R., Pal, K., Keller, J. M., & Bezdek, J. C. "A possibilistic fuzzy c-means clustering algorithm," IEEE Transactions on Fuzzy Systems, 13(4): 517-530, 2005.
Pratap, R., Kulkarni, R. and Sohony, I., "Efficient dimensionality reduction for sparse binary data," 2018 IEEE International Conference on Big Data (Big Data): 152-157, 2018.
Ralambondrainy, H., "A conceptual version of the K-means algorithm," Pattern Recognition Letters, 16: 1147-1157, 1995.
Sadaaki, M., and Masao, M., "Fuzzy c-means as a regularization and maximum entropy approach," Proceedings of the 7th International Fuzzy Systems Association World Congress (IFSA’97), 2: 86-92, 1997.
Schölkopf, B., Smola, A. and Müller, K., "Nonlinear component analysis as a Kernel Eigenvalue Problem," Neural Computation, 10(5): 1299-1319, 1998.
Sen, P. K. "Gini diversity index, hamming distance and curse of dimensionality," Metron-International Journal of Statistics, 63(3): 329-349, 2005.
Śmieja, M., Nakoneczny, S. and Tabor. J., "Fast entropy clustering of sparse high dimensional binary data," 2016 International Joint Conference on Neural Networks (IJCNN): 2397-2404, 2016.
Tan, P.-N., Steinbach, M. S., Karpatne, A. and Kumar, V., Introduction to Data Mining, Pearson Education, Inc., 2019.
Vandhana, S. and Anuradha, J. "Environmental air pollution clustering using enhanced ensemble clustering methodology," Environmental Science and Pollution Research, 28(30): 40746-40755, 2021.
Wei, D., Jiang, Q., Wei, Y., & Wang, S. "A novel hierarchical clustering algorithm for gene sequences," BMC bioinformatics, 13(1): 1-15, 2012.

全文公開日期 2025/09/19 (校內網路)
全文公開日期 2032/09/19 (校外網路)
全文公開日期 2042/09/19 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文