使用自編碼器和粒子群最佳化演算法為基礎之K-means演算法於時間序列自動分群

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱心漁 Hsin-Yu Chiu
論文名稱：	使用自編碼器和粒子群最佳化演算法為基礎之K-means演算法於時間序列自動分群 Automatic Time Series Clustering Using Autoencoder and Particle Swarm Optimization Algorithm-Based K-means Algorithm
指導教授：	郭人介 Ren-Jieh Kuo 曾世賢 Shih-Hsien Tseng
口試委員:	林希偉王孔政郭人介曾世賢
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理系 Department of Industrial Management
論文出版年：	2023
畢業學年度：	112
語文別：	英文
論文頁數：	73
中文關鍵詞：	自編碼器、自動分群、粒子群優化演算法、K-means演算法
外文關鍵詞：	Autoencoder, Automatic clustering, Particle swarm optimization, K-means algorithm
相關次數：	點閱：71 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，時間序列聚類已成為數據挖掘領域的熱門研究之一，其應用也已廣
泛應用於各個領域，例如金融、健康及環境等，時間序列聚類可以從大量數據中
發現隱藏的模式和趨勢，從而提供更深入的洞察和分析。在聚類的問題中，最重
要的步驟即是決定分群的數量，然而在真實資料集中，往往沒有提供合適的聚類
數。為了解決這個問題，本研究提出一種使用自編碼器和粒子群優化結演算法為
基礎之 K-means演算法於時間序列自動分群方法。此方法共分為三個步驟：特徵
擷取、初始中心點優化和數據分群。首先，在第一步中使用自編碼器來擷取原始
時間序列數據的重要特徵。自編碼器是一種深度學習模型，它可以將高維度數據
映射到低維度空間，同時保留原始數據的重要特徵。接著，使用粒子群優化演算
法來優化 K-means演算法的初始中心點，以提高時間序列自動分群的準確性和效
率。粒子群優化演算法是一種啟發式優化方法，它模仿了鳥群覓食的行為，通過
不斷地搜索最佳解來優化目標函數。最後，再透過 K-means演算法進行時間序列
的聚類， K-means是一種經典的分群演算法，它通過計算數據點與聚類中心的距
離來實現分群操作。
此方法在多個數據集及真實資料集上進行實驗，結果顯示此方法相比其他啟
發式演算法具有更好的聚類結果。此外，本研究也將所提出的方法與特徵擷取前
的結果進行比較，結果中也顯示出使用自編碼器後不僅有更好的聚類結果，也可
大幅度地減少計算時間。綜上所述，本研究提出的基於自編碼器和粒子群優化結
合 K-means演算法的時間序列自動分群方法不僅具有較好的穩定性及分群表現
也可以幫助從時間序列數據中發現隱藏的模式和趨勢。

In recent years, time series clustering has become one of the hot topics in the field of data mining. Its application has been widely used in many fields, such as finance, health care and environment, etc. Time series clustering can help discover hidden patterns from large amounts of data and trends, providing deeper insight and knowledge. In clustering problems, the first step is to decide the number of clusters. However, in real-world data, the appropriate number of clusters is often not provided. To overcome this problem, this study proposes an automatic time series clustering approach based on autoencoder and particle swarm optimization algorithm combined with K-means algorithm. The proposed method employs an autoencoder used to extract the features of the original time series data. Then, the particle swarm optimization (PSO) algorithm is used to optimize the initial centers of K-means algorithm to improve the accuracy and efficiency of time series automatic clustering by determining the suitable number of clusters. Finally, time series clustering is performed through the K-means algorithm.
This method has experimented on multiple datasets and the real dataset. The results show that the proposed method has better clustering performance than another heuristic algorithm. In addition, this study also compares the proposed method with the results before feature extraction. The results also show that the use of autoencoders not only has better clustering results but also greatly reduces computing time. In conclusion, the automatic time series clustering method using autoencoder and PSO-based K-means algorithm proposed in this study not only has good stability and clustering results but also can help discover hidden patterns from time series data and trends.

摘要 i
Abstract ii
致謝 iii
Contents iv
List of Figures vi
List of Tables vii
CHAPTER 1. INTRODUCTION 1
1 Research Background and Motivation 1
2 Research Objectives 2
3 Research Limitations 2
4 Thesis Organization 3
CHAPTER 2. LITERATURE REVIEW 4
1 Time Series Clustering Methods 4
1.1 Raw data-based method 5
1.2 Feature-based method 5
1.3 Model-based method 6
2 Feature Extraction of Time Series Data 7
2.1 Feature extraction 7
2.2 Autoencoder 8
3 Automatic Clustering 9
3.1 Traditional method 10
3.2 Merge-split-based method 10
3.3 Evolutionary computation-based method 12
4 PSO and K-means Algorithm Clustering Methods 16
4.1 Particle swarm optimization (PSO) algorithm 16
4.2 K-means algorithm 17
5 Cluster Validity Index Measurement 18
CHAPTER 3. METHODOLOGY 21
1 Autoencoder Model 23
2 Automatic Clustering using PSO combines K-means Algorithm 25
2.1 The maximum number of clusters 25
2.2 Solution representation 26
2.3 Fitness evaluation 26
2.4 Solution update 26
2.5 K-means algorithm 27
2.6 Performance evaluation 28
3 Pseudocode for the Proposed Method 28
CHAPTER 4. EXPERIMENTAL RESULTS 31
1 Dataset Description 31
2 Parameters Setting 32
3 Experimental Results 37
4 Statistical Test 39
5 Complexity Analysis 41
5.1 Time complexity 42
5.2 Computational time 43
CHAPTER 5. CASE STUDY 45
1 Dataset Description45
2 Case Study Parameters Setting 48
3 Case Study Results 48
4 Statistical Test 49
5 Discussion 50
CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH 55
1 Conclusions 55
2 Contributions 55
3 Future Research 56
REFERENCES 58
                                

Aghabozorgi, S., Shirkhorshidi, A. S., and Wah, T. Y., “Time-series clustering- a decade review,” Information Systems, 53(C), 16-38, 2015.
Agustı´n-Blas, L.E., Salcedo-Sanz, S., Jiménez-Fernández, S., Carro-Calvo, L., Del Ser, J., and Portilla-Figueras, J. A., “A new grouping genetic algorithm for clustering problems,” Expert Systems with Applications, 39(10), 9695-9703, 2012.
Ahmadyfard, A., and Modares, H., “Combining PSO and K-means to enhance data clustering,” In 2008 International Symposium on Telecommunications, Tehran, Iran, August 27-28, 688-691, 2008.
Alelyani, S., Tang, J., and Liu, H., “Feature selection for clustering: A review,” Data Clustering, 29-60, 2018.
Aliniya, Z., and Mirroshandel, S. A., “A novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithm,” Expert Systems with Applications, 117, 243-266, 2019.
Almufti, S. M., Zebari, A. Y., and Omer, H. K.,” A comparative study of particle swarm optimization and genetic algorithm,” Journal of Advanced Computer Science & Technology, 8(2), 40-45, 2019.
Atabay, H. A., Sheikhzadeh, M. J., and Torshizi, M., “A clustering algorithm based on integration of K-means and PSO,” In 2016 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), Bam, Iran, March 9-11, 59-63, 2016.
Ball, G., Hall, L., “ISODATA, A novel method of data analysis and pattern classification,” Menlo Park, CA: Stanford Research Institute., 1965.
Bandyopadhyay, S., and Maulik, U., “Genetic clustering for automatic evolution of clusters and application to image classification,” Pattern Recognition, 35(6), 1197-1208, 2002.
Bandyopadhyay, S., and Maulik, U., “Nonparametric genetic clustering: comparison of validity indices,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 31(1), 120-125, 2001.
Calinski, T., “A dendrite method for cluster analysis,” Communication in Statistics, 3, 1-27, 1974.
Chou, C. H., Su, M. C., and Lai, E., “A new cluster validity measure and its application to image compression,” Pattern Analysis and Applications, 7(2), 205-220, 2004.
Das, S., Abraham, A., and Konar, A., “Spatial information-based image segmentation using a modified particle swarm optimization algorithm,” 6th International Conference on Intelligent Systems Design and Applications, Jian, China, October 16-18, 2, 438-444, 2006.
Dai, W., Jiao, C., and He, T., “Research of K-means clustering method based on parallel genetic algorithm,” 3th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kaohsiung, Taiwan, November 26-28, 2, 158-161, 2007.
Davies, D. L., and Bouldin, D. W., “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, (2), 224-227, 1979.
Dunn, J. C, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters”, 1973.
Eberhart, R., and Kennedy, J., “Particle swarm optimization,” In Proceedings of the IEEE International Conference on Neural Networks, 4, 1942-1948, 1995.
Esmael, B., Arnaout, A., Fruhwirth, R. K., and Thonhauser, G., ‘Improving time series classification using Hidden Markov Models,” 12th International Conference on Hybrid Intelligent Systems (HIS), Pune, India, December 4-7, 502-507, 2012.
Fisher, R. A., “The arrangement of field experiments,” Breakthroughs in Statistics: Methodology and Distribution, 82-91, 1992.
Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody, G. B., Peng, C. K., and Stanley, H. E., “PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals,” Circulation, 101(23), E215–E220, 2000.
Halkidi, M., Batistakis, Y., and Vazirgiannis, M, “On clustering validation techniques,” Journal of Intelligent Information Systems, 17(2), 107-145, 2001.
Hancer, E. and Karaboga, D., “A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number,” Swarm and Evolutionary Computation, 32, 49-67, 2017.
Harford, S., Karim, F., and Darabi, H., “Generating adversarial samples on multivariate time series using variational autoencoders,” IEEE/CAA Journal of Automatica Sinica, 8(9), 1523-1538, 2021.
He, H., Tan, Y., and Liu, X., “Feature extraction of ECG signals in meridian systems using wavelet packet transform and clustering algorithms,” 10th IEEE International Conference on Networking, Sensing and Control (ICNSC), Evry, France, April 10-12, 183-187, 2013.
Holland, J.H., “Adaptation in natural and artificial systems,” 1975.
Hruschka, E. R., and Ebecken, N. F., “A genetic algorithm for cluster analysis,” Intelligent Data Analysis, 7(1), 15-25, 2003.
Hu, X., and Xu, L., “A comparative study of several cluster number selection criteria,” International Conference on Intelligent Data Engineering and Automated Learning, Hong Kong, China, March 21-23, 195-202, 2003.
Huang, X., Ye, Y., Xiong, L., Lau, R. Y., Jiang, N., and Wang, S., “Time series K-means: A new K-means type smooth subspace clustering for time series data,” Information Sciences, 367, 1-13, 2016.
Jahandideh-Tehrani, M., Jenkins, G., and Helfer, F., “A comparison of particle swarm optimization and genetic algorithm for daily rainfall-runoff modelling: a case study for Southeast Queensland, Australia,” Optimization and Engineering, 22, 29-50, 2021.
Just, N., Song, C., Haffner, E. G., and Gärtler, M., “Recognizing Phases in Batch Production via Interactive Feature Extraction,” In 2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI), Singapore, Dec. 9-11, 203-209, 2022.
Jha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Patel, O. P., Harshith, N., and Solasa, S. L., “A Novel Scalable Feature Extraction Approach for COVID-19 Protein Sequences and their Cluster Analysis with Kernelized Fuzzy Algorithm,” 2022 IEEE International Conference on Big Data and Smart Computing (BigComp), Bangkok, Thailand, January 17-20, 56-59, 2022.
Johnpaul, C. I., Prasad, M. V., Nickolas, S., and Gangadharan, G. R., “Trendlets: A novel probabilistic representational structures for clustering the time series data,” Expert Systems with Applications, 145, 113119, 2020.
Kashan, A. H., Rezaee, B., and Karimiyan, S., “An efficient approach for unsupervised fuzzy clustering based on grouping evolution strategies,” Pattern Recognition, 46(5), 1240-1254, 2013.
Keogh, E. J., Wei, L. D., Xi, X. and Lonardi, S., “Intelligent icons: Integrating lite-weight data mining and visualization into GUI operating systems,” Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), NW Washington, United States, December 18-22, 2006.
Košmelj, K. and Batagelj, V., “Cross-sectional approach for clustering time-varying data,” Journal of Classification, 7, 99-109, 1990.
Kumari, C. U., Prasad, S. J., and Mounika, G., “Leaf disease detection: feature extraction with K-means clustering and classification with ANN,” 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, March 27-29, 1095-1098, 2019.
Kuo, R. J., Syu, Y. J., Chen, Z. Y., and Tien, F. C., “Integration of particle swarm optimization and genetic algorithm for dynamic clustering,” Information Sciences, 195, 124-140, 2012.
Kuo, R., and Zulvia, F., “Automatic clustering using an improved particle swarm optimization,” Journal of Industrial and Intelligent Information, 1(1), 2013.
Kuo, R. J., and Zulvia, F. E., “Automatic clustering using an improved artificial bee colony optimization for customer segmentation,” Knowledge and Information Systems, 57(2), 331-357, 2018.
Lai, C. C., “A novel clustering approach using hierarchical genetic algorithms,” Intelligent Automation and Soft Computing, 11(3), 143-153, 2005.
Li, T., Wu, X., and Zhang, J., “Time series clustering model based on DTW for classifying car parks,” Algorithms, 13(3), 57, 2020.
Liao, T.W., “Clustering of time series data- a survey,” Pattern Recognition, 38, 1857-1874, 2005.
Liao, T.W., Bolt, B., Forester, J., Hailman, E., Hansen, C., Kaste, R., and O’May, J., “Understanding and projecting the battle state,” Proceedings of 23rd Army Science Conference, Orlando, FL, USA, Dec. 2-5, 25, 2002.
Lin, Y.F., Chen, H.H., Tseng, V.S., and Pei, J., “Reliable early classification on multivariate time series with numerical and categorical attributes,” Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh, Vietnam, May. 19-22,199-211, 2015.
Liu, Y., Wu, X., and Shen, Y., “Automatic clustering using genetic algorithms,” Applied Mathematics and Computation, 218(4), 1267-1279, 2011.
MacQueen, J., “Classification and analysis of multivariate observations,” In 5th Berkeley Symp. Math. Statist. Probability, Los Angeles, USA, June 21-July 18 281-297, 1967.
Martins, A., Lagarto, J., and Cardoso, M. G., “Electricity market price analysis using time series clustering,” 16th IEEE International Conference on the European Energy Market (EEM), Ljubljana, Slovenia, Sep. 18-20, 1-6, 2019.
McClelland, J. L., Rumelhart, D. E., and PDP Research Group., Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Psychological and Biological Models, MIT Press, 2, 1987.
Moh'd Alia, O., Mandava, R., Ramachandram, D., and Aziz, M. E., “Dynamic fuzzy clustering using harmony search with application to image segmentation,” IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Ajman, United Arab Emirates, December. 14-17, 538-543, 2009.
Omran, M. G., Salman, A., and Engelbrecht, A. P., “Dynamic clustering using particle swarm optimization with application in image segmentation,” Pattern Analysis and Applications, 8(4), 332-344, 2006.
Ouadfel, S., Batouche, M., and Taleb-Ahmed, A., “A modified particle swarm optimization algorithm for automatic image clustering,” In International Symposium on Modelling and Implementation of Complex Systems (MISC), Constantine, Algeria, May 30-31, 2010.
Ozturk, C., Hancer, E., and Karaboga, D., “Dynamic clustering with improved binary artificial bee colony algorithm,” Applied Soft Computing, 28, 69-80, 2015.
Pan, S. M., and Cheng, K. S., “Evolution-based tabu search approach to automatic clustering,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37(5), 827-838, 2007.
Paparrizos, J., and Gravano, L., “Fast and accurate time-series clustering,” ACM Transactions on Database Systems (TODS), 42(2), 1-49, 2017.
Pelleg, D., and Moore, A. W., “X-means: Extending K-means with efficient estimation of the number of clusters,” ICML, Stanford, USA, June 29-July 2, 727-734, 2000.
Rahman, M. A., Hossain, M. F., Hossain, M., and Ahmmed, R., “Employing PCA and t-statistical approach for feature extraction and classification of emotion from multichannel EEG signal,” Egyptian Informatics Journal, 21(1), 23-35, 2020.
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., and Keogh, E., “Searching and mining trillions of time series subsequences under dynamic time warping,” In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, United States, August 12-16, 262-270, 2012.
Rousseeuw, P. J., “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, 20, 53-65, 1987.
Sakoe, H., and Chiba, S., “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49, 1978.
Shabir, S., and Singla, R., “A comparative study of genetic algorithm and the particle swarm optimization,” International Journal of electrical engineering, 9(2), 215-223, 2016.
Sheng, W., Swift, S., Zhang, L., and Liu, X., “A weighted sum validity function for clustering with a hybrid niching genetic algorithm,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(6), 1156-1167, 2005.
Tou, J. T., “DYNOC—A dynamic optimal cluster-seeking technique,” International Journal of Computer and Information Sciences, 8(6), 541-547, 1979.
Tukey, J. W., Exploratory data analysis, 2, 131-160, 1977.
Turi, R. H., “Clustering-based colour image segmentation,” Doctoral Dissertation, Monash University, 2001.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P. A., and Bottou, L., “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, 11(12), 2010.
Xiong, Y. and Yeung, D.Y., “Mixtures of ARMA models for model-based time series clustering,” Proceedings of the IEEE International Conference on Data Mining, Maebashi City, Japan, Dec. 9-12, 717-720, 2002.
Yu, H., “Network complexity analysis of multilayer feedforward artificial neural networks,” Applications of Neural Networks in High Assurance Systems, 41-55, 2010.
Žalik, K. R., and Žalik, B., “Validity index for clusters of different sizes and densities,” Pattern Recognition Letters, 32(2), 221-234, 2011.
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., and Saeed, J.,” A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction,” Journal of Applied Science and Technology Trends, 1(2), 56-70, 2020.
Zhang, D., Ji, M., Yang, J., Zhang, Y., and Xie, F., “A novel cluster validity index for fuzzy clustering based on bipartite modularity,” Fuzzy Sets and Systems, 253, 122-137, 2014.
Zhang, G., Liu, Y., and Jin, X., “A survey of autoencoder-based recommender systems,” Frontiers of Computer Science, 14, 430-450, 2020.
Zhang, Z., Li, D., Zhang, Z., and Duffield, N.,” A time-series clustering algorithm for analyzing the changes of mobility pattern caused by COVID-19,” Proceedings of the 1st ACM SIGSPATIAL International Workshop on Animal Movement Ecology and Human Mobility, New York, United States, November 2, 13-17, 2021.

全文公開日期 2029/03/08 (校外網路)
全文公開日期 2029/03/08 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文