研究生: |
詹雨時 Yu-Shih Chan |
---|---|
論文名稱: |
用於分群集成的全域投票系統 A Global Voting System for Clustering Ensemble |
指導教授: |
戴碧如
Bi-Ru Dai |
口試委員: |
戴志華
Chih-Hua Tai 帥宏翰 Hong-Han Shuai 陳怡伶 Yi-Ling Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 41 |
中文關鍵詞: | 分群 、共識分群 、分群集成 、片段 |
外文關鍵詞: | Clustering, Consensus clustering, Clustering ensemble, Fragment |
相關次數: | 點閱:348 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分群是根據資料點之間的相似性將整個資料集劃分為一個分群結果的過程。現今已經有許多分群的方法被提出,但由於這些方法各別的性質,不同的方法通常會產生不同的分群結果。為了將不同的分群結果結合成一個更好且更穩定的分群結果,分群集成因此被提出。然而,有些分群集成方法要求使用者事先指定參數,同時分群的結果通常受到這些參數設置的嚴重影響。在這篇論文中,我們提出了一個聚類集成方法稱為全局投票分群集成演算法,其中不需要任何手動參數,包括聚類的數量。在此方法包含三個階段,在創建階段,將提取片段以進一步建立適當的候選群。在投票階段,投票過程被設計來使候選群演變成最終候選群。最後,在指定階段,確保每個資料點僅分配給一個候選群。經過各種真實資料集的實驗結果能夠證實,我們提出的方法的確能夠擁有更佳的精準度。
Clustering is the process of partitioning a dataset into clusters based on the similar-ity between data points. Many clustering methods have been proposed, but different methods usually generate different clustering results because of various properties of these methods. To combine different clustering results into a better and more robust clustering result, clustering ensemble has been proposed. However, some methods re-quire users to specify parameters, and results are generally affected by settings of pa-rameters severely. In this work, a novel clustering ensemble method called Global Voting Clustering Ensemble (GVCE) is proposed, where no parameter is required, in-cluding the number of clusters. There are three satges in GVCE, in the creating stage, fragments will be extracted to further create appropriate candidate clusters. Then, in the voting stage, a voting process is designed to allow candidate clusters to evolve into final candidate clusters containing all data points. Finally, an assigning stage is devised to make sure that each data point is assigned to only one cluster. Experimental results on a variety of real-world datasets verified that our proposed method is better than state-of-the-art clustering ensemble methods.
1. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems, pp. 849-56 (2002)
2. Frey, B., Dueck, D.: Clustering by Passing Messages Between Data Points. In: Science. 315, pp. 972-976 (2007). doi:10.1126/science.1136800
3. Wu, J., Xiong, H., Chen, J.: Adapting the right measures for K-means clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 09. (2009). doi:10.1145/1557019.1557115
4. Wang, L., Leckie, C., Kotagiri, R., Bezdek, J.: Approximate pairwise clustering for large data sets via sampling plus extension. In: Pattern Recognition. 44, pp. 222–235 (2011). doi:10.1016/j.patcog.2010.08.005
5. Liu, H., Wu, J., Tao, D., Zhang, Y., Fu, Y.: DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering. In: Proceedings of the 2015 SIAM International Conference on Data Mining. pp. 766-774 (2015). doi:10.1137/1.9781611974010.86
6. Liu, H., Fu, Y.: Clustering with Partition Level Side Information. In: 2015 IEEE International Conference on Data Mining. (2015). doi:10.1109/icdm.2015.18
7. Yang Yang, Zhigang Ma, Yi Yang, Feiping Nie, Heng Tao Shen: Multitask Spectral Clustering by Exploring Intertask Correlation. In: IEEE Transactions on Cybernetics. 45, pp. 1083-1094 (2015). doi:10.1109/tcyb.2014.2344015
8. Wang, C., Lai, J., Yu, P.: Multi-View Clustering Based on Belief Propagation. In: IEEE Transactions on Knowledge and Data Engineering. 28, pp. 1007-1021 (2016). doi:10.1109/tkde.2015.2503743
9. Kumar, D., Bezdek, J., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.: A Hybrid Approach to Clustering in Big Data. In: IEEE Transactions on Cybernetics. 46, pp. 2372-2385 (2016). doi:10.1109/tcyb.2015.2477416
10. Zhong, C., Yue, X., Zhang, Z., Lei, J.: A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. In: Pattern Recognition. 48, pp. 2699-2709 (2015). doi:10.1016/j.patcog.2015.02.014
11. Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted partition consensus via kernels. In: Pattern Recognition. 43, pp. 2712-2724 (2010). doi:10.1016/j.patcog.2010.03.001
12. Fred, A., Jain, A.: Data clustering using evidence accumulation. In: Object recognition supported by user interaction for service robots. doi:10.1109/icpr.2002.1047450
13. Huang, D., Wang, C., Lai, J.: Locally Weighted Ensemble Clustering. In: IEEE Transactions on Cybernetics. pp. 1-14 (2017). doi:10.1109/tcyb.2017.2702343
14. Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining partitions. In: Journal of machine learning research, 3(Dec), pp. 583-617 (2002).
15. Alqurashi, T., Wang, W.: A new consensus function based on dual-similarity measurements for clustering ensemble. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). (2015). doi:10.1109/dsaa.2015.7344797
16. Alqurashi, T., Wang, W.: Clustering ensemble method. In: 2018 International Journal of Machine Learning and Cybernetics. (2018). pp.1-20.
17. Wu, O., Zhu, M., Hu, W.: Fragment-based clustering ensembles. In: Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09. (2009). doi:10.1145/1645953.1646232
18. Wu, O., Hu, W., Maybank, S. J., Zhu, M., & Li, B. (2012). Efficient clustering aggregation based on data fragments. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(3), 913-926.
19. Sukhanov, S., Gupta, V., Debes, C., & Zoubir, A. M. (2017, March). Consensus clustering on data fragments. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 4631-4635).
20. Al-Najdi, A., Pasquier, N., Precioso, F.: Using Frequent Closed Pattern Mining to Solve a Consensus Clustering Problem. In: Proceedings of the 28th International Conference on Software Engineering and Knowledge Engineering. (2016). doi:10.18293/seke2016-117
21. Chung, C., Dai, B.: A fragment-based iterative consensus clustering algorithm with a robust similarity. In: Knowledge and Information Systems. 41, pp. 591-609 (2013). doi:10.1007/s10115-013-0667-1
22. Dai, B., Chung, C.: LF-CARS: A Loose Fragment-Based Consensus Clustering Algorithm with a Robust Similarity. In: Discovery Science. pp. 154-168 (2012). doi:10.1007/978-3-642-33492-4_14
23. Franek, L., Jiang, X.: Ensemble clustering by means of clustering embedding in vector spaces. In: Pattern Recognition. 47, pp. 833-842 (2014). doi:10.1016/j.patcog.2013.08.019
24. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml/
25. Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. In: SIAM Journal on Scientific Computing. 20, pp. 359-392 (1998). doi:10.1137/s1064827595287997
26. Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in VLSI domain. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 7, pp. 69-79 (1999). doi:10.1109/92.748202