簡易檢索 / 詳目顯示

研究生: 詹雨時
Yu-Shih Chan
論文名稱: 用於分群集成的全域投票系統
A Global Voting System for Clustering Ensemble
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 戴志華
Chih-Hua Tai
帥宏翰
Hong-Han Shuai
陳怡伶
Yi-Ling Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 41
中文關鍵詞: 分群共識分群分群集成片段
外文關鍵詞: Clustering, Consensus clustering, Clustering ensemble, Fragment
相關次數: 點閱:345下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

分群是根據資料點之間的相似性將整個資料集劃分為一個分群結果的過程。現今已經有許多分群的方法被提出,但由於這些方法各別的性質,不同的方法通常會產生不同的分群結果。為了將不同的分群結果結合成一個更好且更穩定的分群結果,分群集成因此被提出。然而,有些分群集成方法要求使用者事先指定參數,同時分群的結果通常受到這些參數設置的嚴重影響。在這篇論文中,我們提出了一個聚類集成方法稱為全局投票分群集成演算法,其中不需要任何手動參數,包括聚類的數量。在此方法包含三個階段,在創建階段,將提取片段以進一步建立適當的候選群。在投票階段,投票過程被設計來使候選群演變成最終候選群。最後,在指定階段,確保每個資料點僅分配給一個候選群。經過各種真實資料集的實驗結果能夠證實,我們提出的方法的確能夠擁有更佳的精準度。


Clustering is the process of partitioning a dataset into clusters based on the similar-ity between data points. Many clustering methods have been proposed, but different methods usually generate different clustering results because of various properties of these methods. To combine different clustering results into a better and more robust clustering result, clustering ensemble has been proposed. However, some methods re-quire users to specify parameters, and results are generally affected by settings of pa-rameters severely. In this work, a novel clustering ensemble method called Global Voting Clustering Ensemble (GVCE) is proposed, where no parameter is required, in-cluding the number of clusters. There are three satges in GVCE, in the creating stage, fragments will be extracted to further create appropriate candidate clusters. Then, in the voting stage, a voting process is designed to allow candidate clusters to evolve into final candidate clusters containing all data points. Finally, an assigning stage is devised to make sure that each data point is assigned to only one cluster. Experimental results on a variety of real-world datasets verified that our proposed method is better than state-of-the-art clustering ensemble methods.

指導教授推薦書 II 論文口試委員審定書 III Abstract II 論文摘要 III 致 謝 IV Table of Contents V List of Tables VI List of Figures VII 1. Introduction 1 2. Related Works 3 3. Proposed Method 4 3.1 Problem Formulation and Proposed Framework 4 3.2 Creating Stage 6 3.3 Voting Stage 8 3.4 Assigning Stage 16 4. Experiments 19 4.1 Experimental Setup and Datasets 19 4.2 Comparison against other clustering ensemble methods 21 5. Conclusion and Future Work 27 6. Reference 28

1. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems, pp. 849-56 (2002)
2. Frey, B., Dueck, D.: Clustering by Passing Messages Between Data Points. In: Science. 315, pp. 972-976 (2007). doi:10.1126/science.1136800
3. Wu, J., Xiong, H., Chen, J.: Adapting the right measures for K-means clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 09. (2009). doi:10.1145/1557019.1557115
4. Wang, L., Leckie, C., Kotagiri, R., Bezdek, J.: Approximate pairwise clustering for large data sets via sampling plus extension. In: Pattern Recognition. 44, pp. 222–235 (2011). doi:10.1016/j.patcog.2010.08.005
5. Liu, H., Wu, J., Tao, D., Zhang, Y., Fu, Y.: DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering. In: Proceedings of the 2015 SIAM International Conference on Data Mining. pp. 766-774 (2015). doi:10.1137/1.9781611974010.86
6. Liu, H., Fu, Y.: Clustering with Partition Level Side Information. In: 2015 IEEE International Conference on Data Mining. (2015). doi:10.1109/icdm.2015.18
7. Yang Yang, Zhigang Ma, Yi Yang, Feiping Nie, Heng Tao Shen: Multitask Spectral Clustering by Exploring Intertask Correlation. In: IEEE Transactions on Cybernetics. 45, pp. 1083-1094 (2015). doi:10.1109/tcyb.2014.2344015
8. Wang, C., Lai, J., Yu, P.: Multi-View Clustering Based on Belief Propagation. In: IEEE Transactions on Knowledge and Data Engineering. 28, pp. 1007-1021 (2016). doi:10.1109/tkde.2015.2503743
9. Kumar, D., Bezdek, J., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.: A Hybrid Approach to Clustering in Big Data. In: IEEE Transactions on Cybernetics. 46, pp. 2372-2385 (2016). doi:10.1109/tcyb.2015.2477416
10. Zhong, C., Yue, X., Zhang, Z., Lei, J.: A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. In: Pattern Recognition. 48, pp. 2699-2709 (2015). doi:10.1016/j.patcog.2015.02.014
11. Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted partition consensus via kernels. In: Pattern Recognition. 43, pp. 2712-2724 (2010). doi:10.1016/j.patcog.2010.03.001
12. Fred, A., Jain, A.: Data clustering using evidence accumulation. In: Object recognition supported by user interaction for service robots. doi:10.1109/icpr.2002.1047450
13. Huang, D., Wang, C., Lai, J.: Locally Weighted Ensemble Clustering. In: IEEE Transactions on Cybernetics. pp. 1-14 (2017). doi:10.1109/tcyb.2017.2702343
14. Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining partitions. In: Journal of machine learning research, 3(Dec), pp. 583-617 (2002).
15. Alqurashi, T., Wang, W.: A new consensus function based on dual-similarity measurements for clustering ensemble. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). (2015). doi:10.1109/dsaa.2015.7344797
16. Alqurashi, T., Wang, W.: Clustering ensemble method. In: 2018 International Journal of Machine Learning and Cybernetics. (2018). pp.1-20.
17. Wu, O., Zhu, M., Hu, W.: Fragment-based clustering ensembles. In: Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09. (2009). doi:10.1145/1645953.1646232
18. Wu, O., Hu, W., Maybank, S. J., Zhu, M., & Li, B. (2012). Efficient clustering aggregation based on data fragments. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(3), 913-926.
19. Sukhanov, S., Gupta, V., Debes, C., & Zoubir, A. M. (2017, March). Consensus clustering on data fragments. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 4631-4635).
20. Al-Najdi, A., Pasquier, N., Precioso, F.: Using Frequent Closed Pattern Mining to Solve a Consensus Clustering Problem. In: Proceedings of the 28th International Conference on Software Engineering and Knowledge Engineering. (2016). doi:10.18293/seke2016-117
21. Chung, C., Dai, B.: A fragment-based iterative consensus clustering algorithm with a robust similarity. In: Knowledge and Information Systems. 41, pp. 591-609 (2013). doi:10.1007/s10115-013-0667-1
22. Dai, B., Chung, C.: LF-CARS: A Loose Fragment-Based Consensus Clustering Algorithm with a Robust Similarity. In: Discovery Science. pp. 154-168 (2012). doi:10.1007/978-3-642-33492-4_14
23. Franek, L., Jiang, X.: Ensemble clustering by means of clustering embedding in vector spaces. In: Pattern Recognition. 47, pp. 833-842 (2014). doi:10.1016/j.patcog.2013.08.019
24. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml/
25. Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. In: SIAM Journal on Scientific Computing. 20, pp. 359-392 (1998). doi:10.1137/s1064827595287997
26. Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in VLSI domain. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 7, pp. 69-79 (1999). doi:10.1109/92.748202

無法下載圖示 全文公開日期 2023/08/23 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE