Basic Search / Detailed Display

Author: 莊峻豪
Chun-hao Chuang
Thesis Title: 以資料密度為基礎的多階段分群演算法
A Density-based Multistage Clustering Algorithm
Advisor: 徐俊傑
Chiun-chieh Hsu
Committee: 賴源正
Yuan-cheng Lai
Nai-wei Lo
Degree: 碩士
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2010
Graduation Academic Year: 98
Language: 中文
Pages: 54
Keywords (in Chinese): 密度基礎減法分群模糊分群階層式分群
Keywords (in other languages): Density-based, Subtractive clustering, Fuzzy clustering, Hierarchical clustering
Reference times: Clicks: 256Downloads: 4
School Collection Retrieve National Library Collection Retrieve Error Report
  •   近年來隨著電子商務的興起,許多企業紛紛投入電子化,企業們所蒐集到的資料越來越多,對於管理者來說,若能從這些資料中發掘出有用的資訊,將會有莫大的助益。因此,資料探勘已經成為相當重要而且熱門的研究領域之一。



      With the increase of the e-commerce in recent years, large amounts of enterprises start to invest in computerization and thus generate a tremendous amount of data. For the managers, it will be a great benefit for the enterprise if useful information can be extracted from these raw data. Therefore, data mining has become one of important and popular research domains.

      Clustering algorithms can recognize and partition data according to their attributes’ characteristics without defining any categorization information in advance. Therefore, clustering algorithms play an important role in data mining, where the goal is to maximize the homogeneity of objects within the clusters while also maximize the heterogeneity between clusters. Fuzzy C-Means is one of the most popular clustering algorithms, where the number of clusters should be given in advance. Even though we assign the number of clusters, the result of clustering may fall into the local optimum. In addition, because the initial cluster centers are determined by random, the result of each execution may be different. Furthermore, if the data contains noise, it will induce more significant impact on the results, so it is important to suitably select the initial cluster centers. Although the influence of randomly selecting centers can be reduced if the subtractive method is adopted, it is still difficult to deal with the non-spherical shape clusters. If we use the FCM algorithm with the hierarchical algorithm, it can deal with the non-spherical shape clusters, but it cannot easily handle noise.

      Hence, this paper proposes a density-based multistage clustering algorithm, combining with subtractive clustering, fuzzy clustering and hierarchical clustering methods for solving the problems mentioned above. There are three stages in the approach. The first stage is to suitably select the initial cluster centers; the second stage is to modify the distribution of data points; and the third stage is to merge clusters until appropriate number of clusters is achieved. The experimental results show that our proposed method can improve the performance of clustering.

    中文摘要 I 英文摘要 II 目錄 IV 圖索引 VI 表索引 VIII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 1 1.3 研究目的 3 1.4 研究範圍與限制 3 1.5 論文架構 3 第二章 文獻探討 4 2.1 分群技術 4 2.2 分割式分群演算法 7 2.3 階層式分群演算法 8 2.4 以密度為基礎的分群演算法 10 2.5 以網格為基礎的分群演算法 12 2.6 模糊分群演算法 13 2.7 混合式分群演算法 15 2.8 群集有效性分析 16 第三章 研究方法 19 3.1 問題描述 19 3.2 DMC演算法 20 3.2.1初始群心挑選階段 23 3.2.2 群集修正階段 25 3.2.3 群集合併階段 26 3.3 完整範例 27 3.4 參數設定說明 32 第四章 實驗分析與結果 33 4.1 實驗資料與環境 33 4.2 FCM、SFCM與DMC之比較 35 4.3 HFCM與DMC之比較 44 4.4 實驗總結 49 第五章 結論與未來研究方向 51 參考文獻 52

    [1] Bezdek , J. C., "Cluster Validity with Fuzzy Sets", Journal of Cybernetics, vol. 3, pp. 58-73 (1973).
    [2] Bezdek, J. C., "Numerical taxonomy with fuzzy sets", Journal of Mathematical Biology, vol. 1, pp. 57-71 (1974).
    [3] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York (1981).
    [4] Bezdek, J. C., Ehrlich, R., and Full, W., "FCM: The fuzzy c-means clustering algorithm", Computers & Geosciences, vol. 10, pp. 191-203 (1984).
    [5] Chiu, S. L., "Fuzzy model identification based on cluster estimation", Journal of Intelligent Fuzzy Systems, vol. 2, pp. 267-278 (1994).
    [6] Dave, R. N., "Validating fuzzy partitions obtained through c-shells clustering", Pattern Recognition Letters, vol. 17, pp. 613-623 (1996).
    [7] Dragut, A. B. and Nichitiu, C. M., "A Monotonic On-Line Linear Algorithm for Hierarchical Agglomerative Classification", Information Technology and Management, vol. 5, pp. 111-141 (2004).
    [8] Dubes, R. C., "How many clusters are best? - An experiment", Pattern Recognition, vol. 20, pp. 645-663 (1987).
    [9] Dunn, J. C., "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics, vol. 3, pp. 32 - 57 (1973).
    [10] Fukuyama, Y. and Sugeno, M., "A new method of choosing the number of clusters for the fuzzy c-means method", in Proceedings of Fifth Fuzzy Systems Symposium, pp. 247-250 (1989).
    [11] Gan, G., Ma, C., and Wu, J., Data Clustering: Theory, Algorithms, and Applications, SIAM (2007).
    [12] Gao, X. and Xie, W., "Advances in theory and applications of fuzzy clustering", Chinese Science Bulletin, vol. 45, pp. 961-970 (2000).
    [13] GuldemIr, H. and Sengur, A., "Comparison of clustering algorithms for analog modulation classification", Expert Systems with Applications, vol. 30, pp. 642-649 (2006).
    [14] Hammouda, K. and Karay, F., "A Comparative Study of Data Clustering Techniques", Course Project (2000).
    [15] Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco (2001).
    [16] Kuik, S.P. and Salim, N., "Optimized Subtractive Clustering for Cluster-Based Compound Selection", in Proceedings of the 1st International Conference on Natural Resources Engineering & Technology, Putrajaya, Malaysia, pp. 492-499 (2006).
    [17] Li, Y. and Chung, S., "Parallel bisecting k-means with prediction clustering algorithm", The Journal of Supercomputing, vol. 39, pp. 19-37 (2007).
    [18] Liu, W. Y., Xiao, C. J., Wang, B. W., Shi, Y., and Fang, S. F., "Study on combining subtractive clustering with fuzzy c-means clustering", in Proceedings of the Second International Conference on Machine Learning and Cybernetics, vol.5, pp. 2659-2662 (2003).
    [19] Pakhira, M. K., Bandyopadhyay, S., and Maulik, U., "Validity index for crisp and fuzzy clusters", Pattern Recognition, vol. 37, pp. 487-501 (2004).
    [20] Pakhira, M. K., Bandyopadhyay, S., and Maulik, U., "A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification", Fuzzy Sets and Systems, vol. 155, pp. 191-214 (2005).
    [21] Pal, N. R. and Bezdek, J. C., "On cluster validity for the fuzzy c-means model", Fuzzy Systems, IEEE Transactions on, vol. 3, pp. 370-379 (1995).
    [22] Wong, C. C., Chen, C. C., and Su, M. C., "A novel algorithm for data clustering", Pattern Recognition, vol. 34, pp. 425-442 (2001).
    [23] Xie, X. L. and Beni, G., "A validity measure for fuzzy clustering", Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 13, pp. 841-847 (1991).
    [24] Yager, R. R. and Filev, D. P., "Approximate clustering via the mountain method", Systems, Man and Cybernetics, IEEE Transactions on, vol. 24, pp. 1279-1284 (1994).
    [25] Yu, J., Cheng, Q., and Huang, H., "Analysis of the weighting exponent in the FCM", IEEE Transactions on Systems, Man and Cybernetics -Part B: Cybernetics, vol. 34, no. 1, pp. 634-639 (2004).
    [26] Zadeh, L. A., "Fuzzy sets", Information and Control, vol. 8, pp. 338-353 (1965).
    [27] 陳同孝、陳雨霖、劉明山、許文綬、林志強、邱永興,「結合K-means及階層式分群法之二階段分群演算法」,電腦學刊,第十七卷,第一期,第65-76頁 (2006)。