簡易檢索 / 詳目顯示

研究生: 邱華敏
Hua-Min Chiou
論文名稱: 候選群集搜尋法(CGS)結合K調和均值法(KHM)之研究
Candidate group search(CGS) for K-harmonic means data clustering
指導教授: 洪政煌
Cheng-Huang Hung
楊維寧
Wei-Ning Yang
口試委員: 陳雲岫
Yun-Shiow Chen
徐俊傑
Chiun-Chieh Hsu
陳文智
Wen-Chih Chen
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 51
中文關鍵詞: 群集分析K-均值法K-調和均值法候選群集搜尋法
外文關鍵詞: Cluster, K-means, KHM, CGS
相關次數: 點閱:216下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文摘要
    群集分析(cluster analysis)是一種非常普遍的資料採礦和資料分析技術,而K均值法(K-means)則是最為廣泛使用的群集分析方法之一。K均值法雖然具有易於使用和運算快速等優點,但同時K均值法在許多情況下,也普遍存在兩個的缺點,其一是對初始值敏感性較高,其二為分群結果容易落入區域最佳解之情況。為了改善這些缺點,近幾年持續有許多相關研究積極探索新的分群方法,K調和均值法(KHM)就是其中之一,而這種新的分群方法確實有效克服了K均值法的第一項缺點。但對於求解結果容易落入區域最佳解的部份,依近期被提出的幾種新分群方法改進結果,如禁忌搜尋法(TabuKHM)、模擬退火法(SAKHM)及鄰近搜尋法(VNSKHM)等,確實尚有精進的空間。因此,本篇論文同樣提出了一個新的演算法,稱之為「候選群集搜尋法(CGS)」,基本上CGS與VNS等方法,一樣是以KHM為基礎,透過逐一更換群集中心,重複利用KHM疊代演算程序,求取新的群集中心,設法避免落入區域最佳解。經過實證實際資料集,包括Iris、Glass、Wine三組小樣本資料集,以及Drilling與Image Segmentation 二組大型資料集,均顯示CGS之分群結果,相較之前的演算方法如:TabuKHM、SAKHM及VNSKHM等,在目標績效函數值改善部分,有較佳的結果,耗費運算時間方面也比較省時,特別是對於大型的抑或是高維度的資料集。此外,本研究也探討了以優化初始中心(Better initial center, BIC),替代隨機選取初始中心(Random initial center, RIC)的方法,以相同資料集驗證結果,優化初始中心的方法,相較隨機選取初始中心的方法,在求解的收斂速度與目標函數值改善部分,都有較好的結果,整體求解過程的目標函數值變異性明顯降低,穩定性大幅提升。


    Abstract
    Clustering is a very popular data analysis and data mining technique. K-means is one of the most popular methods for clustering. Although K-mean is easy to implement and works fast in most situations, it suffers from two major drawbacks, sensitivity to initialization and convergence to local optimum. In order to improve these drawbacks, many studies have been continued to explore new clustering methods in recent years. K-harmonic means clustering is one of them and has been proposed to overcome the first drawback, sensitivity to initialization. However for solving convergence to local optimum, according to the recent results of improvement by several new clustering methods, such as Tabu KHM, SAKHM and VNSKHM, there is indeed space for improvement. Therefore, we propose a new algorithm, candidate group search(CGS), combining with K-harmonic means to solve clustering problem. CGS and VNS are both based on KHM, with replacing cluster center one by one to repeat KHM iterative calculation procedures, strike a new cluster center and try to avoid convergence to local optimum. After the actual empirical data sets, including the three groups of small data sets, Iris, Glass and Wine, as well as two groups of large data sets, Drilling and Image Segmentation, computational results shows CGS does get better performance with less computational time in clustering, especially for large datasets, when compared with the previous calculation methods such as TabuKHM, SAKHM and VNSKHM etc. In addition, the study also explored alternative Random initial center(RIC) with Better initial center(BIC) to verify the results of the same data sets, BIC is better in solving convergence rate and improvement of function value performance for solving partial compared with RIC. The variability decreased and stability increased highly in the objective function value during whole solving process.

    目 錄 中文摘要 Ⅰ 英文摘要 Ⅱ 誌 謝 Ⅲ 目 錄 IV 圖表索引 VI 第1章 緒論 1 1.1 研究動機與目的 1 1.2 研究創新與貢獻 1 第2章 文獻探討 3 2.1 群集分析法 3 2.2 K均值法KM 5 2.3 K調和均值法KHM 6 2.4 模擬退火法SAKHM 8 2.5 禁忌搜尋法TabuKHM 9 2.6 鄰近搜尋法VNSKHM 10 第3章 研究方法 12 3.1 研究假說 12 3.2 名詞定義、參數性質 12 3.3 候選群及搜尋法 14 3.4 探索CGS採候選群集與VNS採隨機選取替換中心點之差異 15 第4章 資料實證結果與分析 18 4.1 電腦運算設備 18 4.2 實證資料說明 18 4.3 資料實證與分析 20 4.4 小結 31 第5章 延伸研究 32 5.1 CGS與優化初始值之結合 32 5.2 優化初始值CGS_BIC演算法 33 5.3 優化初始值CGS_BIC與隨機選取CGS_RIC求解實證結果 36 5.4 小結 45 第6章 結論與未來研究方向 46 6.1 結論 46 6.2 未來研究方向 47 參考文獻 48

    參考文獻
    [1] 資料探勘(Introduction to Data Mining, Pang-Ning Tan、Michnel Steinbach& Vipin Kumar),施雅月與賴錦慧譯,第5章至第9章。
    [2] 資料採礦理論與實務-顧客關係管理的技巧與科學,(Mastering Data Mining, The Art & Science of Customer Relationship Management, Michael J.A. Berry and Gordon S. Linoff),吳旭志與賴淑貞譯,p44-46、p111-147。
    [3] 資料採礦-顧客關係管理暨電子行銷應用(The Data Warehouse Lifecycle Toolkit, Ralph Kimball, Michael J. A. Berry, Gordon S. Linoff 2001),彭文正譯,數博網資訊股份有限公司。
    [4] 料探勘與技術(Data Mining、AI、Algorithm)張云濤與龔玲編著,p63-74。
    [5] M. S. Aldenderfer and R. K. Blashfield. Cluster Analysis. Sage Publications, Los Angeles, 1985.
    [6] B. S. Everitt, S. Landau, and M. Leese. Cluster Analysis. Arnold Publishers, London, fourth edition, May 2001.
    [7] E.W. Forgy, Cluster analysis of multivariate data: efficiency vs. interpretability of classifications, Biometrics. 21 (1965) 768–769.
    [8] M.R. Anderberg (1973) Cluster Analysis for Application. Academic Press, New York.
    [9] L. Kaufmann, P. Rousseeuw, Finding Groups in Data, Wiley Series in Probability and Statistics, New York, 2005.
    [10] B. Mirkin, Clustering for Data Mining: A Data Recovery Approach, Taylor & Francis group, FL, 2005.
    [11] D. Aloise, P. Hansen, Clustering, in: D.R. Sheir (Ed.), Handbook of Discrete and Combinatorial Mathematics, CRC Press, 2009.
    [12] P. Hansen, B. Jaumard, Cluster analysis and mathematical programming, Mathematical programming 79 (1997) 191–215.
    [13] D. Aloise, A. Deshpande, P. Hansen, P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning 75 (2009) 245–248.
    [14] P. Hansen, N. Mladenovic, J-means: a new local search heuristic for minimum sum of squares clustering, Pattern Recognition 34 (2001) 405–413.
    [15] N. Belacel, P. Hansen, N. Mladenovic´, Fuzzy J-means: a new heuristic for fuzzy clustering, Pattern Recognition 35 (2003) 2193–2200.
    [16] J. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations, vol. 1, University of California Press, Berkeley, CA, 1967.
    [17] G. Ball and D. Hall. A Clustering Technique for summarizing Multivariate Data. Behavior Science, 12:153-155, March 1967.
    [18] I. S. Dhillon, Y. Guan, and J. Kogan. Iterative CVlustering of High Dimensional Text Data Augmented by Local Search. In Proc. Of the 2002 IEEE Intl. Conf. on Data Mining, page 131-138. IEEE Computer Society, 2002.
    [19] B. Zhang, M. Hsu, U. Dayal, K-Harmonic Means – A Data Clustering Algorithm, Technical Report HPL-1999-124, Hewlett-Packard Laboratories, 1999.
    [20] B. Zhang, Generalized K-Harmonic Means – Boosting in Unsupervised Learning, Technical Report HPL-2000-137, Hewlett-Packard Laboratories, 2000.
    [21] Z. Güngör, A. Ünler, K-harmonic means data clustering with tabu-search method, Applied Mathematical Modelling 32 (2008) 1115–1125.
    [22] Z. Gungor, A., Unler, K-harmonic means data clustering with simulated annealing heuristic, Applied Mathematics and Computation Vol.3184, pp.199–209, 2007.
    [23] Hua Jiang , Shenghe Yi, Jing Li, Fengqin Yang, Xin Hu, Ant clustering algorithm with K-harmonic means clustering, Expert Systems with Applications Vol.37, pp.8679–8684, 2010.
    [24] N. Mladenovic´, P. Hansen, Variable neighborhood search, Computers & Operations Research 24 (1997) 1097–1100.
    [25] P. Hansen, N. Mladenovic, Variable neighborhood search: principles and applications, European Journal of Operations Research 130 (2001) 449–467.
    [26] P. Hansen, N. Mladenovic, J.A. Moreno Pérez, Variable neighborhood search: methods and applications, 4OR: A Quarterly Journal of Operations Research 6 (2008) 319–360.
    [27] J. Brimberg, P. Hansen, N. Mladenovic, Attraction Probabilities in Variable Neighborhood search 4OR 8 (2010) 181–194.
    [28] Abdulrahman Alguwaizani, Pierre Hansen, Nenad Mladenovic’, Eric Ngai, Variable neighborhood search for harmonic means clustering, Applied Mathematical Modelling, Vol. 35, Issue 6, June 2011, Pages 2688-2694.
    [29] Hung, C.H., Chiou, H.M., Yang, W.N., Candidate group search for K-harmonic means data clustering, Applied Mathematical Modelling, Vol. 37, 22-Jun-2013,Pages 10123-10128.
    [30] Everitt Brian, Cluster Analysis, 1 ed., Heinemann Educational Books, London, 1974. 2, 3
    [31] B. Mirkin, Clustering for data mining: a data recovery approach, CRC Press, FL, 2005. 3, 36
    [32] R. Xu and D. Wunsch, Clustering, IEEE Press, 2009. 2, 3, 9
    [33] R.M. Cormack, A review of classification, Journal of the Royal Statistical Society. Series A (General) 134 (1971), no. 3, 321–367.
    [34] AD Gordon, A review of hierarchical classification, Journal of the Royal Sta- tistical Society. Series A (General) (1987), 119–137.
    [35] F. Ho¨ppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy cluster analysis: methods for classification, data analysis, and image recognition, Wiley, 1999. 9
    [36] G. Reinelt, TSP-LIB – A traveling salesman library, ORSA Journal Computing 3 (1991) 376–384.
    [37] C.L. Blake, C.J. Merz, UCI Repository of Machine Learning Databases, University of California, 1998. <http://archive.ics.uci.edu/ml/datasets.html>.

    QR CODE