簡易檢索 / 詳目顯示

研究生: 蔡錫震
Hsi-Chen Tsai
論文名稱: 利用最小包含球的支撐向量分群型態之演算法
A Support Vector Clustering Type Algorithm via Minimum Enclosing Balls
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 項天瑞
Tien-Ruey Hsiang
鮑興國
Hsing-Kuo Pao
許鈞南
Chun-Nan Hsu
張源俊
Yuan-Chin Chang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 41
中文關鍵詞: 支撐向量群聚演算法群聚分配最小包含球
外文關鍵詞: cluster labeling, minimum enclosing ball, support vector clustering
相關次數: 點閱:111下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 支撐向量機(support vector machines)在近年來成為最熱門的一種機器學習演算法。支撐向量群聚演算法( support vector clustering )簡稱SVC便是以此為靈感而發展形成。 SVC是一種非監督式學習的演算法。它把要分群的點先映到其它高維度的空間,接著找一最小包含球( minimum enclosing ball )包住這些點,當這在高維度形成的球映回原來的空間變成數個輪廓,每個輪廓中包含個數不一的資料點,同一輪擴的資料點理所當然成為一個群。這是SVC的主要分群概念。

    SVC主要的缺點在於其群聚分配( cluster labeling )時間複雜度過高,因此許多的方法被提出來目的在改善這個缺點。 本篇論文結合支撐向量( support vector )的特性、 最小包含球以及k-means演算法, 把這些概念作一適當的分配利用達到降低時間複雜度的目的。 在實驗的部分我們創造一些分布在2維或3維空間的資料集以利可以馬上觀察出各個資料集的群聚分布。並把我們提出的方法跟其他現有的方法作比較,不管是時間花費度跟潔果準確度我們的方法均有較突出且令人滿意的表現。


    Clustering analysis which is categorized as unsupervised learning in machine learning means based on speci‾c features creating groups of objects in such a way that the objects grouping into the same clusters are similar and those belonging in diferent clusters are dissimilar. Support vector clustering (SVC) is an unsupervised and kernel-based clustering algorithm. SVC could naturally separate dataset with any shape into diferent clusters. SVC separates the dataset into appropriate clusters by tuning two parameters. Suppose number of data points is n, the time complexity of labeling data points becomes O(n2).
    It becomes the bottleneck of SVC and this is the major drawback why SVC always takes more time than other clustering algorithms.
    Focus this drawback, this thesis suggests a novel cluster labeling algorithm combining with the concept of minimum enclosing balls (MEBs), the property of support vector and k means to improve the e±ciency. In the later section, we will test our proposed method on synthetic datasets either in R2 and R3 space in order to visualize the clustering results, and demonstrate the e±ciency of our proposed method by comparing with other cluster labeling algorithms. Besides we also find a flaw in the original SVC mathematical programming model, another issue of this thesis is to discuss the flaw and solve it.

    1 Introduction 1 1.1 Clustering Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Hierarchical algorithms . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Partitional algorithms . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 A Novel Clustering Algorithm : Support Vector Clustering . . . . 4 1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 6 2.1 Support Vector Machines for Classi‾cation . . . . . . . . . . . . . . 6 2.1.1 Support Vector Classi‾cation . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Idea of Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 One-Class SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Support Vector Clustering . . . . . . . . . . . . . . . . . . . . . . . .10 I 2.2.1 Introduction of SVC . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Strategies of Cluster Labeling . . . . . . . . . . . . . . . . . . . 15 3 SVC via Minimum Enclosing Balls 19 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Cluster Labeling via MEBs . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Experiments and Results 27 4.1 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . .27 5 Conclusion and Further Work 32 A SVC Type Algorithm Via MEBs Implementation for Matlab Code 34

    [1] P. Baldi, P. Frasconi, and P. Smyth. Modelling the Internet and the Web. Wiley, 2003.
    [2] Ben-Dor, A., R. Shamir, and Z.Yakhini. Clustering gene experssion patterns. J. Comp. Bio, 6(3-4):281{97, 1999.
    [3] A. Ben-Hur, D. Horn, H. Siegelmann, and V. Vapnik. Support Vector Clustering. Journal of Machine Learning Research, 2:125{137, 2001.
    [4] Asa Ben-Hur, David Horn, Hava Siegelmann, and Vladimir Vapnik. A support vector method for hierarchical clustering.
    [5] A. Ben-Israel, A. Ben-Tal, and S. Zlobec. Optimality in Nonlinear Programming: A Feasible Directions Approach. John Wiley & Sons, New York, 1981.
    [6] D. P. Bertsekas. Nonlinear Programming. Athena Scienti‾c, Belmont, MA, second edition, 1999.
    [7] C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121{167, 1998.
    [8] V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods. John Wiley & Sons, New York, 1998.
    [9] R. Courant and D. Hilbert. Methods of Mathematical Physics. Interscience Publishers, New York, 1953.
    [10] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, 2000.
    [11] A. Dempster, N. M. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B, 39:1{38,
    1977.
    [12] R. C. Dubes and A. K. Jain. Algorithms for Clustering Data. Prentice Hall, 1988.
    [13] P. Hand, D. Mannila, and H. Smyth. Principles of data mining. MIT Press, 2001.
    [14] In The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD). Kernel k-means, spectral clustering and normalized cuts, 2004. 40
    [15] Sammon JW Jr. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18, 1969.
    [16] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
    [17] J. Lee and D. Lee. An improved Cluster Labeling Method for Support Vector Clustering. IEEE Transactions on pattern analysis and machine intelligence, 27:461{464,
    2005.
    [18] S. Lee and K Daniels. Cone Cluster Labeling for Support Vector Clustering. In Proceedings of 6th SIAM Conference on Data Mining, pages 484{488, 2006.
    [19] Y.-J. Lee and O. L. Mangasarian. RSVM: Reduced Support Vector Machines. Technical Report 00-07, Data Mining Institute, Computer Sciences Department, University
    of Wisconsin, Madison, Wisconsin, July 2000. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, 2001, CD-ROM Proceedings.
    ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-07.ps.
    [20] Y.-J. Lee and O. L. Mangasarian. SSVM: A Smooth Support Vector Machine. Computational Optimization and Applications, 20:5{22, 2001. Data Mining Institute, Uni-
    versity of Wisconsin, Technical Report 99-03. ftp://ftp.cs.wisc.edu/pub/dmi/techreports/99-03.ps.
    [21] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.
    [22] MATLAB. User's Guide. The MathWorks, Inc., Natick, MA 01760, 1994-2001. http://www.mathworks.com.
    [23] K. Mehrotra, C. Mohan, and S. Ranka. Elements of Arti‾cial Neural Networks. MIT Press, 1996.
    [24] M. L. Overton and R.S. Womersley. Optimality Conditions and Duality Theory for Minimizing Sums of the Largest Eigenvalues of Symmetric Matrices. Mathematical
    Programming, 62:321{357, 1993.
    [25] J. C. Platt. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. In B. SchÄolkopf, C. J. C. Burges, and A. J. Smola, editors,
    Advances in Kernel Methods - Support Vector Learning, pages 185{208. MIT Press,
    1999. http://www.research.microsoft.com/»jplatt/smo.html.
    [26] B. SchÄolkopf, R. C. Williamson, A.J. Smola, J.Shawe-Taylor, and J.Platt. Support Vector Method for Novelty Detection. In Advances in Neural Information Processing
    Systems 12, pages 526{532, 1999.
    [27] D. M. J Tax and R. P. W. Duin. Support Vector Domain Description. Pattern Recognition Letters, 20(11-13):1191{1199, 1999.
    [28] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. 41
    [29] A. J. Venables. The international division of industries: clustering and comparative advantage in a multiindustry model. CEPR Discussion Paper, (1961), 1998.
    [30] J. T. Wijnen, H. F. A. Vasen, P. M. Khan, A. H. Zwinderman, H. Van Der Klift, A. Mulder, C. Tops, P. Moller, and R. Fodde. Clinical ‾ndings with implications
    for genetic testing in families with clustering of colorectal cancer. N. Engl. J. Med., 339:511{518, 1998.
    [31] J. Yang, V. Estivill-Castro, and S. Chalup. Support Vector Clustering Through Proximity Graph Modeling. In Proceedings, 9th International Conference on Neural
    Information Processing (ICONIP'02), pages 898{903, 2002.

    QR CODE