簡易檢索 / 詳目顯示

研究生: 陳勃年
Bo-Nian Chen
論文名稱: 以資料間距與標準差為基礎的非監督式聚類演算法
An unsupervised clustering algorithm based on its data distribution and standard deviation
指導教授: 楊英魁
Ying-Kuei Yang
口試委員: 黎碧煌
Bih-Hwang Lee
李建南
Chien-Nan Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 69
中文關鍵詞: 非監督式聚類演算法資料間距標準差擴散
外文關鍵詞: unsupervised clustering algorithm, gap, standard deviation, spread
相關次數: 點閱:242下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本文主要目的為延續以資料間距為基礎的非監督式聚類演算法,並提出一個全新的聚類方式,避免在圓形的切割方式下,將非當前聚類的資料點給切割進來。以適當的假設方法,決定初始聚類中心後,由近至遠地抓取參考點,並計算每次新增線段的最短距離,若新增線段的最短距離與上次最短距離超過一定倍率,則完成參考資料集。並使用此參考資料集,計算參考資料集內所有成員間的距離,並利用標準差的概念,挑選合適的參考資料間距,並計算出擴散半徑。最後,從初始參考點開始以兩倍擴散半徑慢慢擴散,在此擴散半徑中的資料點則為下次擴散點,並歸納為同一聚類,並從原資料集移除已分類資料點。如此週而復始,直至原資料集內沒有任何資料點,所有資料點皆分類完畢後,演算法就此結束。此演算法利用小圓形能組成各種圖形的概念,透過擴散的方式,抓取鄰近的資料點,能更利於延展型聚類的分類結果。最後,為了試驗本文提出的方法,總共模擬了六組不同特性的資料樣本,每個樣本對於資料個數多寡、聚類數量、資料分佈的疏密程度及聚類延展性皆有不同的變化。模擬的結果將以資料分佈為基礎的分監督式聚類分割法的結果做比較,列出實驗結果予以對照。


In this thesis, we proposed a new way of clustering expandable clusters by basing on “An unsupervised clustering approach based on its data distribution” to avoid any resultant cluster to include any data point belong to other clusters. After using an appropriate way to decide the initial clustering center, the algorithm we proposed extracts data points to the referenced subset from the near to far and calculates the minimum length of newly added lines in this time. If the ratio of minimum length this time to the minimum length of previous time is over a certain threshold, the algorithm will stop extracting any more data point and complete the referenced subset. Then the algorithm calculates all the distances among data points located in the referenced subset, and picks an appropriate length by using the concept of standard deviation as the “spreading radius” of the cluster. The algorithm then spreads the cluster area from initial clustering center by using a larger clustering radius that is twice long of the current spreading radius. Any data points within the longer clustering radius will be included in the new but larger cluster. This scenarios will repeat until all the data points have been classified in clusters. Finally, the proposed approach is tested against six sets of sample data with various data distributions, number of clusters and levels of expansions. The experimental result has shown that the proposed approach in this thesis works pretty well on not only simple and grouped data distribution, but also the ones that have high degrees of expansions.

摘要 Abstract 目錄 致謝 圖表索引 第一章 緒論 1.1 研究背景 1.2 研究目的 1.3 論文架構 第二章 聚類與聚類方法 2.1 辨識能力與聚類法則 2.2 非監督式聚類演算法 2.2.1 K-nearest neighbors method (K-NN演算法) 2.2.2 Fuzzy c-means algorithm (FCM演算法) 2.2.3 以資料間距為基礎的非監督式聚類分割法 第三章 研究方法 29 3.1 演算法構想 3.2 距離比率閥值與參考資料集 3.3 計算擴散半徑dspread 3.4 擴散方法 3.5 演算法步驟說明 3.6 本章結論 第四章 模擬結果分析 4.1 試驗方法 4.2 模擬結果 第五章 結論 參考文獻

[1] Cong Wang, David J. Hill, “Deterministic Learning and Rapid Dynamical Pattern Recognition,” IEEE Transaction on Neural Network, VOL. 18, NO. 3, May 2007
[2] Biplab Banerjee, Francesca Bovolo, Avik Bhattacharya, Lorenzo Bruzzone, Subhasis Chaudhuri, B. Krishna Mohan, “A New Self-Training-Based Unsupervised Satellite Image Classification Technique Using Cluster Ensemble Strategy,” IEEE Geoscience and Remote Sensing Letters, Vol. 12, NO. 4, Apr 2015
[3] Jun-Uk Chu, Inhyuk Moon, Yun-Jung Lee, Shin-Ki Kim, Mu-Seong Mun, “A Supervised Feature-Projection-Based Real-Time EMG Pattern Recognition for Multifunction Myoelectric Hand Control,” IEEE/ASME Transactions on Mechatronics, Vol. 12, NO. 3, Jun 2007
[4] 徐紹恒, 「運用整體學習與群集分析偵測垃圾網站方法」。台灣科技大學,電機工程學系,碩士論文,2013。
[5] Haiying Wang, Huiru Zheng, Francisco Azuaje, “Poisson-Based Self-Organizing Feature Maps and Hierarchical Clustering for Serial Analysis of Gene Expression Data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 4, NO. 2, Apr-Jun 2007
[6] J.M. Keller, M.R. Gray, J.A. Givens, “A fuzzy k-nearest neighbors algorithm,” IEEE Transactions on System, Man, and Cybernetics,Vol. SMC-15, pp. 580-585, 1985.
[7] B. Bhattacharya, D. Kaller, “Reference set thinning for the k-nearest neighbor decision rule,” Pattern Recognition, Proceedings of the Fourteenth International Conference, Vol. 1, pp. 238-242, 1998.
[8] R.J. Schalkoff, Pattern recognition: statistical, structural and neural approaches, John Wiley & Sons, 1992.
[9] S. Miyamoto, “An overview and new methods in fuzzy clustering,” Knowledge-Based Intelligent Electronic Systems, Proceedings of the KES '98 Second International Conference,Vol. 1, pp. 33-40, 1998.
[10] N.B. Karayiannis, “Generalized fuzzy c-means algorithms, ” Fuzzy Systems, Proceedings of the Fifth IEEE International Conference, Vol. 2, pp. 1036 -1042 , 1996.
[11] D. Hershfinkel, “Accelerated fuzzy c-means clustering algorithm,” SPIE Proceedings , Vol. 2761, pp.41-52,1996.
[12] J. Pei, J. Fan,W. Xie, X. Yang, “ A new effective soft clustering method--sectional set fuzzy c-means (S2FCM) clustering,” Signal Processing, Proceedings of the 3rd International Conference,Vol. 1, pp. 773 -776 , 1996.
[13] Lin Zhu, Fu-Lai Chung, and Shitong Wang, “Generalized fuzzy c-means clustering algorithm with improved fuzzy partitions,” IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, Vol. 39, No. 3, Jun 2009
[14] J. Liu, W. Xie, “ A genetics-based approach to fuzzy clustering,” Fuzzy Systems, Proceedings of the Fourth IEEE International Conference,Vol. 4, pp. 2233 -2240, 1995.
[15] M.A. Egan, “Locating clusters in noisy data: a genetic fuzzy c-means clustering algorithm, ” Fuzzy Information Processing Society - NAFIPS, Conference of the North American, pp. 178 –182, 1998.
[16] A. Singh, C. Quek, S.-Y. Cho, “DCT-Yager FNN: A Novel Yager-Based Fuzzy Neural Network With the Discrete Clustering Technique,” IEEE Transactions on Neural Networks, Vol.19, NO. 4, Apr 2008
[17] Chin-Teng Lin, Chang-Mao Yeh, Sheng-Fu Liang, Jen-Feng Chung, and Nimit Kumar, “Support-Vector-Based Fuzzy Neural Network for Pattern Classification,” IEEE Transactions on Fuzzy Systems, Vol. 14, NO. 1, Feb 2006
[18] H. Suh, J.H. Kim, C.H. Rhee, “Convex-set-based fuzzy clustering, “ Fuzzy Systems, IEEE Transactions, Vol. 73, pp. 271-285, 1999 .
[19] N. Zahid, O. Abouelala, M. Limouri, A. Essaid, “Unsupervised fuzzy clustering,” Pattern Recognition Letters, Vol. 20, pp. 123-129, 1999.
[20] 孫宗瀛,楊英魁, Fuzzy控制:理論、實作與應用, 全華科技圖書, 1995.
[21] Michael Steinbach, George, Karypis Vipin Kumar, “A Comparison of Document Clustering Techniques,“ Department of Computer Science and Engineering, University of Minnesota Technical , Report #00-034, 2000
[22] 施雅月,賴錦慧,資料探勘,歐亞出版社,2007
[23] G.F. Luger and W.A. Stubblefield, Artificial Intelligence, Addison Wesley, 1998.
[24] 蘇木春,張孝德, 機器學習, 全華科技圖書, 1999.
[25] K Hechenbichler, K Schliep, “Weighted k-Nearest-Neighbor Techniques and Ordinal Classification,” Sonderforschungsbereich 386, Paper 399, 2004
[26] Yingquan Wu, Krassimir Ianakiev, Venu Govindaraju “Improved k-nearest neighbor classfication,” Pattern Recognition Vol.35, 2311–2318, 2002
[27] Xiaoming Liu, and Jinshan Tang, “Mass Classification in Mammograms Using Selected Geometry and Texture Features, and a New SVM-Based Feature Selection Method,” IEEE Systems Journal, Vol.8, NO. 3, Sep 2014
[28] James C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Springer, 1981
[29] Hidetomo Ichihashi, Katsuhiro Honda, “Fuzzy c-Means Classifier for Incomplete Data Sets with Outliers and Missing Values,” Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference, Nov 2005
[30] 林彥廷,「以資料分佈為基礎的非監督式聚類分割法」,台灣科技大學,電機工程學系,碩士論文, 2000
[31] Gary Smith, Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics, Duckworth Overlook, 2014
[32] Sheldon M. Ross, Introduction to Probability and Statistics for Engineers and Scientists Fifth Edition, Elsevier, 2014
[33] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington and R.F. Murtagh, “Validity-guided (re)clustering with applications to image segmentation,” Fuzzy Systems, IEEE Transactions, Vol. 42, pp. 112 –123, 1996.

無法下載圖示 全文公開日期 2020/07/01 (校內網路)
全文公開日期 2025/07/01 (校外網路)
全文公開日期 2025/07/01 (國家圖書館:臺灣博碩士論文系統)
QR CODE