簡易檢索 / 詳目顯示

研究生: 萬建明
Jian-ming Wann
論文名稱: 以鏈結式及熵值為基礎的分群演算法
A Cluster Algorithm Base on Linkage and Entropy
指導教授: 楊英魁
Ying-Kuei Yang
口試委員: 黎碧煌
Bih-Hwang Lee
孫宗瀛
none
李建南
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 109
中文關鍵詞: 非監督式鏈結式分群演算法群聚分析資料歸屬
外文關鍵詞: unsupervised, linkage algorithm, cluster analysis, entropy, data belong
相關次數: 點閱:265下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  在這篇論文中,我們提出一個非監督式的分群演算法來處理群聚分群的問題,將彼此相鄰的資料點進行資料鏈結的處理,即關係度較高的資料彼此連結在一起。這個方法不需要經由人為指定群聚個數,便可將空間內的所有資料點進行資料分類,透過這種方式可以處理一些較為不規則且難以處理的資料分布,使其不會侷限於某種資料型態的處理。再引入群聚分析的概念做為評估各個群聚之適當性的判斷,進而對不良之群聚做出更進一步的處理,使能得到較佳的分群結果。論文中以累積消息量的資訊作為群聚分析的條件,藉由這個資訊找出群聚中合適的切割位置並進行群聚切割處理。除此之外,我們提出利用群聚之外圍點與資料點間的距離做為資料歸屬的判斷,不同於傳統將資料點歸屬於鄰近的群聚重心,這可以改善部分資料不良歸屬的狀況。我們將這種歸屬方式應用在新加入的資料點上,讓系統不會因為新資料的加入而對空間內所有的資料點重新做分群處理的動作,這能夠減少系統運算處理的負擔。


  In this thesis, we propose an unsupervised algorithm to solve some of difficult problems existing in the research area of data clustering. The proposed algorithm firstly performs the process of linking neighboring data together so that those data having related each other in higher degree will be linked together as a group. This approach does not require the prior knowledge of pre-setting the number of resultant clusters. Most importantly, the proposed approach is capable of handling the data set that has irregular data distribution and is generally difficult to be clustered by contemporary approaches. Further, the concept of clustering analysis is applied to evaluate how well of the clustering performance so that a good clustering result can be reached. In the thesis, the accumulated information amount of entropy is proposed to search the best positions from where a given data set is divided into several clusters. In addition, instead of using the distance between a given data and the cluster center of gravity, this thesis proposes to use the distance between a given data and the outer boundary of a cluster to decide which cluster the data best belongs to. By doing this way, when a new data is given, there is no need of re-clustering the whole data set while at the same time a better data belonging can be obtained.

目錄  ................................................ II 圖目錄 ................................................ IV 摘要  ................................................ XIV 第一章 緒論 ..........................................  1 1.1 研究動機 ........................................  1 1.2 研究背景 ........................................  2 1.3 論文架構 ........................................  9 第二章 分群 ......................................... 11 2.1 分群方式 ........................................ 11 2.2 群聚分析 ........................................ 13 2.2.1 區別率 ....................................... 13 2.2.2 關聯度 ....................................... 14 2.2.3 凝聚率 ....................................... 16 2.2.4 密度率 ....................................... 17 2.3 群聚分析後的處理方式 ............................ 17 2.4 分群方法 ........................................ 18 2.4.1 K-mean分群法 ................................. 19 2.4.2 K-medoids分群法 .............................. 21 2.4.3 K-NN分群法 ................................... 22 2.4.4 Senior method分群法 ........................... 25 2.4.5 Single-Link分群法 ............................ 30 2.4.6 DBSCAN分群法 ................................. 34 2.5 本章結論 ........................................ 36 第三章 鏈結式分群演算法 ............................. 39 3.1 演算法構想 ...................................... 39 3.2 鏈結式分群法 .................................... 44 3.3 群聚分析法則 .................................... 46 3.4 資料歸屬 ........................................ 51 3.5 處理技術 ........................................ 55 3.5.1 連接元件 ..................................... 56 3.5.2 圖像旋轉 ..................................... 61 3.5.3 影像測邊 - Sobel ............................. 62 3.5.4 自動門檻值決定法 - Otsu ...................... 66 3.5.5 細線化 ....................................... 72 3.6 本章結論 ....................................... 75 第四章 實驗結果 ..................................... 76 4.1 其它的分群演算法 ................................ 76 4.2 鏈結式分群演算法 ................................ 87 4.3 本章結論 ....................................... 103 第五章 結論與未來展望 .............................. 104 5.1 結論 ........................................... 104 5.2 未來展望 ....................................... 105 參考文獻 ............................................ 107

[1] Han, J. and Kamber, M., “Data mining: Concepts and Techniques”, San
Francisco:Morgan Kaufmann Publisher, 2001.
[2] James M.Q., “Some Methods for classification and Analysis of
Multivariate Observations”, in Proc. 5-th Berkeley Symposium on
Mathematical of California Press, 1:281-297, 1967
[3] 鍾國亮, 影像處理與電腦視覺, 東華書局, 2006.
[4] Kaufman, L. and P.J. Rousseeuw, “Finding Groups in Data: An Introduction
to Cluster Analysis”, John Wiley & Sons, 1990.
[5] Ng R. and Han J. “ Efficient and Effective Clustering Method for Spatial
Data Mining”, in Proc. Int. Conf. Very Large Databases (VLDB’94),
Sept., pp.144-155, 1994.
[6] Guha S., Rastogi R. and Shim K. “CURE: An efficient clustering algorithm
for large databases”, in Proc. ACM-SIGMOD Int. Conf. Management of Data
(SIGMOD’98), June, pp. 73-84, 1998.
[7] Guha S., Rastogi R., and Shim K. “ROCK: A Robust Clustering Algorithm
For Categorical Attribute”, in Proc. Int. Conf. Data Engineering (ICDE’
99), Mar., pp. 512-521, 1999.
[8] Zhang T., Ramakrishnan R., and Livny M. “BIRCH: An Efficient Data
Clustering Method for Very Large Databases”, in Proc. ACM-SIGMOD Int.
Conf. Management of Data (SIGMOD’96), pp. 103-114, 1996.
[9] Karypis G., Han E.H., and Kumar V. “CHAMELEON: Hierarchical Clustering
Using Dynamic Modeling”, IEEE Computer, Vol. 32, No. 8, pp. 68-75, 1999.
[10] Martin, E., Kriegel, H., Sander, J., and Xu, X., “A Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases with
Noise”, in Proc. Of KDD 96: pp. 226-231, 1996.
[11] Xu X., Ester M., Kriegel H.P. and Sander J. “A distribution– based
Clustering Algorithm for Mining in Large Spatial Databases", in Proc.
Int. Conf. Data Engineering (ICDE'98). 1998.
[12] Agrawal R., Gehrke J., Gunopulos D., and Raghavan P. “Automatic Subspace
Clustering of High Dimensional Data for Data Mining Applications”, in
Proc. Int. Conf. Management of Data, pp. 94-105, 1998.
[13] Wang, W., Yang, J. and Muntz, R., “STING: A Statistical Information Grid
Approach to Spatial Data Mining”, in Proc. the 23th VLDB Conference,
pp.186-195, 1997.
[14] Sheikholeslami G., Chatterjee S., and Zhang A. “WaveCluster: A multi-
resolution clustering approach for very large spatial databases”, in
Proc. Int. Conf. Very Large Databases (VLDB’98), Aug., pp. 428-439, 1998.
[15] 黃信嘉, “差分動態分群演算法“, 大同大學資訊工程研究所碩士論文, 2008.
[16] Fisher, Douglas H., “Improving inference through conceptual clustering”
in Proc. AAAI Conference, pp. 461-465, 1987.
[17] 楊家豪, “應用模糊分群技術於需求導向之物流配送問題之研究“, 朝陽科技大學資
訊管理系碩士論文, 2006.
[18] Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P., “Automatic
SubspaceClustering of High Dimensional Data for Data Mining
Applications”, in Proc. the ACM SIGMOD Conference on Management of Data,
pp. 94-104, 1998.
[19] 陳榮昌, 林育臣, “群聚演算法及群聚參數的分析與探討”, 朝陽學報第八期, PP.
327-354., 2003.
[20] 陳連進, “以關聯度為基礎的基因表現叢集驗證之方法”, 國立成功大學資訊工程研
究所碩士論文, 2002.
[21] Estivill-Castro V. and Lee I., “AMOEBA: Hierarchical Clustering Based on
Spatial Proximity Using Delaunay Diagram,” in Proc. 9th Int. Spatial
Data Handling (SDH2000), Aug., pp. 10-12., 2000
[22] Estivill-Castro V. and Lee I., “AUTOCLUST: Automatic Clustering via
Boundary Extraction for Massive Point Data Sets,” in Proc. 5th Int.
Conf. Geo-Computation, Aug., pp. 23-25., 2000
[23] Xu X., Ester M., Kriegel H.P., and Sander J., “A distribution– based
Clustering Algorithm for Mining in Large Spatial Databases,” in Proc.
14th Int. Conf. Data Engineering (ICDE'98)., 1998.
[24] Keller J.M., Gray M.R., and Givens J.A., “A fuzzy k-nearest neighbors
algorithm”, IEEE Transactions on System, Man, and Cybernetics, Vol. SMC-
15, pp. 580-585, 1985.
[25] Maria H., Michalis V., “A density-based cluster validity approach using
multi-representatives”, Pattern Recognition Letters. 29, pp. 773-786,
2008.
[26] 林彥廷, “以資料間距為基礎的非監督式聚類演算法”, 國立台灣科技大學電機工程
研究所碩士論文, 2000.
[27] Sibson, R., “SLINK: an optimally efficient algorithm for the single-link
cluster method”, The Computer Journal 16(1): 30–34, 1973.
[28] Otsu N., “A Threshold Selection Method from Gray-Level Histogram”, IEEE
Trans. Systems, Man and Cybernetics. Vol. 9, pp.62-66, 1979.
[29] Kapur J.N., P.K. Sahoo, and A.K.C. Wong, “A new method for gray-level
picture thresholding using the entropy of the histogram”, Computer
Vision Graphics and Image Process. 29(3), pp. 273–285, 1985.
[30] Canny J., “A Computational Approach to Edge Detection”, IEEE
Transactions on Pattern Analysis and Machine intelligence, Vol. PAMI-8,
NO.6, 1986.
[31] Dae-Won K., KiYoung L., Doheon L., and Kwang H. L., “A kernel-based
subtractive clustering method”, Pattern Recognition Letters. 26, pp.
879–891, 2005.
[32] Gleb V. N. , Dongquan L. , Olga S., “Automatic clustering and boundary
detection algorithm based on adaptive influence function”, Pattern
Recognition. 41, pp. 2757–2776, 2008.

無法下載圖示 全文公開日期 2014/07/13 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE