研究生: |
王韋棟 Wei-tung Wang |
---|---|
論文名稱: |
依據資料而修正之基於密度與雜訊辨別聚類分群法 Adaptive Density-based Spatial Clustering of Applications with Noise (DBSCAN) According to Data |
指導教授: |
吳怡樂
Yi-Leh Wu |
口試委員: |
陳建中
none 唐政元 none 閻立剛 none |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 英文 |
論文頁數: | 67 |
中文關鍵詞: | 資料探勘 、DBSCAN 、分群演算法 |
外文關鍵詞: | Data mining, Clustering, DBSCAN |
相關次數: | 點閱:320 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分群是一依據資料某些特性而將其分成不同群聚的技術。DBSCAN為一個基於密度的分群演算法。DBSCAN演算法中需要兩個使用者自訂的參數,而這兩個參數在使用者未對資料有事先研究過的情況下往往難以決定,但卻對分群的結果有著顯著的影響。
在密度有變化的資料中,DBSCAN也不容易分出正確的群。我們修改了原來的DBSCAN,主要的概念是讓DBSCAN在不同的密度資料有著不同的參數,使其能依據資料分佈而自行決定資料中各密度階層所對應的參數,也讓DBSCAN在密度變化大的資料中,可以改善分群的結果。
Clustering is a task that aims to grouping data objects into several groups. DBSCAN is a density-based clustering method. However, it requires two parameters and these two parameters are hard to decide. Also, DBSCAN has difficulties in finding clusters when the density changes in the dataset. In this paper, we modify the original DBSCAN to make it able to determine the appropriate eps values according to data distribution and to cluster when the density varies among dataset.
The main idea is to run DBSCAN with different eps and Minpts values. We also modified the calculation of the Minpts so that DBSCAN can have better clustering results. We did several experiments to evaluate the performance. The results suggest that our proposed DBSCAN can automatically decide the appropriate eps and Minpts values and can detect clusters with different density-levels.
[1]Chaudhari Chaitali G., “Optimizing Clustering Technique based on Partitioning DBSCAN and Ant Clustering Algorithm”, International Journal of Engineering and Advanced Technology (IJEAT), Volume-2, Issue-2, December 2012.
[2]A. Ram, A. Sharma, A. S.Jalall, R. Singh, and A. Agrawal, “An Enhanced Density Based Spatial Clustering Of Applications with Noise,” IEEE International Advance Computing Conference, March, 2009.
[3]Z. Ye, H. Cao, M. Wang, and Y. Zhang, “An Improved Density-Based Cluster Analysis Method Combining Genetic Algorithm and Data Sampling for Large-Scale Datasets,” Control Conference (CCC), July 26-28, 2013.
[4] A. Fahim, G. Saake, A. Salem F. Torkey, and M. Ramadan, “An Enhanced Density Based Spatial clustering of Applications with Noise,” Proceedings of The 2009 International Conference on Data Mining, DMIN 2009, July 13-16, 2009.
[5] P. Liu, D. Zhou, and N. Wu,” VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise,” Service Systems and Service Management, 2007 International Conference, 2007.
[6]M. N. Gaonkar and K. Sawant “AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset,” International Journal on Advanced Computer Theory and Engineering, 2319 – 2526, Volume-2, Issue-2, 2013.
[7]S. Mitra and J. Nandy, “KDDClus: A Simple Method for Multi-Density Clustering”, 2010.
[8] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases”, Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996.
[9] DBSCAN, http://en.wikipedia.org/wiki/DBSCAN, referenced on December 1st, 2013.
[11] Clustering datasets, http://cs.joensuu.fi/sipu/datasets/, referenced on April 1st, 2014.
[12] H. Chang and D.Y. Yeung, “Robust path-based spectral clustering”, Pattern Recognition, 41(1):191-203, January 2008.
[13] UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/index.html, referenced on March 1st, 2013.
[14] A. Gionis, H. Mannila, and P. Tsaparas, “ Clustering aggregation,” Knowledge Discovery from Data (TKDD), 1(1), 2007.
[15] M. Sokolova and G. Lapalme, “ A systematic analysis of performance measures for classification tasks," Information Processing and Management, ” vol.45, 2009.
[16] Accuracy and precision, https://en.wikipedia.org/wiki/Accuracy_and_precision, referenced on May 1st, 2014.
[17] C. T. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters”, IEEE Trans. Comput. 20, 1 (January 1971), 68-86. DOI=10.1109/T-C.1971.223083 http://dx.doi.org/10.1109/T-C.1971.223083, 1971.
[18] C. J. Veenman, M. J. T. Reinders, and E. Backer, “A Maximum Variance Cluster Algorithm,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 9 (September 2002), 1273-1280. DOI=10.1109/TPAMI.2002.1033218 http://dx.doi.org/10.1109/TPAMI.2002.1033218, 2002.
[19] G. Karypis, E.-H. Han, and V. Kumar , “Chameleon: hierarchical clustering using dynamic modeling”, Computer (Volume 32 , Issue: 8 ), Aug. 1999.