依據資料而修正之基於密度與雜訊辨別聚類分群法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王韋棟 Wei-tung Wang
論文名稱：	依據資料而修正之基於密度與雜訊辨別聚類分群法 Adaptive Density-based Spatial Clustering of Applications with Noise (DBSCAN) According to Data
指導教授：	吳怡樂 Yi-Leh Wu
口試委員:	陳建中 none 唐政元 none 閻立剛 none
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	67
中文關鍵詞：	資料探勘、DBSCAN 、分群演算法
外文關鍵詞：	Data mining, Clustering, DBSCAN
相關次數：	點閱：320 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

分群是一依據資料某些特性而將其分成不同群聚的技術。DBSCAN為一個基於密度的分群演算法。DBSCAN演算法中需要兩個使用者自訂的參數，而這兩個參數在使用者未對資料有事先研究過的情況下往往難以決定，但卻對分群的結果有著顯著的影響。
在密度有變化的資料中，DBSCAN也不容易分出正確的群。我們修改了原來的DBSCAN，主要的概念是讓DBSCAN在不同的密度資料有著不同的參數，使其能依據資料分佈而自行決定資料中各密度階層所對應的參數，也讓DBSCAN在密度變化大的資料中，可以改善分群的結果。

Clustering is a task that aims to grouping data objects into several groups. DBSCAN is a density-based clustering method. However, it requires two parameters and these two parameters are hard to decide. Also, DBSCAN has difficulties in finding clusters when the density changes in the dataset. In this paper, we modify the original DBSCAN to make it able to determine the appropriate eps values according to data distribution and to cluster when the density varies among dataset.
The main idea is to run DBSCAN with different eps and Minpts values. We also modified the calculation of the Minpts so that DBSCAN can have better clustering results. We did several experiments to evaluate the performance. The results suggest that our proposed DBSCAN can automatically decide the appropriate eps and Minpts values and can detect clusters with different density-levels.

論文摘要1
ABSTRACT        2
CONTENTS        3
LIST OF FIGURES4
LIST OF TABLES7
CHAPTER 1.  INTRODUCTION8
CHAPTER 2.  DBSCAN10
CHAPTER 3. MODIFIED DBSCAN14
3.1 DETERMINING EPS14
3.2 EVALUATING MINPTS17
3.3 ADDITIONAL EPS AND MINPTS18
CHAPTER 4.  EXPERIMENTS AND RESULTS21
4.1 DATASET AND SETUP21
4.2 PERFORMANCE AND EVALUATION21
4.3 COMPARISONS44
4.4 OPTIMAL K56
4.5 DISCUSSION OF EXPERIMENTS60
CHAPTER 5.  CONCLUSIONS AND FUTURE WORK61
REFERENCES62

                                

[1]Chaudhari Chaitali G., “Optimizing Clustering Technique based on Partitioning DBSCAN and Ant Clustering Algorithm”, International Journal of Engineering and Advanced Technology (IJEAT), Volume-2, Issue-2, December 2012.
[2]A. Ram, A. Sharma, A. S.Jalall, R. Singh, and A. Agrawal, “An Enhanced Density Based Spatial Clustering Of Applications with Noise,” IEEE International Advance Computing Conference, March, 2009.
[3]Z. Ye, H. Cao, M. Wang, and Y. Zhang, “An Improved Density-Based Cluster Analysis Method Combining Genetic Algorithm and Data Sampling for Large-Scale Datasets,” Control Conference (CCC), July 26-28, 2013.
[4] A. Fahim, G. Saake, A. Salem F. Torkey, and M. Ramadan, “An Enhanced Density Based Spatial clustering of Applications with Noise,” Proceedings of The 2009 International Conference on Data Mining, DMIN 2009, July 13-16, 2009.
[5] P. Liu, D. Zhou, and N. Wu,” VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise,” Service Systems and Service Management, 2007 International Conference, 2007.
[6]M. N. Gaonkar and K. Sawant “AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset,” International Journal on Advanced Computer Theory and Engineering, 2319 – 2526, Volume-2, Issue-2, 2013.
[7]S. Mitra and J. Nandy, “KDDClus: A Simple Method for Multi-Density Clustering”, 2010.
[8] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases”, Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996.
[9] DBSCAN, http://en.wikipedia.org/wiki/DBSCAN, referenced on December 1st, 2013.
[11] Clustering datasets, http://cs.joensuu.fi/sipu/datasets/, referenced on April 1st, 2014.
[12] H. Chang and D.Y. Yeung, “Robust path-based spectral clustering”, Pattern Recognition, 41(1):191-203, January 2008.
[13] UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/index.html, referenced on March 1st, 2013.
[14] A. Gionis, H. Mannila, and P. Tsaparas, “ Clustering aggregation,” Knowledge Discovery from Data (TKDD), 1(1), 2007.
[15] M. Sokolova and G. Lapalme, “ A systematic analysis of performance measures for classiﬁcation tasks," Information Processing and Management, ” vol.45, 2009.
[16] Accuracy and precision, https://en.wikipedia.org/wiki/Accuracy_and_precision, referenced on May 1st, 2014.
[17] C. T. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters”, IEEE Trans. Comput. 20, 1 (January 1971), 68-86. DOI=10.1109/T-C.1971.223083 http://dx.doi.org/10.1109/T-C.1971.223083, 1971.
[18] C. J. Veenman, M. J. T. Reinders, and E. Backer, “A Maximum Variance Cluster Algorithm,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 9 (September 2002), 1273-1280. DOI=10.1109/TPAMI.2002.1033218 http://dx.doi.org/10.1109/TPAMI.2002.1033218, 2002.
[19] G. Karypis, E.-H. Han, and V. Kumar , “Chameleon: hierarchical clustering using dynamic modeling”, Computer (Volume 32 , Issue: 8 ), Aug. 1999.

簡易檢索 / 詳目顯示

相關論文