研究生: |
丁志豪 Chih-Hao Ding |
---|---|
論文名稱: |
平行密度分群演算法之研究 A Study of Parallel DBSCAN Algorithms |
指導教授: |
陳維美
Wei-Mei Chen |
口試委員: |
陳維美
Wei-Mei Chen 吳晉賢 Chin-Hsien Wu 陳永耀 Yung-Yao Chen 阮聖彰 Shanq-Jang Ruan |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 分群演算法 、平行 、分群 、GPU 、cell 、DBSCAN |
外文關鍵詞: | DBSCAN, GPU, Algorithm, Parallelism, Clstering, cell |
相關次數: | 點閱:196 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分群演算法(Clustering Algorithms) 可將多維資料歸納於若干個子集合,常
見的方法有K-MEANS、DBSCAN、EM-Clustering、OPTICS、Agglomerative Clustering,
其中DBSCAN 以密度作為分群標準,對任意形狀之資料分布皆能有效處
理,且分群結構不受資料集中的雜訊點(noise) 影響。本論文提出一個利用cell 減
少ϵ-neighbor 計算成本的方法,提升循序運算的效率,並提出其平行的版本,妥
善的利用GPU 的演算能力。
Clustering Algorithms are used to collect similar entities of multidimensional data
into several subsets. Conventional methods are K-MEANS, DBSCAN, EM-Clustering,
OPTICS, and Agglomerative Clustering. Among them, DBSCAN uses density as its
grouping strategy. DBSCAN is not affected by the shape of data and noise points. In
this thesis, a sequential clustering algorithm is devised to enhance the computaitional efficiency
through the characteristics of the cells. The resulting clusters are identical to
those calculated by DBSCAN. Furthermore, a parallel version is proposed to compete
with state-of-the-art methods. According to the experimental results, the sequential approach
can achieve up to 3x speedup and the parallel approach can achieve up to 30x
speedup.
參考文獻
[1] T. M. John Cheng, Max Grossman, “Professional CUDA C Programming,” Wrox
Press Ltd., 2014.
[2] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm,”
Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28,
no. 1, pp. 100–108, 1979.
[3] Q. Zhang and I. Couloigner, “A new and efficient k-medoid algorithm for spatial
clustering,” in Computational Science and Its Applications – ICCSA 2005 (O. Gervasi,
M. L. Gavrilova, V. Kumar, A. Laganà, H. P. Lee, Y. Mun, D. Taniar, and
C. J. K. Tan, eds.), (Berlin, Heidelberg), pp. 181–189, Springer Berlin Heidelberg,
2005.
[4] L. LIU and M. T. ÖZSU, “CLARA (Clustering LARge Applications),” in Springer
US, pp. 330–330, 2009.
[5] V. Estivill-Castro and I. Lee, “Autoclust: Automatic clustering via boundary extraction
for mining massive point-data sets,” in In Proceedings of the 5th International
Conference on Geocomputation, pp. 23–25, 2000.
[6] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering
method for very large databases,” SIGMOD Rec., vol. 25, no. 2, p. 103–114, 1996.
[7] G. Karypis, H. Eui-Hong, and V. Kumar, “Chameleon: hierarchical clustering using
dynamic modeling,” Computer, vol. 32, no. 8, pp. 68–75, 1999.
[8] S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical
attributes,” Information Systems, vol. 25, no. 5, pp. 345–366, 2000.
[9] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering
clusters in large spatial databases with noise,” in Proceedings of the Second
International Conference on Knowledge Discovery and Data Mining, KDD’96,
p. 226–231, AAAI Press, 1996.
[10] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: ordering points
to identify the clustering structure,” SIGMOD Rec., vol. 28, no. 2, p. 49–60, 1999.
[11] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic subspace clustering
of high dimensional data for data mining applications,” SIGMOD Rec., vol. 27,
no. 2, p. 94–105, 1998.
[12] A. Hinneburg and D. A. Keim, “An efficient approach to clustering in large multimedia
databases with noise,” in KDD, 1998.
[13] M. Kryszkiewicz and P. Lasek, “TI-DBSCAN: Clustering with DBSCAN by Means
of the Triangle Inequality,” pp. 60–69, 06 2010.
[14] K. Mahesh Kumar and A. Rama Mohan Reddy, “A fast DBSCAN clustering algorithm
by accelerating neighbor searching using Groups method,” Pattern Recognition,
vol. 58, pp. 39–48, 2016.
[15] P. Viswanath and R. Pinkesh, “l-DBSCAN : A Fast Hybrid Density Based Clustering
Method,” in 18th International Conference on Pattern Recognition (ICPR’06),
vol. 1, pp. 912–915.
[16] P. Viswanath and V. Suresh Babu, “Rough-DBSCAN: A fast hybrid density based
clustering method for large data sets,” Pattern Recognition Letters, vol. 30, no. 16,
pp. 1477–1488, 2009.
[17] X. Xu, J. Jäger, and H.-P. Kriegel, “A Fast Parallel Clustering Algorithm for Large
Spatial Databases,” Data Mining and Knowledge Discovery, vol. 3, no. 3, pp. 263–
290, 1999.
[18] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: an efficient
and robust access method for points and rectangles,” SIGMOD Rec., vol. 19, no. 2,
p. 322–331, 1990.
[19] D. Arlia and M. Coppola, “Experiments in Parallel Clustering with DBSCAN,”
in Euro-Par 2001 Parallel Processing (R. Sakellariou, J. Gurd, L. Freeman, and
J. Keane, eds.), pp. 326–331, Springer Berlin Heidelberg.
[20] Y. He, H. Tan, W. Luo, S. Feng, and J. Fan, “MR-DBSCAN: a scalable MapReducebased
DBSCAN algorithm for heavily skewed data,” Frontiers of Computer Science,
vol. 8, 02 2014.
[21] I. Cordova and T.-S. Moh, “Dbscan on resilient distributed datasets,” in 2015 International
Conference on High Performance Computing Simulation (HPCS), pp. 531–
540, 2015.
[22] M. M. A. Patwary, D. Palsetia, A. Agrawal, W. Liao, F. Manne, and A. Choudhary,
“A new scalable parallel DBSCAN algorithm using the disjoint-set data structure,”
in SC ’12: Proceedings of the International Conference on High Performance Computing,
Networking, Storage and Analysis, pp. 1–11.
[23] A. Lulli, M. Dell’Amico, P. Michiardi, and L. Ricci, “NG-DBSCAN: scalable
density-based clustering for arbitrary data,” Proc. VLDB Endow., vol. 10, no. 3,
p. 157–168, 2016.
[24] H. Song and J.-G. Lee, “RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm
Based on Random Partitioning,” in Proceedings of the 2018 International Conference
on Management of Data, SIGMOD ’18, (New York, NY, USA), p. 1173–1187,
Association for Computing Machinery, 2018.
[25] K. Yang, Y. Gao, R. Ma, L. Chen, S. Wu, and G. Chen, “DBSCAN-MS: Distributed
Density-Based Clustering in Metric Spaces,” in 2019 IEEE 35th International Conference
on Data Engineering (ICDE), pp. 1346–1357, 2019.
[26] C. Böhm, R. Noll, C. Plant, and B. Wackersreuther, “Density-based clustering using
graphics processors,” in Proceedings of the 18th ACM Conference on Information
and Knowledge Management, CIKM ’09, (New York, NY, USA), p. 661–670, Association
for Computing Machinery, 2009.
[27] M. Gowanlock, C. M. Rude, D. M. Blair, J. D. Li, and V. Pankratius, “A Hybrid Approach
for Optimizing Parallel Clustering Throughput using the GPU,” IEEE Transactions
on Parallel and Distributed Systems, vol. 30, no. 4, pp. 766–777, 2019.
[28] M. Gowanlock, D. M. Blair, and V. Pankratius, “Optimizing Parallel Clustering
Throughput in Shared Memory,” IEEE Transactions on Parallel and Distributed
Systems, vol. 28, no. 9, pp. 2595–2607, 2017.
[29] W.-K. Loh and H. Yu, “Fast density-based clustering through dataset partition using
graphics processing units,” Information Sciences, vol. 308, pp. 94–112, 2015.
[30] Y. Wang, Y. Gu, and J. Shun, “Theoretically-efficient and practical parallel dbscan,”
in Proceedings of the 2020 ACM SIGMOD International Conference on Management
of Data, SIGMOD ’20, (New York, NY, USA), p. 2555–2571, Association for
Computing Machinery, 2020.
[31] K.-S. Chang, Y.-W. Peng, and W.-M. Chen, “Density-based clustering algorithm for
GPGPU computing,” in 2017 International Conference on Applied System Innovation
(ICASI), pp. 774–777, 2017.