簡易檢索 / 詳目顯示

研究生: 張凱翔
Kai-Shiang Chang
論文名稱: 密度分群演算法於GPGPU運算之研究
Density-based clustering algorithm for GPGPU computing
指導教授: 陳維美
Wei-Mei Chen
口試委員: 吳晉賢
Chin-Hsien Wu
呂政修
Jenq-Shiou Leu
林昌鴻
Chang Hong Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 59
中文關鍵詞: 通用圖形處理器密度分群法平行計算
外文關鍵詞: DBSCAN, GPGPU, parallel computing
相關次數: 點閱:375下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分群是常見的一種資料探勘的技術,可以將多分量資料中擁有分量相似的資料區分在不同的組別形成不同的子集,而DBSCAN是其中一個廣受用的分群演算法,他可以將群集分成任意形狀與有效找出資料中雜訊的優點,但隨著資料量越來越大,演算法執行時間越來越長,因此許多研究致力於加速DBSCAN的計算。近年來,由於GPU其高度平行與高頻寬的記憶體的優點與GPGPU計算的發展,能加速演算法的計算。本文提出一個方法,將各別分量數值近似的資料點分類,建立索引加速找出核心點,更在資料點搜尋索引時設定搜尋限制條件,並提出部分連接核心點的方式來平行的分群,達到減少資料點之間相互計算距離的工作量。除此之外我們有效分配資料在on-chip的記憶體上加速記憶體的讀寫速度並提高演算法的執行效率,最後在實驗結果顯示相較於常見的GPU-based DBSCAN演算法,效率皆有顯著提升。


    Clustering is a common data mining technique that groups multi-dimensional data points which have similar components to form different subsets. DBSCAN is a famous algorithm in clustering technique that has advantages of finding clusters with arbitrary shapes and noise of data set. However, with data volumes growing and the execution time of algorithms becoming longer, numerous methods have been developed to accelerate algorithm execution times of DBSCAN. Moreover, graphics processing unit (GPU), which feature massively parallel and high-bandwidth memory structures, and general-purpose GPU (GPGPU), can speed the execution of the algorithm. This study proposed an algorithm by using GPU that exploit the data points with similar components to establish indices, thereby facilitating the identification of core points fast. In addition, this algorithm specified a search criteria for indices search, clustered data through the parallel partial connection to the core point to reduce the amount of distance computation between the data points, and allocated the data to on-chip memory to improve the execution speed of the algorithm. The experimental results indicated that the proposed algorithm was more efficient at execution than the other GPU-based DBSCAN.

    一、介紹 二、GPU架構 三、研究方法 四、實驗模擬 五、結論 六、參考文獻

    [1] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.” ACM SIGMOD international conference on Management of data, pp. 94-105, 1998.
    [2] G. Andrade, G. Ramos, and D. Madeira, R. Sachetto, R. Ferreira, L. Rocha. “G-dbscan: A gpu accelerated algorithm for density-based clustering.” Procedia Computer Science, vol. 18, pp. 369-378, 2013.
    [3] M. Ankerst, M.M. Breunig, H.P. Kriegel, and J. Sander, “OPTICS: Ordering points to identify the clustering structure.” ACM SIGMOD international conference on Management of data, pp. 49-60. 1999.
    [4] D. Arlia, and M. Coppola. “Experiments in parallel clustering with DBSCAN.” European Conference on Parallel Processing, pp. 326-331, Aug. 2001.
    [5] N. Beckmann, H.P Kriegel, R. Schneider, and B. Seeger. “The R*-tree: an efficient and robust access method for points and rectangles.” ACM SIGMOD Record. Vol. 19. No. 2, pp. 322-331, 1990.
    [6] J.L. Bentley, “Multidimensional binary search trees used for associative searching.” Communications of the ACM, vol.18 no. 9, pp. 509-517, Sept. 1975.
    [7] C. Böhm, R. Noll, C. Plant, and B. Wackersreuther. “Density-based clustering using graphics processors.” ACM conference on Information and knowledge management, pp. 661-670, 2009.
    [8] B. Borah, and D.K. Bhattacharyya. “An improved sampling-based DBSCAN for large spatial databases.” IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on, pp. 92-96, 2004.
    [9] S. Brecheisen, H.P. Kriegel, and M. Pfeifle. “Parallel density-based clustering of complex objects.” Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 179-188, 2006.
    [10] M. Ester, H.P. Kriegel, J. Sander, and X. Xu. “Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” Proceedings of International Conference on Knowledge Discovery and Data Mining, pp. 226-231, Aug. 1996.
    [11] V. Estivill-Castro, and I. Lee. “AMOEBA: Hierarchical clustering based on spatial proximity using Delaunay diagram.” Proceedings of the 9th International Symposium on Spatial Data Handling, pp. 10-12, Aug. 2000.
    [12] V. Estivill-Castro, and I. Lee. “AUTOCLUST: Automatic clustering via boundary extraction for mining massive point-data sets.” Proceedings of the 5th International Conference on Geocomputation. 2000.
    [13] U.M. Fayyad, G.P. Shapiro, P. Smyth, and R. Uthurusamy. “Advances in Knowledge Discovery and Data Mining.” MIT Press, 1996.
    [14] R.C. Gonzalez, and R.E. Woods. “Digital Image Processing.” Nueva Jersey, 2008.
    [15] S. Guha, R. Rastogi, and K. Shim. “CURE: an efficient clustering algorithm for large databases.” ACM SIGMOD Record. Vol. 27. No. 2. ACM, 1998.
    [16] S. Guha, R. Rastogi, and K. Shim. “ROCK: A robust clustering algorithm for categorical attributes.” IEEE Proceedings of the 15th International Conference on Data Engineering, pp. 512-521, Mar. 1999.
    [17] R.H. Güting. “An introduction to spatial database systems.” The International Journal on Very Large Data Bases, vol. 3, no. 4, pp. 357-399,Oct. 1994.
    [18] A. Guttman. “R-trees: a dynamic index structure for spatial searching.” ACM SIGMOD international conference on Management of data, pp. 47-57, 1984.
    [19] G. Karypis, E.H. Han, and V. Kumar. “Chameleon: Hierarchical clustering using dynamic modeling.” Computer, vol. 32, no. 8, pp. 68-75, Aug. 1999.
    [20] L. Kaufman, and P.J. Rousseeuw. “Partitioning around medoids (program pam).” Finding groups in data: an introduction to cluster analysis, pp. 68-125, 1990.
    [21] L. Kaufman, and P.J. Rousseeuw. “Finding groups in data: an introduction to cluster analysis.” John Wiley & Sons, vol. 344, 2009.
    [22] M. Kryszkiewicz, and P. Lasek. “TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality.” International Conference on Rough Sets and Current Trends in Computing, pp. 60-69, Jun. 2010.
    [23] W.K. Loh, and H. Yu. “Fast density-based clustering through dataset partition using graphics processing units.” Information Sciences, vol. 308, pp. 94-112, Jul. 2015.
    [24] J. MacQueen, “Some methods for classification and analysis of multivariate observations.” Proceedings of the 55th Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14, pp. 281-297, 1967.
    [25] S.C. Madeira, and A.L. Oliveira, “Biclustering algorithms for biological data analysis: a survey.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan. 2004.
    [26] K. Mahesh Kumar, and A. Rama Mohan Reddy. “A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method.” Pattern Recognition, vol. 58, pp. 39-48, Oct. 2016.
    [27] R.T. Ng, and J. Han. “Efficient and effective clustering method for spatial data mining.” Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144-155, 1994.
    [28] Nvidia, C. U. D. A. "C programming guide v7. 5." NVIDIA Corporation, Sept. 2015.
    [29] Nvidia. “Nvidia GP100 Pascal Whitepaper.” NVIDIA Corporation, 2016.
    [30] D. Palsetia, A. Agrawal, W.K. Liao, F. Manne, A. Choudhary. “A new scalable parallel DBSCAN algorithm using the disjoint-set data structure.” IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-11, Nov. 2012.
    [31] Z. Pawlak. “Rough set theory and its applications to data analysis.” Cybernetics and Systems, vol. 29, no. 7, pp. 661-688, 1998.
    [32] J.T. Robinson. “The KDB-tree: a search structure for large multidimensional dynamic indexes.” ACM SIGMOD Proceedings of the international conference on Management of data, pp. 10-18, 1981.
    [33] H. Spath. “Cluster Analysis Algorithms for Data Reduction and Classification.” Ellis Horwood,1980.
    [34] H. Tan, W. Luo, H. Mao, D. Ma, S. Feng, and J. Fan. “MR-DBSCAN: an efficient parallel density-based clustering algorithm using MapReduce.” IEEE International Conference on Parallel and Distributed Systems, pp. 473-480, Dec. 2011.
    [35] S. Theodoridis, and K. Koutroumbas. “Pattern Recognition, second ed.”, Academic Press, 2003.
    [36] P. Viswanath, and V.S. Babu. “Rough-DBSCAN: a fast hybrid density based clustering method for large data sets.” Pattern Recognition Letters, vol. 30, no. 16, pp. 1477-1488, Dec. 2009.
    [37] P. Viswanath, and R. Pinkesh. “l-DBSCAN : A fast hybrid density based clustering method.” International Conference on Pattern Recognition, vol. 1, pp. 912-915, 2006.
    [38] W. Wang, J. Yang, and R. Muntz. “STING: A Statistical Information grid Approach to Spatial Data Mining.” International Conference on Very Large Data Bases, pp. 186-195, 1997.
    [39] X. Xu, M. Ester, H.P. Kriegel, and J. Sander. “A distribution-based Clustering Algorithm for Mining in Large Spatial Databases.” International Conference on Data Engineering, pp. 324-331, Feb. 1998.
    [40] X. Xu, J. Jäger, and H.P. Kriegel. “A fast parallel clustering algorithm for large spatial databases.” Data Mining and Knowledge Discovery, vol. 3, no. 3, pp. 263-290, Sept. 1999.
    [41] T. Zhang, R. Ramakrishnan, and M. Livny. “BIRCH: an efficient data clustering method for very large databases.” ACM SIGMOD international conference on Management of data, pp. 103-144, 1996.

    無法下載圖示 全文公開日期 2021/08/19 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE