簡易檢索 / 詳目顯示

研究生: 里冠樺
KUAN-HUA LI
論文名稱: 空間索引架構用於平行密度分群演算法之研究
A Study of Spatial Indexing Structures on Parallel Density-based Clustering Algorithms
指導教授: 陳維美
Wei-Mei Chen
口試委員: 林淵翔
Yuan-Hsiang Lin
林昌鴻
Chang-Hong Lin
呂政修
Jenq-Shiou Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 57
中文關鍵詞: 圖形處理器平行演算法分群
外文關鍵詞: GPU, Parallel, Algorithm, Clustering
相關次數: 點閱:242下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分群演算法 (Clustering Algorithms) 用於將多維資料歸納至各個群聚之中,DBSCAN 是近年來被大幅研究與發展的基於密度的分群演算法,其特性是可以有效分析任意形狀的群聚,以及不容易被雜訊點 (noise) 給影響。本論文提出在 GPU 架構中可以有效進行空間索引的資料結構,應用在平行 DBSCAN 演算法來提升演算的效率。實驗結果顯示,此架構就算在高維空間中也可以有效提升演算法的執行效率。


    Clustering Algorithms are used to assign multi-dimensional data to each cluster. DBSCAN
    is a density-based algorithm that has been vastly researched and developed. It can identify clusters of arbitrary shape, and it is not easy to be affected by noise. This thesis proposes a spatial indexing structure that can effectively perform spatial query applied to parallel DBSCAN algorithm to increase the computational efficiency. The experimental results show that this structure can effectively improve the algorithm’s efficiency, even in high-dimensional space.

    Contents Abstract in Chinese . . . . .. . . . . . . . . . . . . . . . . . . . i Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . .. . . . . . . . . . . . . . . . . . . . iii Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 DBSCAN algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Variants of DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 GPGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 R-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Researched methods . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 13 3.1 Cell-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Parallel assigning data points to each cell . . . . . . . . . . . . . 13 3.1.2 Parallel identifying core cells . . . . . . . . . . . . . . . . . . . . 14 3.1.3 Parallel merging of core cells . . . . . . . . . . . . . . . . . . . 15 3.1.4 Parallel linking of border points and finding noise points . . . . . 17 3.2 Tree-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Parallel construction of complete R-trees . . . . . . . . . . . . . 20 3.2.2 Parallel search for core points . . . . . . . . . . . . . . . . . . . 22 3.2.3 Parallel merging of core points . . . . . . . . . . . . . . . . . . . 24 3.2.4 Parallel linking of border points and finding noise points . . . . . 27 3.2.5 Flowchart of tree-based algorithm . . . . . . . . . . . . . . . . . 28 4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1 Experimental environment . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 The clustering performance of cell-based approach . . . . . . . . . . . . 31 4.3 Real datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Synthetic datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.5 Space complexity of structures . . . . . . . . . . . . . . . . . . . . . . . 42 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    [1] J. Cheng, M. Grossman, and T. McKercher, Professional CUDA c programming. John Wiley & Sons,
    2014.
    [2] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-means clustering algorithm,” Journal of the
    royal statistical society. series c (applied statistics), vol. 28, no. 1, pp. 100–108, 1979.
    [3] L. Rdusseeun and P. Kaufman, “Clustering by means of medoids,” in Proceedings of the statistical
    data analysis based on the L1 norm conference, neuchatel, switzerland, vol. 31, 1987.
    [4] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis. John
    Wiley & Sons, 2009.
    [5] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering method for very large
    databases,” ACM sigmod record, vol. 25, no. 2, pp. 103–114, 1996.
    [6] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,”
    ACM Sigmod record, vol. 27, no. 2, pp. 73–84, 1998.
    [7] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining applications,” in Proceedings of the 1998 ACM SIGMOD international
    conference on Management of data, pp. 94–105, 1998.
    [8] A. Hinneburg, D. A. Keim, et al., An efficient approach to clustering in large multimedia databases
    with noise, vol. 98. Bibliothek der Universität Konstanz, 1998.
    [9] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters
    in large spatial databases with noise.,” in kdd, vol. 96, pp. 226–231, 1996.
    [10] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: Ordering points to identify the
    clustering structure,” ACM Sigmod record, vol. 28, no. 2, pp. 49–60, 1999.
    [11] M. Kryszkiewicz and P. Lasek, “TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle
    Inequality,” in Rough Sets and Current Trends in Computing: 7th International Conference, RSCTC
    2010, Warsaw, Poland, June 28-30, 2010. Proceedings 7, pp. 60–69, Springer, 2010.
    [12] P. Viswanath and R. Pinkesh, “l-dbscan: A fast hybrid density based clustering method,” in 18th
    International Conference on Pattern Recognition (ICPR’06), vol. 1, pp. 912–915, IEEE, 2006.
    [13] P. Viswanath and V. S. Babu, “Rough-DBSCAN: A fast hybrid density based clustering method for
    large data sets,” Pattern Recognition Letters, vol. 30, no. 16, pp. 1477–1488, 2009.
    [14] K. M. Kumar and A. R. M. Reddy, “A fast DBSCAN clustering algorithm by accelerating neighbor
    searching using Groups method,” Pattern Recognition, vol. 58, pp. 39–48, 2016.
    45
    [15] X. Xu, J. Jäger, and H.-P. Kriegel, “A fast parallel clustering algorithm for large spatial databases,”
    High Performance Data Mining: Scaling Algorithms, Applications and Systems, pp. 263–290, 2002.
    [16] “Experiments in parallel clustering with DBSCAN, author=Arlia, Domenica and Coppola, Massimo,”
    in Euro-Par 2001 Parallel Processing: 7th International Euro-Par Conference Manchester, UK, August 28–31, 2001 Proceedings 7, pp. 326–331, Springer, 2001.
    [17] Y. He, H. Tan, W. Luo, S. Feng, and J. Fan, “MR-DBSCAN: a scalable MapReduce-based DBSCAN
    algorithm for heavily skewed data,” Frontiers of Computer Science, vol. 8, pp. 83–99, 2014.
    [18] “DBSCAN on resilient distributed datasets, author=Cordova, Irving and Moh, Teng-Sheng,” in 2015
    International Conference on High Performance Computing & Simulation (HPCS), pp. 531–540, IEEE,
    2015.
    [19] M. M. A. Patwary, D. Palsetia, A. Agrawal, W.-k. Liao, F. Manne, and A. Choudhary, “A new scalable
    parallel DBSCAN algorithm using the disjoint-set data structure,” in SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11,
    IEEE, 2012.
    [20] A. Lulli, M. Dell’Amico, P. Michiardi, and L. Ricci, “NG-DBSCAN: scalable density-based clustering
    for arbitrary data,” Proceedings of the VLDB Endowment, vol. 10, no. 3, pp. 157–168, 2016.
    [21] M. Gowanlock, C. M. Rude, D. M. Blair, J. D. Li, and V. Pankratius, “A hybrid approach for optimizing parallel clustering throughput using the GPU,” IEEE Transactions on Parallel and Distributed
    Systems, vol. 30, no. 4, pp. 766–777, 2018.
    [22] M. Gowanlock, D. M. Blair, and V. Pankratius, “Optimizing parallel clustering throughput in shared
    memory,” IEEE Transactions on parallel and distributed systems, vol. 28, no. 9, pp. 2595–2607, 2017.
    [23] W.-K. Loh and H. Yu, “Fast density-based clustering through dataset partition using graphics processing units,” Information Sciences, vol. 308, pp. 94–112, 2015.
    [24] C. Böhm, R. Noll, C. Plant, and B. Wackersreuther, “Density-based clustering using graphics processors,” in Proceedings of the 18th ACM conference on Information and knowledge management,
    pp. 661–670, 2009.
    [25] Y. Wang, Y. Gu, and J. Shun, “Theoretically-efficient and practical parallel DBSCAN,” in Proceedings
    of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2555–2571, 2020.
    [26] T. Boonchoo, X. Ao, Y. Liu, W. Zhao, F. Zhuang, and Q. He, “Grid-based DBSCAN: Indexing and
    inference,” Pattern Recognition, vol. 90, pp. 271–284, 2019.
    [27] Y. Chen, L. Zhou, N. Bouguila, C. Wang, Y. Chen, and J. Du, “BLOCK-DBSCAN: Fast clustering for
    large scale data,” Pattern Recognition, vol. 109, p. 107624, 2021.
    [28] K.-S. Chang, Y.-W. Peng, and W.-M. Chen, “Density-based clustering algorithm for GPGPU computing,” in 2017 International Conference on Applied System Innovation (ICASI), pp. 774–777, IEEE,
    2017.
    46
    [29] C.-H. DING, “A Study of Parallel DBSCAN Algorithms,” Master’s thesis, National Taiwan University
    of Science and Technology, 2022.
    [30] “R-trees: A dynamic index structure for spatial searching, author=Guttman, Antonin,” in Proceedings
    of the 1984 ACM SIGMOD international conference on Management of data, pp. 47–57, 1984.
    [31] N. Roussopoulos and D. Leifker, “Direct spatial search on pictorial databases using packed R-trees,” in
    Proceedings of the 1985 ACM SIGMOD international conference on Management of data, pp. 17–31,
    1985.
    [32] I. Kamel and C. Faloutsos, “Hilbert R-tree: An improved R-tree using fractals,” tech. rep., 1993.
    [33] S. T. Leutenegger, M. A. Lopez, and J. Edgington, “STR: A simple and efficient algorithm for Rtree packing,” in Proceedings 13th international conference on data engineering, pp. 497–506, IEEE,
    1997.
    [34] L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.,” J. Open Source
    Softw., vol. 2, no. 11, p. 205, 2017.
    [35] V. Mathur, J. Mehta, and S. Singh, “HCA-DBSCAN: HyperCube accelerated density based spatial
    clustering for applications with noise,” arXiv preprint arXiv:1912.00323, 2019.
    [36] D. Brown, A. Japa, and Y. Shi, “A fast density-grid based clustering method,” in 2019 IEEE 9th Annual
    Computing and Communication Workshop and Conference (CCWC), pp. 0048–0054, IEEE, 2019.
    [37] J. T. Robinson, “The KDB-tree: a search structure for large multidimensional dynamic indexes,” in
    Proceedings of the 1981 ACM SIGMOD international conference on Management of data, pp. 10–18,
    1981.
    [38] D. Dua and C. Graff, “UCI Machine Learning Repository,” 2017

    無法下載圖示 全文公開日期 2025/08/15 (校內網路)
    全文公開日期 2025/08/15 (校外網路)
    全文公開日期 2025/08/15 (國家圖書館:臺灣博碩士論文系統)
    QR CODE