簡易檢索 / 詳目顯示

研究生: 張婉儀
Wan-Yi Chang
論文名稱: 平行範圍搜尋問題與多維索引結構之研究
A Study of Parallel Range Query Based On Multi-dimensional Index
指導教授: 陳維美
Wei-Mei Chen
口試委員: 吳晋賢
Chin-Hsien Wu
林昌鴻
Chang Hong Lin
林淵翔
Yuan-Hsiang Lin
陳維美
Wei-Mei Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2022
畢業學年度: 111
語文別: 中文
論文頁數: 52
中文關鍵詞: Range Query樹狀結構平行演算法
外文關鍵詞: Range Query, Bulk-loading
相關次數: 點閱:164下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Range Query主要用於資料庫中搜尋出一群特定範圍的資料,本篇論文使用Hilbert R-tree作為演算法基礎,並應用於GPU以提升其效能。

    過去已廣泛研究許多高維度的索引結構,R-tree是一個有效索引空間資料的樹狀資料結構,因此許多研究以R-tree作為資料結構以降低運算上的時間複雜度,而其中在高維度時,可能會產生curse of dimensionality的問題,因此如何在高維度的資料中提升效能並進行Range Query是一個重要的課題。

    本論文選用Compelete R-tree為核心架構,並移植至GPU快速進行建樹、利用平行運算以強化搜尋效率。最後發現此查詢方法在高維度時,相較於其他演算法也達成良好的查詢成效,在資料點的關聯度較高的Correlated資料類型以及Gaussian類型中,更具有良好的效能。


    Range Query is mainly used to search out a group of data in a specific range in the database. This paper uses the Hilbert R-tree as the basis of the algorithm and applies it to the GPU to improve its performance. Among many multi-dimensional index structures introduced in the literature, an R-tree is a structure that effectively indexes spatial data. Therefore, several studies use an R-tree as a data structure to reduce computational consumption. However, R-trees may cause the curse of dimensionality, so how to improve performance for solving Range Query in high-dimensional data is an important topic. This paper selects the Complete R-tree as the basic architecture and uses the GPU to build trees in parallel and enhance search efficiency. Finally, this method achieves good query results compared with other algorithms in high dimensions and performs better for the datasets with the Correlated and Gaussian distribution.

    1 緒論 1.1 研究背景 1.2 研究動機 1.3 論文架構 2 文獻探討 2.1 GPGPU 2.1.1 CUDA 2.1.2 軟硬體間的對應 2.1.3 記憶體類型 2.2 R-tree 2.2.1 R-tree 架構 2.2.2 Static R-tree 2.3 相關研究 2.3.1 分散式演算法(distributed algorithms) 2.3.2 GPU 平行演算法(parallel algorithms) 3 研究方法 3.1 基本定義 3.1.1 問題範例 3.2 演算法概述 3.3 平行建構Tree 3.4 Range Query 3.5 遍歷方法 3.5.1 Depth-First Search 4 實驗結果與分析 4.1 實驗環境及測試資料集 4.2 Range Query 4.3 查詢時間分析 4.3.1 資料分量對平均查詢時間的影響 4.3.2 資料量對平均查詢時間的影響 4.3.3 選擇比率分析 4.3.4 建樹與遍歷的時間比例 4.3.5 平行度分析 4.4 真實數據集分析 5 結論 參考文獻

    [1] Professional CUDA C Programming, 1st ed. GBR: Wrox Press Ltd., 2014.
    [2] R. Bayer and E. McCreight, “Organization and maintenance of large ordered indices,” in Proceedings of the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, ser. SIGFIDET ’70. New York, NY, USA: Association for Computing Machinery, 1970, p. 107–141. [Online]. Available: https://doi.org/10.1145/1734663.1734671
    [3] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The r*-tree: An efficient and robust access method for points and rectangles,” in Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’90. New York, NY, USA: Association for Computing Machinery, 1990, p. 322–331. [Online]. Available: https://doi.org/10.1145/93597.98741
    [4] J. L. Bentley, “Decomposable searching problems,” Information Processing Letters, vol. 8, no. 5, pp. 244–251, 1979. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0020019079901170
    [5] S. Börzsönyi, D. Kossmann, and K. Stocker, “The skyline operator,” in Proceedings of the 17th International Conference on Data Engineering. USA: IEEE Computer Society, 2001, p. 421–430.
    [6] S. Brakatsoulas, D. Pfoser, and Y. Theodoridis, “Revisiting r-tree construction principles,” in Advances in Databases and Information Systems, Y. Manolopoulos and P. Návrat, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 149–162.
    [7] U. S. C. Bureau, “Tiger/line shapefiles and tiger/line files,” https://www.census.gov/geo/mapsdata/data/tiger-line.html, 2006, accessed: 2019-11-05.
    [8] W.-M. Chen and G.-H. Chen, “Divide-and-conquer recurrences associated with generalized heaps, optimal merge, and related structures,” Theoretical computer science, vol. 292, no. 3, pp. 667–677, 2003.
    [9] R. A. Finkel and J. L. Bentley, “Quad trees: A data structure for retrieval on composite keys.” Acta Inf., vol. 4, pp. 1–9, 1974. [Online]. Available: http://dblp.uni-trier.de/db/journals/acta/acta4.html#FinkelB74
    [10] J. Fix, A. J. Wilkes, and K. Skadron, “Accelerating braided b+ tree searches on a gpu with cuda,” 2011.
    [11] V. Gaede and O. Günther, “Multidimensional access methods,” ACM Comput. Surv., vol. 30, no. 2, p.170–231, jun 1998. [Online]. Available: https://doi.org/10.1145/280277.280279
    [12] Y. J. García R, M. A. López, and S. T. Leutenegger, “A greedy algorithm for bulk loading r-trees,” in Proceedings of the 6th ACM International Symposium on Advances in Geographic Information Systems, ser. GIS ’98. New York, NY, USA: Association for Computing Machinery, 1998, p. 163–164. [Online]. Available: https://doi.org/10.1145/288692.288723
    [13] O. Guenther and A. Buchmann, “Research issues in spatial databases,” SIGMOD Rec., vol. 19, no. 4, p. 61–68, dec 1990. [Online]. Available: https://doi.org/10.1145/122058.122065
    [14] A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in Proceedings of the 1984 ACM SIGMOD international conference on Management of data, 1984, pp. 47–57.
    [15] D. Hubert, “Ueber die stetige abbildung einer linie auf ein flächenstück,” Mathematische Annalen, vol. 38, pp. 459–460, 1891. [Online]. Available: http://eudml.org/doc/157555
    [16] L. R. Johnson, “An indirect chaining method for addressing on secondary keys,” Commun. ACM, vol. 4, no. 5, p. 218–222, may 1961. [Online]. Available: https://doi.org/10.1145/366532.366540
    [17] I. Kamel and C. Faloutsos, “Hilbert r-tree: An improved r-tree using fractals,” in Proceedings of the 20th International Conference on Very Large Data Bases, ser. VLDB ’94. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994, p. 500–509.
    [18] G. Kedem, “The quad-cif tree: A data structure for hierarchical on-line algorithms,” in 19th Design Automation Conference, 1982, pp. 352–357.
    [19] J. Kim, W.-K. Jeong, and B. Nam, “Exploiting massive parallelism for indexingmulti-dimensional datasets on the gpu,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 8, pp.2258–2271, 2015.
    [20] J. Kim, S.-G. Kim, and B. Nam, “Parallel multi-dimensional range query processing with r-trees on gpu,” Journal of Parallel and Distributed Computing, vol. 73, no. 8, pp. 1195–1207, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0743731513000592
    [21] J. Kim and B. Nam, “Co-processing heterogeneous parallel index for multi-dimensional datasets,” Journal of Parallel and Distributed Computing, vol. 113, pp. 195–203, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0743731517302915
    [22] D. Knuth, The Art Of Computer Programming, vol. 3: Sorting And Searching. Addison-Wesley, 1973.
    [23] S. Leutenegger, M. Lopez, and J. Edgington, “Str: a simple and efficient algorithm for r-tree packing,” in Proceedings 13th International Conference on Data Engineering, 1997, pp. 497–506.
    [24] Z.-N. Lewis and Y.-C. Tu, “G-pics: A framework for gpu-based spatial indexing and query processing,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 3, pp. 1243–1257, 2022.
    [25] D. Mountain and A. MacFarlane, “Geographic information retrieval in a mobile environment: evaluating the needs of mobile individuals,” Journal of Information Science, vol. 33, no. 5, pp.515–530, 2007. [Online]. Available: https://doi.org/10.1177/0165551506075333
    [26] J. Nievergelt, H. Hinterberger, and K. C. Sevcik, “The grid file: An adaptable, symmetric multi-key file structure,” in Proceedings of the 3rd Conference of the European Cooperation in Informatics on Trends in Information Processing Systems. Berlin, Heidelberg: Springer-Verlag, 1981, p. 236–251
    [27] S. Nishimura, S. Das, D. Agrawal, and A. E. Abbadi, “Md-hbase: A scalable multi-dimensional data infrastructure for location aware services,” in 2011 IEEE 12th International Conference on Mobile Data Management, vol. 1, 2011, pp. 7–16.
    [28] NVIDIA, “Nvidia-ampere-ga102-gpu-architeture,” 2021.
    [29] Peano, “Sur une courbe, qui remplit toute une aire plane,” Mathematische Annalen, vol. 36, pp.157–160, 1890. [Online]. Available: http://eudml.org/doc/157489
    [30] J. Qi, Y. Tao, Y. Chang, and R. Zhang, “Packing r-trees with space-filling curves: Theoretical optimality, empirical efficiency, and bulk-loading parallelizability,” ACM Trans. Database Syst., vol. 45, no. 3, aug 2020. [Online]. Available: https://doi.org/10.1145/3397506
    [31] J. T. Robinson, “The k-d-b-tree: A search structure for large multidimensional dynamic indexes,” in Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’81. New York, NY, USA: Association for Computing Machinery, 1981, p. 10–18. [Online]. Available: https://doi.org/10.1145/582318.582321
    [32] N. Roussopoulos and D. Leifker, “Direct spatial search on pictorial databases using packed r-trees,” in Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’85. New York, NY, USA: Association for Computing Machinery, 1985, p. 17–31. [Online]. Available: https://doi.org/10.1145/318898.318900
    [33] T. K. Sellis, N. Roussopoulos, and C. Faloutsos, “The r+-tree: A dynamic index for multi-dimensional objects,” in Proceedings of the 13th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1987, p. 507–518.
    [34] M. Stonebraker, T. K. Sellis, and E. N. Hanson, “An analysis of rule indexing implementations in data base systems,” in Expert Database Conf., 1986.
    [35] H.-H. Tsai, “Design and implementation of the ecdf query problem on cpu-gpu architectures,” Master’s thesis, National Taiwan University of Science and Technology, 2021

    無法下載圖示 全文公開日期 2027/12/07 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE