簡易檢索 / 詳目顯示

研究生: 林家興
Jia-Xing Lin
論文名稱: 以 GPUs 加速 Skyline-Join 查詢問題之研究
A Study of Accelerating Skyline-Join Query Using GPUs
指導教授: 陳維美
Wei-Mei Chen
口試委員: 林昌鴻
Chang-Hong Lin
林敬舜
Ching-Shun Lin
林淵翔
Yuan-Hsiang Lin
陳維美
Wei-Mei Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 49
中文關鍵詞: 天際線查詢關聯式連接Skyline-join平行演算法GPGPU
外文關鍵詞: skyline query, relational join, Skyline-join, parallel algorithm, GPGPU
相關次數: 點閱:358下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

為了加速天際線查詢(skyline query) ,已經有許多文獻提出相關演算法,但大部分都是對單一表格進行天際線查詢,在實務情況資料分量可能分散在不同表格,若對多個表格的資料進行天際線查詢,需要先將表格進行關聯式連接(relational join),再利用現有的天際線查詢演算法,這樣可能造成許多不必要運算,於是延伸出skyline-join問題,但隨著資料量的上升,計算量變得龐大,導致相關文獻的演算法的執行時間過長。本論文提出適合Graphics Processing Unit(GPU)平行計算的skyline-join演算法,以join分量的數值分群,並利用k-d樹概念尋找天際線,以減少比較次數,在輸入表格中分析支配關係,進行join運算後資料都是skyline。本論文的循序演算法與其他文獻進行比較,效率有顯著提升,最後應用至GPU平行演算法,實驗結果顯示可大幅降低整體時間。


Based on the goal of accelerating the skyline query, there have been many papers and research that have proposed related algorithms. However, most methods are to perform skyline query on a single table. In practice, the data attributes may be scattered in multiple tables separately. Relational joins are required before performing the existing skyline query algorithm. Such steps may produce many unnecessary operations, so the related issues of skyline-join are extended. the cardinality increases, the amount of calculation becomes relatively large, from numerous literatures, it is found that directly leads to the too long execution time of the algorithms. This paper is dedicated to proposing a skyline-join algorithm suitable for parallel computing by Graphics Processing Unit (GPU). The join attribute is used for grouping, and the k-d tree concept is applied to find the skyline, thereby reducing the number of comparisons. Then, in the input form, the dominance relationship is analyzed. After the join operation, the data is all skylines. Comparing the sequential algorithm used in this paper with the methods in other papers, it can be found that the efficiency is significantly improved. Furthermore, by applying the skyline-join algorithm to the GPUs , it can be estimated from the experimental results that the overall time is greatly reduced.

教授推薦書 i 論文口試委員審定書 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖目錄 viii 表目錄 x 1 緒論 1 1.1 研究背景 1 1.2 研究動機 1 1.3 論文架構 3 2 GPU 架構 4 2.1 NVIDIA GPU 硬體架構 4 2.2 CUDA 6 2.3 軟硬體間的對應 6 2.4 全域記憶體 7 2.5 共享記憶體 8 3 文獻探討 9 3.1 Skyline query 9 3.2 Join 10 3.3 Skylinejoin 10 4 研究方法 14 4.1 基本定義 14 4.1.1 Skyline query 定義 14 4.1.2 Join 定義 14 4.1.3 Skyline-join query 定義 15 4.2 SJQ algorithm 15 4.2.1 分群資料16 4.2.2 尋找local skyline 17 4.2.3 分析local skyline 支配關係 20 4.2.4 Join 20 4.3 SJQGPU algorithm 21 4.3.1 應用於GPU 的資料結構與工作分配 21 4.3.2 平行尋找local skyline 22 4.3.3 記錄支配關係的資料結構 23 4.3.4 平行版本join 運算 24 5 實驗結果與分析 27 5.1 實驗資料與設定 27 5.2 SJQ 實驗分析 28 5.2.1 分群群大小的影響 28 5.2.2 資料量大小的影響 29 5.2.3 資料分量的影響 30 5.2.4 Join rate 的影響 30 5.3 循序演算法與其他文獻比較 31 5.3.1 資料大小的影響 32 5.3.2 資料分量的影響 32 5.3.3 Join rate 的影響 33 5.4 Real dataset 33 6 結論 35 參考文獻 36

[1] A.A.Alwan, H.Ibrahim, N.I.Udzir, and F.Sidi, “Processing skyline queries in incomplete distributed databases,” Journal of Intelligent Information Systems, vol.48, no.2, pp.399–420, 2017.
[2] M.Amiruzzaman and S.Jamonnak, “Multi-dimensional skyline query to find best shopping mall for customers,” in 2020 6th Conference on Data Science and Machine Learning Applications, 2020, pp.71–76.
[3] A.Awasthi, A.Bhattacharya, S.Gupta, and U.K.Singh, “K-dominant skyline join queries: Extending the join paradigm to kdominant skylines,” in 2017 IEEE 33rd International Conference on Data Engineering, 2017, pp.99–102.
[4] M.Bai, J.Xin, G.Wang, R.Zimmermann, and X.Wang, “Skylinejoin query processing in distributed databases,” Frontiers of Computer Science, vol.10, no.2, pp.330–352, 2016.
[5] Z.-D.Bai, H.-K.Hwang, W.-Q.Liang, and T.H.Tsai, “Limit theorems for the number of maxima in random samples from planar regions,” Electronic Journal of Probability, vol.6, pp.1–41, 2001.
[6] W.T.Balke, U.Güntzer, and J.X.Zheng, “Efficient distributed skylining for web information systems,” in International Conference on Extending Database Technology, 2004, pp.256–273.
[7] I.Bartolini, P.Ciaccia, and M.Patella, “SaLSa: computing the skyline without scanning the whole sky,” in Proceedings of the 15th ACM international conference on Information and knowledge management, 2006, pp.405–414.
[8] N.Bell and J.Hoberock, “Thrust: A productivity-oriented library for CUDA,” GPU computing gems Jade edition, vol.2, pp.359–371, 2011.
[9] P.A.Boncz, S.Manegold, and M.L.Kersten, “Database architecture optimized for the new bottleneck: Memory access,” in Proceedings of the 25th International Conference on Very Large Data Bases, vol.99, 1999, pp.54–65.
[10] S.Borzsony, D.Kossmann, and K.Stocker, “The skyline operator,” in Proceedings 17th International Conference on Data Engineering, 2001, pp.421–430.
[11] W.-M.Chen, H.-K.Hwang, and T.-H.Tsai, “Maxima-finding algorithms for multidimensional samples:A two-phase approach,” Computational Geometry, vol.45, no.12, pp.33–53, 2012.
[12] L.Cheng, S.Kotoulas, Q.Liu, and Y.Wang, “Load-balancing distributed outer joins through operator decomposition,” Journal of Parallel and Distributed Computing, vol.132, pp.21–35, 2019.
[13] L.Cheng, S.Kotoulas, T.E.Ward, and G.Theodoropoulos, “Improving the robustness and performance of parallel joins over distributed systems,” Journal of Parallel and Distributed Computing, vol.109, pp.310–323, 2017.
[14] J.Chomicki, P.Godfrey, J.Gryz, and D.Liang, “Skyline with presorting,” in Proceedings of the 19th International Conference on Data Engineering, vol.3, 2003, pp.717–719.
[15] Y.Gulzar, A.Alwan, N.Salleh, and I.Shaikhli, “Skyline query processing for incomplete data in cloud environment,” in Proceedings of the 6th International Conference on Computing & Informatics, 2017,pp.567–576.
[16] B.He, K.Yang, R.Fang, M.Lu, N.Govindaraju, Q.Luo, and P.Sander, “Relational joins on graphics processors,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp.511–524.
[17] W.Jin, M.Ester, Z.Hu, and J.Han, “The multi-relational skyline operator,” in 2007 IEEE 23th International Conference on Data Engineering, 2007, pp.1276–1280.
[18] W.Jin, M.D.Morse, J.M.Patel, M.Ester, and Z.Hu, “Evaluating skylines in the presence of equijoins,” in 2010 IEEE 26th International Conference on Data Engineering, 2010, pp.249–260.
[19] I.Keles and K.Hose, “Skyline queries over knowledge graphs,” in International Semantic Web Conference, 2019, pp.293–310.
[20] D.Kossmann, F.Ramsak, and S.Rost, “Shooting stars in the sky: An online algorithm for skyline queries,” in Proceedings of the 28th International Conference on Very Large Databases, 2002, pp.275–286.
[21] H.-S.Kung, F.Luccio, and F.P.Preparata, “On finding the maxima of a set of vectors,” Journal of the ACM (JACM), vol.22, no.4, pp.469–476, 1975.
[22] C.K.Lee, B.Zheng, H.Li, and W.-C.Lee, “Approaching the skyline in z order,” in Proceedings of the 33rd International Conference on Very Large Data Bases, 2007, pp.279–290.
[23] X.Lin, X.Jianliang, and H.Haibo, “Authentication of location-based skyline queries,” in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp.1583–1588.
[24] M.Nagendra and K.S.Candan, “Skyline-sensitive joins with LRpruning,” in Proceedings of the 15th international conference on extending database technology, 2012, pp.252–263.
[25] M.Nagendra and K.S.Candan, “Efficient processing of skylinejoin queries over multiple data sources,” ACM Transactions on Database Systems (TODS), vol.40, no.2, pp.1–46, 2015.
[26] NVIDIA, “Turing GPU Architecture Nvidia,” https://www.nvidia.com/content/dam/enzz/Solutions/designvisualization/technologies/turingarchitecture/NVIDIATuringArchitectureWhitepaper.pdf,2018.
[27] J.Rao and K.A.Ross, “Cache conscious indexing for decisionsupport in main memory,” in Proceedings of 25th International Conference on Very Large Data Bases, 1999, pp.78–89.
[28] D.Sun, S.Wu, J.Li, and A.K.Tung, “Skyline-join in distributed databases,” in 2008 IEEE 24th International Conference on Data Engineering Workshop, 2008, pp.176–181.
[29] K.-L.Tan, P.-K.Eng, and B.C.Ooi, “Efficient progressive skyline computation,” in VLDB, vol.1,2001, pp.301–310.
[30] A.Vlachou, C.Doulkeridis, and N.Polyzotis, “Skyline query processing over joins,” in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011, pp.73–84.
[31] Y.Xu, P.Kostamaa, X.Zhou, and L.Chen, “Handling data skew in parallel joins in sharednothing systems,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp.1043–1052.
[32] J.Zhang, J.Gu, S.Cheng, B.Li, W.Wang, and D.Meng, “Efficient algorithms of parallel skyline join over data streams,” in International Conference on Algorithms and Architectures for Parallel Processing, 2018, pp.184–199.
[33] J.Zhang, Z.Lin, B.Li, W.Wang, and D.Meng, “Efficient skyline query over multiple relations,” Procedia Computer Science, vol.80, pp.2211–2215, 2016.

無法下載圖示 全文公開日期 2025/08/13 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE