簡易檢索 / 詳目顯示

研究生: 林遠聿
Yuan-Yu Lin
論文名稱: 天際線代表點選擇問題應用於GPU之研究
A Study of Selection for Representative Skyline Points on GPUs
指導教授: 陳維美
Wei-Mei Chen
口試委員: 呂政修
Jenq-Shiou Leu
林敬舜
ChingShun Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 50
中文關鍵詞: 天際線控制關係平行演算法GPGPU
外文關鍵詞: skyline, dominance relation, parallel algorithm, GPGPU
相關次數: 點閱:213下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 天際線主要應用於資料庫的篩選中,可將具有代表性之資料點篩選出來,達到過濾資訊及提供決策的效果。但當資料維度提升時,藉由天際線運算所得到的資料點數會以指數速度遞增,導致資料篩選的效果喪失。因此衍生出了天際線代表點等相關應用,大多希望藉由條件限制選出具特定特徵的天際線代表點,其中以控制點數作為條件限制的方式可以過濾掉過於極端的資料點,保留整體特性較均衡的資料點。天際線的相關應用中,主要運算為控制關係的比較,而且需處理的資料集通常龐大且複雜,因此如何提升運算效率及減少不必要之運算為二大重點。隨著目前處理器的運算效率的提升及GPU(Graphics Processing Unit)計算應用的發展,我們可以藉由GPU能將運算大量平行化的特性來提升運算效率,再搭配上適當的運算機制減少不必要之運算,改善上述問題。本研究中將著重於提升控制關係的計算效率,並以找出各點所控制的資料點數量作為各點強度,找出具代表性的天際線資料點,最後提出一適用於GPU的資料結構及有效的尋訪(traversal)演算法用來加速天際線代表點控制關係的運算。實驗結果顯示,我們所提出的方法確實大幅加速了控制點數運算的時間。


    Skyline query is mainly used for database screening. It can screen out the representative data points to achieve the effect of information filtering and providing decision making. However, when the dimensionality of dataset is increased, the number of data points obtained by skyline computation will be increased exponentially, thus losing the effectiveness of data screening. Therefore, it generates the applications such as representative skyline points. The expectation is to select representative skyline points with specific features via condition restraints. By setting the dominance number as the condition restraint, we can eliminate the overly extreme data points and keep the data points with more balanced characteristics. Among all skyline query related applications, the main computation is about the comparison of dominance relations, and the data sets to be processes are usually enormous and complicated. Therefore, the two key points are how to enhance computation efficiency and how to reduce unnecessary computation. Along with the current enhancement of computation capability of processor and the improvement of GPU (Graphics Processing Unit) hardware, we can enhance the computation efficiency based on GPU’s capability to parallelize a large amount of computation, and reduce unnecessary computation in coordination with proper computation mechanism in order to improve aforementioned problems. This study will be focused on enhancing computation efficiency of dominance relation, and finding out the number of data points dominated by each point as the strength of each point in order to find out the representative skyline points. In the end, a data structure suitable for GPU and effective traversal algorithm will be proposed to accelerate the computation of skyline representative points dominance relation. The experimental results indicate that the method we proposed can indeed greatly accelerate the computation of number of dominance number.

    教授推薦書......................................... i 論文口試委員審定書 ................................. ii 中文摘要.......................................... iii 英文摘要.......................................... iv 目錄............................................. v 圖目錄 .......................................... viii 表目錄 .......................................... x 1 緒論........................................ 1 1.1 研究動機.................................. 1 1.2 研究背景.................................. 2 1.3 論文架構.................................. 2 2 GPU架構..................................... 3 2.1 NVIDIAGPU硬體架構.......................... 3 2.2 CUDA 2.2.1  軟硬體間的對應.......................... 5 
 2.2.2  全域記憶體 ............................ 7 
 2.2.3  共享記憶體 ............................ 8 
 2.2.4  適合GPU的演算法設計 ..................... 9 
 3. 文獻探討..................................... 10 3.1 基本定義.................................. 10 3.2 與天際線相關的研究 ........................... 11 
 3.3 多維資料結構的規劃 ........................... 12 3.3.1 循序版本的建樹方法 ....................... 13 3.3.2 樹狀資料結構的分支單位大小與執行效率 ........... 15 
 4. 研究方法..................................... 17 4.1  適用於GPU的多維資料結構....................... 17 4.1.1 平行版本的建樹方法 ....................... 17 
 4.2 應用於樹狀資料結構上的控制關係比較................. 20 
 4.3  對於internal node而言有效的MBR ................... 23 4.4  分支單位的結合.............................. 24 
 5. 實驗結果與分析................................. 28 5.1 實驗環境.................................. 28 
 5.2 實驗資料與設定.............................. 28 
 5.3 實驗結果分析 ............................... 29 5.3.1 總執行時間 ............................ 30 
 5.3.2 建立資料結構與traversal時間比例 ............... 31 
 5.3.3 總比較次數 ............................ 32 
 5.3.4 Traversal時平均經過的node數目................ 33 
 5.3.5 在 internal node 與 leaf node 進行控制關係比較的比例 . . . . . 33 
 5.3.6 Internal node與比較機制的效果................. 34 
 5.3.7 Leaf node與比較機制的效果................... 35 
 6. 結論........................................ 36 參考文獻....................................... 37

    [1] Z.-D.Bai,H.-K.Hwang,W.-Q.Liang,and T.-H.Tsai,“Limit theorems for the number of maxima in random samples from planar regions,” Electronic Journal of Probability, vol. 6, pp. 1–41, 2001.
    [2] Z.-D. Bai, H.-K. Hwang, and T.-H. Tsai, “Berry-esseen bounds for the number of maxima in planar regions,” Electronic Journal of Probability, vol. 8, 2003.
    [3] N. Bell and J. Hoberock, “Thrust: A productivity-oriented library for cuda,” GPU computing gems Jade edition, vol. 2, pp. 359–371, 2011.
    [4] J.L.Bentley,H.Kung,M.Schkolnick,and C.Thomborson,“On the average number of maxima in a set of vectors and applications,” Journal of the ACM (JACM), vol. 25, pp. 536–543, 1978.
    [5] K. S. Bøgh, S. Chester, and I. Assent, “SkyAlign: a portable, work-efficient skyline algorithm for multicore and gpu architectures,” The VLDB Journal, vol. 25, no. 6, pp. 817–841, 2016.
    [6] S.Borzsony,D.Kossmann,and K.Stocker,“The skyline operator,”in Proceedings 17th International Conference on Data Engineering, 2001, pp. 421–430.
    [7] K. C. K. Lee, B. Zheng, H. Li, and W.-C. Lee, “Approaching the skyline in z order.” in Proceedings of the 33rd International Conference on Very Large Data Bases, 2007, pp. 279–290.
    [8] C.-Y. Chan, H. V. Jagadish, K.-L. Tan, A. K. H. Tung, and Z. Zhang, “Finding k-dominant skylines in high dimensional space,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 2006, pp. 503–514.
    [9] W.-M. Chen and G.-H. Chen, “Divide-and-conquer recurrences associated with generalized heaps, optimal merge, and related structures,” Theoretical Computer Science, vol. 292, pp. 667–677, 2003.
    [10] W.-M. Chen, H.-K. Hwang, and T.-H. Tsai, “Maxima-finding algorithms for multidimensional samples: a two-phase approach,” Computational Geometry, vol. 45, no. 1–2, pp. 33–53, 2012.
    [11] W.-M.Chen,H.-K.Hwang,and T.-H.Tsai,“Efficient maxima-finding algorithms for random planar samples.” Discrete Mathematics & Theoretical Computer Science, vol. 6, no. 1, pp. 107–122, 2003.
    [12] C. A. C. Coello, G. B. Lamont, and D. A. V. Veldhuizen, Evolutionary algorithms for solving multi- objective problems. Springer, 2007, vol. 5.
    [13] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, Inc., 2001, vol. 16.
    [14] M. Ehrgott, Multicriteria optimization. Springer Science & Business Media, 2006.
    [15] M. Goncalves and M.-E. Vidal, “Top-k skyline: A unified approach,” in On the Move to Meaningful
    Internet Systems 2005: OTM 2005 Workshops, vol. 3762, 2005, pp. 790–799.
    [16] A. Guttrnan, “R-trees: A dynamic index structure for spatial searching,” SIGMOD Rec., pp. 47–57, 1984.
    37
    [17] V. Hristidis, N. Koudas, and Y. Papakonstantinou, “Prefer: A system for the efficient execution of multi-parametric ranked queries.” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, vol. 30, 2001, pp. 259–270.
    [18] H.Samet,“The quadtree and related datastructures,”ACM Computing Surveys-CSUR,1984.
    [19] J. Kim, W. Jeong, and B. Nam, “Exploiting massive parallelism for indexing multi-dimensional datasets on the gpu,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 8, pp. 2258–2271, 2015.
    [20] D. B. Kirk and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers Inc., 2012.
    [21] V. Koltun and C. H. Papadimitriou, “Approximately dominating representatives,” Theoretical Com- puter Science, vol. 371, pp. 148–154, 2007.
    [22] J.Lee,G.-w.G.-W.You,and S.-W.Hwang,“Personalized top-k skyline queries in high-dimensional space,” Information System, vol. 34, pp. 45–61, 2009.
    [23] J.LeeandS.-w.Hwang,“BSkyTree: scalable skyline computation using a balanced pivot selection,” in Proceedings of the 13th International Conference on Extending Database Technology, 2010, pp. 195–206.
    [24] B. Li, J. Li, K. Tang, and X.Yao, “Many-objective evolutionary algorithms: A survey,” ACM Com- puting Surveys, vol. 48, no. 1, 2015.
    [25] X.Lin,Y.Yuan,Q.Zhang,and Y.Zhang,“Selecting stars: The k most representative skylineoperator,” in 2007 IEEE 23rd International Conference on Data Engineering, 2007, pp. 86 – 95.
    [26] W.Liu and B.Vinter,“Ad-heap: An efficient heap datastructure for asymmetric multicoreprocessors,” in Proceedings of Workshop on General Purpose Processing Using GPUs, 2014, pp. 54 – 63.
    [27] Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis, R-Trees: Theory and Applications. Springer Publishing Company, Incorporated, 2005.
    [28] X.Miao,Y.Gao,G.Chen,and T.Zhang,“k-dominant skyline queries on incomplete data,”Information Sciences, vol. 367, 2016.
    [29] NVIDIA, “CUDA C programming guide,” https://docs.nvidia.com/cuda/archive/9.0/ cuda-c-programming-guide/index.html, 2017.
    [30] NVIDIA, “GTX1080 Pascal Whitepaper - Nvidia,” https://international.download.nvidia.com/ geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf, 2016.
    [31] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive skyline computation in database systems,” ACM Transcations on Database Systems., vol. 30, pp. 41–82, 2005.
    [32] J.T.Robinson,“The K-D-B-tree: A Search Structure for Large Multidimensional Dynamic Indexes,” in Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, 1981, pp. 10–18.
    38

    [33] M. A. Siddique and Y. Morimoto, “Efficient k-dominant skyline computation for high dimensional space with domination power index,” Journal of Computers, vol. 7, 2012.
    [34] X.Tian,D.Zhang,and Y.Tao,“On skylining with flexible dominance relation,”in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, 2008, pp. 1397–1399.
    [35] A. Vlachou, C. Doulkeridis, and M. Halkidi, “Discovering representative skyline points over distributed data,” in Scientific and Statistical Database Management, 2012.
    [36] M. L. Yiu and N. Mamoulis, “Multi-dimensional top-k dominating queries,” The VLDB Journal - VLDB, vol. 18, pp. 695–718, 2009.

    無法下載圖示 全文公開日期 2024/08/22 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE