簡易檢索 / 詳目顯示

研究生: 吳柏毅
Bo-Yi Wu
論文名稱: 植基於相似度計算與CUDA架構之網路分群
Network Clustering Using Structure Similarity with CUDA Architecture
指導教授: 林昌鴻
Chang-Hong Lin
口試委員: 李佳翰
Chia-Han Lee
陳維美
Wei-Mei Chen
吳晉賢
Chin-Hsien Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2013
畢業學年度: 102
語文別: 英文
論文頁數: 82
中文關鍵詞: 資料探勘重點成員相似程度圖模組化CUDA整合平行技術
外文關鍵詞: Data Mining, Community Structure, Super Node, Similarity Graph, Modularity-based Method, CUDA Parallel Architecture
相關次數: 點閱:290下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 大型資料探勘與分析技術近年來普遍應用於眾多社群網路研究領域。而資料分析最主要的核心目的就是找出一筆資料中彼此關聯度較高的群體,藉由群的概念可以使我們輕易地將單一複雜的問題切割成許多關聯性較高的子問題,進而處理解決之。目前許多方法僅能針對無權重網路有著顯著的效果或者處理結果並不與主觀認知相符。因此本論文結合成員相似度計算與模組最佳化來提供一個實用且快速的資料分析演算法;其憑藉網路中常見的重要特性,如利用重點成員較高的資料連結度與成員之間的相似數值來建構相似程度圖。待相似程度圖產生後,我們利用網絡中模組的特性,將產生的相似程度圖進而拆成許多有意義的群體。最後我們導入NVIDIA的CUDA整合平行技術,將分群演算法中可平行處理的部份利用GPU多執行緒架構的幫忙,進而加快處理速度。根據實驗結果中我們能在無權重網路上獲得較精準的分群結果並同時有效地處理權重網路與較快速的處理時間。


    Over the past decade, large-scale data mining and analysis techniques are commonly used in many research fields of social networks. The central purpose of the data analysis is to identify the community structure, where vertices within a community are strongly connected and vertices between communities have low edge density in the networks. According to the community structure, we can effortlessly split the unitary complex problem into several higher associated issues, and then we can solve it more easily. However, there are many clustering algorithms only can process on the unweighted networks, and some of detected communities do not correspond with the subjective perception. In this thesis, we proposed a method by using the combination of different methods, including the detection of super nodes, similarity-based method and modularity-based clustering. It based on many characteristics in the weighted network, such as using several vertices which have strong influences in the whole network and the similarity measurements between the nodes, in order to construct the similarity graph. After obtaining the similarity graph, we then extract meaningful communities by using the modularity-based method. Finally, we import the integration of CUDA parallel technology to deal with the processes which can be parallel operated. The experimental results show that the proposed algorithm can not only obtain exact communities against other clustering algorithms on the unweighted networks, but also can efficiently deal with weighted networks. Moreover, we can handle global networks within a reasonable time.

    ABSTRACT...............................................I 中文摘要 ...............................................II 致謝 .................................................III List of Contents ......................................IV List of Figures ......................................VII List of Tables .......................................IX CHAPTER 1 INTRODUCTION ...............................1 1.1 Motivation .......................................1 1.2 Contributions ....................................2 1.3 Thesis Organization ..............................2 CHAPTER 2 RELATED WORKS ...........................3 2.1 Community Structure in a Real-World Network ......3 2.2 Related Clustering Algorithms ....................4 2.2.1 Similarity-based Methods .......................4 2.2.2 Modularity-based Methods .......................5 2.2.3 Clustering Methods on Weighted Networks ........7 CHAPTER 3 CUDA INTRODUCTION ..........................9 3.1 Background .......................................9 3.2 The Specification of CUDA device ................10 3.3 CUDA Software Model .............................13 3.3.1 Function Type Qualifiers ......................14 3.3.2 Grid of Thread Blocks .........................15 CHAPTER 4 PROPOSED METHODS ..........................19 4.1 Preprocessing ...................................20 4.2 Similarity Graph Construction ...................23 4.2.1 Cosine Similarity .............................23 4.2.2 Structure Similarity Calculation ..............25 4.2.3 Finding Local Communities .....................28 4.3 Modularity-base Method ..........................32 4.3.1 The CNM Modularity Method .....................32 4.3.2 Similarity-based Modularity ...................37 4.4 Proposed Clustering Algorithm ...................39 4.5 CUDA Implementation .............................42 CHAPTER 5 EXPERIMENTAL RESULTS ......................45 5.1 Developing Platform .............................45 5.2 Experimental Results ............................46 5.2.1 Discussion of Super Node Threshold ............46 5.2.2 Cosine Similarity vs. Jaccard Similarity ......49 5.2.3 Comparison of Different Modularity Algorithm ..50 5.2.4 Experiments with Weighted Networks ............53 5.3 Performance Analysis ............................61 CHAPTER 6 CONCLUSIONS AND FUTURE WORKS ..............64 6.1 Conclusions .....................................64 6.2 Future Work .....................................64 References ..........................................66

    References
    [1] L. C. Freeman, The development of social network analysis. Vancouver: Empirical Press, 2004.

    [2] J. M. Pujol, J. Bejar, and J. Delgado, “Clustering algorithm for determining community structure in large networks,” Physical Review E, Vol. 74, No. 1, 016107, 2006.

    [3] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences, Vol. 74, No. 1, pp. 7821-7826, 2002.

    [4] R. Guimera and L. A. N. Amaral, “Functional cartography of complex metabolic networks,” Nature, Vol. 433, No. 7028, pp. 895-900, 2005.

    [5] A. Clauset, M. E. J. Newman, and C. Moore, “Finding community structure in very large networks,” Physical review E, Vol. 70, No. 6, 066111, 2004.

    [6] M. Ankerst, M. M. Breunig, H.-P. Kriegel and J. Sander, “OPTICS: ordering points to identify the clustering structure,” Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Vol. 28, No. 2, pp. 49-60, 1999.

    [7] Z. Lu, Y. Wen and G. Cao, “Community detection in weighted networks: Algorithms and applications,” In IEEE International Conference on Pervasive Computing and Communications, pp. 179-184, 2013.

    [8] S.-P. Marta, G. Roger, A. A. Moreira and L. A. N. Amaral, “Extracting the hierarchical organization of complex systems,” Proceedings of the National Academy of Sciences, Vol. 104, No. 39, pp. 15224-15229, 2007.

    [9] S. Fortunato and M. Barthelemy, “Resolution limit in community detection.” Proceedings of the National Academy of Sciences, Vol. 104, No. 1, pp. 36-41, 2007.

    [10] M. E. J. Newman. (2013). Network data [Online]. http://www-personal.umich. edu/~mejn /netdata/, 2013

    [11] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” In Second International Conference on Knowledge Discovery and Data Mining, pp. 226-231, 1996.

    [12] X. Xu, N. Yuruk, Z. Feng, and T. A. Schweiger, “SCAN: a structural clustering algorithm for networks,” In the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824-833, 2007

    [13] D. Bortner and J. Han, “Progressive clustering of networks using structure -connected order of traversal,” In the IEEE 26th International Conference on Data Engineering (ICDE), pp. 653-656, 2010

    [14] M. E. J. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, Vol. 69, 066133, 2004.

    [15] J. Duch and A. Arenas, “Community detection in complex networks using extremal optimization,” Physical review E, Vol. 72, 027104, 2005.

    [16] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, P10008, 2008

    [17] B. Yan and S. Gregory, “Detecting communities in networks by merging cliques,” In IEEE International Conference on Intelligent Computing and Intelligent Systems, Vol. 1, pp. 832-836, 2009

    [18] S. Gregory, “Finding overlapping communities in networks by label propagation,” New Journal of Physics, Vol. 12, 103018, 2010.

    [19] D. Chen, M. Shang, Z. Lv, and Y. Fu, “Detecting overlapping communities of weighted networks via a local algorithm,” Physica A: Statistical Mechanics and its Applications, Vol. 389, No. 19, pp. 4177–4187, 2010.

    [20] U. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Physical Review E, Vol. 76, 036106, 2007.

    [21] S. Cook, CUDA Programming: A Developer's Guide to Parallel Computing with GPUs. Newnes, 2012.

    [22] NVIDIA. (2013). products [Online]. http://www.nvidia.com/object/product- quadro-2000-us.html

    [23] D. B. Kirk and W.-H. W. Hwu, Programming massively parallel processors: a hands -on approach. Morgan Kaufmann. 2010.

    [24] N. Wilt, The CUDA Handbook: A Comprehensive Guide to GPU Programming. Addison-Wesley Professional. 2013.

    [25] R. Farber, CUDA application design and development. Access Online via Elsevier. 2011.

    [26] D. A. Patterson and J. L. Hennessy, Computer organization and design: the hardware/software interface. Morgan Kaufmann. Revised Fourth Edition, 2011

    [27] G. Williams, Linear algebra with applications. Jones & Bartlett Publishers. 2012.

    [28] P. Mishra, N. Padhy and R. Panigrahi, “The survey of data mining applications and feature scope,” Asian Journal of Computer Science & Information Technology, Vol. 2, No. 3, pp. 1-43, 2012.

    [29] G. Guo, J. Zhang and N. Yorke-Smith, “A Novel Bayesian Similarity Measure for Recommender Systems,” In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 2619-2625, 2013.

    [30] N. Augsten and M. H. Bohlen, “Similarity Joins in Relational Database Systems,” Synthesis Lectures on Data Management, Vol. 5, No. 5, pp. 1-124, 2013.

    [31] M. E. J. Newman and M. Girvan. “Finding and evaluating community structure in networks,” Physical review E, Vol. 69, No. 2, 026113, 2004.

    [32] T. Opsahl. (2008). Datasets [Online]. http://toreopsahl.com/datasets/

    [33] K. Wakita and T. Tsurumi, “Finding community structure in a mega-scale social networking service,” Proceedings of the 16th International Conference on World Wide Web, pp. 153–162, 2007.

    [34] P. Pons and M. Latapy, “Computing communities in large networks using random walks,” Proceedings of the 20th International Conference on Computer and Information Sciences, Vol. 10, No. 2, pp. 199-218, 2005.

    [35] National Basketball Association (NBA). (2013-2014). Scores & schedules [Online]. http://www.nba.com/schedules/national_tv_schedule/

    [36] Major League Baseball (MLB). (2014). schedules [Online]. http://mlb.mlb.com/ mlb/schedule/index.jsp?tcid=mm_mlb_schedule#date=03/22/2014

    無法下載圖示 全文公開日期 2018/12/20 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE