簡易檢索 / 詳目顯示

研究生: 鄭捷軒
Chieh-Hsuan Cheng
論文名稱: 使用Apache Hama枚舉可擴展的最大團
MCE-P:Scalable Maximum Clique Enumeration Using Apache Hama
指導教授: 金台齡
Tai-Lin Chin
口試委員: 吳秀陽
Shiow-Yang Wu
邱舉明
Ge-Ming Chiu
項天瑞
Tien-Ruey Hsiang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 104
語文別: 英文
論文頁數: 36
中文關鍵詞: 最大團Apache Hama整體同步並行計算模型MapReduce
外文關鍵詞: maximum clique, Apache Hama, Bulk Synchronous Parallel, MapReduce
相關次數: 點閱:242下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在大圖(large graph)中探勘出最大團(maximum clique)問題被應用在許多領
    域中,例如:社群網路、生物資訊學(bioinformatics)以及計算化學(computational
    chemistry)等領域。近年來,有許多研究是基於MapReduce架構下設計演算法來
    解決大圖中探勘最大團問題,但他們只採用MapReduce分散式並行運算之架構
    將大圖分割(partition)成許多較小之子圖,接著卻使用傳統的循序式最大團探勘
    演算法來並行地探勘出每個子圖中的最大團,並非真正使用純粹的分散式並行運
    算之演算法來解決大圖中探勘最大團之問題。因此,此篇論文提出一個創新且完
    全並行的演算法來解決大圖中探勘最大團問題,此演算法是基於Apache Hama之
    架構下所設計而成的,該架構是一個基於Hadoop框架上的整體同步並行(Bulk
    Synchronous Parallel, BSP)運算技術。在Apache Hama架構上,演算法是以頂點
    (vertex)的運算為觀點而設計的,每個頂點在每一次的超級步(superstep)中皆重複
    地執行相同的程序,其程序中包含:接收鄰居節點傳送之訊息(messages)、對所
    收到之訊息做相對應之處理以及傳送處理結果至該節點之所有鄰居節點。每個超
    級步中,在一個特定的團中之頂點會被收集起來,直到沒有任何頂點可以被加入
    團中為止,最終,大圖中的最大團即可從那些團中被決定出來。實驗結果顯示,
    此篇論文中提出之演算法將比現有的MapReduce演算法更有效率,擴展性更佳。


    The maximum clique mining problem for extremely large graphs has been used in
    many fields, such as social network, bioinformatics and computational chemistry. Recently,
    some studies in the literature solve the problem using conventional MapReduce
    algorithms. Nevertheless, those algorithms just use the parallel architecture of MapReduce
    processing to partition the graph, but still apply sequential algorithms to find the
    maximum clique in a subgraph. The problem of mining the maximum clique in a graph
    is not actually solved in a parallel fashion. This paper proposes an innovative scheme
    to mine the maximum clique in a huge graph by a parallel technique based on Apache
    Hama, which is a general bulk synchronous parallel (BSP) computing engine on top of
    Hadoop. Essentially, every vertex iteratively executes the same procedure, including receiving
    messages from its neighbors, processing the tasks and sending messages to its
    neighbors. The vertices in a particular clique will be collected in each iteration until no
    vertex can be added. The maximum cliques are determined among those cliques at the
    end. Our experimental results demonstrate that our proposed solution is more efficient
    and more scalable than the existing MapReduce algorithms.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Single-node Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Apache Hama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Hadoop MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1 First Phase{Local Maximum Clique Generation . . . . . . . . . . . . . . 10 4.2 Enhancements on MCE-P . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2.1 Decreasing the Number of Cliques, MCE-PA . . . . . . . . . . . 14 4.2.2 Sending Messages to Partial Neighbors, MCE-P+ . . . . . . . . . 14 4.3 Second Phase{Global Maximum Clique Verification . . . . . . . . . . . 15 5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 The Schema of BMC+MCR . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3 Scalability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.4 Efficiency Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    [1] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications
    of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
    [2] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel:
    A system for large-scale graph processing,” in Proceedings of the 2010 ACM SIGMOD International
    Conference on Management of Data, SIGMOD ’10, (New York, NY, USA), pp. 135–146, ACM, 2010.
    [3] “Apache hama.” https://hama.apache.org/.
    [4] R. Luce and A. Perry, “A method of matrix analysis of group structure,” Psychometrika, vol. 14, no. 2,
    pp. 95–116, 1949.
    [5] P. M. Pardalos and J. Xue, “The maximum clique problem,” Journal of global Optimization, vol. 4,
    no. 3, pp. 301–328, 1994.
    [6] E. J. Gardiner, P. J. Artymiuk, and P. Willett, “Clique-detection algorithms for matching threedimensional
    molecular structures,” Journal of Molecular Graphics and Modelling, vol. 15, no. 4,
    pp. 245–253, 1997.
    [7] F. J. MacWilliams and N. J. A. Sloane, The theory of error correcting codes. Elsevier, 1977.
    [8] W. C. Huffman and V. Pless, Fundamentals of error-correcting codes. Cambridge university press,
    2003.
    [9] X. Dimitropoulos, D. Krioukov, M. Fomenkov, B. Huffaker, Y. Hyun, G. Riley, et al., “As relationships:
    Inference and validation,” ACM SIGCOMM Computer Communication Review, vol. 37, no. 1,
    pp. 29–40, 2007.
    [10] J. Xiang, C. Guo, and A. Aboulnaga, “Scalable maximum clique computation using mapreduce,” in
    2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 74–85, IEEE, 2013.
    [11] J. Cheng, L. Zhu, Y. Ke, and S. Chu, “Fast algorithms for maximal clique enumeration with limited
    memory,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery
    and Data Mining, KDD ’12, (New York, NY, USA), pp. 1240–1248, ACM, 2012.
    [12] Y. Xie and P. Yu, “Max-clique: A top-down graph-based approach to frequent pattern mining,” in 2010
    IEEE 10th International Conference on Data Mining (ICDM), pp. 1139–1144, Dec 2010.
    [13] N. Du, C. Faloutsos, B. Wang, and L. Akoglu, “Large human communication networks: patterns and
    a utility-driven generator,” in Proceedings of the 15th ACM SIGKDD international conference on
    Knowledge discovery and data mining, pp. 269–278, ACM, 2009.
    [14] J. Cheng, Y. Ke, A. W.-C. Fu, J. X. Yu, and L. Zhu, “Finding maximal cliques in massive networks,”
    ACM Transactions on Database Systems (TODS), vol. 36, no. 4, p. 21, 2011.
    [15] D. Eppstein, M. Loffler, and D. Strash, Listing all maximal cliques in sparse graphs in near-optimal
    time. Springer, 2010.
    [16] D. Eppstein and D. Strash, “Listing all maximal cliques in large sparse real-world graphs,” in Experimental
    Algorithms, pp. 364–375, Springer, 2011.
    [17] M. C. Schmidt, N. F. Samatova, K. Thomas, and B.-H. Park, “A scalable, parallel algorithm for maximal
    clique enumeration,” Journal of Parallel and Distributed Computing, vol. 69, no. 4, pp. 417–428,
    2009.
    [18] E. Tomita, A. Tanaka, and H. Takahashi, “The worst-case time complexity for generating all maximal
    cliques and computational experiments,” Theoretical Computer Science, vol. 363, no. 1, pp. 28–42,
    2006.
    [19] N. Du, B. Wu, L. Xu, B. Wang, and P. Xin, “Parallel algorithm for enumerating maximal cliques in
    complex network,” in Mining Complex Data, pp. 207–221, Springer, 2009.
    [20] M. Sipser, Introduction to the Theory of Computation. Cengage Learning, 2012.
    [21] M. R. Garey and D. S. Johnson, “Computers and intractability: a guide to the theory of npcompleteness.
    1979,” San Francisco, LA: Freeman, 1979.
    [22] R. Carraghan and P. M. Pardalos, “An exact algorithm for the maximum clique problem,” Operations
    Research Letters, vol. 9, no. 6, pp. 375–382, 1990.
    [23] T. Fahle, “Simple and fast: Improving a branch-and-bound algorithm for maximum clique,” in Algorithms—
    ESA 2002, pp. 485–498, Springer, 2002.
    [24] E. Tomita and T. Seki, “An efficient branch-and-bound algorithm for finding a maximum clique,” in
    Discrete mathematics and theoretical computer science, pp. 278–289, Springer, 2003.
    [25] E. Tomita and T. Kameda, “An efficient branch-and-bound algorithm for finding a maximum clique
    with computational experiments,” Journal of Global Optimization, vol. 37, no. 1, pp. 95–111, 2007.
    [26] E. Tomita, Y. Sutani, T. Higashi, S. Takahashi, and M. Wakatsuki, “A simple and faster branch-andbound
    algorithm for finding a maximum clique,” in WALCOM: Algorithms and computation, pp. 191–
    203, Springer, 2010.
    [27] P. San Segundo, D. Rodr guez-Losada, and A. Jim enez, “An exact bit-parallel algorithm for the maximum
    clique problem,” Computers & Operations Research, vol. 38, no. 2, pp. 571–581, 2011.
    [28] J. Dongarra, S. Huss-Lederman, S. Otto, M. Snir, and D. Walker, “Mpi: The complete reference,”
    1996.
    [29] P. M. Pardalos, J. Rappe, and M. G. Resende, “An exact parallel algorithm for the maximum clique
    problem,” in High performance algorithms and software in nonlinear optimization, pp. 279–300,
    Springer, 1998.
    [30] L. Peng, W. Zebing, and G. Ming, “A maximum clique algorithm based on mapreduce,” in 2010 3rd
    International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 6, pp. V6–
    87, IEEE, 2010.
    [31] N. S. Dasari, D. Ranjan, and Z. Mohammad, “Maximal clique enumeration for large graphs on hadoop
    framework,” in Proceedings of the first workshop on Parallel programming for analytics applications,
    pp. 21–30, ACM, 2014.
    [32] H.-F. Liu, C.-T. Su, and A.-C. Chu, “Fast quasi-biclique mining with giraph,” in 2013 IEEE International
    Congress on Big Data (BigData Congress), pp. 347–354, IEEE, 2013.
    [33] “Apache giraph.” http://giraph.apache.org/.
    [34] L. G. Valiant, “A bridging model for parallel computation,” Commun. ACM, vol. 33, pp. 103–111,
    Aug. 1990.
    [35] “Hadoop.” https://hadoop.apache.org/.

    QR CODE