簡易檢索 / 詳目顯示

研究生: 陳漢昇
Han-Sheng Chen
論文名稱: 在Mapreduce中極大完全二分圖探勘之研究
The algorithm of mining maximal biclique on Mapreduce
指導教授: 陳省隆
Hsing-Lung Chen
呂政修
Jenq-Shiou Leu
口試委員: 陳省隆
Hsing-Lung Chen
呂政修
Jenq-Shiou Leu
陳郁堂
Yie-Tarng Chen
莊博任
Po-Jen Chuang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 59
中文關鍵詞: 大數據平行運算極大完全二分圖子超立方體
外文關鍵詞: Maximal Bicliques, Subcube
相關次數: 點閱:200下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在圖學之中,挖掘極大完全二分圖(Maximal Bicliques)在二分圖(Bipartite)探勘中是很重要的一環,極大完全二分圖的應用不僅限於蛋白質分析,同時可以分析搜尋引擎的關鍵字搜尋,以及賣場的交易資料關聯性分析(封閉項目集)等等,對於企業或管理者可以獲得更多的金錢利益。
    然而,某些演算法因為需要濾除重複的極大完全二分圖需要使用記憶體來儲存,導致在極大完全二分圖很多時會有記憶體不足的問題,或者有的演算法因為需要做支持度(Support)的運算而去掃描整個數據庫,造成不必要的時間浪費等,這些問題需要我們藉由研發新的演算法來一一克服。
    因此,我們提出了MBC_SP演算法,就是以切割子超立方體搜尋空間的概念做深度優先搜尋(Depth-First-Search)來完成所有的極大完全二分圖探勘,並且藉由減少搜尋空間的方式來減少演算法執行時間。
    MBC_SP是藉由子超立方體的分解來切割成獨立的搜尋空間來搜尋出所有的極大完全二分圖,此演算法不僅適合做平行處理來加快演算法執行效率,同時也可以減少不必要的搜尋空間以及解決重複產生極大完全二分圖的問題,對於效能的加速是一大優勢。


    In graphics, mining maximal bicliques is a very important part in the bipartite graph exploration. Mining maximal bicliques can be employed in protein analysis, key analysis of search engines, and correlation analysis of closed itemsets in store transaction data, etc. It can obtain more monetary benefits for enterprises or managers.
    However, some algorithms need to use memory to store the found maximal bicliques that need to filter out duplicates, resulting in insufficient memory when there are extremely large maximal bicliques. Some algorithms need to scan the entire database for support calculations, resulting in wasting lot of calculation time on scanning the database. These problems need to be overcome one by one by developing new algorithms.
    Therefore, we proposed the MBC_SP algorithm, which uses the concept of cutting the subcube search space to facilitate maximal bicliques exploration, and then reducing the search space, resulting in reducing the execution time significantly.
    MBC_SP divides the subcube into independent search spaces to search for all the maximal bicliques. This algorithm is not only suitable for parallel processing to speed up the execution efficiently, but also reduces unnecessary search space. It can filter out duplicates completely. The experiment results show that the proposed algorithm can achieve performance improvement significantly.

    致謝 II 摘要 III ABSTRACT IV 圖表目錄 VIII Chapter 1 緒論 1 1.1 研究動機 1 1.2 研究貢獻 2 1.3 文獻回顧 3 Chapter 2 基礎概念 8 2.1 問題描述 8 2.2 子超立方體切割 10 2.3 減少搜尋空間 13 Chapter 3 MBC_SP 演算法設計架構 17 3.1 MBC_SP_V1 17 3.2 MBC_SP_V2 25 3.3 MBC_SP_V3 28 3.4 MBC_SP_V4 29 3.5 MBC_SP_MR 33 Chapter 4 實驗與分析 36 4.1 實驗環境 36 4.2 資料集分析 36 4.3 MBC_SP_MR 分析 45 Chapter 5 結論 48 Chapter 6 參考文獻 49

    [1] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee, "Measurement and analysis of online social networks," in Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007, pp. 29-42.
    [2] M. E. Newman, D. J. Watts, and S. H. Strogatz, "Random graph models of social networks," Proceedings of the national academy of sciences, vol. 99, no. suppl_1, pp. 2566-2572, 2002.
    [3] L. Royer, M. Reimann, B. Andreopoulos, and M. Schroeder, "Unraveling protein networks with power graph analysis," PLoS computational biology, vol. 4, no. 7, p. e1000108, 2008.
    [4] J. Yi and F. Maghoul, "Query clustering using click-through graph," in Proceedings of the 18th international conference on World wide web, 2009, pp. 1055-1056.
    [5] I. Ostroumov and N. Kuzmenko, "Statistical analysis and flight route extraction from automatic dependent surveillance-broadcast data," in 2022 Integrated Communication, Navigation and Surveillance Conference (ICNS), 2022: IEEE, pp. i-ix.
    [6] S. kour Siledar, B. Deogaonkar, N. Panpatte, and J. Pagare, "Map Reduce Overview and Functionality," in 2021 6th International Conference on Communication and Electronics Systems (ICCES), 2021: IEEE, pp. 1560-1566.
    [7] A. Das and S. Tirthapura, "Shared-Memory Parallel Maximal Biclique Enumeration," in 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), 17-20 Dec. 2019 2019, pp. 34-43, doi: 10.1109/HiPC.2019.00016.
    [8] G. Alexe, S. Alexe, Y. Crama, S. Foldes, P. L. Hammer, and B. Simeone, "Consensus algorithms for the generation of all maximal bicliques," Discrete Applied Mathematics, vol. 145, no. 1, pp. 11-21, 2004.
    [9] J. Li, G. Liu, H. Li, and L. Wong, "Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: A one-to-one correspondence and mining algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 12, pp. 1625-1637, 2007.
    [10] 王浩宇, "在MapReduce中分解式頻繁項目集探勘之研究," 碩士, 電子工程系, 國立臺灣科技大學, 台北市, 2021. [Online]. Available: https://hdl.handle.net/11296/cq987e
    [11] A. P. Mukherjee and S. Tirthapura, "Enumerating maximal bicliques from a large graph using mapreduce," IEEE Transactions on Services Computing, vol. 10, no. 5, pp. 771-784, 2016.
    [12] G. Liu, K. Sim, and J. Li, "Efficient mining of large maximal bicliques," in 8th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2006, Krakow, Poland, September 4-8, 2006. Proceedings 8, 2006: Springer, pp. 437-448.
    [13] J. Leskovec. "Stanford large network dataset collection" [Online] Available : http://snap.stanford.edu/data/index.html.
    [14] "Network X python." Available : https://networkx.org/.

    QR CODE