簡易檢索 / 詳目顯示

研究生: 劉佑玫
Yu-mei Liu
論文名稱: 頻繁子圖型樣探勘演算法之研究
Mining of Frequent Subgraph Patterns
指導教授: 徐俊傑
Chiun-chieh Hsu
口試委員: 洪政煌
Cheng-huang Hung
謝樹明
Shu-ming Hsieh
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 61
中文關鍵詞: 資料探勘圖形探勘子圖型樣探勘
外文關鍵詞: data mining, graph mining, subgraph patterns mining
相關次數: 點閱:159下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,資料探勘(Data Mining)技術被廣泛運用在許多不同的領域當中,如何提升在大量資料集中探勘頻繁型樣(Frequent Pattern)的效率,成為目前資料探勘領域中非常熱門的議題。而在過去許多研究中,已證實圖形(Graph)具有良好的資料表達能力,能夠用來表達各種不同領域內的複雜資料集。因此,以圖形來表示資料所發展出的資料探勘技術-圖形探勘(Graph Mining),已逐漸成為目前主要的趨勢。
    圖形探勘的目的是要找出在圖形資料集中「頻繁」出現的子圖,即探勘出出現次數不小於使用者定義門檻值的所有子圖。以圖形作為資料探勘的資料,更容易取得資料之間的複雜架構或是關係。圖形探勘的應用非常廣泛,較著名的應用領域像是化學(Chemistry)、生物學(Biology)和電腦網路方面(Computer Network)…等。
    圖形探勘的主要挑戰在於如何解決重覆列舉以及子/圖形同構測試問題(Subgraph/ Graph Isomorphism testing problem),本論文提出演算法MFG(Mining of Frequent subGraph patterns),針對圖形資料集(Graph Dataset)進行高效率探勘。其主要概念為利用全域排序(global order)成長限制解決部分重覆列舉問題,以及採用資料內嵌結構(Embedded Mapping)有技巧的記錄圖形型樣在資料庫中的位置,完全避免子圖形同構的檢查問題,最後結合圖形標記(Graph Signatures)和圖形標準型態(Canonical Form)解決圖形同構測試問題。


    In recent years, data mining has been extensively applied to different domains. How to find frequent patterns efficiently in large data sets is a popular research topic in the data mining community. In addition, the power of using graphs to model complex data sets has been recognized based on the researches in the past. Therefore, using graphs to represent data and developing the data mining technique has turned into a main trend.
    The purpose of graph mining is to find frequent subgraphs from graph data sets. In other words, it is to discover all the structures whose occurrence frequency is no less than a user-specified threshold. Data mining technique using graphs to represent data can find more complicated relations or structures among data. For this reason, graph mining has been widely and extensively used in many domains like chemistry, biology and computer networks, etc.
    The main challenge in graph mining is how to solve the graph/ subgraph isomorphism testing problems. This research proposes the algorithm MFG(Mining of Frequent subGraph Patterns), which combined many previous graph mining techniques to mine all frequent subgraph patterns efficiently. MFG uses global orders of vertices in frequent patterns to reduce part of the duplicate enumeration, and utilizes an effective embedded mapping structure to store the information of subgraph patterns, so it can avoid the subgraph isomorphism checking problem completely. Finally, MFG adopts graph signatures and canonical form to solve the graph isomorphism testing problem.

    摘 要 ABSTRACT 誌 謝 目 錄 圖索引 第一章、緒論 1.1 研究背景 1.2 研究目的 1.3 論文架構 第二章、相關研究 2.1 問題描述 2.2 圖形探勘演算法的分類 2.2.1 依探勘資料集的類型 2.2.2 依子圖的類型 2.2.3 依演算法的種類 2.2.4 依型樣搜尋架構 2.3 圖形探勘演算法的比較與分析 2.3.1 圖形探勘演算法的比較 2.3.2 圖形探勘演算法的分析 第三章 頻繁子圖型樣探勘演算法 3.1 圖形探勘的挑戰 3.2 演算法主要流程 3.3 圖形資料結構 3.4 型樣列舉方法 3.5 型樣成長機制 3.5.1 一般性的成長限制 3.5.2 迴圈性的成長限制 3.6 型樣刪減技術 3.6.1 計算候選子圖的支持度 3.6.2 圖形同構檢查 第四章、實驗設計與分析 4.1 實際資料集 4.1.1 資料集 4.1.2 實驗結果與分析 4.2 人工合成資料集 4.2.1 資料集 4.2.2 實驗結果與分析 第五章、結論與未來展望 5.1 結論 5.2 未來展望 參考文獻

    [1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, Proc.of the 20th Int. Conf. on Very Large Data Bases (VLDB), pp.487-499 (1994)
    [2] C. Borgelt, “On Canonical Forms for Frequent Graph Mining”, Workshop on Mining Graphs, Trees, and Sequences (MGTS), pp.1-12 (2005)
    [3] C. Borgelt and M. R. Berhold, “Mining molecular fragments: finding relevant substructures of molecules”, Proc. of 2002 IEEE Int. Conf. on Data Mining (ICDM), pp.51-58 (2002)
    [4] R. Chittimoori, L. B. Holder, and D. J. Cook, “Applying the SUBDUE substructure discovery system to the chemical toxicity domain”, Proc. of the 12th Int. Florida AI Research Society Conf., pp.90-94 (1999)
    [5] M. Cohen and E. Gudes, “Diagonally subgraphs pattern mining”, Proc. of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp.51-58 (2004)
    [6] D. J. Cook, L. B. Holder, S. Su, R. Maglothin, and I. Jonyer, “Structural mining of molecular biology data”, IEEE Engineering in Medicine and Biology Magazine, vol.20, no.4, pp.67-74 (2001)
    [7] S. Ghazizadeh and S. Chawathe, “SEuS: structure extraction using summaries”, Proc. of the 5th Int. Conf. on Discovery Science, pp.71-85 (2002)
    [8] J. Han, and J. Pei, “Mining frequent patterns by pattern-growth: methodology and implications”, ACM SIGKDD Explorations Newsletter, vol.2, no.2, pp.14-20 (2000)
    [9] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of data, pp.1-12 (2000)
    [10] L. Holder, D. Cook, and S. Djoko, “Substructure discovery in the SUBDUE system”, Proc. of the AAAI Workshop on Knowledge Discovery in Databases, pp.169-180 (1994)
    [11] S. M. Hsieh and C. C. Hsu and L. F. Hsu, “Efficient method to perform isomorphism testing of labeled graphs”, Proc. of the 2006 Int. Conf. on Computational Science and Its Applications(ICCSA), pp.422-431 (2006)
    [12] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha, “Mining family specific residue packing patterns from protein structure graphs”, Proc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB), pp.308-315 (2004)
    [13] J. Huan, W. Wang, and J. Prins, "Efficient mining of frequent subgraph in the presence of isomorphism", Proc. of the 3rd IEEE Int. Conf. on Data Mining (ICDM), pp.549-552 (2003)
    [14] J. Huan, W. Wang, and J. Prins, "Efficient mining of frequent subgraph in the presence of isomorphism", Technical Reports produced by the Department of Computer Science at the University of North Carolina (2003)
    [15] A. Inokuchi, T. Washio, and H. Motoda, “An apriori-based algorithm for mining frequent substructures from graph data”, Proc. of the 4th European Conf. on Principles of Data Mining and Knowledge Discovery, pp.13-23 (2000)
    [16] A. Inokuchi, T. Washio, H. Motoda, “Complete mining of frequent patterns from graphs: mining graph data”, Machine Learning, vol.50, no.3, pp.321-354 (2003)
    [17] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment”, Journal of the ACM (JACM), vol.46, no.5, pp.604-632 (1999)
    [18] J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins, “The web as a graph: measurements, models and methods”, Proc. of the Int. Conf. on Combinatorics and Computing, pp.1-17 (1999)
    [19] M. Kuramochi and G. Karypis, “Frequent subgraph discovery”, Proc, of the 2001 IEEE Int. Conf. on Data Mining, pp.313-320 (2001)
    [20] M. Kuramochi and G. Karypis, “An efficient algorithm for discovering frequent subgraphs”, Technical Reports 02-026 produced by the University of Minesota (2002)
    [21] M. Kuramochi and G. Karypis, “An efficient algorithm for discovering frequent subgraphs”, IEEE Trans. on Knowledge and Data Engineering, vol.16, no.9, pp.1038-1051 (2004)
    [22] M. Kuramochi and G. Karypis, “Finding frequent patterns in a large sparse graph”, SIAM Int. Conf. on Data Mining (2004)
    [23] M. Kuramochi and G. Karypis, “GREW-a scalable frequent subgraph discovery algorithm”, The 4th IEEE Int. Conf. on Data Mining (ICDM), pp.439-442 (2004)
    [24] T. Matsuda, H. Motoda, T. Yoshida, and T. Washio, “Mining patterns from structured data by beam-wise graph-based induction”, Proc. of the 5th Int. Conf. on Discovery Science, pp.422-429 (2002)
    [25] S. Nijssen and J. N. Kok, “A quickstart in frequent structure mining can make a difference”, Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.647-652 (2004)
    [26] S. Nijssen, and J. N. Kok, “Frequent graph mining and its application to molecular databases,” Proc. of the IEEE Int Conf on Systems Man and Cybernetics, pp.4571- 4577 (2004)
    [27] C. R. Palmer, P. B. Gibbons, and C. Faloutsos, “ANF: a fast and scalable tool for data mining in massive graphs”, Proc. of the 8th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.81-90 (2002)
    [28] J.S. Park, M.S. Chen, and P.S. Yu, “An effective hash based algorithm for mining association rules”, ACM SIGMOD Record, vol.24, no.2, pp.175-186 (1995)
    [29] X. Pennec and N. Ayache, “A geometric algorithm to find small but highly similar 3D substructures in proteins”, Bioinformatics, vol.14, pp.516–522 (1998)
    [30] C. Wang, W. Wang, J. Pei, Y. Zhu and B. Shi, “Scalable mining of large disk-based graph database”, Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.316-325 (2004)
    [31] T. Washio and H. Motoda, “State of the art of graph-based data mining”, ACM SIGKDD Explorations Newsletter, vol.5, no.1, pp.59-68 (2003)
    [32] X. Yan and J. Han, “gSpan: graph-based substructure pattern mining”, Proc. of the 2002 Int. Conf. on Data Mining (ICDM), pp.721-724 (2002)
    [33] X. Yan and J. Han, “gSpan: graph-based substructure pattern mining”, Technical Reports UIUCDCS-R-2002-2296 produced by the University of Illinois at Urbana-Champaign (2002)
    [34] M. J. Zaki and K. Gouda, “Fast vertical mining using diffsets”, Technical Report 01-1 produced by the Department of Computer Science (2003)
    [35] 余翠芬,「漸進式頻繁子圖形樣探勘」,碩士論文,國立台灣科技大學,台北(2005)。

    無法下載圖示 全文公開日期 2017/06/26 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE