簡易檢索 / 詳目顯示

研究生: 余翠芬
Tsui-Fen Yu
論文名稱: 漸進式頻繁子圖型樣探勘
Incremental Mining of Frequent Subgraph Patterns
指導教授: 徐俊傑
Chun-Chieh Hsu
口試委員: 陳永昇
Yeong-Sheng Chen
黃世禎
Sun-Jen Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 79
中文關鍵詞: 資料探勘漸進式探勘子圖型樣探勘
外文關鍵詞: incremental mining, subgraph patterns mining, data mining
相關次數: 點閱:254下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

如何有效率的在大量資料集中找出頻繁型樣(Frequent Pattern),是目前資料探勘(Data Mining)技術中非常活躍的議題,近年來亦被廣泛的運用在許多不同的領域當中。
在過去的研究中,頻繁型樣的探勘大多集中在項目集(Itemset)、路徑型樣(Path Pattern)等探勘,但隨著資料探勘技術應用領域的增加,面對新領域中複雜的資料集(Dataset),傳統的頻繁型樣探勘技術已不敷使用,因此發展有效率且可以適用各種不同型樣的資料探勘技術已成為目前主要的趨勢。在過去許多研究中證實,圖形(Graph)具有非常好的資料表達能力,能夠用來表達複雜的資料集,因此以圖形來表示資料所發展出的資料探勘技術,不僅能適用於複雜的資料集,亦可使用在過去傳統的資料集中。
但是就我們所知,目前發展出的頻繁子圖型樣探勘演算法,都是針對靜態資料集所設計,一但資料集有所變動,或是使用者所指定的門檻值不同,就必須重新執行資料探勘,面對子圖型樣的高複雜度而言,此舉無疑是加重了資料探勘的負擔。
本論文提出演算法IMFG(Incremental Mining of Frequent subGraph patterns),運用不同的資料結構來儲存圖形資料集中的資訊,改善過去演算法只能處理靜態資料集的缺點,當圖形資料集有所變動,或是使用者對門檻值的需求有所改變時,不必重新探勘整個資料集,亦可得到使用者所想要的資料。


How to find frequent patterns efficiently in large data sets is an active research topic in the data mining community. In recent years, mining frequent patterns has been extensively applied to different domains.
In the past, mining frequent patterns was focused on itemsets and path patterns. Nevertheless, with the increasing domains applied by the data mining technique and more complex data sets in new domains, the traditional mining technique of the frequent patterns will be not enough. Therefore, it has turned into a main trend to develop an efficient and suitable data mining technique. The power of graphs to model complex data sets has been recognized according to the researches in the past. For this reason, the data mining technique using graphs to represent data can apply not only in complex data sets but also in traditional data sets.
As we know so far, mining frequent subgraph patterns algorithms have already been developed and designed for static datasets. Once the dataset or the thresholds that users assign have been changed, it must re-mine the whole dataset. When facing the high complexity of subgraph patterns, it will burden the data mining.
This thesis proposes the algorithm IMFG(Incremental Mining of Frequent subGraph Patterns), which uses a different data structure to store the dataset information of graphs and removes the defect of algorithm that only can deal with static datasets in the past. When the graph dataset or threshold is changed, we can obtain the new information we want without re-mining the whole dataset.

中文摘要 英文摘要 誌 謝 目 錄 圖索引 表索引 第一章、緒論 1.1 研究背景 1.2 研究目的 1.3 論文架構 第二章、問題描述 2.1 頻繁子圖型樣探勘 2.2漸進式頻繁子圖型樣探勘 第三章、相關研究 3.1 頻繁子圖型樣探勘 3.2 漸進式型樣探勘 3.3 FFSM演算法 第四章、漸進式頻繁子圖型樣探勘演算法 4.1 演算法主要架構 4.2 Edge Embedding Index資料結構 4.3 IMFG-Build演算法 4.4 IMFG-Update演算法 4.4.1 EEI_Update 演算法 4.4.2 Tree_Update 演算法 第五章、實驗設計與分析 5.1資料集的產生與模擬 5.2漸進式子圖型樣探勘演算法實作與分析 5.2.1圖形資料集的增加 5.2.2門檻值的變動 第六章、結論與未來展望 6.1 結論 6.2 未來展望 參考文獻

[1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, Proc.of the 20th Int. Conf. on Very Large Data Bases (VLDB), pp.487-499, 1994
[2] N. F. Ayan, A. U. Tansel and E. Arkun, “An efficient algorithm to update large itemsets with early pruning”, Proc. of the 5th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.287-291, 1999
[3] C. Borgelt and M. R. Berhold, “Mining molecular fragments: finding relevant substructures of molecules”, Proc. of 2002 IEEE Int. Conf. on Data Mining (ICDM), pp.51-58, 2002
[4] B. Berendt, A. Hotho, and G. Stumme, “Towards semantic web mining”, Proc. of the 1st Int. Semantic Web Conference (ISWC), pp.264-278, 2002
[5] M. Cohen and E. Gudes, “Diagonally subgraphs pattern mining”, Proc. of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp.51-58, 2004
[6] R. Chittimoori, L. B. Holder, and D. J. Cook, “Applying the SUBDUE substructure discovery system to the chemical toxicity domain”, Proc. of the 12th Int. Florida AI Research Society Conf., pp.90-94, 1999
[7] D. W. Cheung, J. Han, V. T. Ng, and C.Y. Wong, “Maintenance of discovered association rules in large databases: an incremental updating technique”, Proc. of the 12th IEEE Int. Conf. on Data Engineering, pp.106-114, 1996
[8] D. J. Cook, L. B. Holder, S. Su, R. Maglothin, and I. Jonyer, “Structural mining of molecular biology data”, IEEE Engineering in Medicine and Biology Magazine, vol.20, no.4, pp.67-74, 2001
[9] D. W. Cheung, S. D. Lee, and B. Kao, “A general incremental technique for maintaining discovered association rules”, Proc. of the 5th Int. Conf. on Database Systems for Advanced Applications (DASFAA), vol.6, pp.185-194, 1997
[10] H. Cheng , X. Yan , J. Han, “IncSpan: incremental mining of sequential patterns in large database”, Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.527-532, 2004
[11] W. Cheung and O. R. Zaïane, “Incremental mining of frequent patterns without candidate generation or support constraint”, Proc. 7th Int. Database Engineering and Applications Symposium (IDEAS), pp.111-116, 2003
[12] S. Ghazizadeh and S. Chawathe, “SEuS: structure extraction using summaries”, Proc. of the 5th Int. Conf. on Discovery Science, pp.71-85, 2002
[13] L. Holder, D. Cook, and S. Djoko, “Substructure discovery in the SUBDUE system”, Proc. of the AAAI Workshop on Knowledge Discovery in Databases, pp.169-180, 1994
[14] J. Han, and J. Pei, “Mining frequent patterns by pattern-growth: methodology and implications”, ACM SIGKDD Explorations Newsletter, vol.2, no.2, pp.14-20, 2000
[15] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of data, pp.1-12, 2000
[16] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha, “Mining family specific residue packing patterns from protein structure graphs”, Proc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB), pp.308-315, 2004
[17] J. Huan, W. Wang, and J. Prins, "Efficient mining of frequent subgraph in the presence of isomorphism", Proc. of the 3rd IEEE Int. Conf. on Data Mining (ICDM), pp.549-552, 2003
[18] J. Huan, W. Wang, and J. Prins, "Efficient mining of frequent subgraph in the presence of isomorphism", Technical Reports produced by the Department of Computer Science at the University of North Carolina, 2003
[19] L. Huan, W. Wang and J. Prins, “SPIN: mining maximal frequent subgraphs from graph databases”, Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.581-586, 2004
[20] T. P. Hong, C.Y. Wang and Y. H. Tao, “Incremental data mining based on two support thresholds”, IEEE Knowledge-Based Intelligent Engineering Systems and Allied Technologies, vol.1, pp.436-439, 2000
[21] A. Inokuchi, T. Washio, and H. Motoda, “An apriori-based algorithm for mining frequent substructures from graph data”, Proc. of the 4th European Conf. on Principles of Data Mining and Knowledge Discovery, pp.13-23, 2000
[22] A. Inokuchi, T. Washio, H. Motoda, “Complete mining of frequent patterns from graphs: mining graph data”, Machine Learning, vol.50, no.3, pp.321-354, 2003
[23] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment”, Journal of the ACM (JACM), vol.46, no.5, pp.604-632, 1999
[24] M. Kuramochi and G. Karypis, “Frequent subgraph discovery”, Proc, of the 2001 IEEE Int. Conf. on Data Mining, pp.313-320, 2001
[25] M. Kuramochi and G. Karypis, “An efficient algorithm for discovering frequent subgraphs”, Technical Reports 02-026 produced by the University of Minesota, 2002
[26] M. Kuramochi and G. Karypis, “Finding frequent patterns in a large sparse graph”, SIAM Int. Conf. on Data Mining, 2004
[27] M. Kuramochi and G. Karypis, “An efficient algorithm for discovering frequent subgraphs”, IEEE Trans. on Knowledge and Data Engineering, vol.16, no.9, pp.1038-1051, 2004
[28] M. Kuramochi and G. Karypis, “GREW-a scalable frequent subgraph discovery algorithm”, The 4th IEEE Int. Conf. on Data Mining (ICDM), pp.439-442, 2004
[29] J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins, “The web as a graph: measurements, models and methods”, Proc. of the Int. Conf. on Combinatorics and Computing, 1999
[30] M. Y. Lin and S.Y. Lee, “Incremental update on sequential patterns in large databases”, Proc. of the 10th IEEE Int. Conf. on Tools with Artificial Intelligence, pp.24-31, 1998
[31] T. Matsuda, H. Motoda, T. Yoshida, and T. Washio, “Mining patterns from structured data by beam-wise graph-based induction”, Proc. of the 5th Int. Conf. on Discovery Science, pp.422-429, 2002
[32] F. Masseglia, P. Poncelet, and M. Teisseire, “Incremental mining of sequential patterns in large databases”, Data and Knowledge Engineering, vol.46, no.1, pp.97-121, 2003
[33] S. Nijssen and J. N. Kok, “A quickstart in frequent structure mining can make a difference”, Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.647-652, 2004
[34] X. Pennec and N. Ayache, “A geometric algorithm to find small but highly similar 3D substructures in proteins”, Bioinformatics, vol.14, pp.516–522, 1998
[35] J.S. Park, M.S. Chen, and P.S. Yu, “An effective hash based algorithm for mining association rules”, ACM SIGMOD Record, vol.24, no.2, pp.175-186, 1995
[36] C. R. Palmer, P. B. Gibbons, and C. Faloutsos, “ANF: a fast and scalable tool for data mining in massive graphs”, Proc. of the 8th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.81-90, 2002
[37] S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas, “Incremental and interactive sequence mining”, Proc. of the 8th Int. Conf. on Information and knowledge management, pp.251-258, 1999
[38] N. Vanetik, E. Gudes, and S. E. Shimony, “Computing frequent graph patterns from semistructured data”, Proc. of the 2002 IEEE Int. Conf. on Data Mining (ICDM), pp.458–465, 2002
[39] Q. Wang, E. Chen, and S. Wang, “Efficient incremental pattern mining from semi-structured dataset”, the 6th Asia Pacific Web Conference (APWEB), pp.211-216, 2004
[40] T. Washio and H. Motoda, “State of the art of graph-based data mining”, ACM SIGKDD Explorations Newsletter, vol.5, no.1, pp.59-68, 2003
[41] C. Wang, W. Wang, J. Pei, Y. Zhu and B. Shi, “Scalable mining of large disk-based graph database”, Proc. of the 2004 ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.316-325, 2004
[42] X. Yan and J. Han, “gSpan: graph-based substructure pattern mining”, Technical Reports UIUCDCS-R-2002-2296 produced by the University of Illinois at Urbana-Champaign, 2002
[43] X. Yan and J. Han, “gSpan: graph-based substructure pattern mining”, Proc. of the 2002 Int. Conf. on Data Mining (ICDM), pp.721-724, 2002
[44] X. Yan and J. Han, “CloseGraph: mining closed frequent graph patterns”, Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp.286-295, 2003
[45] M. J. Zaki and K. Gouda, “Fast vertical mining using diffsets”, Technical Report 01-1 produced by the Department of Computer Science, 2001

無法下載圖示 全文公開日期 2006/06/09 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE