簡易檢索 / 詳目顯示

研究生: 陳漱瑜
Shu-Yu Chen
論文名稱: 應用最大完全子圖樹於挖掘最大頻繁項集之研究
Efficient Mining of Maximal Frequent Itemsets Based on Maximal Clique Trees
指導教授: 陳維美
Wei-Mei Chen
口試委員: 陳維美
林淵翔
吳晉賢
林昌鴻
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 43
中文關鍵詞: 資料探勘最大頻繁項集挖掘關聯規則雙分圖
外文關鍵詞: data mining, maximal frequent itemset, association rule, graph, bipartite graph
相關次數: 點閱:39下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 頻繁項集挖掘是資料探勘中其中一個重要的議題,它旨在找出交易資料庫中的關聯規則。找出這些頻繁項集可能需要對資料庫進行多次掃描並且也可能會產生冗餘的候選項集,這兩個因素造成了此問題的效能瓶頸。在過去的研究中已經提出各種資料結構來嘗試解決這個問題,然而當利用到基於圖的演算法時,它可能會產生資料庫中不實際存在的頻繁項集。本篇論文我們目的在於挖掘最大頻繁項集,也就是一項集是頻繁的且不存在它的超集也是頻繁的。我們提出最大完全子圖樹能在只掃描交易資料庫一次的情況下有效挖掘所有最大頻繁項集,並且不產生錯誤的項集結果。除此之外,我們提出了一個基於最大完全雙分子圖的優化選項來改善演算法的空間複雜度。實驗結果顯示,所提出的方法能有效挖掘最大頻繁項集並確保挖掘結果的正確性,並且基於最大完全雙分子圖的分而治之方法使得我們的演算法得以處理大規模資料集。


    Frequent itemset mining is one of the important domains in data mining. It aims to find out the association rules in a transaction database. Finding these itemsets may require several times database scanning and generates redundant candidate itemsets. The above two problems cause the performance bottleneck. Previous research on frequent itemset mining has proposed a wide variety of data structures to solve this problem. However, when it comes to the graph-based approaches in the literature, it may generate the itemsets that do not really exist in the database. In this thesis, we focus on maximal frequent itemset rather than frequent itemset, which means the itemset is frequent and none of its immediate supersets are frequent. We propose an efficient maximal clique tree to extract all maximal frequent itemsets without generating error itemsets, and the proposed method only needs to scan the whole transaction once. Moreover, we provide a maximal-bicliquebased optimization choice to improve the space complexity. The experimental results show that our proposed method maintains efficiency while ensuring correctness, and can deal with massive data with maximal-biclique-based partitioning.

    Abstract in Chinese iii Abstract in English iv Acknowledgements v Contents vi List of figures viii List of tables ix List of algorithms x 1 Introduction 1 2 Background 3 2.1 Fundamental of frequent itemset mining 3 2.2 Maximal frequent itemset mining 4 2.3 Graph-based itemset mining 5 2.4 Related issues of itemset mining 7 2.4.1 Parallelization of frequent itemset mining 7 2.4.2 High utility itemset mining 8 2.4.3 Frequent itemset mining from uncertain & incremental data . . . 8 3 Proposed method 10 3.1 Definition 10 3.2 The MCT algorithm 12 3.3 Build 2-itemsets support count matrix 12 3.4 Bitmap representation 13 3.5 Construct maximal clique tree 14 3.5.1 Transaction AND operation 15 3.5.2 XOR operation for pruning 2-itemsets 16 3.5.3 Construct sibling tree 17 3.5.4 Discover MFIs via searching MCT 18 3.6 Space complexity optimization 19 3.6.1 Bipartite graph representation 20 3.6.2 Mining closed frequent itemsets via maximal biclique 21 4 Experiments 28 4.1 Experimental environment 28 4.2 Real datasets 29 4.3 Synthetic datasets 34 4.4 Space complexity improvement 36 5 Conclusions 39 References 40

    [1] N. Deng, X. Chen, D. Li, and C. Xiong, “Frequent patterns mining in DNA sequence,” IEEE Access, vol. 7, pp. 108400 – 108410, 2019.
    [2] N. Chu, A. Williams, R. Alhajj, and K. Barker, “Data stream mining architecture for network intrusion detection,” in Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004., pp. 363–368, 2004.
    [3] M. Ilayaraja and T. Meyyappan, “Efficient data mining method to predict the risk of heart diseases through frequent itemsets,” Procedia Computer Science, vol. 70, pp. 586–592, 2015.
    [4] J. Luna, P. Fournier-Viger, and S. Ventura, “Frequent itemset mining: A 25 years review,” in Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, 2019.
    [5] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (ICDM ’93), vol. 22, pp. 207–216, 1993.
    [6] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in In Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499, 1994.
    [7] R. J. Bayardo, “Efficiently mining long patterns from databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, vol. 27, pp. 85–93, 1998.
    [8] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A maximal frequent itemset algorithm for transactional databases,” in Proceedings 17th International Conference on Data Engineering, pp. 443–452, 2001.
    [9] F. Nadi, S. G. Hormozi, A. Foroozandeh, and M. H. N. Shahraki, “A new method for mining maximal frequent itemsets based on graph theory,” in 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 183–188, 2004.
    [10] G. Yang, “The complexity of mining maximal frequent itemsets and maximal frequent patterns,” in KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 344–353, 2004.
    [11] M. J. Zaki and W. Meira, Jr. in Data mining and analysis, 2014.
    [12] C. Lucchese, S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: A multi-strategy algorithm for mining frequent sets,” in IEEE ICDM Workshop Frequent Itemset Mining Implementations, vol. 80, 2003.
    [13] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New algorithms for fast discovery of association rules,” in In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283–286, 1997.
    [14] M. J. Zaki and K. Gouda, “Fast vertical mining using diffsets,” in KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 326–335, 2003.
    [15] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000.
    [16] G. Grahne and J. Zhu, “Fast algorithms for frequent itemset mining using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347–1362, 2005.
    [17] J. Lu, W. Xu, K. Zhou, and Z. Guo, “Frequent itemset mining algorithm based on linear table,” Journal of Database Management, vol. 34, no. 1, p. 1–21, 2023.
    [18] Z.-H. Deng and S.-L. Lv, “PrePost+ : An efficient N-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning,” Expert Systems with Applications, vol. 42, no. 13, pp. 5424–5432, 2015.
    [19] Z.-H. Deng and S.-L. Lv, “Fast mining frequent itemsets using Nodesets,” Expert Systems with Applications, vol. 41, no. 10, pp. 4505–4512, 2014.
    [20] B. Vo, T.-P. Hong, and B. Le, “DBV-Miner: A dynamic bit-vector approach for fast mining frequent closed itemsets,” Expert Systems with Applications, vol. 39, no. 8, pp. 7196–7206, 2012.
    [21] M. Zaki and C.-J. Hsiao, “Efficient algorithms for mining closed itemsets and their lattice structure,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 462–478, 2005.
    [22] N. Aryabarzan and B. Minaei-Bidgoli, “NECLATCLOSED: A vertical algorithm for mining frequent closed itemsets,” Expert Systems with Applications, vol. 174, no. 114738, 2021.
    [23] J. Liu, Z. Ye, X. Yang, X. Wang, L. Shen, and X. Jiang, “Efficient strategies for incremental mining of frequent closed itemsets over data streams,” Expert Systems with Applications, vol. 191, no. 116220, 2022.
    [24] H. Moonesinghe, S. Fodeh, and P.-N. Tan, “Frequent closed itemset mining using prefix graphs with an efficient flow-based pruning strategy,” in IEEE Sixth International Conference on Data Mining (ICDM’06), pp. 426–435, 2006.
    [25] A. Salleb, Z. Maazouzi, and C. Vrain, “Mining maximal frequent itemsets by a boolean based approach,” in ECAI’02: Proceedings of the 15th European Conference on Artificial Intelligence, pp. 385–389, 2002.
    [26] G. Grahne and J. Zhu, “High performance mining of maximal frequent itemsets,” in Proc. 6th Int. Workshop High Perform. Data Mining, vol. 16, 2003.
    [27] K. Gouda and M. J. Zaki, “GenMax: An efficient algorithm for mining maximal frequent itemsets,” Data Mining and Knowledge Discovery, pp. 223–242, 2005.
    [28] T. Uno, M. Kiyomi, and H. Arimura, “LCM ver. 2: Efficient mining algorithms for frequent/closed/ maximal Itemsets,” in Proc. Workshop Frequent Itemset Mining Implementations, vol. 126, 2004.
    [29] F.-Z. Chen and M.-Q. Li, “A two-way hybrid algorithm for maximal frequent itemsets mining,” in Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 3, pp. 499–503, 2007. [30] J. Qian and F. Ye, “Mining maximal frequent itemsets with frequent pattern list,” Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 1, p. 628–632, 2007.
    [31] M. M. J. Kabir, S. Xu, B. H. Kang, and Z. Zhao, “GeneticMax: An efficient approach to mining maximal frequent itemsets based on genetic algorithms,” Journal of Information Technology in Industry, vol. 3, no. 3, pp. 64–73, 2015.
    [32] B. Vo, S. Pham, T. Le, and Z.-H. Deng, “A novel approach for mining maximal frequent patterns,” Expert Systems with Applications, vol. 73, pp. 178–186, 2017.
    [33] S. M. Fatemi, S. M. Hosseini, A. Kamandi, and M. Shabankhah, “CL-MAX: A clustering-based approximation algorithm for mining maximal frequent itemsets,” International Journal of Machine Learning and Cybernetics, vol. 12, no. 2, p. 365–383, 2021.
    [34] J. Liu, J. Li, F. Ni, X. Xia, S. Li, and W. Dong, “An algebraic semigroup method for discovering maximal frequent itemsets,” Open Mathematics, vol. 20, no. 1, pp. 1432–1443, 2022.
    [35] D. J. Chai, L. Jin, B. Hwang, and K. H. Ryu, “Frequent pattern mining using bipartite graph,” in 18th International Workshop on Database and Expert Systems Applications (DEXA 2007), pp. 182–186, 2007.
    [36] Manju and C. Kant, “Graph based approach for finding frequent itemsets to discover association rules,” International Journal of Innovations in Engineering and Technology (IJIET), vol. 3, no. 2, pp. 269– 274, 2013.
    [37] K. Singh, H. K. Shakya, and B. Biswas, “Discovery of multi-frequent patterns using directed graph,” in Emerging Research in Computing, Information, Communication and Applications, vol. 2, pp. 153– 162, 2016.
    [38] B. Liu and J. Pan, “A graph-based algorithm for mining maximal frequent itemsets,” in Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 7, pp. 263–267, 2007.
    [39] Z. Halim, O. Ali, and M. G. Khan, “On the efficient representation of datasets as graphs to mine maximal frequent itemsets,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1674–1691, 2021.
    [40] Y. Xie and P. S. Yu, “Max-Clique: A top-down graph-based approach to frequent pattern mining,” in 2010 IEEE International Conference on Data Mining, pp. 1139–1144, 2010.
    [41] K.-W. Chon, S.-H. Hwang, and M.-S. Kim, “GMiner: A fast GPU-based frequent itemset mining method for large-scale data,” Information Sciences, vol. 439, pp. 19–38, 2018.
    [42] K.-W. Chon, E. Yi, and M.-S. Kim, “SGMiner: A fast and scalable GPU-based frequent pattern miner on SSDs,” IEEE Access, vol. 10, pp. 62502–62519, 2022.
    [43] A. A. Zoraghchian, M. K. Sohrabi, and F. Yaghmaee, “Exploiting parallel graphics processing units to improve association rule mining in transactional databases using butterfly optimization algorithm,” Cluster Computing: The Journal of Networks, Software Tools and Applications, vol. 24, no. 4, pp. 3767–3778, 2021.
    [44] A. A. Zoraghchian, M. K. Sohrabi, and F. Yaghmaee, “Parallel frequent itemsets mining using distributed graphic processing units,” Multimedia Tools and Applications, vol. 81, no. 30, pp. 43873– 43895, 2022.
    [45] X. Han, X. Liu, J. Li, and H. Gao, “Efficient top-k high utility itemset mining on massive data,” Information Sciences, vol. 557, pp. 382–406, 2021.
    [46] J. He, X. Han, J. Wang, and K. Zhang, “Efficient high-utility occupancy itemset mining algorithm on massive data,” Expert Systems with Applications, vol. 210, no. 118329, 2022.
    [47] H. Kim, U. Yun, Y. Baek, J. Kim, B. Vo, E. Yoon, and H. Fujita, “Efficient list based mining of high average utility patterns with maximum average pruning strategies,” Information Sciences, vol. 543, pp. 85–105, 2021.
    [48] M. Ashraf, T. Abdelkader, S. Rady, and T. F. Gharib, “TKN: An efficient approach for discovering topk high utility itemsets with positive or negative profits,” Information Sciences, vol. 587, pp. 654–678, 2022.
    [49] A. Shah and Z. Halim, “On efficient mining of frequent itemsets from big uncertain databases,” Journal of Grid Computing, vol. 17, no. 4, pp. 831–850, 2019.
    [50] H. Nguyen, D. Vo, H. Bui, T. Le, and B. Vo, “An efficient approach for mining weighted uncertain interesting patterns,” Information Sciences, vol. 615, pp. 1–23, 2022.
    [51] R. Davashi, “ILUNA: Single-pass incremental method for uncertain frequent pattern mining without false positives,” Information Sciences, vol. 564, pp. 1–26, 2021.
    [52] J. Li, H. Li, D. Soh, and L. Wong, “A correspondence between maximal complete bipartite subgraphs and closed patterns,” PKDD 2005: Knowledge Discovery in Databases: PKDD 2005, pp. 146–156, 2005.
    [53] Y. Zhang, C. A. Phillips, G. L. Rogers, E. J. Baker, E. J. Chesler, and M. A. Langston, “On finding bicliques in bipartite graphs: A novel algorithm and its application to the integration of diverse biological data types,” BMC Bioinformatics, vol. 15, no. 110, 2014.
    [54] “Frequent Itemset Mining Dataset Repository.” http://fimi.uantwerpen.be/data/.

    無法下載圖示 全文公開日期 2034/01/15 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE