簡易檢索 / 詳目顯示

研究生: 林東蓁
Tung-chen Lin
論文名稱: P2P網路上預測式大項目集探勘的研究
A Predictive Maximal Frequent Itemset Mining Algorithm On P2P Network
指導教授: 陳秋華
Chyou-Hwa Chen
口試委員: 李育杰
none
項天瑞
none
鄧惟中
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 51
中文關鍵詞: Apriori.p2pmaximal frequent itemset
外文關鍵詞: maximal frequent itemset, predictive, P2P Networks
相關次數: 點閱:123下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資料探勘的領域中,探勘關連規則扮演著相當重要的地位。本篇論文以Apriori為基礎,在p2p(Peer to Peer)網路上探勘最大頻繁項目集(Maximal Frequent Itemsets)。其演算法分為兩階段,第一階段利用收集各節點局部的最大頻繁項目集(Local Maximal Frequent Itemsets),並增加預測方式以產生部份的頻繁與非頻繁項目集,以利於在第二階段對產生的候選項目集做刪除的動作,來減少需要驗證的候選項目集來達到減少網路上封包的數量。而第二階段演算法則是以Apriori為主,但不同於Apriori的是增加預測較長的候選項目集,得到較長的頻繁項目集以增加下一個長度的候選項目集刪除的個數。從實驗結果顯示,不論是在密集資料庫或者稀疏資料庫,都能大量的減少網路上封包的傳輸量。


    Association rule mining is one of most popular problems in the field of data mining. We propose a novel algorithm for distributed maximal frequent itemset mining in unstructured peer-to-peer networks based on a bottom-up Apriori-like process. The algorithm combines with two part. First, the master node collects local maximal frequent itmesets from others and discovers a set of maximal frequent itemsets and minimal non-frequent itmesets for subset infrequency pruning strategy and superset frequency pruning strategy. In the second stage, it is based on a bottom-up Apriori-like process. Unlike Apriori, this stage generates longer candidates at each level. Correctly predicted long itemsets are used for superset frequency pruning. An extensive set of experiments show that our proposed strategy is effective in candidate prediction, thus requiring significantly less message exchange than traditional approaches.

    誌謝i 中文摘要ii ABSTRACTiii CONTENTSiv LIST OF FIGURESvi LIST OF TABLESvii Chapter 1介紹1 Chapter 2相關議題2 2.1相關定理2 2.2Apriori 演算法4 Chapter 3演算法9 3.1分散式CD演算法9 3.2演算法11 3.2.1第一階段演算法12 3.2.2第二階段演算法13 3.3演算法範例11 Chapter 4實驗效能15 Chapter 5結語與未來展望21 REFERENCE27

    [1]R. Agrawal and J. Shafer. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6):962 . 969, 1996
    [2]R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proceedings of the 20th International Conference on Very Large Data Bases, (VLDB'94), pp. 487-499, Sep. 1994
    [3]Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l. Conference on Management of Data, pages 207.216, Washington, D.C., June 1993
    [4]R. J. Bayardo Jr. Efficiently mining long patterns from databases. In SIGMOD, 1998.
    [5]Y. Birk, L. Liss, A. Schuster, and R. Wolff. A local algorithm for ad hoc majority voting via charge fusion. In Proc. of DISC, October 2004
    [6]B. Bloom. Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, 1970
    [7]Bonomi, Flavio; Mitzenmacher, Michael; Panigrahy, Rina; Singh, Sushil; Varghese, George (2006), "An Improved Construction for Counting Bloom Filters", Algorithms – ESA 2006, 14th Annual European Symposium, 4168, pp. 684–695
    [8]S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In ACM SIGMOD, 1997
    [9]S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Trans. Info. Theory, vol. 52, no. 6, 2006
    [10]D. Cheung and Y. Xiao. Effect of data skewness in parallel mining of association rules. In 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 48 . 60, April 1998.

    無法下載圖示 全文公開日期 2017/08/06 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE