簡易檢索 / 詳目顯示

研究生: 黃協瑜
SHEA-YU YELLOW
論文名稱: P2P網路上maximal 項目集探勘之研究
Predictive Search Approach to Maximal Frequent Itemset Mining on P2P Networks
指導教授: 陳秋華
Chyou-Hwa Chen
口試委員: 邱舉明
none
李育杰
none
戴碧如
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 45
中文關鍵詞: maximal frequent itemsetpredictiveP2P Networks
外文關鍵詞: maximal frequent itemset, predictive, P2P Networks
相關次數: 點閱:216下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文擬針對分散式最高的大項目集(Maximal Frequent Itemsets )探勘的問題作深入研究。文獻中對並行式最高的大項目集探勘的問題有較多探討,但對分散式最高的大項目集探勘的問題甚少著墨,唯一提出之方法乃是一個在非階層式的網路上,以流言謠傳(/傳染)通信類協定(gossip/epidemic protocol) 為基礎,達到項目集探勘的演算法。但此方法效率非常不佳,演算法具有收斂時間很長、過程中產生指數(2n)個candidate項目集、僅能處理很小的資料集、且僅能在很小的網路上執行等缺點。 本論文探討以階層式資料匯集樹為核心,作為發展分散式最高的大項目集探勘演算法基礎的相關研究議題。依據階層式資料匯集樹,本論文提出數個值得探討的方向,及需要解決的相關問題,初步實驗結果顯示我們的設計具有速度快、產生的candidate項目集數量少,且可以處理大型資料集等優點。 本論文以模擬器為實驗的基礎,分析比較在分散式系統中求取最高的大項目集的方法。


    Predictive Search Approach to Maximal Frequent Itemset Mining on P2P Networks

    誌謝 摘要 目錄 圖表目錄 1 Introduction …………………………………………………………………………1 1.1 前言 ………………………………………………………………………………1 1.2 動機 ………………………………………………………………………………4 2 分散式網路架構 ………………………………………………………………………9 3 現行方法 ………………………………………………………………………………12 3.1 集中式系統 ………………………………………………………………12 3.2 分散式系統 ………………………………………………………………13 4 相關議題 ………………………………………………………………………………14 5 改善方法 ………………………………………………………………………………15 5.1 想法 ......................................................15 5.2 預測方法 …………………………………………………………………16 5.2.1 support平均值預測法 ……………………………16 5.2.2 support gap 預測法 ………………………………18 5.3 兩種預測法法的比較 ……………………………………………………19 5.4 support 平均預測法的探究 ……………………………………………21 5.5 support gap 預測方法的探究 …………………………………………24 5.6 預測方法的改進 …………………………………………………………26 6 改善演算法 ……………………………………………………………………………28 6.1 符號表示 …………………………………………………………………28 6.2 演算法架構 ………………………………………………………………28 6.3 Apriori+Topk演算法 ……………………………………………………30 6.4 演算法範例 ………………………………………………………………32 7 實驗效能 ………………………………………………………………………………36 IV 7.1 環境參數 ……………………………………………………………36 7.2 實驗效能評估 ………………………………………………………37 8 結語 …………………………………………………………………………………42 9 未來展望 ……………………………………………………………………………42 10 參考文獻 ………………………………………………………………………… 43

    參考文獻
    1. R. Agrawal and J. Shafer. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6):962 . 969, 1996
    2. R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proceedings of the 20th International Conference on Very Large Data Bases, (VLDB'94), pp. 487-499, Sep. 1994
    3. Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l. Conference on Management of Data, pages 207.216, Washington, D.C., June 1993
    4. R. J. Bayardo Jr. Efficiently mining long patterns from databases. In SIGMOD, 1998.
    5. Y. Birk, L. Liss, A. Schuster, and R. Wolff. A local algorithm for ad hoc majority voting via charge fusion. In Proc. of DISC, October 2004
    6. B. Bloom. Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, 1970
    7. Bonomi, Flavio; Mitzenmacher, Michael; Panigrahy, Rina; Singh, Sushil; Varghese, George (2006), "An Improved Construction for Counting Bloom Filters", Algorithms – ESA 2006, 14th Annual European Symposium, 4168, pp. 684–695
    8. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In ACM SIGMOD, 1997
    9. S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Trans. Info. Theory, vol. 52, no. 6, 2006
    10. Toon Calders, Bart Goethals,.Non-derivable itemset mining.- In: Data mining and knowledge discovery, 14:1(2007), p. 171-206
    11. Toon Calders, Bart Goethals, Depth-first non-derivable itemset mining.- In: Proceedings of the SIAM International Conference on Data Mining, s.l., , 2005, p. 250-261
    12. Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker. Making Gnutella-like P2P Systems Scalable. In SIGCOMM, pages 407–418, 2003
    13. D. Cheung and Y. Xiao. Effect of data skewness in parallel mining of association rules. In 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 48 . 60, April 1998.
    14. D.W. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu. A fast distributed algorithm for mining association rules. In Proc. of the Int'l. Conf. on Parallel and Distributed Information Systems, pages 31 . 44, Miami Beach, Florida, December 1996
    15. Laukik Chitnis, Alin Dobra, Sanjay Ranka: Fault tolerant aggregation in heterogeneous sensor networks, Journal of Parallel and Distributed Computing 69(2):(Feb 2009) 210--219, 10
    16. Laukik Chitnis, Alin Dobra, Sanjay Ranka: Aggregation methods for large-scale sensor networks. IEEE Transactions of Sensor Networks 4(2):(2008)
    17. Laukik Chitnis, Alin Dobra, Sanjay Ranka. Analyzing the multiple aggregation trees technique for fault tolerance in sensor networks. In proceedings of International Conference on Information Systems, Technology and Management (ICISTM 2007), New Delhi, India, March 2007. pg. 269-279.
    18. Soon Myoung Chung, Congnan Luo, Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowledge and Information Systems, 16(3):359–391, (2008)
    19. Demers, A.J., Greene, D.H., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H.E., Swinehart, D.C., Terry, D.B.: Epidemic algorithms for replicated database maintenance. In: Proceedings of the Principles of Distibuted Computing (PODC), pp. 1–12 (1987)
    20. L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable widearea Web cache sharing protocol. IEEE/ACM Trans. on Networking, 8(3):281-293, 2000
    21. Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):352 . 377, 2000
    22. J. Han, and M. Kamber, Data Mining Concepts and Techniques, San Francisco, Morgan Kaufmann, 2001
    23. J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1-12, May 2000
    24. Gian Paolo Jesi, Alberto Montresor, and Ozalp Babaoglu, Proximity-aware superpeer overlay topologies, IEEE Transactions on Network and Service Management, 4(2):74-83, September 2007
    25. Lujun Jia, Guevara Noubir, Rajmohan Rajaraman, Ravi Sundaram, Group-Independent Spanning Trees for Data Aggregation in Sensor Networks, IEEE International Conference on Distributed Computing in Sensor Systems, June 2006
    26. Michael Kleis, Eng Keong Lua, Hierarchical Peer-to-Peer Networks Using Lightweight SuperPeer Topologies, 10th IEEE Symposium on Computers and Communications (IEEE ISCC'05), Cartagena, Spain, June 27-30, 2005, pp. 143-148
    27. Denis Krivitski, Assaf Schuster, Ran Wolff, “A Local Facility Location Algorithm for Large-scale Distributed Systems,” Journal of Grid Computing, 5(4): 361-378, 2007
    28. P Luo, H Xiong, K Lü, Z Shi, “Distributed classification in peer-to-peer networks,” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007
    29. A. Montresor, “A robust protocol for building superpeer overlay topologies,” in Proc. of the 4th Int. Conf. on Peer-to-Peer Computing. Zurich, Switzerland: IEEE, Aug. 2004
    30. Osmar R. Zaiane, Mohammad El-Hajj, and Paul Lu. Fast parallel association rules mining without candidacy generation. In IEEE 2001 International Conference on Data Mining, pages 665.668, 2001
    31. A. Schuster and R. Wolff. Communication-efficient distributed mining of association rules. In Proc. of the ACM SIGMOD Int'l. Conference on Management of Data, pages 473 . 484, Santa Barbara, California, May 2001.
    32. Ran Wolff and Assaf Schuster, “Association Rule Mining in Peer-to-Peer Systems,” IEEE Transactions on Systems, Man, and Cybernetics, part B, 34(6), 2004
    33. B. Yang and H. Garcia-Molina, “Designing a super-peer network,” in Proc. IEEE International Conference on Data Engineering (ICDE’03),2003
    34. Li Xiao, Zhenyun Zhuang, and Yunhao Liu, "Dynamic Layer Management in Superpeer Architectures", IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 11, November 2005, Pages 1078-1091
    35. J. Xu Yu, Z. Chong, H. J. Lu and A. Zhou. False Positive or False Negative: Mining Frequent Itemsets form High Speed Transactional Data Streams. In Proc. of 30th Intl. Conf. on Very Large Data Bases, pages 204-215, 2004
    36. G. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge, Massachusetts, 1949.
    37. http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html
    38. http://fimi.cs.helsinki.fi/data/
    39. http://www.ics.uci.edu/~mlearn

    無法下載圖示 全文公開日期 2015/08/02 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE