簡易檢索 / 詳目顯示

研究生: 何豫雯
Yu-Wen Ho
論文名稱: 連續的時效性頻繁樣本演算法研究
An Efficient Algorithm for Continuously Maintaining Temporal Frequent Patterns
指導教授: 邱舉明
Ge-Ming Chiu
口試委員: 金台齡
Tai-Lin Chin
項天瑞
Tien-Ruey Hsiang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 42
中文關鍵詞: 時效性頻繁樣本漸進式資料探勘
外文關鍵詞: Temporal Frequent Pattern, Incremental data mining
相關次數: 點閱:249下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來手持智慧行動裝置的普及,使得記錄使用者資訊(如行動軌跡、購買商品等)的資料庫越來越多,且因使用者的數量龐大,資料紀錄將不斷增加,資料庫也會面臨必須將過於老舊的資料刪除,因此如何從這些紀錄中找出連續且有時序性的資訊即是一門重要的議題。目前已有許多針對資料庫進行資料探勘的技術與研究,但在其中大多在資料庫進行更新後皆必須重新掃描,以排除不必要之候選項目集,因此,本文提出「連續的時效性頻繁樣本演算法研究」,針對改善現有的技術做改進,將處理過的資料以建表的方式做適當的保存,使得資料庫更新時不需重新掃描舊有資料,以更有效率的得到連續的時效性頻繁樣本結果。論文最後也以實驗結果說明,本論文所提出的方法相較於現有的方法能更有效率地找出頻繁樣本。


In recent years, with the advanced development of mobile devices and smart phones, information associated with users' trajectory paths and purchase lists are increasing. The more users, the more data needed to be recorded to the databases. Moreover, finding some useful information such as sequential pattern or frequent pattern from the large amount of data has been a key issue in several years. In incremental data mining, there have been lots of related researches in the literature. But, they still need to rescan the data segment which will be removed. The operations of rescanning the data segment can avoid them excluding the frequent pattern which they want to get, but some of the operations are useless. Therefore, we propose a method named “an efficient algorithm for continuously maintaining temporal frequent patterns”, which can have a good performance on solving the problem. We utilize the candidates table to store the information which can help us find the temporal frequent Pattern efficiently. Finally, experiment results show that our algorithm outperforms other algorithms.

摘要................................................................................................................................ I Abstract ......................................................................................................................... II 致謝............................................................................................................................... III 圖目錄 ...........................................................................................................................V 第一章 緒論............................................................................................................ 1 1.1 背景............................................................................................................ 1 1.1.1 資料探勘 ....................................................................................... 1 1.1.2 漸進式資料探勘 ........................................................................... 3 1.2 論文目標 ................................................................................................... 6 1.3 論文架構 ................................................................................................... 7 第二章 相關研究 ................................................................................................... 9 2.1 以Apriori 為架構 ..................................................................................... 9 2.2 以區段處理 ............................................................................................. 11 2.3 以FUP 為架構 ........................................................................................ 11 2.4 SWF 演算法 ............................................................................................ 14 2.5 小結.......................................................................................................... 16 第三章 連續的時效性頻繁樣本演算法 ............................................................. 17 3.1 定義.......................................................................................................... 17 3.1.1 移動路徑 ..................................................................................... 17 3.1.2 時效性資料處理區塊 ................................................................. 18 3.1.3 頻繁路徑 ..................................................................................... 18 3.1.4 候選項目表 ................................................................................. 19 3.1.5 問題定義 ..................................................................................... 19 3.2 方法概述 ................................................................................................. 20 3.3 例子說明 ................................................................................................. 23 3.4 方法正確性驗證 ..................................................................................... 26 第四章 實驗與效能分析 ..................................................................................... 28 4.1 實驗環境設定 ......................................................................................... 28 4.2 實驗結果 ................................................................................................. 30 4.3 小結.......................................................................................................... 38 第五章 結論與未來展望 ..................................................................................... 39 重要參考文獻 ............................................................................................................. 40

[1] J. Han, H. Cheng, D. Xin, and X. Yan, "Frequent pattern mining: current status and future directions", Data Mining and Knowledge Discovery, 2007.
[2] D. W. Cheung, J. Han, V. T. Ng and C. Y. Wong, "Maintenance of discovered association rules in large databases: An incremental updating technique", in Proceedings 12th Int. Conf. on Data Engineering, New Orleans, Louisiana, 1996.
[3] G. Piatetsky-Shapiro and W. J. Frawley, "Knowledge Discovery in Databases", AAAI/MIT Press, 1991.
[4] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, "Advances in Knowledge Discovery and Data Mining", AAAI/MIT Press, 1995.
[5] R. Agrawal, T. Imielinski, and A. Swami. "Mining Association Rules between Sets of Items in Large Databases", in Proceedings 1993 ACM-SIGMOD Int. Conf. Management of Data, pp. 207-216, May 1993.
[6] R. Agrawal, and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases", in Proceedings of the 20th VLDB conference, pp. 487-499, Santiago, Chile, 1994.
[7] R. Agrawal, T. Imielinski, and A. Swami, "Mining sequential patterns", in Proceedings 11th International Conference on Data Engineering, pp. 3-14, March 1995.
[8] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo, "Fast Discovery of Association Rules in Large Database", Advance in Knowledge Discovery and Data Mining, pp. 307-328. AAAI Press, 1996.
[9] J. S. Park, M. S. Chen, P. S. Yu, "Using hash-based method with transaction trimming for mining association rules", IEEE Trans, Knowledge and Data Engineering, 9 (5), pp. 813-825, 1997.
[10] S. Brin, R. Motwani, J.D. Ullman, S. Tsur, "Dynamic itemset counting and implication rules for market basket data", ACM SIGMOD Rec. 26 (2), pp.255-264, 1997.
[11] R. Srikant, R. Agrawal, "Mining generalized association rules", in Proceedings of the 21st International Conference on Very Large Bases, pp. 407-419, September 1995.
[12] H. Toivonen, "Sampling large databases for association rules", in Proceedings of the 22nd VLDB Conference, pp. 134-145, September 1996.
[13] J. S. Park, M. S. Chen, and P. S. Yu, "An effective hash-based algorithm for mining association rules", in Proceedings 1995 ACM-SIGMOD Int. Conf. Management of Data, San Jose, CA, May 1995.
[14] A. Savasere, E. Omiecinski, S. Navathe, "An efficient algorithm for mining association rules in large database", in Proceedings of the 21st International Conference on Very Large Data Bases, pp. 432-444, September 1995.
[15] J. L. Lin, M.H. Dunham, "Mining association rules: anti-skew algorithm", in Proceedings of 1998 International Conference on Data Engineering, pp.486-493, 1998.
[16] A. Mueller, "Fast sequential and parallel algorithms for association rule mining: a comparison", Technical Report CS-TR-3515, Department of Computer Science, University of Maryland, College Park, MD, 1995.
[17] C. H. Lee, M. S. Chen, C. R. Lin, "Progressive Partition Miner: An Efficient Algorithm for Mining General Temporal Association Rules", IEEE Transaction on Knowledge and Data Engineering, Vol. 15, No. 4, July/August 2003.
[18] D.W. Cheung, S.D. Lee and B. Kao, "A General Incremental Technique for Maintaining Discovered Association Rules", in Proceedings of the 5th International Conference on Database Systems for Advanced Applications, Melbourne, Australia, pp. 185–194, April 1997.
[19] C.H. Lee, C.R. Lin and M.S. Chen, "Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining", in Proceedings of the 10th International Conference on Information and Knowledge Management, Atlanta, GA, pp. 263–270, November 2001 .
[20] C.H. Lee, C.R. Lin and M.S. Chen, "Sliding window filtering: an efficient method for incremental mining on a time-variant database", Information Systems, 30, pp. 227-244, 2005.
[21] J. W. Huang, B. R. Dai, M. S. Chen, "Twain: Two-End Association Miner with Precise Frequent Exhibition Periods", ACM Transaction on Knowledge Discovery from Data, vol. 1, Number 2, Article 8, August 2007.
[22] T. F. Gharib, M. Taha, H. Nassar, "An efficient technique for incremental updating of association rules", the International Journal of Hybrid Intelligent Systems 5, pp. 45-53, January 2008.
[23] T. F. Gharib, H. Nassar, M. Taha, A. Abraham, "An efficient algorithm for incremental mining of temporal association rules", Data & Knowledge Engineering 69, pp.800-815, 2010.
[24] http://www.almaden.ibm.com/cs/disciplines/iis/

QR CODE