資料串流中尋找關聯法則之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	宋子康 Zi-kong Song
論文名稱：	資料串流中尋找關聯法則之研究 A research on mining association rules in data stream
指導教授：	呂永和 Yung-Ho Lu
口試委員:	楊維寧 Wei-Ning Yang 葉耀明 Yao-Ming Yeh
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2006
畢業學年度：	94
語文別：	中文
論文頁數：	62
中文關鍵詞：	資料串流、異動頻繁、資料探勘、關聯規則
外文關鍵詞：	Data Stream
相關次數：	點閱：229 下載：8
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

資料串流是現今的一個熱門研究主題。相較於傳統靜態資料庫，資料串流的資料擁有資料異動頻繁、資料流入量龐大、使用者需要即時回應等特性。本文提出一個適用於資料串流的關聯規則演算法，利用將交易資料庫編碼成少量的資料，然後直接對主記憶體中的壓縮資料進行資料探勘，探勘過程並不需要解壓縮。該編碼容易更新維護，對於每次資料模型的改變我們只需要對更新的部份做編碼的維護即可達到探勘的需求。經實驗證明，在資料串流異動頻繁的環境假設下，本文提出的演算法優於目前常見的演算法。

Due to its large number of applications, association rules mining on data streams has received much attention in the recent years. Compared to a static database, a data stream is characterized by its high data update frequency, unlimited data rate, and real-time response requirements. These requirements make the traditional association rules mining algorithms fail. In this paper, we propose a novel algorithm, called “Transaction- Run-Length-Encoding”, to encode and to mine useful association rules from data streams. To study the performance of the proposed algorithm, we have conducted a series of experiments. The experimental results show that the proposed algorithm outperforms other algorithms for mining association rules in a fast changing data stream.

中文摘要	I
Abstract	II
目錄	III
圖表目錄	V
第一章　序論	1
1研究背景	1
2資料串流與資料探勘	3
Landmark model	4
Time-fading model	5
Pattern tree and Tilted-time window model	5
Sliding-Ｗindow model	7
１.3資料探勘的分類	8
關聯規則(Assocation Rule)	9
分群(Clustering)	9
序列型樣(Sequential Patterns)	9
分類(Classification)	10
4研究動機	10
5論文架構	11
第二章 相關研究	13
1 Apriori演算法探討	15
1.1Apriori演算法運作	15
1.2 Apriori演算法範例	18
1.3 Apriori演算法缺點	21
2 Fp-Growth演算法探討	21
2.1 Fp-Tree的建置與演算法流程	21
2.2 FP-Growth演算法之缺失	26
第三章  研究方法	27
1 編碼前置處理－建構、新增、與刪除	28
2資料探勘演算法	39
3系統流程	45
第四章 效能分析與實驗結果	47
1實驗環境建置	47
1.1軟硬體環境	47
1.2 測試資料庫	48
2交易變動長度編碼與Fp-tree比較	50
3 TRLE之效能分析評估	52
3.1不同平均交易長度下的評估	52
3.2不同項目數量下的評估	53
3.3不同支持度下的評估	54
第五章	結論與未來展望	56
參考文獻	58

                                

[1] R. Agawal, T. Imielinski and A. Swami. “Mining Association Rules between Sets of Items in Large Databases,” Proc. Of ACM SIGMOD, pages 207 – 216 ,May 1993.

[2] G. Manku and R. Motwani, “Approximate Frequency Counts over Data Streams,” Proc. Of VLDB Conf,. pp.346-357 ,2002 .

[3] R. Agrawal and R. Srikant (1995) , “Mining sequential patterns , “ Proceeding of The International Conference of Data Engineering , pp.3-14.　

[4] J.H. Chang and W.S. Lee, “Finding Recent Frequent Itemsets Adaptively over Online Data Streams,” Proc. Of ACM SIGKDD Conf., pp.487-492,2003.

[5] E. Cohen and M. Strauss, “Maintaining Time Decaying Stream Aggregates,” Proc. Of ACM PODS Symp., 2003.
[6]R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. Of VLDB Journal, 12(2):120-139,2003
[7] A. Arasu and G. S. Manku, “Approximate Counts and Quantiles over Sliding Windows,” Proc. Of ACM PODS Symp., 2004.
[8] D. Cheung, J. Han, V. Ng, and C.Y. Wong,”Maintenance of Discovered Association Rules in Large Databases: An Ibncremental Udating Technique,” Proc. Of ICDE Conf., 1996.
[9] G.Cormode and S. Muthukrishnaan, “What’s Hot and What’s Not: Tracking Most Frequent tems Dynamically,” Proc. Of ACM PODS Symp., pp. 296-306,2003
[10] C.Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha(eds.), Next Generation Data Mining, pp. 191-212,2003.

[11]邱士軍(2002)，”關聯規則演算法之實作和效能評估”，國立台灣科技大學資訊管理研究所碩士論文。
[12]徐雅琪(2003)，“適用於連續探勘的關聯規則演算法”，國立台灣科技大學資訊管理研究所碩士論文。
[13] J. Han and K. Kamber, Simon Fraser University (2001), Data Mining:Concepts and Techniques, 5nd ed., San Francisco: Morgan Kaufmann, pp. 230-235
[14]王守田（2002），”一個使用高頻串列具有可擴充性的關聯規則演算法“，國立台灣科技大學資訊管理研究所碩士論文。

簡易檢索 / 詳目顯示

相關論文