研究生: |
宋子康 Zi-kong Song |
---|---|
論文名稱: |
資料串流中尋找關聯法則之研究 A research on mining association rules in data stream |
指導教授: |
呂永和
Yung-Ho Lu |
口試委員: |
楊維寧
Wei-Ning Yang 葉耀明 Yao-Ming Yeh |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 62 |
中文關鍵詞: | 資料串流 、異動頻繁 、資料探勘 、關聯規則 |
外文關鍵詞: | Data Stream |
相關次數: | 點閱:229 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料串流是現今的一個熱門研究主題。相較於傳統靜態資料庫,資料串流的資料擁有資料異動頻繁、資料流入量龐大、使用者需要即時回應等特性。本文提出一個適用於資料串流的關聯規則演算法,利用將交易資料庫編碼成少量的資料,然後直接對主記憶體中的壓縮資料進行資料探勘,探勘過程並不需要解壓縮。該編碼容易更新維護,對於每次資料模型的改變我們只需要對更新的部份做編碼的維護即可達到探勘的需求。經實驗證明,在資料串流異動頻繁的環境假設下,本文提出的演算法優於目前常見的演算法。
Due to its large number of applications, association rules mining on data streams has received much attention in the recent years. Compared to a static database, a data stream is characterized by its high data update frequency, unlimited data rate, and real-time response requirements. These requirements make the traditional association rules mining algorithms fail. In this paper, we propose a novel algorithm, called “Transaction- Run-Length-Encoding”, to encode and to mine useful association rules from data streams. To study the performance of the proposed algorithm, we have conducted a series of experiments. The experimental results show that the proposed algorithm outperforms other algorithms for mining association rules in a fast changing data stream.
[1] R. Agawal, T. Imielinski and A. Swami. “Mining Association Rules between Sets of Items in Large Databases,” Proc. Of ACM SIGMOD, pages 207 – 216 ,May 1993.
[2] G. Manku and R. Motwani, “Approximate Frequency Counts over Data Streams,” Proc. Of VLDB Conf,. pp.346-357 ,2002 .
[3] R. Agrawal and R. Srikant (1995) , “Mining sequential patterns , “ Proceeding of The International Conference of Data Engineering , pp.3-14.
[4] J.H. Chang and W.S. Lee, “Finding Recent Frequent Itemsets Adaptively over Online Data Streams,” Proc. Of ACM SIGKDD Conf., pp.487-492,2003.
[5] E. Cohen and M. Strauss, “Maintaining Time Decaying Stream Aggregates,” Proc. Of ACM PODS Symp., 2003.
[6]R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. Of VLDB Journal, 12(2):120-139,2003
[7] A. Arasu and G. S. Manku, “Approximate Counts and Quantiles over Sliding Windows,” Proc. Of ACM PODS Symp., 2004.
[8] D. Cheung, J. Han, V. Ng, and C.Y. Wong,”Maintenance of Discovered Association Rules in Large Databases: An Ibncremental Udating Technique,” Proc. Of ICDE Conf., 1996.
[9] G.Cormode and S. Muthukrishnaan, “What’s Hot and What’s Not: Tracking Most Frequent tems Dynamically,” Proc. Of ACM PODS Symp., pp. 296-306,2003
[10] C.Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha(eds.), Next Generation Data Mining, pp. 191-212,2003.
[11]邱士軍(2002),”關聯規則演算法之實作和效能評估”,國立台灣科技大學資訊管理研究所碩士論文。
[12]徐雅琪(2003),“適用於連續探勘的關聯規則演算法”, 國立台灣科技大學資訊管理研究所碩士論文。
[13] J. Han and K. Kamber, Simon Fraser University (2001), Data Mining:Concepts and Techniques, 5nd ed., San Francisco: Morgan Kaufmann, pp. 230-235
[14]王守田(2002),”一個使用高頻串列具有可擴充性的關聯規則演算法“,國立台灣科技大學資訊管理研究所碩士論文。