簡易檢索 / 詳目顯示

研究生: 姜弘霖
Hung-lin Jiang
論文名稱: 在資料串流環境下的Top-K序列型樣探勘
Top-K Sequential Pattern Mining in the Data Stream
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 楊得年
De-Nian Yang
吳怡樂
Yi-Leh Wu
陳建錦
Chien-Chin Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 49
中文關鍵詞: 資料探勘資料串流Top-K序列型樣探勘
外文關鍵詞: Data mining, data stream, Top-K sequential patterns
相關次數: 點閱:186下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 序列型樣探勘是一個從資料序列中找出有用型樣的探勘方式。最常見的例子為網路記錄存取序列,業主通常可藉由這樣的資訊找出使用者在瀏覽網頁共有的瀏覽方式,進而分析內容而採取商用行為。目前存在的在資料串流中Top-K型樣探勘通常是為非序列型樣,因此在我們的架構中,主要是針對Top-K型樣探勘,讓使用者決定合適的候選者型樣。我們提供的方法為在探勘資料串流中不同批的過程中自動調整最小支持度的大小以達到取得想要的候選者型樣,並將候選者型樣以樹狀結構維持,方便取得最後Top-K型樣的結果。實驗結果說明所提出的方式為有效且規模可變的。


    Sequential pattern mining is a process of extracting useful patterns in data sequences. A popular example is web log access sequences. Proprietors usually use such information to find the habits of the users by observing the web pages that they often surf to perform commercial activities. Existing works on mining Top-K patterns on data streams are mostly for non-sequential patterns. In our framework, we focus on the topic of Top-K sequential pattern mining, where users can obtain adequate amount of interesting patterns. The proposed method can automatically adjust the minimum support during mining each batch in the data stream and obtain candidate patterns. Then candidate patterns are maintained by a tree structure for extracting Top-K sequential patterns. Empirical results show that the proposed method is efficient and scalable.

    摘要 Abstract List of Contents List of Tables List of Figures Chapter 1 Introduction 1.1 Background 1.2 Summaries of methods 1.3 The Organization of this Thesis Chapter 2 Related Works 2.1 Traditional Mining in Large Database 2.2 Mining in Incremental Environment and Data Stream Chapter 3 Methodology 3.1 Predefine Minimum Support of a Batch 3.2 Generate Candidates and Adjust Threshold Table 3.3 Construct Candidate Tree and Mine Top-K Patterns 3.4 An Example Application Chapter 4 Experiment 4.1 Experiment of Threshold Estimation 4.2 The Efficiency Experiment in Dense Dataset 4.3 The Efficiency Experiment in Sparse Dataset 4.4 Summaries Chapter 5 Conclusion References

    [1] R. Agrawal and R. Srikant, “Mining sequential patterns,” In Proc. 1995 Int. Conf. Data Engineering (ICDE’95), pp. 3–14, Taipei, Taiwan, Mar. (1995)
    [2] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” In Proc. 5th Int. Conf. Extending Database Technology (EDBT’96), pp. 3–17, Avignon, France, Mar. (1996)
    [3] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, “Freespan: Frequent pattern-projected sequential pattern mining,” In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pp. 355–359, Boston, MA, Aug. (2000)
    [4] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth,” In Proc. 2001 Int. Conf. Data Engineering (ICDE’01), pp. 215–224, Heidelberg, Germany, April (2001)
    [5] Ming-Yen Lin , Suh-Yin Lee, “Fast Discovery of Sequential Patterns by Memory Indexing,” In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, pp. 150-160, September 04-06, (2002)
    [6] M. J. Zaki, M.J. Zaki, “SPADE: an efficient algorithm for mining frequent sequences,” Machine Learning, Vol. 1, No. 1~2, pp. 31-60, (2001)
    [7] Zhenglu Yang, Masaru Kitsuregawa, Yitong Wang, “PAID: Mining Sequential Patterns by Passed Item Deduction in Large Databases,” ideas, pp. 113-120, 10th International Database Engineering and Applications Symposium (IDEAS'06), (2006)
    [8] Jiawei Han, Jianyong Wang, Ying Lu, Petre Tzvetkov, “Mining Top.K Frequent Closed Patterns without Minimum Support,” icdm, pp. 211, Second IEEE International Conference on Data Mining (ICDM'02), (2002)
    [9] Petre Tzvetkov, Xifeng Yan, Jiawei Han, “TSP: Mining Top-K Closed Sequential Patterns,” icdm, pp. 347, Third IEEE International Conference on Data Mining (ICDM'03), (2003)
    [10] Jianyong Wang, Jiawei Han, Ying Lu, Petre Tzvetkov, “TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 652-664, May. (2005)
    [11] Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque, “CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Patterns,” icdm, pp. 274-281, Fifth IEEE International Conference on Data Mining (ICDM'05), (2005)
    [12] Carson Kai-Sang Leung, Quamrul I. Khan, “DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams,” icdm, pp. 928-932, Sixth IEEE International Conference on Data Mining (ICDM'06), (2006)
    [13] Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-Soo Jeong, and Young-Koo Lee, “CP-Tree: A Tree Structure for Single-Pass Frequent Pattern Mining,” Advances in Knowledge Discovery and Data Mining, Volume 5012/2008, pp. 1022-1027, Springer-Verlag Berlin Heidelberg, (2008)
    [14] M. El-Sayed, C. Ruiz, and E. Rundensteiner, “FS-Miner: Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logs,” In 6th ACM CIKM WIDM International Workshop on Web Information and Data Management, pp. 128–135, (2004)
    [15] C.I. Ezeife, Mostafa Monwar, “SSM: A Frequent Sequential Data Stream Patterns Miner,” Computational Intelligence and Data Mining (CIDM 2007), pp 120-126, (2007)
    [16] Ezeife, C.I. and Lu, Yi., “Mining Web Log sequential Patterns with Position Coded Pre-Order Linked WAP-tree,” the International Journal of Data Mining and Knowledge Discovery (DMKD), Vol. 10, pp. 5-38, Kluwer Academic Publishers, June. (2005)
    [17] Giannella, C. and Han, J. and Pei, J. and Yan, X. and Yu, P.S., “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, (2003)
    [18] Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H, “Mining access patterns efficiently from web logs,” In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00). Kyoto, Japan, (2000)
    [19] Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy, “Mining Data Streams: A Review,” ACM SIGMOD Record, v.34 n.2, pp. 18 – 26, June. (2005)
    [20] Hua-Fu Li, Suh-Yin Lee, Man-Kwan Shan, “DSM-PLW: Single-pass mining of path traversal patterns over streaming Web click-sequences,” Computer Networks: Special Issue on Web Dynamics, 50(10), pp. 1474-1487, (2006)
    [21] Hua-Fu Li, Suh-Yin Lee, Man-Kwan Shan, “DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams,” wi, pp.326-329, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), (2005)
    [22] Chin-Chuan Ho, Hua-Fu Li, Fang-Fei Kuo, Suh-Yin Lee, “Incremental Mining of Sequential Patterns over a Stream Sliding Window,” icdmw, pp.677-681, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), (2006)
    [23] Jiawei Han , Jian Pei , Yiwen Yin, “Mining frequent patterns without candidate generation,” Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, Dallas, Texas, United States. (2000)
    [24] J. Han, H. Cheng, D. Xin, and X. Yan, “Frequent pattern mining: current status and future directions,” In Data Mining and Knowledge Discovery, p. 55-86, (2007)
    [25] Mendes, L.F., Bolin Ding, and Jiawei Han, “Stream Sequential Pattern Mining with Precise Error Bounds,” Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM '08), pp. 941-946, (2008)
    [26] G. Manku and R. Motwani, “Approximate frequency counts over data streams,” Proceedings of the 28th international conference on Very Large Data Bases (VLDB’02), pp. 346-357, (2002)
    [27] A. Metwally, D. Agrawal, and A. E. Abbadi, “Efficient computation of frequent and top-k elements in data streams,” In Proceedings of the 10th ICDT International Conference on Database Theory(ICDT’05), pp. 398-412, (2005)
    [28] Raymond Chi-Wing Wong and Ada Wai-Chee Fu, “Mining Top-K Itemsets over a Sliding Window Based on Zipfian Distribution,” SIAM International Conference on Data Mining, on April 21-23, (2005)
    [29] Raymond Chi-Wing Wong and Ada Wai-Chee Fu, “Mining Top-K Frequent Itemset from Data Streams,” Journal of Data Mining and Knowledge Discovery, Volume 13, pp193-217, (2006)
    [30] Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques,” 2nd edition, Morgan Kaufmann, pp. 482, (2006)
    [31] En-Zheng Guan, Xiao-Yu Chang, Zhe Wang, and Chun-Guang Zhou, “Mining Maximal Sequential Patterns,” International Conference on Neural Networks and Brain(ICNN&B), pp. 525-528, (2005)
    [32] Xilu Wang and Weili Yao, “Sequential Pattern Mining: Optimum Maximum Sequential Patterns and Consistent Sequential Patterns,” International Conference on Integration Technology (ICIT), pp. 365-368, (2007)
    [33] Xilu Wang and Weili Yao, “Efficient Algorithms for Mining Maximal Frequent Concatenate Sequences in Biological Datasets,” In Proceedings of the 2005 The Fifth International Conference on Computer and In-formation Technology (CIT’05), pp. 98-104, (2005)
    [34] Huffman Code, http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node59.html
    [35] BMS-WebView, http://www.ecn.purdue.edu/KDDCUP/

    無法下載圖示 全文公開日期 2014/07/23 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE