簡易檢索 / 詳目顯示

研究生: 陳玟君
Wen-jun Chen
論文名稱: 在動態資料庫中以較少運算重建頻繁樣式樹之方法
A Less Computation Approach to Reconstructing Frequent Pattern Trees in a Dynamic Database
指導教授: 楊英魁
Ying-Kuei Yang
口試委員: 陳俊良
Jiann-Liang Chen
黎碧煌
Bih-Hwang Lee
孫宗瀛
Tsung-Ying Sun
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 90
中文關鍵詞: 頻繁樣式樹關聯規則資料探勘異動資料庫
外文關鍵詞: FP-tree, association rules, data mining, dynamic database
相關次數: 點閱:142下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資訊科技發展快速,使得企業公司擁有越來越多獲取資訊的管道,各行各業都有其過去累積的龐大交易資料,如何利用資料探勘技術挖掘出潛藏在龐大資料庫中的有用資訊,已儼然成為一個重要的議題。交易資料庫中的交易筆數會隨著時間而日漸增加,資料量也就越來越巨大,傳統的關聯規則演算法每當交易資料庫有異動時,難免因為資料庫變動而必需重新探勘,而掃描資料庫往往是資料探勘中最花費時間的階段,若能避免重覆掃描原始資料庫,並利用上一階段所保留的探勘資訊,將有助於提升探勘的效率。

    本研究以AFPIM (Adjusting FP-Tree Structure for Incremental Mining)演算法為基礎,以保留原資料庫中頻繁項目集資訊的概念,並提出IFP-tree (Infrequent Pattern-tree) 記錄非頻繁項目,稱做 LCRFP-tree (A Less Computation Approach of Reconstructing FP-Trees) 演算法,該演算法可將所有交易資料全部壓縮在FP-tree (Frequent Pattern-tree) 與IFP-tree 結構中,因此當資料庫發生異動後只需針對異動的部分進行掃描,然後對樹狀結構進行調整,完全無需重新掃描原始資料庫且不會浪費前次探勘的結果。


    Due to rapid development of information technology that enables the enterprises to have more and more ways to get information, all industries have huge storage of their past accumulated transaction data. How to use the data mining techniques to dig out useful information hidden in large database has become an important issue. As time goes by, the number of transactions in the database will gradually increase and the amount of data will become enormously large. When transaction database has been changed, that happens often in most applications, it is necessary to perform the very time-consuming task of re-mining (meaning the re-scanning) the whole databse if traditional mining algorithm is used. The mining efficiency will be greatly improved if the task of re-scanning whole data base can be voided by reserving the information obtained at previous stage of setting up the database.

    Based on the concept of AFPIM (Adjusting FP-Tree Structure for Incremental Mining) algorithm, this thesis proposes a LCRFP-tree (A Less Computation Approach of Reconstructing FP-Trees) that not ony retains the frequent itemsets in original database but also records infrequent items by an IFP-tree (Infrequent Pattern-tree). The whole transaction database can therefore be incorporated in the proposed FP- and IFP-tres. Consequently, there is no need to re-scan the whole database when it is changed or updated, making the mining performance more efficient in processing time.

    摘要 I Abstract II 誌謝 III 圖目錄 VI 表目錄 IX 第一章 緒論 1 1-1 研究背景與動機 1 1-2 研究目的 3 1-3 研究流程 3 1-4 論文架構 4 第二章 相關文獻探討 6 2-1 資料探勘簡介 6 2-2 關聯規則探討 10 2-3 靜態探勘演算法 12 2-3.1 Apriori演算法 12 2-3.2 FP-Growth演算法 15 2-4 動態探勘演算法 21 第三章 LCRFP-tree演算法 23 3-1 基本概念 24 3-2 首次探勘處理 26 3-3 移除FP-tree中1- 非頻繁項目集節點 32 3-4 加入1- 頻繁項目集至FP-tree中 34 3-5調整FP-tree項目節點順序 35 3-6將異動資料加入或刪除 49 3-7 LCRFP-tree演算法步驟 52 第四章 實驗結果 55 4-1 實驗環境與實驗資料 55 4-2 實驗評估 56 第五章 結論與未來展望 71 5-1 研究結論 71 5-2 未來研究方向 72 參考文獻 73

    [1] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2nd edition, (2000)
    [2] 廖述賢、溫志皓,資料探勘理論與應用,博碩文化股份有限公司(2012)。
    [3] M.S. Chen, J. Han, and P.S. Yu, “data Mining: An Overview from a Database Perspective, ” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No.6, (1996)
    [4] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proc. of the 20th International Conference on Very Large DataBases, (1994).
    [5] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” in Proc. of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, pp. 1-12(2000)
    [6] Grahne, G. and Zhu, J., “Fast Algorithms for Frequent Itemset Mining Using FP-Trees,” IEEE Trans. on Knowledge and Data Engineering, Vol. 17, No. 10, pp. 1347-1362, (2005).
    [7] Koh, Jia-Ling, and Shui-Feng Shieh. “An efficient approach for maintaining association rules based on adjusting FP-tree structures,” Database Systems for Advanced Applications. Springer Berlin Heidelberg, (2004).
    [8] U.Fayyad, G.Piatetsky-Shapiro and P.smyth, “ From Data Mining to Knowledge Discovery in Database,” Cambridge, AAAI/MIT Press, (1996).
    [9] Agrawal, R., Imielinski, T. and Swami, A., “Mining Association Rules between the Sets of Items in Large Database,” in Proc. of the 1993 ACM SIGMOD, pp. 207-216(1993)
    [10] 鄭滄祥、魏志平、蕭漢威和齊玉美,「非對稱性分類分析之實證評估」,第十四屆國際資訊管理學術研討會(2003)。
    [11] Weiss, Sholom M., and Casimir A. Kulikowski. “ Computer Systems That Learn: Classification And Prediction Methods From Statistics, Neural Nets, Machine Learning And Exp.” (1990).
    [12] 陳垂呈,「以有效率分群化演算法發掘消費者最適性之產品項目」,TANET 2002 (台灣區網際網路研討會),台灣(2002)。
    [13] 陳垂呈,「應用布林運算快速分群化交易項目」,TANET 2001(台灣區網際網路研討會),台灣(2001)。
    [14] 顏秀珍、李御璽和王思穎,「從交易資料庫中挖掘客戶的購物行為」,第十四屆物件導向技術及應用研討會(2003)。
    [15] Ng, R. and Han, J., “Efficient and Effective Clustering Method for Spatial Data Mining”, in Proc. of International Conference Very Large Data Bases, pp. 144-155, (1994).
    [16] 連宏明,「從交易資料庫中發掘含有時間間隔的序列型樣」碩士論文,輔仁大學(2001)。
    [17] 賴瓊惠,「使用布林演算法找出在大型資料庫中多階層序列型樣」,碩士論文,台灣科技大學,台北(2001)。
    [18] Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” in Proc. of the 7th International Conference on Data Engineering, pp. 3-14, (1995).
    [19] 顏秀珍、李御璽、陳培炯和劉又誠,「從時序性資料中發掘非同步部份週期性規則」,2005NCS 全國計算機會議,民國94 年。
    [20] 賴春松,「提昇資料探勘效率於企業行銷之應用」,碩士論文,南台科技大學(2002)。
    [21] Ayan, N. F., Tansel, A. U. and Arkun, E., “An Efficient Algorithm to Update Large Itemsets with Early Pruning,” in Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287-291(1999)
    [22] Coenen, F., Leng, P. and Ahmed, S., “Data Structure for Association Rule Mining: T-trees and P-trees,” in Proc. of the IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 6, pp. 774-778(2004)
    [23] Srikant, R. and Agrawal, R., “Mining Generalized Association Rules,” in Proc. of the 21th International Conference on Very Large Data Bases, pp. 407-419(1995)
    [24] Liu, P. Q., Li, Z. Z. and Zhao, Y. L., “Effective Algorithm of Mining Frequent Itemsets for Association Rules,” in Proc. of the Third International Conference on Machine Learning and Cybernetics, pp. 1447-1451(2004)
    [25] Tsay, Y. J. and Chiang, J. Y., “CBAR: an Efficient Method for Mining Association Rules,” Knowledge-Based Systems, Vol. 18, pp. 99-105(2005)
    [26] Li, Z. C., He, P. L. and Lei, M., “A High Efficient AprioriTid Algorithm for Mining Association Rule,” in Proc. of the Fourth International Conference on Machine Learning and Cybernetics, pp. 1812-1815(2005)
    [27] Park, J. S., Chen, M. S. and Yu, P. S., “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” in Proc. of the IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, pp. 813-825, (1997).
    [28] 王子健,「漸進式區塊深度優先關聯法則探勘之研究」,碩士論文,中華大學(2005)。
    [29] 王慶堯,「利用準大項目集之漸進式資料挖掘」,碩士論文,義守大學(1999)。
    [30] 許智豪,「在動態資料庫中作線上挖掘關聯式法則」,碩士論文,國立中興大學資訊科學研究所(2000)。
    [31] 陳可欣,「在動態交易資料庫中探勘線上關聯法則之設計與分析」,碩士論文,臺南師範學院(2001)。
    [32] 陳俊堯,「在異動資料庫中有效率地動態探勘關聯規則」,碩士論文,南台科技大學(2003)。
    [33] 劉玉敏,「線上漸進式網站使用探勘之研究-以產品推薦為例」,碩士論文,雲林科技大學(2003)。
    [34] Adnan, M., Alhajj, R. and Barker, K., “Alternative Method for Increnentally Constructing the FP-Tree,” in Proc. of the 3rd International IEEE Conference Intelligent Systems, pp. 494-499, (2006).
    [35] Cheung, D. W., Lee, S. D. and Kao, B., “A General Incremental Technique for Maintaining Discovered Association Rules,” in Proc. of the 5th International Conference on Database Systems for Advanced Applications, pp. 185-194, (1997).
    [36] Duan, Z., Cai, Z. and Yu, J., “Incrementally-Updating Association Rules Based on Multiple Previously Mined Results,” in Proc. of the IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 741-745, (2005).
    [37] Hong, T.P., Lin, J.W. and Wu, Y.L., “A Fast Updated Frequent Pattern Tree, ”in Proc. of the IEEE International Conference on Systems, Man and Cybernetics, Vol. 9, pp. 2167-2172, (2006).
    [38] Lin, C.W., Hong, T.P. and Lu, W.H., “Using the Pre-FUFP Algorithm for Handling New Transactions in Incremental Mining,” in Proc. of the IEEE Symposium on Computational Intelligence and Data Mining, pp. 598-603, (2007).
    [39] Lan, Qihua, Defu Zhang, and Bo Wu. “ A new algorithm for frequent itemsets mining based on Apriori and FP-tree. ” Intelligent Systems, 2009. GCIS'09. WRI Global Congress on. Vol. 2. IEEE, (2009).
    [40] KUMAR, B. Santhosh; RUKMANI, K. V. Implementation of web usage mining using Apriori and FP Growth algorithms. Int. J. of Advanced Networking and Applications, (2010).
    [41] Cheung, David W., et al. “Maintenance of discovered association rules in large databases: An incremental updating technique.” Data Engineering, 1996. in Proc. of the 12th International Conference on. IEEE, (1996).
    [42] D. Cheung, S.D. Lee, and B. Kao. “A general incremental technique for updaing discovered association rules.” in Proc. of the International Conference on Database Systems for Advanced Applications, April , (1997).
    [43] Leung, C. K. S., Khan, Q. I. and Hoque, T., “CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Patterns,” in Proc. of the 5th IEEE International Conference on Data Mining, pp. 274-281, (2005).
    [44] Qiu, Y. and Lan, Y. J., “Efficient Improvement of FP-tree Based Frequent Itemsets Mining Algorithms,” in Proc. of the First International Conference on Innovative Computing, Information and Control, Vol. 3, pp. 374-377, (2006).
    [45] 謝維珍,「在異動資料庫中以完全 FP-Tree 為基礎更新關聯規則」,碩士論文,南台科技大學(2008)。
    [46] 黃祺閎,「異動資料庫中基於擴張 FP-Tree 結構之更新關聯規則」,碩士論文,立德大學(2010)
    [47] 黃仁鵬、林曉薇和郭煌政,「適用於大型資料庫的漸增式關聯規則演算法」,第六屆(2004)永續發展管理研討會,國立屏東科技大學,第679-686頁(2004)。
    [48] Fournier-Viger, P. “Spmf: A sequential pattern mining framework.” URL http://www. philippe-fournier-viger. com/spmf (2011).
    [49] S. Y. Wur and Y. Leu, “An Effective Boolean Algorithm for Mining Association Rules in Large Databases”, in Proc. of the 6th International Conference on Database Systems for Advanced Applications, pp179-186, (1999).
    [50] 高文瑞,「在異動資料庫中有效率探勘關聯規則之演算法」,碩士論文,南台科技大學(2007)。
    [51] 江彥呈,「有效率演算法探勘動態最小支持度之關聯規則」,碩士論文,南台科技大學(2008)。

    無法下載圖示 全文公開日期 2019/07/28 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE