簡易檢索 / 詳目顯示

研究生: 姜禮翔
Li-Hsiang Chiang
論文名稱: 在漸進式資料庫中隱藏頻繁項目集
Frequent Pattern Hiding in Incremental Database
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 胡誌麟
Chih-Lin Hu
葉彌妍
Mi-Yen Yeh
李育杰
Yuh-Jye Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 42
中文關鍵詞: 隱私保護更新式資料庫漸進式資料庫頻繁項目集資料修剪
外文關鍵詞: updated database, incrementa
相關次數: 點閱:182下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資訊爆炸性成長以及網際網路快速的發展下,往往要從巨大的資料中找尋特定的、有影響的訊息,已經相當的不容易。而在這樣的需求下,使得資料探勘技術快速的發展起來。至今,已有許多成熟的資料探勘演算法,像是頻繁項目集的探勘、關聯法則的尋找、對資料做分類或做分群,都已是相當成熟的技術。而我們都能夠使用這樣子的技術去尋找我們感興趣、我們覺得重要的資訊。但同時許多原本隱藏在資料中的敏感資訊也會同時的被探勘出來,進而對資料的擁有者造成嚴重的損失。
    然而近幾年,學者們也在開始關注於隱私保護(Privacy Preserving)的議題。許多議題也開始指向如何在資料探勘演算法下保護資料擁有者的敏感資訊。其中隱藏敏感高頻項目集在保護隱私的資料探勘中也是一項很重要的議題,但在資訊爆炸性成長以及網際網路快速的發展下。儲存在資料庫中的資料通常都是不斷的加入新資料、不斷的對資料庫做更新,然而現有的隱藏敏感高頻項目集的演算法是針對靜態資料庫而設計以至於現有的演算法不能夠有效且有效率的去處理這一類動態的資料庫。若直接使用現有用在靜態資料庫上的演算法。反而會有副作用提升和效率不彰的結果。
    在論文中,我們也以例子來說明直接應用之前針對靜態環境所設計的演算法到動態的資料庫上,會有什麼樣的問題。在論文中我們提出了以Template Based Framework為基礎的漸增型的機制SPITF。我們使用提出的資料結構來儲存我們在隱藏高頻項目集所需的資訊,其中我們也使用了Sensitive Indexer來加速對資料的查詢及修改。最後我們也在實驗中也證明此資料結構能夠在不犧牲效能的情況以及對資料做最少的修改來隱藏敏感項目集。另外我們也對SPITF做改善,提出SPITF+,由於Template Based Framework可能會聯結出多餘的Template,進而提出根據取得的交易資料來產生Template,來避免產生出多餘的Template。最後我們對整個資料結構提出對記憶體使用率的的優化。並在實驗中證明我們所提出的演算法的確對漸近式的資料庫有良好的適應性。


    In recent year, researchers are starting to focus on the issue of privacy. Many issues are starting to focus how to protect the sensitive information of dataset owner under data mining technology too. The frequent pattern hiding is also an important issue in privacy preserving data mining technology. However, in the explosion of information and rapid development of the internet the data which stored in database is usually continually add and update. On the other hand, the exits algorithms are designed addressed to the static database such that the exits algorithms cannot handle the static database in efficiently and effectively. If apply the exits algorithm to the static database directly will increase the side effect and low efficiency.
    In order to solve this problem, we propose an incremental mechanism and design a data structure in this paper to hide sensitive frequent patterns in the incremental environment. In this mechanism, the transaction data and sensitive patterns are stored in two types of trees. The proposed algorithm can efficiently find related transactions by links between these two types of trees. Experiment results show that the proposed method can efficiently hide sensitive frequent patterns in the incremental environment.

    ABSTRACT 1 中文摘要 2 TABLES OF CONTENTS 3 LIST OF TABLES 4 LIST OF FIGURES 5 CHAPTER 1 INTRODUCTION 6 1.1 BACKGROUND 6 1.2 MOTIVATION 6 1.3 CONTRIBUTIONS 8 1.4 THESIS ORGANIZATION 8 CHAPTER 2 RELATED WORKS 9 CHAPTER 3 FRAMEWORK 11 3.1 PRELIMINARIES 11 3.2 SANITIZATION FRAMEWORK 12 3.3 TEMPLATE TABLE 14 3.4 SENSITIVE PATTERN INDEXED TRANSACTION FOREST 16 3.5 THE IMPROVEMENT OF SPITF 23 3.5.1 IMPROVEMENT OF TEMPLATES 25 3.5.2 MEMORY USAGE IMPROVEMENT 29 3.6 RECOVER 31 CHAPTER 4 PERFORMANCE 32 CHAPTER 5 CONCLUSION 38 REFERENCE 39

    [1] R. Agrawal, T. Imielinski, and A.N. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, volume 22(2) of SIGMOD Record, pages 207–216. ACM Press, 1993.
    [2] Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1):53–87, January 2004.
    [3] C. Clifton and D. Marks, “Security and Privacy Implications of Data Mining,” in ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, (Montreal, Canada), pp. 15–19, University of British Columbia Department of Computer Science, 1996.
    [4] Saygin, Y., Verykios, V., & Clifton, C. (2001). Using unknowns to prevent discovery of association rules. SIGMOND Record, 30(4), 45–54.
    [5] Chih-Chia Weng; Shan-Tai Chen; Hung-Che Lo, "A Novel Algorithm for Completely Hiding Sensitive Association Rules," Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on , vol.3, no., pp.202-208, 26-28 Nov. 2008
    [6] Stanley R. M. Oliveira and Osmar R. Zaiane, Privacy preserving frequent itemset mining, In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining (2002), 43–54.
    [7] Chen, X., Orlowska, M. E. and Li, X. (2004). A new framework of privacy preserving data sharing. In: S. Matwin, L. Chang, C. Adams and J. Zhan, Proceedings of the Fourth IEEE International Conference on Data Mining. The Fourth IEEE International Conference on Data Mining, Brighton, UK, (47-56). 1 November, 2004.
    [8] Ya-Ping Kuo, Pai-Yu Lin and Bi-Ru Dai, "Hiding Frequent Patterns under Multiple Sensitive Thresholds," Proceedings of the 19th International Conference on Database and Expert Systems Applications (DEXA 2008), Turin, Italy, September 1-5, 2008. (Lecture Notes in Computer Science 5181 Springer 2008, ISBN 978-3-540-85653-5) (EI)
    [9] Verykios, V., Elmagarmid, A., Bertino, E., Saygin, Y., & Dasseni, E. (2004). Association rules hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4), 434–447.
    [10] Verykios, V., Bertino, E., Fovino, I. G., Provenza, L. P., Saygin, Y., & Theodoridis, Y.(2004). State-of-the-art in privacy preserving data mining. SIGMOD Record,33(1), 50–57.
    [11] Wang, S. 2009. Maintenance of sanitizing informative association rules. Expert Syst. Appl. 36, 2 (Mar. 2009), 4006-4012
    [12] Wang, K., Fung, B.C.M., Yu, P.S.: Template-Based Privacy Preservation in Classification Problems. Proc. - IEEE Int. Conf. on Data Mining (2005) 466–473
    [13] Y. Sucahyo and P. Gopalan, “CT-ITL: Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth,” Proc. of ADC, 2003.
    [14] Wang, S. L., Maskey, R., Jafari, A., & Hong, T. P. (2007b). Efficient sanitization of informative association rules. Expert Systems with Applications. doi:10.1016/j.eswa.2007.07.039.
    [15] K. Geurts, G. Wets, T. Brijs, and K. Vanhoof. Profiling high frequency accident locations using association rules. In Proceedings of the 82nd Annual Transportation Research Board, page 18, 2003.
    [16] V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Asso-ciation Rule Hiding,” IEEE Transactions on Knowledge and Data Engineering,vol. 16, pp. 434–447, April 2004.
    [17] Wang, S.-L., Lai, T.-Z., Hong, T.-P., Wu, Y.-L. Hiding collaborative recommendation association rules on horizontally partitioned data (2010) Intelligent Data Analysis, 14 (1), pp. 47-67.

    無法下載圖示 全文公開日期 2015/07/29 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE