簡易檢索 / 詳目顯示

研究生: 郭雅平
Ya-ping Kuo
論文名稱: 高頻項目集在不同門檻值下的保護技術
Protecting Frequent Pattern with Variant Sensitive Thresholds
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 李育杰
Yuh-Jye Lee
鄧維光
Wei-Guang Teng
黃俊龍
Jiun-Long Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 82
中文關鍵詞: 隱私多重門檻值關聯法則的保護高頻項目集的保護資料淨除
外文關鍵詞: multiple thresholds, data sanitization, frequent pattern hiding
相關次數: 點閱:168下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今資料探勘的技術受到很大的重視,其中高頻項目集和關聯法則的探勘可以應用在很多不同的領域上,但是隨著技術越來越進步,所帶來隱私的議題也越來越受到重視。在過去保護交易資料庫內敏感資訊的技術上,都是以單一的最小支持度或是隱私係數來作為保護的基準,並且從中取得機密資訊保護和非機密資訊保存上得平衡點。但是在實際生活中,不同的項目集因為特性上得不同,可能需要運用不同的最小支持度去判斷他是否為頻繁的,因此在本論文中,我們預期加入考慮多門檻值,以及在多門檻值下頻繁項目集以及關聯法則的特性,來提出兩個新的架構TBF和GTBA來保護機密的資訊,使得對敏感資訊的保護更切合於它本身的特性並且藉此在修改過後的資料庫內保留更多的非機密資訊。另外加入一些機制對效率、擴充性和對資料庫最小影響方面作最佳化。在實驗的部份,本論文提出在多門檻值下對淨除演算法的新的評估公式,並且選用不同特性的資料集,以及不同特性的機密資訊去評估和過去相關保護方法在單一門檻值、隱私係數、以及多重門檻值下作比較,GTBA和TBF都能夠在保護隱私的情況下,保存更多原有非機密的資訊。在效率上也有不錯的表現。


    Frequent pattern and association rule mining are popular topics in data mining. With the advance of these techniques, privacy issues attract more and more attentions in recent years. In this field, previous works hide sensitive information based on a uniform support threshold or using a privacy disclosure threshold. However in practical applications, we probably need to apply different support thresholds on different itemsets to reflect their significance. In this study, two sanitization frameworks, TBF and GTBA, are proposed to protect sensitive patterns and sensitive association rules with the concept of non-uniform sensitive thresholds, respectively. Based on different sensitive thresholds, the hiding result is closer to user requirements and real applications. Besides, the strategy of choosing rules, victim items, and victim transactions are proposed to preserve more information of the original dataset. In addition, we apply hash table, inverted file and some index skills to speed up the whole sanitization process. As shown in the experimental results, our frameworks can protect sensitive knowledge well not only under multiple thresholds but also a uniform threshold. Furthermore, these frameworks preserve more information, and reach higher similarity between the original dataset and the modified dataset. Moreover, the efficiency and the scalability can also be maintained.

    目錄 指導教授推薦書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 論文口試委員審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 The Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . 4 2 Problem Formulation and RelatedWork . . . . . . . . . . . . . . . . . . . . 6 2.1 Multiple Sensitive Thresholds . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Hiding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Multiple Thresholds . . . . . . . . . . . . . . . . . . . . . . . . 18 3 The Framework of the Template-Based Frequent Itemset Hiding Method (TBF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Border-Based Concept under Multiple Thresholds and Sensitive Pattern Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Template Table and Choosing Strategy . . . . . . . . . . . . . . . . . . 24 3.3 Updating Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 Inverted File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 Pattern Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.3 Hash Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 The Framework of the Grouping Template-Based Association Rule Hiding Method (GTBA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1 Sensitive Rule Table and Border-Rule Concept . . . . . . . . . . . . . . 32 4.2 Grouping Template Generation . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Sensitive Transaction with Bound and Template Division . . . . . . . . 39 4.4 Reducing Side Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4.1 Rule: Choosing Strategy for Template . . . . . . . . . . . . . . 42 4.4.2 Transaction: Priority . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4.3 Item: Minimum Difference . . . . . . . . . . . . . . . . . . . . . 44 4.5 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.5.1 Bound Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5.2 Rule index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1 Support Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.4.1 Frequent Pattern Hiding . . . . . . . . . . . . . . . . . . . . . . 54 5.4.2 Association Rule Hiding . . . . . . . . . . . . . . . . . . . . . . 64 5.4.3 Real Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.5 Efficiency and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 授權書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

    [1] R. Agrawal, T. Imielinski, and A. N. Swami, “Mining Association Rules between Sets of Items in Large Databases,” in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (P. Buneman and S. Jajodia,
    eds.), vol. 22, (Washington, D.C.), pp. 207–216, ACM, May 1993.
    [2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” in VLDB ’94: Proceedings of the 20th International Conference on Very Large Data Bases (J. B. Bocca, M. Jarke, and C. Zaniolo, eds.), (San Francisco, CA, USA), pp. 487–499, Morgan Kaufmann, 1994.
    [3] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,”in SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (W. Chen, J. Naughton, and P. A. Bernstein, eds.), vol. 29, (New York, NY, USA), pp. 1–12, ACM, May 2000.
    [4] G. Ertek and A. Demiriz, “A Framework for Visualizing Association Mining Results,”in Computer and Information Sciences - ISCIS 2006, vol. 4263 of Lecture Notes in Computer Science, pp. 593–602, 2006.
    [5] R. J. Kuo, S. Y. Lin, and C.W. Shih, “Mining association rules through integration of clustering analysis and ant colony system for health insurance database in Taiwan,” Expert Systems with Applications, vol. 33, no. 3, pp. 794–808, 2007.
    [6] T. Wang and C. Cheng, “A Mining Algorithm for Discovering Association Rules in Stock Linkage Information Guided by Meta-Rule,” Jisuanji Gongcheng/Computer Engineering, vol. 32, no. 5, pp. 260–262, 2006.
    [7] K. A. Pray and C. Ruiz, “Mining Expressive Temporal Associations from Complex Data,” in Machine Learning and Data Mining in Pattern Recognition, vol. 3587 of Lecture Notes in Computer Science, pp. 384–394, 2005.
    [8] S.-H. Liao, C.-L. Hsieh, and S.-P. Huang, “Mining product maps for new product development,” Expert Systems with Applications, vol. 34, no. 1, pp. 50–62, 2008.
    [9] C. Clifton and D. Marks, "Security and Privacy Implications of Data Mining," in ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, (Montreal, Canada), pp. 15-19, University of British Columbia Department of Computer Science, 1996.
    [10] V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, "Association Rule Hiding," IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 434-447, April 2004.
    [11] S. R. M. Oliveira and O. R. Zaiane, "Privacy Preserving Frequent Itemset Mining," in CRPIT'14: Proceedings of the IEEE international conference on Privacy, security and data mining, (Darlinghurst, Australia, Australia), pp. 43-54, Australian Computer Society, Inc., 2002.
    [12] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006.
    [13] M. Atallah, A. Elmagarmid, M. Ibrahim, E. Bertino, and V. Verykios, "Disclosure Limitation of Sensitive Rules," in KDEX'99: Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, (Washington, DC, USA), p. 45, IEEE Computer Society, 1999.
    [14] S. R. Oliveira, O. R. Zaiane, and Y. Saygin, "Secure Association Rule Sharing," in Advances in Knowledge Discovery and Data Mining, vol. 3056 of Lecture Notes in Computer Science, (Sydney, Australia), Springer Berlin / Heidelberg, 2004.
    [15] E. T. Wang and G. Lee, "An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining," Data Knowledge Engineering, vol. 65, pp. 463-484, June 2008.
    [16] S. Menon, S. Sarkar, and S. Mukherjee, "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information System Research, vol. 16, no. 3, pp. 256-270, 2005.
    [17] X. Sun and P. S. Yu, "A Border-Based Approach for Hiding Sensitive Frequent Itemsets," in ICDM'05: Proceedings of the Fifth IEEE International Conference on Data Mining, (Washington, DC, USA), pp. 426-433, IEEE Computer Society, 2005.
    [18] A. Gkoulalas-Divanis and V. S. Verykios, "An Integer Programming Approach for Frequent Itemset Hiding," in CIKM'06: Proceedings of the 15th ACM international
    conference on Information and knowledge management, (New York, NY, USA), pp. 748-457, ACM, 2006.
    [19] G. V. Moustakides and V. S. Verykios, "A Max-Min Approach for Hiding Frequent Itemsets," in ICDMW'06: Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops, (Washington, DC, USA), pp. 502-506, IEEE Computer Society, 2006.
    [20] A. Amiri, "Dare to share: Protecting sensitive knowledge with data sanitization," Decision Support Systems, vol. 43, no. 1, pp. 181-191, 2007.
    [21] A. Gkoulalas-Divanis and V. S. Verykios, "A Hybrid Approach to Frequent Itemset Hiding," in ICTAI'07: Proceedings of the 19th IEEE International Conference
    on Tools with Artificial Intelligence, vol. 1, (Washington, DC, USA), pp. 297-304, IEEE Computer Society, 2007.
    [22] Y.-H. Wu, C.-M. Chiang, and A. L. Chen, "Hiding Sensitive Association Rules with Limited Side Effects," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 29-42, 2007.
    [23] Y.-C. Li, J.-S. Yeh, and C.-C. Chang, "MICF: An effective sanitization algorithm for hiding sensitive patterns on data mining," Advanced Engineering Informatics,
    vol. 21, no. 3, pp. 269-280, 2007.
    [24] S. R. M. Oliveira and O. R. Zaiane, "Algorithms for Balancing Privacy and Knowledge Discovery in Association Rule Mining," Proceedings of the 7th International Database Engineering and Applications Symposium (IDEAS'03), pp. 54-
    63, July 2003.
    [25] S. R. M. Oliveira and O. R. Zaiane, "An Efficient One-Scan Sanitization For Improving The Balance Between Privacy And Knowledge Discovery," Tech. Rep. TR03-15, Computer Science Department, University of Alberta, Canada, 2003.
    [26] V. S. Verykios, E. D. Pontikakis, Y. Theodoridis, and L. Chang, “Efficient algorithms for distortion and blocking techniques in association rule hiding,” Distributed and Parallel Databases, vol. 22, no. 1, pp. 85–104, 2007.
    [27] S.-L. Wang and A. Jafari, “Hiding Sensitive Predictive Association Rules,” Systems, Man and Cybernetics, 2005 IEEE International Conference on, vol. 1, pp. 164–169, Oct. 2005.
    [28] S. J. Rizvi and J. R. Haritsa, “Maintaining data privacy in association rule mining,”in VLDB ’02: Proceedings of the 28th international conference on Very
    Large Data Bases, pp. 682–693, VLDB Endowment, 2002.
    [29] X. Zhang, “Knowledge Hiding in Data Mining by Transaction Adding and Removing,”Computer Software and Applications Conference, 2007. COMPSAC 2007. 31st Annual International, vol. 1, pp. 233–240, 2007.
    [30] X. Zhang, “Research on Rule Hiding Algorithm of Transaction Adding and Removing,”FSKD ’07: Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 3, pp. 643–647, 2007.
    [31] T. Mielik‥ainen, “On Inverse Frequent Set Mining,” in Proceedings of 2nd Workshop on Privacy Preserving Data Mining (PPDM), pp. 18–23, IEEE Computer Society, 2003.
    [32] G. Ramesh, W. A. Maniatty, and M. J. Zaki, “Feasible itemset distributions in data mining: theory and application,” in PODS ’03: Proceedings of the twentysecond
    ACM SIGMOD-SIGACT-SIGART symposium on Principles of database
    systems, (New York, NY, USA), pp. 284–295, ACM, 2003.
    [33] G. Ramesh, M. J. Zaki, and W. A. Maniatty, “Distribution-Based Synthetic Database Generation Techniques for Itemset Mining,” Database Engineering and Application Symposium, 2005. IDEAS 2005. 9th International, pp. 307–316, July 2005.
    [34] X. Chen, M. Orlowska, and X. Li, “A New Framework of Privacy Preserving Data Sharing,” in Proceedings of IEEE 4th International Conference on Data Mining(ICDM’04) Workshop: Privacy and Security Aspects of Data Mining, pp. 47–56, IEEE Computer Society, 2004.
    [35] X. Wu, Y. Wu, Y. Wang, and Y. Li, “Privacy-Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining,” in Proceedings 5th SIAM International Conference on Data Mining (SDM), 2005.
    [36] Y. Guo, Y. Tong, S. Tang, and D. Yang, “A FP-Tree-Based Method for Inverse Frequent Set Mining,” in Flexible and Efficient Information Handling, pp. 152–163, 2006.
    [37] Y. Guo, “Reconstruction-Based Association Rule Hiding,” in Proceedings of SIGMOD2007 Ph.D. Workshop on Innovative Database Research 2007(IDAR2007), (Beijing, China), 2007.
    [38] Z. Wang, W. Wang, and B. Shi, “Blocking Inference Channels in Frequent Pattern Sharing,” Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pp. 1425–1429, April 2007.
    [39] B. Liu, W. Hsu, and Y. Ma, “Mining Association Rules with Multiple Minimum Supports,” in KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, (New York, NY, USA), pp. 337–341, ACM, 1999.
    [40] M.-C. Tseng and W.-Y. Lin, “Maintenance of generalized association rules with multiple minimum supports,” Intelligent Data Analysis, vol. 8, no. 4, pp. 417–436, 2004.
    [41] Y.-H. Hu and Y.-L. Chen, “Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism,” Decision Support Systems, vol. 42, no. 1, pp. 1–24, 2006.
    [42] K. Wang, Y. He, and J. Han, “Mining Frequent Itemsets Using Support Constraints,”in VLDB ’00: Proceedings of the 26th International Conference on Very Large Data Bases, (San Francisco, CA, USA), pp. 43–52, Morgan Kaufmann Publishers Inc., 2000.
    [43] M. Seno and G. Karypis, “Finding Frequent Patterns Using Length-Decreasing Support Constraints,” Data Mining and Knowledge Discovery, vol. 10, pp. 197–228, 2005.
    [44] Y.-C. Lee, T.-P. Hong, and W.-Y. Lin, “Mining association rules with multiple minimum supports using maximum constraints,” International Journal of Approximate
    Reasoning, vol. 40, pp. 44–54, July 2005.
    [45] Y.-C. Lee, T.-P. Hong, and T.-C. Wang, “Mining Fuzzy Multiple-level Association Rules under Multiple Minimum Supports,” Systems, Man and Cybernetics, 2006. SMC ’06. IEEE International Conference on, vol. 5, pp. 4112–4117, 2006.
    [46] M.-C. Tseng and W.-Y. Lin, “Efficient mining of generalized association rules with non-uniform minimum support,” Data Knowledge Engineering, vol. 62, no. 1, pp. 41–64, 2007.
    [47] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. 1999.
    [48] A. Asuncion and D. Newman, “UCI machine learning repository [http://www.ics.uci.edu/.mlearn/MLRepository.html],” 2007.
    [49] T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets, “Using association rules for product assortment decisions: A case study,” in Knowledge Discovery and Data Mining, pp. 254–260, 1999.
    [50] K. Wang, Y. He, and J. Han, “Pushing support constraints into association rules mining,” IEEE Transaction on Knowledge and Data Engineering, vol. 15, no. 3, pp. 642–658, 2003.

    無法下載圖示 全文公開日期 2013/07/29 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE