簡易檢索 / 詳目顯示

研究生: 余威逸
Wei-Yi Yu
論文名稱: 結合分類器與漸進式學習決策樹應用於異常入侵偵測系統
An Incremental-Learning Method for Supervised Anomaly Detection by Cascading Service Classifier and ITI Decision Tree Methods
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 項天瑞
Tien-Ruey Hsiang
劉聰德
Tsung-De Liou
李育杰
Yuh-Jye Lee
黃淇竣
Chi-Chun Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 49
中文關鍵詞: Anomaly Detection System (ADS)K-Means clusteringKohonens' Self-Organizing Maps (SOM)Incremental Tree inductionKDD' 99
外文關鍵詞: Anomaly Detection System (ADS), K-Means clustering, Kohonens' Self-Organizing Maps (SOM), Incremental Tree induction, KDD' 99
相關次數: 點閱:375下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在過去數年中,入侵偵測系統已經是資訊安全中不可或缺的一部分,由其是運用資料探勘及分析方法論已被用來探勘及學習各類攻擊的特性。又,正確性及適應性為入侵偵測系統所需要件,為了讓入侵偵測系統適應各種型式的資料,應用入侵偵測系統之方法論需有能力可以處理遺漏值及屬性值相同但類別不同的資料集之問題,為了入侵偵測系統產出具正確性,如何運用不同的方法論來有效及正確地偵測異常行為仍然是個深具挑戰性的研究課題,由於漸近式學習方法論係為入侵偵測系統方法論之一小部分,該漸近式學習方法論應用於入侵偵測系統,也遭遇到相同問題。
    在這篇論文裡,我們結合漸近式學習之聚類及決策樹方法論。我們選擇Incremental Tree Induction (ITI)決策樹取代ID3決策樹的主要原因是它具有漸近式學習的能力及處理數值分割點、屬性值相同但類別不同的資料集及遺漏值之問題。當只使用聚類方法來偵測異常行為時,Forced Assignment及Class Dominance問題便會產生,為了減輕這兩個問題發生,結合聚類及決策樹(K-Means+ID3)方法論首次在西元2007年被提出。我們提出結合漸近式學習Self Organizing Map (SOM)聚類與ITI決策樹方法論來減輕這兩個問題並處理漸近式學習問題,在實驗中,我們分析這個方法論不能處理漸近式學習能問題,然後我們再提出另一個具有漸近是學習能力的方法,稱為結合分類器與漸進式學習決策樹方法論可以避免處理未曾看過服務資料集。實驗結果呈現ITI決策樹在偵測率方面表現比其他方法要好,我們所提的結合分類器與漸進式學習決策樹方法論在偵測率及誤報率上,都表現比ITI決策樹要好,而且我們的方法也類似ITI決策樹一樣,提供處理數值分割點、屬性值相同但類別不同的資料集及遺漏值的機制並且提供批次及漸近式學習。


    Over the past years, an intrusion detection system (IDS) has been an essential part of computer system security. Especially data mining and analysis methodologies have been used to mining and learning data characteristics of attacks. Also, IDSs need to adaptive and accurate. For IDSs to be adaptive to different forms of data, these methodologies of IDSs have the capability of handling missing values, inconsistent instances. For IDSs to be accurate, how to use different kinds of methodologies to detect anomalies efficiently and accurately is still a main challenging problem. Because an incremental leaning methodology is a part of IDS methodologies, an incremental leaning methodology encounter the same problems as IDS.
    In this thesis, we cascade the incremental learning clustering and decision tree methods. We chose the incremental tree induction (ITI) decision tree instead of ID3 decision tree because it can handle these problems: 1) splitting point selection for numeric variables; 2) incremental learning; 3) inconsistent instances; 4) missing values. When the single clustering method detects anomalies, the Forced Assignment and Class Dominance problems will arise. In order to alleviate two problems, the clustering with decision tree cascading method, called “K-Means+ID3”, was first proposed in 2007. Our method, the incremental learning Self Organizing Map (SOM) with ITI cascading method, is proposed to mitigate these two problems and to provide the incremental learning capability. During experiments, we analyze that the SOM+ITI cascading method cannot provide the incremental learning capability. Then, we propose the other incremental learning service classifier with ITI cascading method, called “SC+ITI”, to provide this incremental learning capability and to avoid instances with unseen service class labels. The results show that the ITI method has better performance than the K-Means, SOM, K-Means+ITI and SOM+ITI methods in terms of the overall detection rate. Our method, the service classifier with ITI cascading method outperforms the ITI method in terms of the detection rate and false positive rate (FPR) and shows better detection rate as compared with other methods. Like the ITI method, our method also provides the additional options of handling splitting point selection for numeric variables, batch/incremental learning, inconsistent instances and missing values.

    Abstract v Content vii List of Figures ix List of Tables x Chapter 1 Introduction 1 1.1 Motivations 3 1.2 Related Work 3 1.3 Problems Definition 4 1.4 Our Goals and Design 5 1.5 Contributions 6 1.6 Outline of the Thesis 6 Chapter 2 Background 7 2.1 Anomaly Detection with K-Means 7 2.2 Anomaly Detection with Incremental-Learning Self-Organizing Map 8 2.2.1 Bubble and Gaussian Neighborhood Function 9 2.2.2 Map Topologies of SOM Units 10 2.3 Anomaly Detection with the ITI Decision Tree 10 2.3.1 Numeric Variables 12 2.3.2 Inconsistent Training Instances (Clashes) 12 2.3.3 Missing Values 13 2.3.4 Gain Ratio 13 2.3.5 Fayyad’s Splitting Point Selection for Numeric Variables 14 2.3.6 Limitations 16 2.4 K-Means with ITI Cascading Method for Anomaly Detection 16 Chapter 3 System Architectures 18 3.1 Anomaly Detection using Self-Organizing Map with ITI Cascading Method 18 3.2 Anomaly Detection using Service Classifier with ITI Cascading Method 21 3.3 Analysis of the SOM with ITI and Service Classifier with ITI Cascading Methods 22 Chapter 4 Experiments 25 4.1 Experimental Setup 25 4.2 Data Sets 26 4.3 Performance Measures 30 4.4 Results 30 4.5 Discussions 34 Chapter 5 Conclusions and Further Work 36 5.1 Conclusions 36 5.2 Future Work 37 References 39 APPENDIX A KDD’99 Feature List 46 APPENDIX B Mapping Table 48

    [1] D. Anderson, T. Frivold, and A. Valdes, Next-Generation Intrusion Detection Expert System (NIDES): A Summary, Technical Report SRI-CSL-95-07, Computer Science Laboratory, SRI International, Menlo Park, California, USA, May 1995.
    [2] A. A. Cardenas, J. S. Baras, and K. Seamon, “A Framework for the Evaluation of Intrusion Detection Systems,” IEEE Symposium on Security and Privacy, Berkeley, Oakland, California, USA, May 2006, pp. 63-77.
    [3] A. Chan and E. Pampalk, “Growing Hierarchical Self Organising Map (GHSOM) Toolbox: Visualisations and Enhancements,” Proceedings of the 9th International Conference on Neural Information Processing, ICONIP, vol. 5, Singapore, Nov. 2002, pp. 2537-2541.
    [4] S. N. Chari and P. C. Cheng, “BlueBox: A Policy-Driven, Host-based Intrusion Detection System,” ACM Transactions on Information and System Security, vol. 6, no. 2, pp. 173-200, May 2003.
    [5] S. B. Cho, “Incorporating Soft Computing Techniques into a Probabilistic Intrusion Detection System,” IEEE Transactions on systems, man, and cybernetics-Part C: Applications and reviews, vol. 32, no. 2, pp. 154-160, May 2002.
    [6] S. B. Cho and H. J. Park, “Efficient Anomaly Detection by Modeling Privilege Flows using Hidden Markov Model,” Computers and Security, vol. 22, no. 1, pp 45-55, Jan. 2003.
    [7] H. Debar, M. Dacier, and A. Wespi, “Towards A Taxonomy of Intrusion-Detection Systems,” Computer Networks, vol. 31, no. 8, pp. 805-822, Apr. 1999.
    [8] D. E. Denning, “An Intrusion-Detection Model,” IEEE Transactions on Software Engineering, vol. se-13, no. 2, pp. 222-232, Feb. 1987.
    [9] M. Dittenbach, D. Merkl and A. Rauber, “The Growing Hierarchical Self-Organizing Map,” Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN, vol. 6, Como, Italy, July 2000, pp. 15-19.
    [10] K. M. Faraoun and A. Boukelif, “Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions,” International Journal of Computational Intelligence, vol. 3, no. 2, pp. 161-168, 2006.
    [11] T. Fawcett, “An Introduction to ROC Analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, June 2006.
    [12] U. M. Fayyad and K. B. Irani, “On the Handling of Continuous-Valued Attributes in Decision Tree Generation,” Machine Learning, vol. 8, no. 1, pp. 87-102, Jan. 1992.
    [13] S. R. Gaddam, V. V. Phoha, and K. S. Balagani, “K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 345-354, Mar. 2007.
    [14] S. J. Han and S. B. Cho, “Evolutionary Neural Networks for Anomaly Detection based on the Behavior of a Program,” IEEE Transactions on Systems, Man, and Cybernetics-PART B: Cybernetics, vol. 36, no. 3, pp. 559-570, June 2006.
    [15] J. A. Hartigan and M. A. Wong, “A K-Means Clustering Algorithm,” Applied Statistics, vol. 28, no. 1, pp. 100-108, 1979.
    [16] V. J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Review, vol. 22, no. 2, pp. 85-126, Oct. 2004.
    [17] K. Ilgun, R. A. Kemmerer, and P. A. Porras, “State Transition Analysis: A Rule-based Intrusion Detection Approach,” IEEE Transactions on Software Engineering, vol. 21, no. 3, pp. 181-199, Mar. 1995,.
    [18] D. Joo, T. Hong, and I. Han, “The Neural Network Models for IDS based on the Asymmetric Costs of False Negative Errors and False Positive Errors,” Expert Systems with Applications, vol. 25, no. 1, pp. 69-75, July 2003,.
    [19] H. G. Kayacik, Hierarchical self organizing map based IDS on KDD benchmark, Master’s thesis, Dalhousie University, Faculty of Computer Science, 2003.
    [20] H. G. Kayacik, A. N. Zincir-Heywood, M. I. Heywood, “A Hierarchical SOM-based Intrusion Detection System,” Engineering Applications of Artificial Intelligence, vol. 20, no. 4, pp. 439-451, June 2007.
    [21] K. Kendall, A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems, Master's Thesis, Massachusetts Institute of Technology, 1998.
    [22] Knowledge discovery in databases DARPA archive. Task Description. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/task.html.
    [23] T. Kohonen, “Self-Organizing Maps,” Proceedings of the IEEE, vol. 78, no. 9, pp.1464-1480, Sep. 1990.
    [24] T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen. SOM_PAK: The self-organizing map program package. [Online]. Available: http://www.cis.hut.fi /research/som_lvq_pak.shtml.
    [25] P. Laskov, P. Dussel, C. Schafer, and K. Rieck, “Learning Intrusion Detection: Supervised or Unsupervised,” International Conference on Image Analysis and Processing, ICIAP, vol. 3617, Sep. 2005, pp. 50-57.
    [26] S. C. Lee and D. V. Heinbuch, “Training a Neural-Network based Intrusion Detector to Recognize Novel Attacks,” IEEE Transaction on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 31, no. 4, pp. 294-299, July 2001.
    [27] W. Lee, W. Fan, M. Miller, S. J. Stolfo, and E. Zadok, “Toward Cost-Sensitive Modeling for Intrusion Detection and Response,” Journal of Computer Security, vol. 10, no. 1-2, pp. 5-22, 2002.
    [28] W. Lee and S. J. Stolfo, “A Framework for Constructing Features and Models for Intrusion Detection Systems,” ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 227-261, Nov. 2000.
    [29] W. Lee, S. J. Stolfo, and K. W. Mok, “A Data Mining Framework for Building Intrusion Detection Models,” IEEE Symposium on Security and Privacy, Oakland, California, USA, 1999, pp. 120-132.
    [30] W. Lee, S. J. Stolfo, and K. W. Mok, “Adaptive Intrusion Detection: A Data Mining Approach,” Artificial Intelligence Review, vol. 14, no. 6, pp. 533-567, Dec. 2000.
    [31] I. Levin, “KDD-99 Classifier Learning Contest LLSoft's Results Overview,” ACM SIGKDD Explorations Newsletter, ACM SIGKDD, vol. 1, no. 2, Jan. 2000, pp. 67-75.
    [32] Y. Liao and V. R. Vemuri, “Use of K-Nearest Neighbor Classifier for Intrusion Detection,” Computers Security, vol. 21, no. 5, pp. 439-448, Oct. 2002.
    [33] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K. Cunningham, and M. A. Zissman, “Evaluating Intrusion Detection Systems: the 1998 DARPA Off-Line Intrusion Detection Evaluation,” Proceedings of the 2000 DARPA Information Survivability Conference and Exposition, DISCEX, Los Alaminos, California, USA, 2000, vol. 2, pp. 12-26.
    [34] R. P. Lippmann, J. W. Haines, D. Fried, J. Korba, and K. Das, “The 1999 DARPA Off-Line Intrusion Detection Evaluation,” Computers Networks, vol. 34, no. 4, pp. 579-595, Oct. 2000.
    [35] R. A. Maxion and K. M. C. Tan, “Anomaly Detection in Embedded Systems,” IEEE Transactions on Computers, Special Issue on Embedded Fault-Tolerant Systems, vol. 51, no. 2, pp. 108-120, Feb. 2002.
    [36] J. McHugh, “Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory,” ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 262-294, Nov. 2000.
    [37] MIT Lincoln Laboratory, DARPA Intrusion Detection Evaluation. [Online]. Available: http://www.ll.mit.edu/IST/ideval/index.html.
    [38] S. Mukkamala and A. H. Sung, “Identifying Significant Features for Network Forensic Analysis Using Artificial Intelligent Techniques,” International Journal of Digital Evidence, vol. 1, no. 4, pp. 1-17, 2003.
    [39] A. Patcha and J. M. Park, “An Overview of Anomaly Detection Techniques: Existing Solutions and Latest Technological Trends,” Computer Networks, vol. 51, no. 12, pp. 3448-3470, Aug. 2007.
    [40] W. Peng, J. Chen and H. Zhou. An implementation of ID3 --- Decision Tree Learning Algorithm. [Online]. Available: http://web.arch.usyd.edu.au/~wpeng/ DecisionTree2.pdf.
    [41] D. T. Pham, S. S. Dimov and C. D. Nguyen, “An Incremental K-means Algorithm,” Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 218, no. 7, pp. 783-795, 2004.
    [42] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, Mar. 1986.
    [43] S. R. Safavian and D. Landgrebe, “A Survey of Decision Tree Classifier Methodology,” IEEE Transactions on Systems, Man and Cybernetics, vol. 21, no. 3, pp. 660-674, May 1991.
    [44] S. T. Sarasamma and Q. A. Zhu, “Min-Max Hyperellipsoidal Clustering for Anomaly Detection in Network Security,” IEEE Transactions on systems, man, and cybernetics-part B: Cybernetics, vol. 36, no. 4, pp. 887-901, Aug. 2006.
    [45] S. T. Sarasamma, Q. A. Zhu, and J. Huff, “Hierarchical Kohonenen Net for Anomaly Detection in Network Security,” IEEE Transactions on systems, man, and cybernetics, vol. 35, no. 2, pp. 302-312, Apr. 2005.
    [46] S. Stolfo et al.. (2002). The third international knowledge discovery and data mining tools competition. [Online]. Available: http://kdd.ics.uci.edu/databases /kddcup99/kddcup99.html.
    [47] P .E. Utgoff, “Incremental Induction of Decision Trees,” Machine Learning, vol. 4, no. 2, pp. 161-186, Nov. 1989.
    [48] P. E. Utgoff, N. C. Berkman, and J. A. Clouse, “Decision Tree Induction based on Efficient Tree Restructuring,” Machine Learning, vol. 29, no. 1, pp. 5-44, Oct. 1997.
    [49] Y. Wang, I. Kim, G. Mbateng, and S. Y. Ho, “A Latent Class Modeling Approach to Detect Network Intrusion,” Computer Communications, vol. 30, no. 1, pp. 93-100, Dec. 2006.
    [50] N. Ye, X. Li, Q. Chen, S. M. Emran, and M. Xu, “Probabilistic Techniques for Intrusion Detection based on Computer Audit Data,” IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, vol. 31, no. 4, pp. 266-274, July 2001.
    [51] N. Ye, S. M. Emran, Q. Chen, and S. Vilbert, “Multivariate Statistical Analysis of Audit Trails for Host-based Intrusion Detection,” IEEE Transactions on Computers, vol. 51, no. 7, pp. 810-819, July 2002.
    [52] N. Ye, Y. Zhang, and C. M. Borror, “Robustness of the Markov-Chain Model for Cyber-Attack Detection,” IEEE Transactions on Reliability, vol. 53, no. 1, pp. 116-123, Mar. 2004.
    [53] S. Zhong, T. Khoshgoftaar, and N. Seliya, “Clustering-based Network Intrusion Detection,” International Journal of Reliability, Quality and Safety Engineering, vol. 14, no. 2, pp. 1-17 ,2007.

    QR CODE