簡易檢索 / 詳目顯示

研究生: 杜鎮宇
ZHEN-YU DU
論文名稱: 基於主成分分析和串接極限學習機器的入侵偵測
Intrusion Detection Based on Principal Component Analysis and Cascaded Extreme Learning Machine
指導教授: 陳維美
Wei-Mei Chen
口試委員: 林淵翔
Yuan-Hsiang Lin
林昌鴻
Chang Hong Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 55
中文關鍵詞: 入侵偵測系統機器學習極限學習機器特徵提取
外文關鍵詞: Intrusion Detection System, machine learning, extreme learning machine, feature extraction
相關次數: 點閱:411下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊網路的蓬勃發展,資訊安全的議題毫無疑問是個熱門的議題,對於惡意用戶的網路攻擊如何能即時準確的發現,入侵偵測系統(IDS, Intrusion Detection System)在此扮演著重要的角色。
    近年來機器學習技術被廣泛的運用在入侵偵測系統中,然而對於大量的訓練樣本需要的訓練時間過多,訓練樣本之間類別的數量不平衡,導致對於某些樣本數量極少的入侵類別難辨識出來,這時候就會產生誤判也就是系統沒有正確分辨出異常流量的情況發生,所以為了解決以上兩個問題我們提出了一個基於主成分分析的串接極限學習機器,改進原有的Cascaded ELM(CE),先對訓練資料集做欠採樣,平衡訓練樣本類別之間的數量,使用主成分分析對資料集做特徵提取,找出關鍵特徵,去除冗餘的特徵做資料降維,對於每種的入侵類別和正常類別個別訓練二元分類器,再用串接的方式將所有的分類器連接起來,測試資料透過我們的分類器一層一層的正確分類,最後就能準確的判別異常流量。
    我們利用四個不同的資料集Ada、Sylva、Gina、Madelon去實驗證明本篇方法PCE和CE相比有更好的分類能力,最後通過本篇提出的方法PCE在KDDCUP99資料集實驗下和其他方法比較後能夠稍微改善和平衡有效。


    Along with the prosperous development of information network, information security has undoubtedly become a hot topic. The Intrusion Detection System (IDS) has played an important role for accurate and timely detection of network attacks from hostile users.
    In recent years, machine-learning technology has been widely used in IDS. However, a large amount of training samples will require too much training time. The imbalanced amounts of various types of training samples will result in difficulty in identifying the type of intrusion with extremely small amount, from which misjudgments will occur, i.e. the system fails to correctly identify the occurrences of abnormal flow. To solve the two aforementioned problems, we propose a cascade extreme learning machine based on principal component analysis by improving the original Cascaded ELM (CE). First step is the undersampling of training data set to balance the amounts of various types of training samples, using principal component analysis to extract key features from data set, removing redundant features for data dimension reduction, and training binary classifier with respect to every type of intrusion and normal type. Then all classifiers will be connected via the cascade approach such that the test data can be correctly classified layer by layer via our classifiers, and eventually abnormal flow can be accurately judged. As compared to other methods, the method proposed in this paper can provide better effectiveness and identification capability with respect to imbalanced data sets under a huge amount of training samples.
    We use four different data sets of Ada, Sylva, Gina, and Madelon to experimentally prove the PCE method of this paper is equipped with better classification capability than CE. In the end the PCE method proposed in this paper is compared with other methods during the KDDCUP99 data set experiment, and it shows slight improvement and balanced effectiveness.

    中文摘要 i Abstract ii 第一章 緒論 1 1.1研究背景 1 1.2研究動機 1 1.3論文架構 2 第二章 文獻探討 3 2.1 Extreme learning machine (ELM) 3 2.2 特徵降維技術 4 2.2.1 特徵挑選(Feature selection) 5 2.2.2 特徵提取(Feature extraction) 6 2.3 相關演算法 6 第三章 研究方法 9 3.1問題描述 9 3.2 演算法描述 10 3.2.1 PCE 10 3.2.2預處理 11 3.2.3 Principal Component Analysis (PCA) 13 3.2.4 訓練分類器模型 15 3.2.5 Cascaded分類器 17 第四章 實驗模擬與探討 19 4.1 實驗環境與資料集 19 4.2 實驗模擬 25 4.2.1 選擇參數 26 4.2.2 使用KDDCUP資料集之方法比較 33 4.2.3特徵降維的實用性之方法比較 36 4.2.4 使用過濾KDDCUP99資料集觀察對模型影響 38 第五章 結論 42 參考文獻 43

    [1] A. A. Aburomman and M. B. I. Reaz, "A novel SVM-kNN-PSO ensemble method for intrusion detection system. " Applied Soft Computing, vol. 38, pp. 360-372, Jan. 2016
    [2] "Ada datasets. " Available on: https://www.openml.org/d/1043.
    [3] I. Ahmad, M. Basheri, M. J. Iqbal, and A. Rahim, "Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. " IEEE Access, vol. 6, pp. 33789–33795, 2018.
    [4] G. Chandrashekar and F. Sahin, "A survey on feature selection methods." Computers and Electrical Engineering, vol. 40, no.1,pp. 16-28, 2014
    [5] L. Donoho and J. Tanner, "Precise undersampling theorems." Proceedings of the IEEE, 98 (2010), vol. 98, no. 6, pp. 913–924.
    [6] G. B. Huang, D. H. Wang, and Y. Lan, "Extreme learning machines: A survey. " International journal of machine learning and cybernetics, vol. 2, no. 2, pp. 107–122, 2011.
    [7] A. Hyvärinen and E. Oja, "Independent component analysis: algorithms and applications." Neural networks, vol. 13, no. 4-5, pp. 411-430, 2000
    [8] R. M. Elbasiony, E. A. Sallam, T. E. Eltobely, and M. M. Fahmy, "A hybrid network intrusion detection framework based on random forests and weighted k-means. " Ain Shams Engineering Journal, vol. 4, no. 4, pp. 753-762, 2013
    [9] N. Farnaaz and M. A. Jabbar, "Random forest modeling for network intrusion detection system. " Procedia Computer Science, vol. 89, pp. 213-217, Jan. 2016
    [10] "Gina dataset. " Available on: https://www.openml.org/d/1038.
    [11] Q. Guo, W. Wu, D. L. Massart, C. Boucon, and S. De Jong , "Feature selection in principal component analysis of analytical data." Chemometrics and Intelligent Laboratory Systems, vol. 61, no. 1/2, pp. 123–132, 2002.
    [12] A. E. Hoerl, and R. W. Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems." Technometrics, vol. 12, no. 1, Feb. 1970.
    [13] G. Huang, H. Zhou, X. Ding, and R. Zhang, "Extreme learning machine for regression and multiclass classification." IEEE Transactions on systems, man, and cybernetics—Part b: Cybernetics, vol. 42, no. 2, April 2012.
    [14] G.-B. Huang, "An insight into extreme learning machines: Random neurons, random features and kernels. " Cognitive Computation, vol. 6, no. 3, pp. 376–390, 2014.
    [15] G. Huang, G. B. Huang, S. Song, and K. You, "Trends in extreme learning machines: A review." Neural Networks, vol. 61, pp. 32–48, Jan. 2015.
    [16] "KDDCup1999 datasets." Available on: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
    [17] S. Khalid, T. Khalil, and S. Nasreen, "A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning." Science and Information Conference 2014, pp. 372-378, 2014.
    [18] F. Kuang, W. Xu, and S. Zhang, "A novel hybrid KPCA and SVM with GA model for intrusion detection. " Applied Soft Computing, vol. 18, pp. 178-184, May 2014.
    [19] Y. Lu, I. Cohen, X. S. Zhou, and Q. Tian, "Feature selection using principal feature analysis." Proceedings of the 15th ACM international conference on Multimedia. ACM, pp. 301–304, 2007.
    [20] "Madelon datasets." Available on: https://www.openml.org/d/1485.
    [21] J. W. Osborne, A. B. Costello, and J. T. Kellow, "Best practices in exploratory factor analysis." Best practices in quantitative methods, vol. 10, no. 7, pp. 86-99, July 2005.
    [22] M. R. G. Raman, N. Somu, K. Kirthivasan, R. Liscano, and V. S. S. Sriram, "An efficient intrusion detection system based on hypergraph-Genetic algorithm for parameter optimization and feature selection in support vector machine." Knowledge-Based Systems, vol. 134, pp. 1-12, Oct. 2017.
    [23] S. Rosset and A. Inger, "KDD-cup 99: Knowledge discovery in a charitable organization’s donor database. " SIGKDD Explorations, vol.1, no.2, p85–90, Jan. 2000.
    [24] "Sylva datasets." Available on: https://www.openml.org/d/1036.
    [25] J. Tang, C. Deng, and G.-B. Huang, "Extreme learning machine for multilayer perceptron. " IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 4, pp. 809–821, Apr. 2016.
    [26] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the KDDCUP99 dataset. " Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Security and Defense Applications, pp. 53–58, 2009.
    [27] S. Roshan, Y. Miche, A. Akusok, and A. Lendasse, "Adaptive and online network intrusion detection system using clustering and Extreme Learning Machines." Journal of the franklin institute, vol.355, no.4, pp.1752-2018.
    [28] S. Teng, N. Wu, H. Zhu, L. Teng, and W. Zhang, "SVM-DT-based adaptive and collaborative intrusion detection." IEEE/CAA Journal Automatica Sinica, vol. 5, no. 1, pp. 108-118, Jan. 2018.
    [29] H. Wang, J. Gu, and S. Wang, "An effective intrusion detection framework based on SVM with feature augmentation.'' Knowledge-Based Systems, vol. 136, pp. 130-139, Nov. 2017.
    [30] Z. Wang, J. Xin, H. Yang, S. Tian, G. Yu, C. Xu, and Y. Yao, "Distributed and weighted extreme learning machine for imbalanced big data learning." Tsinghua Science and Technology, vol. 22, no. 2, pp. 160–173, Apr. 2017.
    [31] S. Wold, K. Esbensen, and P. Geladi, "Principal component analysis." Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1–3, pp. 37–52, 1987.
    [32] Z. Ye and Y. Yu, "Network Intrusion Classification Based on Extreme Learning Machine." in Proceedings of the IEEE International Conference on Information and Automation, ICIA 2015, pp. 1642–1647, August 2015.
    [33] Y. Yu, Z. Ye, X. Zheng, and C. Rong, " An efficient cascaded method for network intrusion detection based on extreme learning machines." The Journal of Supercomputing, p1-16, June 2016.
    [34] W. Zong, G.-B. Huang, and Y. Chen, "Weighted extreme learning machine for imbalance learning. " Neurocomputing, vol. 101, pp. 229–242, Feb. 2013.

    無法下載圖示 全文公開日期 2024/08/21 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE