簡易檢索 / 詳目顯示

研究生: 蘇家禹
Chia-Yu Su
論文名稱: 機器學習應用於物聯網惡意流量檢測與行為分析
Machine Learning Approaches to Malicious IoT Traffic Flow Detection and Behavior Analysis
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 郭斯彥
Sy-Yen Kuo
黃能富
Nen-Fu Huang
楊竹星
Chu-Sing Yang
陳英一
Ing-Yi Chen
陳俊良
Jiann-Liang Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 71
中文關鍵詞: 物聯網惡意封包惡意流量檢測惡意行為分析機器學習
外文關鍵詞: Internet of Things, Malicious Packet, Malicious Traffic Flow Detection, Malicious Behavior Analysis, Machine Learning
相關次數: 點閱:323下載:22
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著技術的進步與物聯網應用的蓬勃發展,雖然使人們的生活帶來便利,但針對物聯網而來的網路攻擊也逐年增加。根據資安公司 SonicWall 最新發佈的 2021 年安全威脅報告指出,受到 Covid-19 新冠肺炎影響,越來越多公司及學校採用遠距辦公及教學,這個改變使得網路犯罪者鎖定這些家用物聯網設備發動攻擊。報告顯示,物聯網攻擊較去年成長 6%,其中亞洲國家首當其衝,相較於其他國家來說同比增長 92%。

    本研究為了防範物聯網攻擊所導致的網路威脅,提出一種基於封包特徵與機器學習演算法的架構,其內容包含了建立特徵、特徵評估、資料平衡與機器學習等機制,透過分析巨量網路封包,檢測出惡意封包及其惡意行為。本研究之資料來源來自布拉格捷克理工大學 Stratosphere Laboratory 所提供的公開資料集 IoT-23,資料集包含了 23 個網路封包,其中 20 個為惡性的,3 個為良性的,並切割為訓練、驗證與測試集。為了使模型能夠與時俱進偵測出新型態的攻擊手法,本研究將另外蒐集近三年較知名的公共漏洞與暴露,透過蒐集及分析上述知名 CVE 漏洞,將資料送入模型進行訓練,以提升模型的可靠性。

    為了檢測出惡意封包及分析其惡意行為,本研究分析原始封包行為並提出 53 項特徵,分別是基於網路元數據特徵、數據包流量特徵、有用負載資料特徵以及網路封包視窗大小特徵。最終本研究經由 LightGBM 特徵重要性機制,將最佳的特徵組合進行模型訓練,採用事先切割好的驗證資料集進行 10-Fold 交叉驗證,其效能在二分類模型檢測惡意封包可達到 92.09% 準確度,多分類模型分析惡意行為可達到 92.30% 準確度。由數據結果可知,本研究所提出之二分類及多分類模模型優於現有檢測惡意封包及惡意行為的方法。


    With the advancement of technology and the successful development of Internet of Things (IoT) applications, although it brings convenience to people’s lives, cyberattacks against IoT are also increasing year by year. According to the latest 2021 Security Threat Report released by SonicWall, an information security company, more and more companies, and schools are using telecommuting and teaching due to the new Covid-19. This change has led cybercriminals to target these home IoT devices to launch attacks. The report shows that IoT attacks are up 6% over last year, with Asian countries leading the way, up 92% compared to other countries.

    This study proposes a framework of packet features and machine learning algorithms to prevent network threats caused by IoT attacks. It consists of mechanisms of feature creation, feature evaluation, data balancing, and machine learning to detect malicious packets and their malicious behaviors by analyzing a large number of network packets. The data source for this study is the public dataset IoT-23 provided by the Stratosphere Laboratory. The dataset contains 23 network packets, of which 20 are malicious and 3 are benign. In addition, this study will collect more well-known public vulnerabilities and exposures in the past three years to keep the model up-to-date and detect new attack techniques. Through collecting and analyzing the above well-known CVE vulnerabilities, the data will be fed into the model for training to enhance the reliability of the model.

    In order to detect malicious packets and analyze their malicious behaviors, this study proposes 53 features, which are network-based features, flow-based features, payload-based features, and window size-based features. Finally, the best combination of features is trained by the LightGBM feature importance mechanism. The performance of the binary classification model can reach 92.09% accuracy in detecting malicious packets. The multiclass classification model can achieve 92.30% accuracy in malicious behavior analysis. The data results clearly show that the proposed binary and multiclass classification models are superior to the existing methods for detecting malicious packets and behaviors.

    摘要 ii Abstract iii Contents iv List of Figures vii List of Tables ix 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 5 1.3 Organization 7 2 Related Work 8 2.1 Global Malicious IoT Attack Trends 8 2.2 Malicious IoT Traffic Flow Detection methods with IoT-23 dataset 9 2.3 Malicious IoT Traffic Flow Detection using AI Techniques 10 3 Proposed System 12 3.1 System Overview 12 3.2 Data Collection 13 3.2.1 Data Source 13 3.2.2 Data Preprocessing 14 3.3 Feature Extraction 15 3.3.1 Network-based Features 17 3.3.2 Flow attributes-based Features 18 3.3.3 Payload attributes-based Features 24 3.3.4 Window size-based Features 26 3.4 Imbalance Solution 27 3.4.1 Synthetic Minority Oversampling Technique 28 3.5 Model Training 29 3.6 Prediction & Performance Analysis 30 4 Experimental Results 31 4.1 System Environment 31 4.1.1 Experiment Environment 31 4.1.2 Experiment Parameter 33 4.2 Performance of ML Algorithms 36 4.3 Feature-based Analysis 39 4.3.1 Feature Analysis of Network-based Features 39 4.3.2 Feature Analysis of Flow attributes-based Features 40 4.3.3 Feature Analysis of Payload attributes-based Features 42 4.3.4 Feature Analysis of Window size-based Features 44 4.4 Performance Analysis 45 4.4.1 Prediction by Binary Classifier 46 4.4.2 Prediction by Multi-class Classifier 47 4.4.3 Unseen dataset prediction result 48 4.4.4 Comparing with Different Studies 50 4.5 Summary 51 5 Conclusions 53 5.1 Conclusions 53 5.2 Future Works 54 References 56

    [1] Statista, “Number of internet of things (IoT) connected devices worldwide from 2019 to 2030 (in billions),” May 2022. Accessed: July. 4, 2022. [Online]. Available: https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/.
    [2] SonicWall, “Mid-Year Update: 2022 SonicWall Cyber Threat Report,” June 2022. Accessed: July. 16, 2022. [Online]. Available: https://www.sonicwall.com/2022-cyber-threat-report/.
    [3] S. Garcia, A. Parmisano, and M. J. Erquiaga, “IoT-23: A labeled dataset with malicious and benign IoT network traffic,” Jan. 2020. Accessed: Nov. 15, 2021. [Dataset]. Available: http://doi.org/10.5281/zenodo.4743746.
    [4] The MITRE Corporation, “CVE-2022-22965”, Jan. 10, 2022. Accessed: Apr. 18, 2022. [Online]. Available: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-22965.
    [5] The MITRE Corporation, “CVE-2021-44228”, Nov. 26, 2021. Accessed: Apr. 16, 2022. [Online]. Available: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228.
    [6] Zeek-flowmeter, “zeek-flowmeter”, Nov. 8, 2020. Accessed: Jan. 28, 2022. [Online]. Available: https://github.com/zeek-flowmeter/zeek-flowmeter.
    [7] Y. Liu, J. Wang, J. Li, S. Niu and H. Song, “Machine Learning for the Detection and Identification of Internet of Things Devices: A Survey,” in IEEE Internet of Things Journal, vol. 9, no. 1, pp. 298-320, 1 Jan.1, 2022.
    [8] A. A. Cook, G. Mısırlı and Z. Fan, “Anomaly Detection for IoT Time-Series Data: A Survey,” in IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6481-6494, July 2020.
    [9] R. Doshi, N. Apthorpe and N. Feamster, “Machine Learning DDoS Detection for Consumer Internet of Things Devices,” in 2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 29-35.
    [10] M. Anirudh, S. A. Thileeban and D. J. Nallathambi, “Use of honeypots for mitigating DoS attacks targeted on IoT networks,” in 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), 2017, pp. 1-4.
    [11] Kozik, R., Pawlicki, M. and Choraś, M, “A new method of hybrid time window embedding with transformer-based traffic data classification in IoT-networked environment,” in Pattern Anal Applic 24, 1441–1449 (2021).
    [12] U. Imtiaz, A. Ullah, and M. Sajjad, “Towards a Hybrid Deep Learning Model for Anomalous Activities Detection in Internet of Things Networks,” in IoT 2021, 2, 428-448.
    [13] I. Ullah and Q. H. Mahmoud, “Design and Development of a Deep Learning-Based Model for Anomaly Detection in IoT Networks,” in IEEE Access, vol. 9, pp. 103906-103926, 2021.
    [14] Dutta, Vibekananda, M. Choraś, M. Pawlicki, and R. Kozik, “A Deep Learning Ensemble for Network Anomaly and Cyber-Attack Detection” in Sensors 20, no. 16: 4583.
    [15] Siddiqui, A.J. and Boukerche, “A. TempoCode-IoT: temporal codebook-based encoding of flow features for intrusion detection in Internet of Things,” in Cluster Comput 24, 17–35 (2021).
    [16] R. -H. Hwang, M. -C. Peng, C. -W. Huang, P. -C. Lin and V. -L. Nguyen, “An Unsupervised Deep Learning Model for Early Network Traffic Anomaly Detection,” in IEEE Access, vol. 8, pp. 30387-30399, 2020.
    [17] N. Moustafa, B. Turnbull and K. R. Choo, “An Ensemble Intrusion Detection Technique Based on Proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things,” in IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4815-4830, June 2019.
    [18] R. -H. Hwang, M. -C. Peng, V. -L. Nguyen, and Y. -L. Chang. 2019, “An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level” in Applied Sciences 9, no. 16: 3414.
    [19] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad, D. Nyang, and A. Mohaisen, “Analyzing and Detecting Emerging Internet of Things Malware: A Graph-Based Approach,” in IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8977-8988, Oct. 2019.
    [20] P. Jeatrakul, K. W. Wong, and C. C. Fung, “Classification of imbalanced data by combining the complementary neural network and smote algorithm,” in International Conference on Neural Information Processing, pp. 152–159, Springer, 2010.
    [21] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in neural information processing systems, vol. 30, 2017.
    [22] P. A. A. Resende and A. C. Drummond, “A survey of random forest based methods for intrusion detection systems,” in ACM Computing Surveys (CSUR), vol. 51, no. 3, pp. 1–36, 2018.
    [23] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho and K. Chen, “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1–4, 2015.
    [24] O. Yoachimik, “DDoS attack trends for 2022 Q2,” July. 6, 2022. Accessed: July. 24, 2022. [Online]. Available: https://blog.cloudflare.com/ddos-attack-trends-for-2022-q2/.
    [25] The Zeek Project, “Zeek,” 2020. Accessed: Feb. 8, 2022. [Online]. Available: https://zeek.org/.
    [26] Corelight, Inc., “Bro Cheatsheets,” Oct. 28, 2018. Accessed: Feb. 24, 2022. [Online]. Available: https://github.com/corelight/bro-cheatsheets/blob/master/Corelight-Bro-Cheatsheets-2.6.pdf.
    [27] Y. Liang, “A research project of anomaly detection on dataset IoT-23,” May. 9, 2021. Accessed: Mar. 4, 2022. [Online]. Available: https://github.com/yliang725/Anomaly-Detection-IoT23.
    [28] Bhageerath123, “Machine-Learning-Models-for-Detecting-Malicious-Traffic-in-IoT-Devices-using-IoT-23-Dataset,” Mar. 25, 2021. Accessed: Mar. 12, 2022. [Online]. Available: https://github.com/Bhageerath123/Machine-Learning-Models-for-Detecting-Malicious-Traffic-in-IoT-Devices-using-IoT-23-Dataset/blob/main/Dataset_2.ipynb.
    [29] N. Abdalgawad, A. Sajun, Y. Kaddoura, I. A. Zualkernan and F. Aloul, “Generative Deep Learning to Detect Cyberattacks for the IoT-23 Dataset,” in IEEE Access, vol. 10, pp. 6430-6441, 2022.
    [30] D. Nanthiya, P. Keerthika, S. B. Gopal, S. B. Kayalvizhi, T. Raja and R. S. Priya, “SVM Based DDoS Attack Detection in IoT Using Iot-23 Botnet Dataset,” 2021 Innovations in Power and Advanced Computing Technologies (i-PACT), 2021, pp. 1-7.
    [31] F. B. Hussain, S. G. Abbas, U. U. Fayyaz, G. A. Shah, A. Toqeer, and A. Ali, “Towards a universal features set for iot botnet attacks detection,” in 2020 IEEE 23rd International Multitopic Conference (INMIC), pp. 1–6, 2020.
    [32] F. Jeelani, D. S. Rai, A. Maithani and S. Gupta, “The Detection of IoT Botnet using Machine Learning on IoT-23 Dataset,” in 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), 2022, pp. 634-639.
    [33] I. Ullah and Q. H. Mahmoud, “An Anomaly Detection Model for IoT Networks based on Flow and Flag Features using a Feed-Forward Neural Network,” in 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), 2022, pp. 363-368.
    [34] J. S. Bains, H. V. Kopanati, R. Goyal, B. K. Savaram and S. Butakov, “Using Machine Learning for malware traffic prediction in IoT networks,” in 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2021, pp. 146-149.
    [35] A. Alharbi and K. Alsubhi, “Botnet Detection Approach Using Graph-Based Machine Learning,” in IEEE Access, vol. 9, pp. 99166-99180, 2021.
    [36] H. Kang, D. H. Ahn, G. M.. Lee, J. D. Yoo, K.. H.. Park, H. K. Kim, September 27, 2019, “IoT network intrusion dataset”, IEEE Dataport.
    [37] Malware-Traffic-Analysis.net, “A source for packet capture (pcap) files and malware samples...,” Accessed: May. 30, 2022. [Online]. Available: https://www.malware-traffic-analysis.net/.
    [38] C. V. Oha, F. S. Farouk, P. P. Patel, P. Meka, S. Nekkanti, B. Nayini, S. X. Carvalho, N. Desai, M. Patel, and S. Butakov, “Machine learning models for malicious traffic detection in iot networks /iot-23 dataset/,” in Machine Learning for Networking: 4th International Conference, MLN 2021, Virtual Event, December 1–3, 2021, Proceedings, (Berlin, Heidelberg), p. 69–84, Springer-Verlag, 2021.
    [39] Netresec, “Publicly available PCAP files,” Accessed: June. 15, 2022. [Online]. Available: https://www.netresec.com/?page=PcapFiles.
    [40] E. Hubbard, “Real-time detection of encrypted traffic based on entropy estimation prepared for the salzburg university of applied sciences degree program information technology,” 2010.

    QR CODE