簡易檢索 / 詳目顯示

研究生: 王柏翔
Bo-Xiang Wang
論文名稱: 機器學習應用於識別與分析網路威脅之研究
Machine Learning Applied to Identify and Analyze Cyber Threats
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 陳俊良
Jiann-Liang Chen
孫雅麗
Yeali-S. Sun
黎碧煌
Bih-Hwang Lee
廖婉君
Wan-jiun Liao
郭耀煌
Yau-Hwang Kuo
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 75
中文關鍵詞: 機器學習威脅檢測意圖分析蜜罐遠端外殼存取
外文關鍵詞: Machine Learning, Threat Detection, Intent Analysis, Honeypot, Remote Shell Access
相關次數: 點閱:230下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著技術的進步與互聯網的快速發展,各種裝置以及用戶收發訊息的數量迅速增加。因應現代追求方便以及強調資源的分享,遠端控制為現今廣泛應用的機制之一。而為了控制系統、竊取資料並傳播惡意程式,駭客威脅手法也與時俱進、不斷創新。Command Injection (CMDi) 為物聯網中最顯著的漏洞,攻擊者試圖使用CMDi攻擊來獲得該設備的初始訪問權限。在進入系統後,會透過偵查、執行惡意檔案、規避防禦等行為來達到攻擊者最終的目標。
本研究為防範「遠端連線」造成的網路威脅,基於行為特徵結合機器學習演算法之系統。其概念結合樣本分析、建立特徵與特徵評估等機制打造偵測模型,期望此模型能夠提供作業系統或相關業者在防禦惡意攻擊時有更明確的策略。本研究收集於Cowrie中 SSH and Telnet 的交互資訊,並切割為訓練、驗證與測試資料集。為了確保資料集可信度,本研究根據 MITRE ATT&CK的 Enterprise tactics進行標記,以確保數據集的公正性。並且提出檢測遠程外殼威脅系統,依照威脅系統的風險程度分成三個級別。
為了將遠端連線中各個級別的網路威脅偵測出來,本研究分析資料集中行為並提出52項特徵,分別為各類Linux操作指令之message-based特徵、連線過程各種資訊之host-based特徵與地理位置相關之geography-based特徵,並經由LightGBM特徵評估機制證實本研究所提出的自定義特徵具有鑑別度。最終本研究將最佳的特徵組合所訓練出之偵測模型,採用事先切割的測試資料集進行預測。其效能在LightGBM演算法之偵測模型可達到99%的準確度,Random Forest演算法之偵測模型可達到95.66%的準確度,而K-NN演算法可達到94.08%的準確度。由數據結果可知,本研究提出之偵測模型優於現有檢測遠端連線中網路威脅方法。


The number of devices and the number of messages sent and received by users is increasing rapidly, along with the rapid development of technology and the Internet. Remote control is widely used for its convenience and its support of resource sharing. Hacking threats are also evolving and innovating to control systems, steal data and spread malicious programs. Command Injection (CMDi) is the most significant vulnerability in the Internet of Things. Attackers try to use CMDi attacks to obtain the initial access rights of the device. After entering a system, an attacker will seek to investigate environment, execute malicious files, avoid system defenses and performs other tasks.
The aim of this work is to prevent remote network threats using behavioral features of attacker and machine learning algorithms. The concept combines the mechanism of sample analysis, feature building and feature evaluation to build a detection model, which is expected to provide a more straightforward strategy for the operating system or related operators to defend against malicious attacks. This study collected the interaction information of SSH and Telnet from Cowrie and divided it into training, verification and testing datasets. This study is labeled according to Enterprise Tactics of MITRE ATT&CK [1] to ensure dataset credibility. A Detect Remote Shell Threat (DEST) System is divided into three levels, depending on the risk of damage of user by attacker.
This study analyzes the behavior of the dataset and presents 52 features to detect each level of cyber threat in the remote connection. It contains message-based features of all kinds of Linux operation instructions, host-based features of all kinds of information in the connection process, and geography-based features related to geographical location. The LightGBM [2] feature evaluation mechanism was used to verify the discrimination of the custom features. Finally, the detection model trained by the best combination of features is used for prediction using the test dataset. The accuracy of the LightGBM algorithm can reach 99%, the accuracy of the Random Forest algorithm [3] can reach 95.66%, and the accuracy of the K-NN algorithm [4] can reach 94.08%. The results show that the proposed detection model is superior to the existing remote network threat detection methods. One the other hand, the mutual dependencies of features and network threats are presented.

摘要 I Abstract II List of Figures VII List of Tables IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 5 1.3 Organization 6 Chapter 2 Related Work 8 2.1 Threat Indicator with Honeypots 8 2.2 Analyzing of Attacker Behaviors Based on SSH Sessions 10 2.3 Detection of Attacks using ML Techniques 11 Chapter 3 Proposed Threat Detection System 14 3.1 System Architecture 14 3.2 Data Collection 17 3.2.1 Data Source 17 3.2.2 Crawler Technology 17 3.3 Data Processing 18 3.3.1 Data Cleaning 18 3.3.2 Label Observing 18 3.4 Feature Definition 19 3.4.1 Message-based Features 20 3.4.2 Host-based Features 29 3.4.3 Geography-based Features 33 3.5 Data Distribution 35 Chapter 4 System Environment and Performance Analysis 37 4.1 System Environment 37 4.1.1 Experimental Environment 37 4.1.2 Experimental Parameter 39 4.2 Performance of ML Algorithms 42 4.3 Feature-based Analysis 44 4.3.1 Feature Analysis of Message-based Features 45 4.3.2 Feature Analysis of Host-based Features 46 4.3.3 Feature Analysis of Geography -based Features 48 4.3.4 Feature Analysis of All Features 50 4.3.5 Classifier Generation 52 4.4 Performance Analysis 53 4.4.1 Prediction by Classifier 53 4.4.2 Comparison of Different Studies 54 4.5 Summary 55 Chapter 5 Conclusions and Future Works 57 5.1 Conclusions 57 5.2 Future Works 58 References 60

[1] B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Penning-ton and C. B. Thomas, "MITRE ATT&CK™: Design and philosophy," Technical report, 2018.
[2] G. Ke et al., "LightGBM: A highly efficient gradient boosting decision tree," Proc. 31st Conf. Neural Inf. Process. Syst. (NIPS), pp. 3146-3154, 2017.
[3] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
[4] P. Cunningham and S. J. Delany, "k-Nearest neighbour classifiers," Multiple Classifier Syst., vol. 34, pp. 1-17, 2007.
[5] S. kemp, "Digital 2021: Global Overview Report, " Retrieved from https://datareportal.com/reports/digital-2021-global-overview-report/ (last visited on 2021/03/07)
[6] Profit From Tech, "The Ultimate List of Internet of Things Statistics for 2021, " Retrieved from https://www.profitfromtech.com/internet-of-things-statistics/ (last visited on 2021/03/07)
[7] B. Buntz, "The 10 Most Vulnerable IoT Security Targets, " Retrieved from https://www.iotworldtoday.com/2016/07/27/10-most-vulnerable-iot-security-targets/ (last visited on 2021/03/07)
[8] T. Ylonen, "SSH Key Management Challenges and Requirements," 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1-5, 2019.
[9] D. McMillen, "A New Botnet Attack Just Mozied Into Town, " Retrieved from https://securityintelligence.com/posts/botnet-attack-mozi-mozied-into-town/ (last visited on 2021/03/07)
[10] Venafi, " Secure Shell (SSH) Security, Vulnerabilities and Exploitation, " Retrieved from https://www.venafi.com/education-center/ssh/security-and-vulnerabilities/ (last visited on 2021/03/07)
[11] D. Fraunholz, M. Zimmermann, A. Hafner and H. D. Schotten, "Data Mining in Long-Term Honeypot Data," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 649-656, 2017.
[12] V. Kuskov, M. Kuzin, Y. Shmelev, D. Makrushin and I. Grachev, "Honeypots and the Internet of Things, " 2017. (last visited on 2021/03/31)
[13] A. Kyriakou and N. Sklavos, "Container-Based Honeypot Deployment for the Analysis of Malicious Activity," 2018 Global Information Infrastructure and Networking Symposium (GIIS), pp. 1-4, 2018.
[14] S. Kumar, B. Janet and R. Eswari, "Multi Platform Honeypot for Generation of Cyber Threat Intelligence," 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 25-29, 2019.
[15] J. M. Pittman, K. Hoffpauir and N. Markle, "Primer–a tool for testing honeypot measures of effectiveness," arXiv:2011.00582, 2020.
[16] E. Kheirkhah, S. M. Poustchi Amin, H. A. Jahanshahi Sistani and H. Acharya, "An experimental study of ssh attacks by using honeypot decoys," Indian Journal of Science and Technology, vol. 6, 2013.
[17] C. Valli, P. Rabadia and A. Woodward, "Patterns and Patter-An Investigation into SSH Activity Using Kippo honeypots," Australian Digital Forensics Conference, 2013.
[18] G. Kambourakis, C. Kolias and A. Stavrou, "The Mirai botnet and the IoT zombie armies," Proc. IEEE Military Commun. Conf. (MILCOM), pp. 267-272, 2017.
[19] C. Kolias et al., "DDoS in the IoT: Mirai and Other Botnets," Computer, vol. 50, no. 7, pp. 80-84, 2017.
[20] T. Bajtoš et al. "Analysis of the infection and the injection phases of the Telnet botnets," Journal of Universal Computer Science, vol. 25, no. 11, pp. 1417-1436, 2019.
[21] L. Hellemons et al., "SSHCure: A flow-based SSH intrusion detection system," Proc. 6th Int. Conf. AIMS, vol. 7279, pp. 86-97, 2012.
[22] G. K. Sadasivam, C. Hota and B. Anand, "Classification of SSH Attacks Using Machine Learning Algorithms," 2016 6th International Conference on IT Convergence and Security (ICITCS), pp. 1-6, 2016.
[23] T. Bajtoš, P. Sokol, and T. Mézešová, "Virtual honeypots and detection of telnet botnets," Proceedings of the Central European Cybersecurity Conference, 2018.
[24] R. M. Arifianto, P. Sukarno and E. M. Jadied, "An SSH Honeypot Architecture Using Port Knocking and Intrusion Detection System," 2018 6th International Conference on Information and Communication Technology (ICoICT), pp. 409-415, 2018.
[25] R. K. Shrivastava, B. Bashir and C. Hota, "Attack detection and forensics using honeypot in iot environment," International Conference on Distributed Computing and Internet Technology, pp. 402-409, 2019.
[26] P. Dumont, R. Meier, D. Gugelmann and V. Lenders, "Detection of Malicious Remote Shell Sessions," 2019 11th International Conference on Cyber Conflict (CyCon), pp. 1-20, 2019.
[27] S. Udhani, A. Withers and M. Bashir, "Human vs Bots: Detecting Human Attacks in a Honeypot Environment," 2019 7th International Symposium on Digital Forensics and Security (ISDFS), pp. 1-6, 2019.
[28] T. Lee, L. Chang and C. Syu, "Deep Learning Enabled Intrusion Detection and Prevention System over SDN Networks," 2020 IEEE International Cosnference on Communications Workshops (ICC Workshops), pp. 1-6, 2020.
[29] J. M. J. Valero, M. G. Pérez, A. H. Celdrán and G.M. Pérez, "Identification and classification of cyber threats through ssh honeypot systems," in Handbook of Research on Intrusion Detection Systems, IGI Global, pp. 105-129, 2020.
[30] B. Lingenfelter, I. Vakilinia and S. Sengupta, "Analyzing Variation Among IoT Botnets Using Medium Interaction Honeypots," 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0761-0767, 2020.
[31] J. T. M. Garre, M. G. Pérez, A. R. Martínez, "A novel Machine Learning-based approach for the detection of SSH botnet infection," Future Generation Computer Systems, vol 115, pp. 387-396, 2021.
[32] T. Chen, T. He and M. Benesty, "Xgboost: Extreme gradient boosting," R Package Version 0.4-2, pp. 1-4, 2015.
[33] Wikipedia, " Comparison of SSH clients, " Retrieved from https://en.wikipedia.org/wiki/Comparison_of_SSH_clients (last visited on 2021/04/09)

QR CODE