基於機器學習方法的加密惡意流量偵測與分類｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李孟瑾 Meng-Chin Lee
論文名稱：	基於機器學習方法的加密惡意流量偵測與分類 An approach to detect and to classify encrypted malicious internet traffic based on machine learning algorithms
指導教授：	鄧惟中 Wei-Chung Teng
口試委員:	林宗男 Tsung-Nan Lin 黃政嘉 Jheng-Jia Huang 沈上翔 Shan-Hsiang Shen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	53
中文關鍵詞：	傳輸層安全性協定、安全資訊與事件管理、流量分類、機器學習
外文關鍵詞：	Transport Layer Security Protocol, Traffic Classification
相關次數：	點閱：228 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

真實網路流量是許多網路相關研究與發展的重要資產，然而加密流量的興
起，為網際網路流量的分析帶來新的挑戰。現今加密資料的網路流量因為無法直
接解讀封包內的 payload，因此分析手法大多著重於 TCP/IP 標頭的欄位，但如此
則不易解析網路應用層協定，也妨礙了惡意流量的偵測。隨著 SSL/TLS 已成為
Internet 上最流行的網路通訊協議，本研究從 SSL/TLS 的協議內容著手，來區別惡
意和良性網路流量，更進一步對惡意流量進行分類。
本研究藉由過去研究在流量分類中使用的特徵，搭配我們提出的兩類有力特
徵與目前最新的機器學習演算法，建立一套分類器的標準作業程序來解決這個問
題。我們提出的兩類特徵分別為傳輸層 SSL/TLS 協定的封包欄位與傳送方向變
換。傳送方向變換與流量改變傳送方向的頻率有關。
本研究提出的標準作業程序首先是提取惡意程式的 Pcap 檔。從記錄資料中過
濾雜訊和特定封包後，可切取出流量。抽取出的流量可被更進一步的切割成較短
的流量片段，我們接著計算這些流量片段的特徵。標準作業程序中對於如何過濾
封包、怎麼切割流量和怎麼處理特徵，都定義了調整參數以提供彈性。所以任何
機器學習演算法，都能用這套標準作業程序，並搭配我們所提出的傳輸層特徵，
訓練出一個良好的分類器。
本研究針對五類惡意軟體家族進行了分類實驗來驗證上述特徵的有效性。我
們提出之特徵的分類準確度較我們統計之大部分文獻特徵的準確度最大提升了
0.48%。而在偵測階段我們拿不屬於這五類惡意家族之惡意流量進行偵測，則可
達到 85% 的分類準確度，這代表這個機器學習模型具有抵抗未知惡意軟體的能
力。

With increasing encrypted network flow in recent years, most internet traffic is protected
using the cryptographic protocol known as Transport Layer Security (TLS). Some examples being HyperText Transfer Protocol Secure (HTTPS), File Transfer Protocol Secure Sockets Layer (FTPS), etc. Unfortunately, criminals with the intention of spreading
malware have also adopted TLS to secure its data transmission. This trend makes threat
detection more difficult because a common way to detect malware traffic, deep packet
inspection (DPI), becomes ineffective if the traffic is encrypted.
Traditional packet analysis based network solutions for obvious reasons, cannot decipher the data inside encrypted traffic. Because of this shortcoming, it is difficult to
ensure that encrypted traffic does not contain malicious data. Therefore, because decrypting is not an option, the use of passive detection can be used to collect feature differences
in encryption suites and or other data traffic features between benign and malware data
traffic. Next, these collected features can be used in machine learning to train models to
differentiate benign data traffic from malware data traffic. This method for solving the
shortcomings of traditional packet analysis.
In this thesis: (1) We propose two categories of powerful features for machine learning algorithms to train classifiers. (2) We develop a standard operating procedure (SOP)
to build a classifier for malware traffic. Through our efforts, we can make traffic classification of malware network flow no longer a difficult problem, and protect users from
malicious data.

目錄
論文摘要
Abstract
目錄
圖目錄
表目錄
緒論
1 研究背景
2 研究動機與目的
3 研究貢獻
4 論文架構
背景知識與相關研究
1 惡意流量偵測 .
2 Transport Layer Security
3 Machine Learning Algo
資料預處理
1 資料生成流程
2 資料切割與過濾
3 刪除過短資料流
4 計算特徵
4.1 傳輸安全層協定的相關特徵
4.2 流量方向相關
5 特徵分析與選取
5.1 重疊率統計
5.2 重疊率直方圖
5.3 不重要特徵直方圖
實驗結果與分析
1 資料集與實驗環境
1.1 資料集來源及資料搜集
1.2 實驗環境
2 評估指標
2.1 False positive rate (FPR) 和 False negative rate (FNR)
2.2 Accuracy
2.3 Precision、Recall 和 F1­Measure
3 不同演算法及機器學習模型偵測率比較
4 檢驗系統之架構
5 檢驗測試實驗結果
結論
                                

[1] Google, “HTTPS usage.” https://transparencyreport.google.com/https/
overview?hl=zh_TW/ Accessed April 4, 2021.
[2] https://www.cisco.com/c/en/us/solutions/collateral/
enterprise-networks/enterprise-network-security/
nb-09-encrytd-traf-anlytcs-wp-cte-en.html author = Cisco.
[3] Stratosphere, “Nomad project.” https://www.stratosphereips.org/blog/
2017/6/19/nomad-project Accessed January 28, 2018/.
[4] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges,” IEEE Transactions on Network and Service Management, vol. 16, no. 2,
pp. 445–458, 2019.
[5] S. Frolov and E. Wustrow, “The use of TLS in Censorship Circumvention.,” in 2019
Network and Distributed Systems Security (NDSS), 2019.
[6] E. B. Beigi, H. H. Jazi, N. Stakhanova, and A. A. Ghorbani, “Towards effective
feature selection in machine learningbased botnet detection approaches,” in 2014
IEEE Conference on Communications and Network Security, pp. 247–255, IEEE,
2014.
[7] F. V. Alejandre, N. C. Cortés, and E. A. Anaya, “Feature selection to detect botnets
using machine learning algorithms,” in 2017 International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7, IEEE, 2017.
[8] J. Muehlstein, Y. Zion, M. Bahumi, I. Kirshenboim, R. Dubin, A. Dvir, and O. Pele,
“Analyzing https encrypted traffic to identify user’s operating system, browser and
application,” in 2017 14th IEEE Annual Consumer Communications & Networking
Conference (CCNC), pp. 1–6, IEEE, 2017.
[9] D. Bekerman, B. Shapira, L. Rokach, and A. Bar, “Unknown malware detection
using network traffic classification,” in 2015 IEEE Conference on Communications
and Network Security (CNS), pp. 134–142, IEEE, 2015.
[10] Z. Chen, Q. Yan, H. Han, S. Wang, L. Peng, L. Wang, and B. Yang, “Machine learning based mobile malware detection using highly imbalanced network traffic,” Information Sciences, vol. 433, pp. 346–364, 2018.
[11] M. J. De Lucia and C. Cotton, “Detection of encrypted malicious network traffic
using machine learning,” in MILCOM 20192019 IEEE Military Communications
Conference (MILCOM), pp. 1–6, IEEE, 2019.
[12] https://otx.alienvault.com/api/ title = DirectConnect API, author = alienvault.
[13] https://www.virustotal.com/gui/home/search/ title = DirectConnect API,
author = virus.
[14] B. Anderson, S. Paul, and D. McGrew, “Deciphering malware＇s use of TLS (without
decryption),” Journal of Computer Virology and Hacking Techniques, vol. 14, no. 3,
pp. 195–211, 2018.
[15] B. Anderson and D. McGrew, “Machine learning for encrypted malware traffic classification: accounting for noisy labels and nonstationarity,” in Proceedings of the
23rd ACM SIGKDD International Conference on knowledge discovery and data
mining, pp. 1723–1732, 2017.
[16] Y. Cao, “Malicious javascript detection using similarity measurement,” Master’s thesis, National Taiwan University of Science and Technology, 2019.
[17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikitlearn: Machine learning in
Python,” Journal of Machine Learning Research, 2011.

全文公開日期 2024/07/08 (校外網路)
全文公開日期 2024/07/08 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文