簡易檢索 / 詳目顯示

研究生: 李孟瑾
Meng-Chin Lee
論文名稱: 基於機器學習方法的加密惡意流量偵測與分類
An approach to detect and to classify encrypted malicious internet traffic based on machine learning algorithms
指導教授: 鄧惟中
Wei-Chung Teng
口試委員: 林宗男
Tsung-Nan Lin
黃政嘉
Jheng-Jia Huang
沈上翔
Shan-Hsiang Shen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 53
中文關鍵詞: 傳輸層安全性協定安全資訊與事件管理流量分類機器學習
外文關鍵詞: Transport Layer Security Protocol, Traffic Classification
相關次數: 點閱:228下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 真實網路流量是許多網路相關研究與發展的重要資產,然而加密流量的興
    起,為網際網路流量的分析帶來新的挑戰。現今加密資料的網路流量因為無法直
    接解讀封包內的 payload,因此分析手法大多著重於 TCP/IP 標頭的欄位,但如此
    則不易解析網路應用層協定,也妨礙了惡意流量的偵測。隨著 SSL/TLS 已成為
    Internet 上最流行的網路通訊協議,本研究從 SSL/TLS 的協議內容著手,來區別惡
    意和良性網路流量,更進一步對惡意流量進行分類。
    本研究藉由過去研究在流量分類中使用的特徵,搭配我們提出的兩類有力特
    徵與目前最新的機器學習演算法,建立一套分類器的標準作業程序來解決這個問
    題。我們提出的兩類特徵分別為傳輸層 SSL/TLS 協定的封包欄位與傳送方向變
    換。傳送方向變換與流量改變傳送方向的頻率有關。
    本研究提出的標準作業程序首先是提取惡意程式的 Pcap 檔。從記錄資料中過
    濾雜訊和特定封包後,可切取出流量。抽取出的流量可被更進一步的切割成較短
    的流量片段,我們接著計算這些流量片段的特徵。標準作業程序中對於如何過濾
    封包、怎麼切割流量和怎麼處理特徵,都定義了調整參數以提供彈性。所以任何
    機器學習演算法,都能用這套標準作業程序,並搭配我們所提出的傳輸層特徵,
    訓練出一個良好的分類器。
    本研究針對五類惡意軟體家族進行了分類實驗來驗證上述特徵的有效性。我
    們提出之特徵的分類準確度較我們統計之大部分文獻特徵的準確度最大提升了
    0.48%。而在偵測階段我們拿不屬於這五類惡意家族之惡意流量進行偵測,則可
    達到 85% 的分類準確度,這代表這個機器學習模型具有抵抗未知惡意軟體的能
    力。


    With increasing encrypted network flow in recent years, most internet traffic is protected
    using the cryptographic protocol known as Transport Layer Security (TLS). Some examples being HyperText Transfer Protocol Secure (HTTPS), File Transfer Protocol Secure Sockets Layer (FTPS), etc. Unfortunately, criminals with the intention of spreading
    malware have also adopted TLS to secure its data transmission. This trend makes threat
    detection more difficult because a common way to detect malware traffic, deep packet
    inspection (DPI), becomes ineffective if the traffic is encrypted.
    Traditional packet analysis based network solutions for obvious reasons, cannot decipher the data inside encrypted traffic. Because of this shortcoming, it is difficult to
    ensure that encrypted traffic does not contain malicious data. Therefore, because decrypting is not an option, the use of passive detection can be used to collect feature differences
    in encryption suites and or other data traffic features between benign and malware data
    traffic. Next, these collected features can be used in machine learning to train models to
    differentiate benign data traffic from malware data traffic. This method for solving the
    shortcomings of traditional packet analysis.
    In this thesis: (1) We propose two categories of powerful features for machine learning algorithms to train classifiers. (2) We develop a standard operating procedure (SOP)
    to build a classifier for malware traffic. Through our efforts, we can make traffic classification of malware network flow no longer a difficult problem, and protect users from
    malicious data.

    目錄 論文摘要 Abstract 目錄 圖目錄 表目錄 1 緒論 1.1 研究背景 1.2 研究動機與目的 1.3 研究貢獻 1.4 論文架構 2 背景知識與相關研究 2.1 惡意流量偵測 . 2.2 Transport Layer Security 2.3 Machine Learning Algo 3 資料預處理 3.1 資料生成流程 3.2 資料切割與過濾 3.3 刪除過短資料流 3.4 計算特徵 3.4.1 傳輸安全層協定的相關特徵 3.4.2 流量方向相關 3.5 特徵分析與選取 3.5.1 重疊率統計 3.5.2 重疊率直方圖 3.5.3 不重要特徵直方圖 4 實驗結果與分析 4.1 資料集與實驗環境 4.1.1 資料集來源及資料搜集 4.1.2 實驗環境 4.2 評估指標 4.2.1 False positive rate (FPR) 和 False negative rate (FNR) 4.2.2 Accuracy 4.2.3 Precision、Recall 和 F1­Measure 4.3 不同演算法及機器學習模型偵測率比較 4.4 檢驗系統之架構 4.5 檢驗測試實驗結果 5 結論

    [1] Google, “HTTPS usage.” https://transparencyreport.google.com/https/
    overview?hl=zh_TW/ Accessed April 4, 2021.
    [2] https://www.cisco.com/c/en/us/solutions/collateral/
    enterprise-networks/enterprise-network-security/
    nb-09-encrytd-traf-anlytcs-wp-cte-en.html author = Cisco.
    [3] Stratosphere, “Nomad project.” https://www.stratosphereips.org/blog/
    2017/6/19/nomad-project Accessed January 28, 2018/.
    [4] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges,” IEEE Transactions on Network and Service Management, vol. 16, no. 2,
    pp. 445–458, 2019.
    [5] S. Frolov and E. Wustrow, “The use of TLS in Censorship Circumvention.,” in 2019
    Network and Distributed Systems Security (NDSS), 2019.
    [6] E. B. Beigi, H. H. Jazi, N. Stakhanova, and A. A. Ghorbani, “Towards effective
    feature selection in machine learning­based botnet detection approaches,” in 2014
    IEEE Conference on Communications and Network Security, pp. 247–255, IEEE,
    2014.
    [7] F. V. Alejandre, N. C. Cortés, and E. A. Anaya, “Feature selection to detect botnets
    using machine learning algorithms,” in 2017 International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7, IEEE, 2017.
    [8] J. Muehlstein, Y. Zion, M. Bahumi, I. Kirshenboim, R. Dubin, A. Dvir, and O. Pele,
    “Analyzing https encrypted traffic to identify user’s operating system, browser and
    application,” in 2017 14th IEEE Annual Consumer Communications & Networking
    Conference (CCNC), pp. 1–6, IEEE, 2017.
    [9] D. Bekerman, B. Shapira, L. Rokach, and A. Bar, “Unknown malware detection
    using network traffic classification,” in 2015 IEEE Conference on Communications
    and Network Security (CNS), pp. 134–142, IEEE, 2015.
    [10] Z. Chen, Q. Yan, H. Han, S. Wang, L. Peng, L. Wang, and B. Yang, “Machine learning based mobile malware detection using highly imbalanced network traffic,” Information Sciences, vol. 433, pp. 346–364, 2018.
    [11] M. J. De Lucia and C. Cotton, “Detection of encrypted malicious network traffic
    using machine learning,” in MILCOM 2019­2019 IEEE Military Communications
    Conference (MILCOM), pp. 1–6, IEEE, 2019.
    [12] https://otx.alienvault.com/api/ title = DirectConnect API, author = alienvault.
    [13] https://www.virustotal.com/gui/home/search/ title = DirectConnect API,
    author = virus.
    [14] B. Anderson, S. Paul, and D. McGrew, “Deciphering malware's use of TLS (without
    decryption),” Journal of Computer Virology and Hacking Techniques, vol. 14, no. 3,
    pp. 195–211, 2018.
    [15] B. Anderson and D. McGrew, “Machine learning for encrypted malware traffic classification: accounting for noisy labels and non­stationarity,” in Proceedings of the
    23rd ACM SIGKDD International Conference on knowledge discovery and data
    mining, pp. 1723–1732, 2017.
    [16] Y. Cao, “Malicious javascript detection using similarity measurement,” Master’s thesis, National Taiwan University of Science and Technology, 2019.
    [17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit­learn: Machine learning in
    Python,” Journal of Machine Learning Research, 2011.

    無法下載圖示
    全文公開日期 2024/07/08 (校外網路)
    全文公開日期 2024/07/08 (國家圖書館:臺灣博碩士論文系統)
    QR CODE