簡易檢索 / 詳目顯示

研究生: 莊翔宇
Hsiang-Yu Chuang
論文名稱: 基於圖卷積網路與函式呼叫圖之惡意軟體偵測與分類
Malware Detection and Classification Based on Graph Convolutional Networks and Function Call Graphs
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 郭斯彥
黃能富
楊竹星
陳英一
陳俊良
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 59
中文關鍵詞: 惡意程式圖卷積網路深度學習沙盒分析動態行為
外文關鍵詞: Malware, Graph Convolutional Networks, Deep Learning, Sandbox Analysis, Dynamic Behavior
相關次數: 點閱:163下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,由於資訊技術的發展,網際網路在現代生活中已經是不可或缺的一部分,網路為人們的生活帶來許多便利;另一方面,隨著資訊數位化,攻擊者能夠更輕易的使用軟體與系統漏洞來竊取這些資訊。此外,數位化轉型已成為政府和主要企業的必要戰略之一。
    然而,新的網路威脅也從傳統的資訊領域逐漸入侵到物聯網領域中。隨著物聯網的發展,許多企業藉由引入大數據與人工智慧技術,來提高企業競爭力。同時,也會有更多的設備在網際網路上傳輸數據,其中也包括有商業價值的機密資訊。但引入新型數位技術的同時,也使企業所遭遇的資安風險增加。其中,惡意程式更是一大威脅。一但遭受攻擊,對企業或是一般民眾都會有很大的影響。
    由於惡意軟體的快速發展,不斷產生新型與變種惡意軟體,如何有效且快速的辨識惡意軟體將成為分析人員的首要目標。因此,本研究提出基於圖卷積網路與函數呼叫圖之惡意程式偵測與分類模型。藉由沙盒分析惡意程式執行的行為,取得函數呼叫與函數之間的關聯,建立一個能夠代表惡意程式的行為圖。使用程式呼叫的API當作節點,API之間的呼叫關係當作邊,API的潛在語意當作節點特徵,透過子圖整合來保留惡意軟體的行為。
    最後,本研究將模型與先前的研究進行效能測試與比較。此外,本研究也使用先前研究提出的圖方法,進行分類任務的效能測試與比較。在相同的標準下,偵測模型的準確度以及精確度分別達到0.945與0.95,分類模型的準確度以及精確度分別達到0.926與0.93。此外,由數據結果可得知,本研究所提出的方法優於先前的研究。


    In recent years, due to the development of information technology, the Internet has become an indispensable part of modern life, and the Internet has brought many conveniences to people's lives. On the other hand, with the digitization of information, attackers can more easily use software and system vulnerabilities to steal this information. In addition, digital transformation has become one of the necessary strategies for governments and major corporations.
    However, new cyber threats gradually invade the Internet of Things from the traditional information domain. With the growth of the Internet of Things, many companies are introducing big data and artificial intelligence technologies to improve their competitiveness. At the same time, more devices will be transmitting data over the Internet, including commercially valuable confidential information. However, introducing new digital technologies has also increased the information security risks that companies face. Among them, malware is a significant threat. If an attack occurs, it will significantly impact both businesses and the general public.
    Due to the rapid development of malware, new types and variants of malware are constantly generated, and how identifying malware effectively and quickly will become the primary goal of analysts. Therefore, this study proposes a malware detection and classification model based on graphical convolutional networks and function call graphs. By analyzing the behavior of malware executions through sandboxes, we obtain the association between function calls and functions to build a graph representing the behavior of malware. Using the APIs called by the application as nodes, the call relationships between APIs as edges, and the underlying semantics of APIs as node features, the behavior of malware is retained through subgraph integration.
    Finally, the performance of the model was tested and compared with previous studies. In addition, the performance of the classification task was also tested and compared using the graphical method proposed in the previous study. Under the same criteria, the accuracy and precision of the detection model reached 0.945 and 0.95, respectively, and the accuracy and precision of the classification model reached 0.926 and 0.93, respectively. Furthermore, the data results show that the proposed method is superior to the previous studies.

    摘要 I Abstract II List of Figures VII List of Tables VIII Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Organization 5 Chapter 2 Related Work 5 2.1 Malware Concept 5 2.2 Malware Analysis 7 2.2.1 Static Analysis 8 2.2.2 Dynamic Analysis 9 2.3 Malware Detection Techniques 12 2.3.1 Signature Based 12 2.3.2 Heuristic Based 13 2.3.3 Machine/Deep Learning 15 Chapter 3 Proposed Methods 18 3.1 System Overview 18 3.2 Data Collection 19 3.3 Dataset Building 20 3.3.1 Sandbox Analysis 21 3.3.2 API Embedding 22 3.3.3 Data Preprocessing 24 3.4 Model and Prediction 28 3.4.1 Graph Neural Networks 28 3.4.2 Classification Model Architecture 31 Chapter 4 Performance Analysis 34 4.1 Experimental Environment 34 4.2 Experimental Performance 36 4.2.1 Detection Task 36 4.2.2 Malware Classification Task 38 4.2.3 Multi Classification Task 39 4.3 Performance Comparison 41 Chapter 5 Conclusions and Future Works 43 5.1 Conclusions 43 5.2 Future Works 44 References 46

    [1] Fortinet, “Global Threat Landscape Report,” Available: https://www.fortinet.com/content/dam/maindam/PUBLIC/02_MARKETING/08_Report/report-2021-threat%20landscape.pdf [Accessed: 20-Apr 2022].
    [2] SonicWall, “2021 Sonicwall Cyber Threat Report,” Available: https://www.sonicwall.com/medialibrary/en/white-paper/2021-cyber-threat-report.pdf [Accessed: 21-Apr 2022].
    [3] AV-TEST, “AV-ATLAS analyzes,” Available: https://portal.av-atlas.org/ [Accessed: 5-Mar 2022].
    [4] Y. Ye, T. Li, D. Adjeroh, and S. S. Iyengar, “A survey on malware detection using data mining techniques,” ACM Computing Surveys, vol.50, no.3, pp.1–40, 2018.
    [5] C. B. Huidobro, D. Cordero, C. Cubillos, H. A. Cid and C. C. Barragán, “Obfuscation procedure based on the insertion of the dead code in the crypter by binary search,” Proceedings of the International Conference on Computers Communications and Control, pp.183-192, 2018.
    [6] A. Afianian, S. Niksefat, B. Sadeghiyan, and D. Baptiste, “Malware dynamic analysis evasion techniques: A survey”, ACM Computing Surveys, vol.52, no.6, pp.1-28, 2020.
    [7] N. Naik, P. Jenkins, R. Cooke, J. Gillett and Y. Jin, “Evaluating Automatically Generated YARA Rules and Enhancing Their Effectiveness,” 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1146-1153, 2020.
    [8] N. Naik, P. Jenkins, N. Savage, L. Yang, T. Boongoen, and N. Iam-On, “Fuzzy-import hashing: A static analysis technique for malware detection,” Forensic Science International: Digital Investigation, vol. 37, no. 301139, p. 301139, 2021.
    [9] J. Kornblum, “Identifying almost identical files using context triggered piecewise hashing,” Digital Investigation, vol. 3, pp. 91–97, 2006.
    [10] V. Roussev, “Data fingerprinting with similarity digests,” in Advances in Digital Forensics VI, pp. 207–226, 2010.
    [11] F. Breitinger, K. P. Astebøl, H. Baier and C. Busch, “mvHash-B - A New Approach for Similarity Preserving Hashing,” Proceedings of the International Conference on IT Security Incident Management and IT Forensics, pp.33-44, 2013.
    [12] Ö. A. Aslan and R. Samet, “A Comprehensive Review on Malware Detection Approaches,” IEEE Access, vol.8, pp.6249-6271, 2020.
    [13] Z. Guo, W. Zhang, W. Yang, X. Che, Z. Zhang, and M. Li, “A survey on feature extraction methods of heuristic backdoor detection,” Proceedings of International Conference on Frontiers of Electronics, Information and Computation Technologies, pp.1-7, 2021.
    [14] A. Pektaş and T. Acarman, “Malware classification based on API calls and behaviour analysis”, IET Inf. Secur., vol.12, no.2, pp.107-117, 2018.
    [15] G. Cabau, M. Buhu and C. P. Oprisa, “Malware Classification Based on Dynamic Behavior,” Proceedings of International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp.315-318, 2016.
    [16] S. Sharma, C. Rama Krishna, and S. K. Sahay, “Detection of advanced malware by machine learning techniques,” Advances in Intelligent Systems and Computing, pp.333–342, 2019.
    [17] Z. Ma, H. Ge, Y. Liu, M. Zhao and J. Ma, “A Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms,” IEEE Access, vol.7, pp.21235-21245, 2019.
    [18] G. D’Angelo, M. Ficco and F. Palmieri, “Association rule-based malware classification using common subsequences of API calls”, Applied Soft Computing, vol. 105, pp.1-8, 2021.
    [19] B. Yuan, J. Wang, D. Liu, W. Guo, P. Wu, and X. Bao, “Byte-level malware classification based on markov images and deep learning,” Computers & Security, vol. 92, p.101740, 2020.
    [20] F. Zhong, Z. Chen, M. Xu, G. Zhang, D. Yu and X. Cheng, “Malware-on-the-Brain: Illuminating Malware Byte Codes with Images for Malware Classification,” IEEE Transactions on Computers, 2022. (Early Access)
    [21] D. Gibert, C. Mateu, J. Planes, and R. Vicens, “Using convolutional neural networks for classification of malware represented as images,” J. comput. virol. hacking tech., vol.15, pp.15–28, 2019.
    [22] D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, “Image-Based malware classification using ensemble of CNN architectures (IMCEC),” Comput. Secur., vol. 92, no. 101748, p. 101748, 2020.
    [23] S. Li, Q. Zhou, R. Zhou and Q. Lv, “Intelligent malware detection based on graph convolutional network”, The Journal of Supercomputing, 2021.
    [24] C. Catal, H. Gunduz, and A. Ozcan, “Malware detection based on Graph Attention Networks for intelligent Transportation Systems”, Electronics, vol.10, no.20, 2021.
    [25] H. T. Nguyen, Q. D. Ngo and V. H. Le, "IoT Botnet Detection Approach Based on PSI graph and DGCNN classifier," Proceedings of the IEEE International Conference on Information Communication and Signal Processing, pp.118-122, 2018.
    [26] N. V. Hung, P. Ngoc Dung, T. N. Ngoc, V. Dinh Phai and Q. Shi, "Malware detection based on directed multi-edge dataflow graph representation and convolutional neural network," Proceedings of the 11th International Conference on Knowledge and Systems Engineering, pp.1-5, 2019.
    [27] A. Irshad and M. K. Dutta, “Identification of windows-based malware by dynamic analysis using machine learning algorithm,” in Advances in Intelligent Systems and Computing, pp.207-218, 2021.
    [28] D. Vij, V. Balachandran, T. Thomas and R. Surendran, “GRAMAC: A graph based Android malware classification mechanism”, Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp.156-158, 2020.
    [29] Cuckoo Sandbox tool, [online] Available: https://www.cuckoosandbox.org.
    [30] S. Choi, “Malicious powershell detection using graph convolution network,” applied sciences, vol.11, no.14, p.6429, 2021.
    [31] F. Al Shamsi, W. L. Woon and Z. Aung, “Discovering similarities in malware behaviors by clustering of API call sequences”, Proceedings of the International Conference on Neural Information Processing, pp. 122-133, 2018.
    [32] T. Mikolov, Q. V. Le, and I. Sutskever, “Exploiting similarities among languages for machine translation,” arXiv [cs.CL], 2013.
    [33] R. Řehůřek and P. Sojka, "Software framework for topic modelling with large corpora", Proceedings of the LREC Workshop New Challenges for NLP Frameworks, pp.45-50, 2010.
    [34] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How Powerful are Graph Neural Networks?,” arXiv [cs.LG], 2018.
    [35] T.N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks”, Proceedings of the International Conference on Learning Representations, pp. 1-14, 2017.
    [36] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proc. NIPS, pp.1024–1034, 2017.
    [37] M. Fey and J. E. Lenssen, “Fast graph representation learning with PyTorch Geometric,” arXiv [cs.LG], 2019.
    [38] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler–Lehman graph kernels,” in Journal of Machine Learning Research 12, vol.12, pp. 2539–2561, 2011.
    [39] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end deep learning architecture for graph classification,” Proceedings of AAAI, pp. 1–8, 2018.

    QR CODE