研究生: |
鄧仁豐 Ren-Feng Deng |
論文名稱: |
基於圖神經網路之惡意程式分類與相似性分析 Malware Family Classification and Similarity Based on Graph Neural Networks |
指導教授: |
Jiann-Liang Chen |
口試委員: |
Yau-Hwang Ku 孫雅麗 Yea-li Sun 廖婉君 Wan-jiun Liao 黎碧煌 Bih-Hwang Lee |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 74 |
中文關鍵詞: | 惡意程式家族 、圖神經網路 、暹羅網路 、深度學習 、表徵學習 |
外文關鍵詞: | Malware Families, Graph Neural Networks, Siamese Network, Deep Learning, Representation Learning |
相關次數: | 點閱:559 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
In this era of globalization, information technology and communication technologies are booming, making the computer network rapidly popular in the daily life of the public. The United Nations has even passed a resolution that the use of the Internet is a fundamental human right and a basic need for modern life. Meanwhile, it also called on countries to address information security issues and ensure the freedom of Internet use. However, the rampant epidemic of the COVID-19 has forced the world to telecommute. This has increased the risk of information security for most businesses, and malware is a major threat that can cause serious harm to businesses and individuals alike.
Many large security companies receive many malware samples every day. The continual mutation of malware imposes a burden on malware analysts. Identifying the variants of known malware is an important task. Therefore, this study proposes a malware family identification model that is based on a graph neural network. The function call relationship and the function assembly content are obtained by analyzing the malware to generate a graph that represents the functional structure of the malware. The function of the malware and the calling relationship between the functions are regarded as the nodes and edges of the graph, respectively. In addition, the latent semantics of the assembly code are also learned through the representation learning model, and the functional behavior embedding vector is expressed as the feature of the node.
As well as establishing a multi-classification model for predicting fixed classes, this study also implements a similarity model that is based on a distance metric learning. However, the classification model will be limited when facing new classes and must be retrained with entire dataset. The similarity model is based on measuring the distance between two samples in the vector space, assessing whether they belong to the same class. Besides, it will gradually adjust the distance between the samples during the training process to improve performance. Therefore, when the model needs to be expanded, the similarity model has better flexibility and performance.
Finally, the performance of the similarity model is analyzed, and its output is visualized. The accuracies of the similarity model when applied to a testing dataset and an unseen dataset were 92% and 70.4%, respectively. In summary, according to the data results, the method proposed in this study is better than previous studies.
[1] Businesswire, "Advanced Persistent Threats in 2021: Kaspersky researchers predict new threat angles and attack strategies to come," Available: https://www.businesswire.com/news/home/20201119005817/en/Advanced-Persistent-Threats-in-2021-Kaspersky-Researchers-Predict-New-Threat-Angles-and-Attack-Strategies-to-Come [Accessed: 23-Apr-2021].
[2] Lockheed Martin, "Cyber Kill Chain®," Available: https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-chain.html [Accessed: 30-Apr-2021].
[3] 數位時代, "WannaCry全球網攻滿兩週年:全球百萬台設備仍陷風險,台灣成重災區," Available: https://www.bnext.com.tw/article/53267/wannacry-cybersecurity-twoyears [Accessed: 30-Apr-2021].
[4] Shodan, "EternalBlue vulnerabilities (May 12)," Available: https://www.shodan.io/report/S8dhzrSn [Accessed: 30-Apr-2021].
[5] Symantec, “Threat Landscape Trends – Q3 2020,” Available: https://symantec-enterprise-blogs.security.com/blogs/threat-intelligence/threat-landscape-trends-q3-2020 [Accessed: 30-Apr-2021].
[6] G DATA, "G DATA threat analysis 2020: cyber attacks every second," Available: https://www.gdatasoftware.com/news/2021/02/36663-g-data-threat-analysis-2020-cyber-attacks-every-second [Accessed: 30-Apr-2021].
[7] AV-TEST, "The AV-TEST Security Report 2019/2020," Available: https://www.av-test.org/fileadmin/pdf/security_report/AV-TEST_Security_Report_2019-2020.pdf [Accessed: 30-Apr-2021].
[8] P. Black, I. Gondal, and R. Layton, "A survey of similarities in banking malware behaviours," Computers & Security, vol. 77, pp. 756–772, 2018.
[9] phishingbox, "Verizon Data Breach Investigations Report (DBIR) – 2019," Available: https://www.phishingbox.com/news/phishing-news/verizon-data-breach-investigations-report-dbir-2019 [Accessed: 30-Apr-2021].
[10] H. Darabian, S. Homayounoot, A. Dehghantanha, S. Hashemi, H. Karimipour, R. M. Parizi, and K.-K. R. Choo, "Detecting Cryptomining Malware: a Deep Learning Approach for Static and Dynamic Analysis," Journal of Grid Computing, vol. 18, no. 2, pp. 293–303, 2020.
[11] P. Burnap, R. French, F. Turner, and K. Jones, "Malware classification using self organising feature maps and machine activity data," Computers & Security, vol. 73, pp. 399–410, 2018.
[12] S. Hsiao and D. Kao, "The static analysis of WannaCry ransomware," Proceedings of the 2018 20th International Conference on Advanced Communication Technology (ICACT), pp. 153-158, 2018.
[13] K. Bakour, H. M. Ünver and R. Ghanem, "The Android Malware Static Analysis: Techniques, Limitations, and Open Challenges," Proceedings of the 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp. 586-593, 2018.
[14] A. Afianian, S. Niksefat, B. Sadeghiyan, and D. Baptiste, "Malware Dynamic Analysis Evasion Techniques: A Survey.," ACM Computing Surveys, vol. 52, no. 6, pp. 1–28, 2020.
[15] Any.Run, "ANY.RUN - Interactive Online Malware Sandbox," Available: https://any.run/ [Accessed: 30-Apr-2021].
[16] x64dbg, “x64dbg,” Available: https://x64dbg.com/ [Accessed: 30-Apr-2021].
[17] R. Tahir, "A Study on Malware and Malware Detection Techniques," International Journal of Education and Management Engineering, vol. 8, no. 2, pp. 20–30, 2018.
[18] C. H. Kim, K. E. Kamundala and S. Kang, "Efficiency-Based Comparison on Malware Detection Techniques," Proceedings of the 2018 International Conference on Platform Technology and Service, pp. 1-6, 2018.
[19] J. Kornblum, "Identifying almost identical files using context triggered piecewise hashing," Digital Investigation, vol. 3, pp. 91–97, 2006.
[20] J. Oliver, C. Cheng and Y. Chen, "TLSH -- A Locality Sensitive Hash," Proceedings of the 2013 Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7-13, 2013.
[21] V. Roussev, "Data Fingerprinting with Similarity Digests," Advances in Digital Forensics VI, pp. 207–226, 2010.
[22] P. Black, I. Gondal, P. Vamplew and A. Lakhotia, "Evolved Similarity Techniques in Malware Analysis," Proceedings of the 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 404-410, 2019.
[23] R. Sihwail, K. Omar, and K. A. Zainol Ariffin, "A Survey on Malware Analysis Techniques: Static, Dynamic, Hybrid and Memory Analysis," International Journal on Advanced Science, Engineering and Information Technology, vol. 8, no. 4-2, p. 1662, 2018.
[24] B. Ndibanje, K. Kim, Y. Kang, H. Kim, T. Kim, and H. Lee, "Cross-Method-Based Analysis and Classification of Malicious Behavior by API Calls Extraction," Applied Sciences, vol. 9, no. 2, p. 239, 2019.
[25] Y. Fang, W. Zhang, B. Li, F. Jing, and L. Zhang, "Semi-supervised malware clustering based on the weight of bytecode and API," IEEE Access, vol. 8, pp. 2313–2326, 2020.
[26] W. Han, J. Xue, Y. Wang, L. Huang, Z. Kong, and L. Mao, "MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics," Computers & Security, vol. 83, pp. 208–233, 2019.
[27] R. Taheri, M. Ghahramani, R. Javidan, M. Shojafar, Z. Pooranian, and M. Conti, "Similarity-based Android malware detection using Hamming distance of static binary features," Future Generation Computer Systems, vol. 105, pp. 230–247, 2020.
[28] B.L. Zhao, F.D. Liu, Z. Shan, Y.H. Chen, and J. Liu, "Graph similarity metric using graph convolutional network: Application to malware similarity match," Proceedings of the IEICE TRANSACTIONS on Information and Systems, pp. 1581–1585, 2019.
[29] T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," arXiv: 1609.02907 [cs.LG], 2016.
[30] M. Fan et al., "Graph Embedding Based Familial Analysis of Android Malware using Unsupervised Learning," Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 771-782, 2019.
[31] F. N. Ducau, E. M. Rudd, T. M. Heppner, A. Long, and K. Berlin, "Automatic malware description via attribute tagging and similarity embedding," arXiv: 1905.06262 [cs.LG], 2019.
[32] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, "IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture," Computer Networks, vol. 171, no. 107138, p. 107138, 2020.
[33] Z. Cui, F. Xue, X. Cai, Y. Cao, G. Wang and J. Chen, "Detection of Malicious Code Variants Based on Deep Learning," IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3187-3196, 2018.
[34] S.-C. Hsiao, D.-Y. Kao, Z.-Y. Liu, and R. Tso, "Malware Image Classification Using One-Shot Learning with Siamese Networks," Procedia Computer Science, vol. 159, pp. 1863–1871, 2019.
[35] D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, "Image-Based malware classification using ensemble of CNN architectures (IMCEC)" Computers & Security, vol. 92, p. 101748, 2020.
[36] D. Wang, H. Shu, F. Kang and W. Bu, "A Malware Similarity Analysis Method Based on Network Control Structure Graph," Proceedings of the 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), pp. 295-300, 2020.
[37] E. Amer and I. Zelinka, "A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence," Computers & Security, vol. 92, no. 101760, p. 101760, 2020.
[38] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, "Signature verification using a "Siamese" time delay neural network," Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS'93), pp. 737-744, 1993.
[39] S. H. H. Ding, B. C. M. Fung, and P. Charland, "Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization," Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), pp. 472-489, 2019.
[40] Q. V. Le and T. Mikolov, "Distributed representations of sentences and documents," arXiv: 1405.4053 [cs.CL], pp. II-1188-II–1196, 2014.
[41] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, "How Powerful are Graph Neural Networks?," arXiv: 1810.00826 [cs.LG], 2018.
[42] M. Fey and J. E. Lenssen, "Fast graph representation learning with PyTorch Geometric," arXiv: 1903.02428 [cs.LG], 2019.
[43] W. L. Hamilton, R. Ying, and J. Leskovec, "Inductive representation learning on large graphs," arXiv:1706.02216 [cs.SI], 2017.
[44] O. Vinyals, S. Bengio, and M. Kudlur, "Order Matters: Sequence to sequence for sets," arXiv: 1511.06391 [stat.ML], 2015.
[45] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with Neural Networks," arXiv: 1409.3215 [cs.CL], 2014.
[46] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," arXiv: 1706.03762 [cs.CL], 2017.
[47] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[48] R. Hadsell, S. Chopra and Y. LeCun, "Dimensionality Reduction by Learning an Invariant Mapping," Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 1735-1742, 2006.
[49] J. Oliver, S. Forman, and C. Cheng, "Using Randomization to Attack Similarity Digests," Proceedings of the Applications and Techniques in Information Security, pp. 199–210, 2014.