研究生: |
鍾昌霖 Paul Elijah Setiasabda |
---|---|
論文名稱: |
Study of Malware Detection Based on Deep Learning Algorithms Study of Malware Detection Based on Deep Learning Algorithms |
指導教授: |
呂政修
Jenq-Shiou Leu |
口試委員: |
方文賢
Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen 陳省隆 Hsing-Lung Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | Machine Learning 、Deep Learning 、Malware Detection 、Malware Images 、Convolutional Neural Network (CNN) |
外文關鍵詞: | Machine Learning, Deep Learning, Malware Detection, Malware Images, Convolutional Neural Network (CNN) |
相關次數: | 點閱:272 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
This research aims to investigate the effectiveness of a few deep learning algorithms to detect malware based on malware images. Every year, there has been a huge increase of malwares found in numerous network systems. This adversely causes a lot of damage in terms of finance and privacy. Malware sophistication has only improved over the years where the creators would obfuscate the code which cannot be detected swiftly with signature-based and heuristic method, thus bringing the need of a new approach, where malware images method came up. Initially, malware data would be converted into malware images and then be fed into machine mearning or deep learning architectures. A few algorithms were trained and tested including the Multilayer Perceptron (MLP), the Convolutional Neural Network (CNN), the CNN Long-Short Term Memory (CNN-LSTM) and the CNN Support Vector Machines (CNN-SVM). This study presents the CNN as the most suitable for malware images with accuracy that can reach 97.3% in one dataset with training time of only 43 seconds. It demonstrates this approach to be the most suitable compared to the other methods.
This research aims to investigate the effectiveness of a few deep learning algorithms to detect malware based on malware images. Every year, there has been a huge increase of malwares found in numerous network systems. This adversely causes a lot of damage in terms of finance and privacy. Malware sophistication has only improved over the years where the creators would obfuscate the code which cannot be detected swiftly with signature-based and heuristic method, thus bringing the need of a new approach, where malware images method came up. Initially, malware data would be converted into malware images and then be fed into machine mearning or deep learning architectures. A few algorithms were trained and tested including the Multilayer Perceptron (MLP), the Convolutional Neural Network (CNN), the CNN Long-Short Term Memory (CNN-LSTM) and the CNN Support Vector Machines (CNN-SVM). This study presents the CNN as the most suitable for malware images with accuracy that can reach 97.3% in one dataset with training time of only 43 seconds. It demonstrates this approach to be the most suitable compared to the other methods.
1. J. Landage and P. M. P. Wankhade, “Malware and Malware Detection Techniques: A Survey,” International Journal of Engineering Research & Technology, vol. 2, no. 12, Nov. 2013.
2. J. J. Blount, “Adaptive rule-based malware detection employing learning classifier systems,” thesis, 2011.
3. R. Sharp, “An Introduction to Malware.” [Online]. Available: https://orbit.dtu.dk/files/4918204/malware.pdf. [Accessed: 15-Nov-2020]
4. Vinod, P., et al., Survey on Malware Detection Methods. 2009.
5. K. Mathur and S. Hiranwal, “A Survey on Techniques in Detection and Analyzing Malware Executables,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 4, Apr. 2013.
6. Llauradó Daniel Gibert and Alonso Javier Béjar, “Convolutional neural networks for malware classification,” thesis, 2016.
7. Kaspersky, “What is a Botnet?,” www.kaspersky.com, 13-Jan-2021. [Online]. Available: https://www.kaspersky.com/resource-center/threats/botnet-attacks. [Accessed: 13-Jan-2021]
8. MalwareBytes, “State of Malware Report”, 2017. Available: https://www.malwarebytes.com/pdf/white-papers/stateofmalware.pdf
9. “The Evolution of Anti-Virus Software & How MSSPs Have Adapted,” Cerdant, 22-Jun-2020. [Online]. Available: https://www.cerdant.com/the-evolution-of-anti-virus-software-how-mssps-have-adapted/. [Accessed: 15-Nov-2020].
10. I. You and K. Yim, “Malware Obfuscation Techniques: A Brief Survey”, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications, Fukuoka, Japan, November 4-6, 2010.
11. L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath. Malware images: Visualization and automatic classification, 2011.
12. “MaleVis: A Dataset for Vision Based Malware Recognition,” MaleVis Dataset Home Page. [Online]. Available: https://web.cs.hacettepe.edu.tr/~selman/malevis/. [Accessed: 20-Nov-2020].
13. Marketing and E. Corporation, “PyTorch vs TensorFlow in 2020: What You Should Know,” Exxact, 30-Jan-2020. [Online]. Available: https://blog.exxactcorp.com/pytorch-vs-tensorflow-in-2020-what-you-should-know-about-these-frameworks/. [Accessed: 18-Nov-2020].
14. Anil Thomas Nikos Karampatziakis, Jack Stokes and Mady Marinescu. Using file relationships in malware classification. Detection of Intrusions and Malware, and Vulnerability Assessment, 7591:1–20, 2013.
15. “Deep Learning Spreads,” Semiconductor Engineering, 06-Feb-2018. [Online]. Available: https://semiengineering.com/deep-learning-spreads/. [Accessed: 20-Nov-2020].
16. “Convolutional Neural Networks cheatsheet Star,” CS 230 - Convolutional Neural Networks Cheatsheet. [Online]. Available: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks. [Accessed: 20-Nov-2020].
17. J. Brownlee, “Crash Course On Multi-Layer Perceptron Neural Networks,” Machine Learning Mastery, 14-Aug-2020. [Online]. Available: https://machinelearningmastery.com/neural-networks-crash-course/. [Accessed: 20-Nov-2020].
18. “Understanding LSTM Networks,” Understanding LSTM Networks -- colah's blog. [Online]. Available: https://colah.github.io/posts/2015-08-Understanding-LSTMs/. [Accessed: 20-Nov-2020].
19. G.I. Webb, Overfitting, in: C. Sammut, G.I. Webb (Eds.), Encyclopedia of Machine Learning, Springer, Boston, 2010, p. 744. https://doi.org/10.1007/978-0-387-30164-8_623.
20. Quoc V. Le Prajit Ramachandran Barret Zoph. Swish: a Self-Gated activation function. 2017
21. “A hyperparameter optimization framework” Optuna. [Online]. Available: https://optuna.readthedocs.io/en/stable/. [Accessed: 20-Nov-2020].
22. “Why TensorFlow,” TensorFlow. [Online]. Available: https://www.tensorflow.org/about. [Accessed: 20-Nov-2020].
23. Konstantinos Kosmidis and Christos Kalloniatis. Machine Learning and Images for Malware Detection and Classification. In Proceedings of the 21st Pan-Hellenic Conference on Informatics, PCI 2017, New York, NY, USA, 2017. Association for Computing Machinery.
24. Md. Zabirul Islam, Md. Milon Islam, Amanullah Asraf, "A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images", Informatics in Medicine Unlocked, Volume 20, 2020, 100412, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2020.100412.