簡易檢索 / 詳目顯示

研究生: 鍾昌霖
Paul Elijah Setiasabda
論文名稱: Study of Malware Detection Based on Deep Learning Algorithms
Study of Malware Detection Based on Deep Learning Algorithms
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
陳省隆
Hsing-Lung Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 57
中文關鍵詞: Machine LearningDeep LearningMalware DetectionMalware ImagesConvolutional Neural Network (CNN)
外文關鍵詞: Machine Learning, Deep Learning, Malware Detection, Malware Images, Convolutional Neural Network (CNN)
相關次數: 點閱:241下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This research aims to investigate the effectiveness of a few deep learning algorithms to detect malware based on malware images. Every year, there has been a huge increase of malwares found in numerous network systems. This adversely causes a lot of damage in terms of finance and privacy. Malware sophistication has only improved over the years where the creators would obfuscate the code which cannot be detected swiftly with signature-based and heuristic method, thus bringing the need of a new approach, where malware images method came up. Initially, malware data would be converted into malware images and then be fed into machine mearning or deep learning architectures. A few algorithms were trained and tested including the Multilayer Perceptron (MLP), the Convolutional Neural Network (CNN), the CNN Long-Short Term Memory (CNN-LSTM) and the CNN Support Vector Machines (CNN-SVM). This study presents the CNN as the most suitable for malware images with accuracy that can reach 97.3% in one dataset with training time of only 43 seconds. It demonstrates this approach to be the most suitable compared to the other methods.


    This research aims to investigate the effectiveness of a few deep learning algorithms to detect malware based on malware images. Every year, there has been a huge increase of malwares found in numerous network systems. This adversely causes a lot of damage in terms of finance and privacy. Malware sophistication has only improved over the years where the creators would obfuscate the code which cannot be detected swiftly with signature-based and heuristic method, thus bringing the need of a new approach, where malware images method came up. Initially, malware data would be converted into malware images and then be fed into machine mearning or deep learning architectures. A few algorithms were trained and tested including the Multilayer Perceptron (MLP), the Convolutional Neural Network (CNN), the CNN Long-Short Term Memory (CNN-LSTM) and the CNN Support Vector Machines (CNN-SVM). This study presents the CNN as the most suitable for malware images with accuracy that can reach 97.3% in one dataset with training time of only 43 seconds. It demonstrates this approach to be the most suitable compared to the other methods.

    CONTENTS ABSTRACT ii ACKNOWLEDGEMENTS iii LIST OF FIGURES v LIST OF TABLES vii LIST OF EQUATIONS viii CHAPTER 1 INTRODUCTION 1 1.1 Research Background 1 1.2 Research Objectives 6 1.3 Research Scope and Limitations 6 1.4 Outline and Report 7 CHAPTER 2 RELATED WORKS 8 2.1 Malware Images 8 2.2 Machine Learning and Deep Learning 9 2.2.1 Convolutional Neural Network (CNN) 11 2.2.2 Multi-Layers Perceptron (MLP) 11 2.2.3 Long Short-Term Memory (LSTM) 12 2.2.4 Overfitting 13 2.2.5 Swish Activation Function 13 2.2.6 Optuna 14 2.2.7 Tensorflow 2 14 2.3 Evaluation Metrics 15 2.3.1 Training Time 15 2.3.2 Accuracy 15 2.3.3 F1-Score 15 2.4 Related Research 16 CHAPTER 3 PROPOSED METHOD 17 3.1 Data Collection 17 3.2 Data Pre-Processing 18 3.3 Model Training 18 3.3.1 CNN Architecture 18 3.3.3 CNN-LSTM 20 3.3.4 Daniel Gilbert’s CNN 20 3.4 Validation 21 3.5 Optimization 21 CHAPTER 4 EXPERIMENT AND RESULT 23 4.1 Dataset 23 4.2 Experiment Result 25 4.2.1 Malimg Dataset 25 4.2.2 Malevis Dataset 28 CHAPTER 5 EVALUATION AND DISCUSSION 30 5.1 Evaluation Result 30 5.1.1 Accuracy 30 5.1.2 F1-Score 31 5.1.3 Training time 34 5.2 Discussion 35 CHAPTER 6 CONCLUSION AND FUTURE RESEARCH 37 6.1 Conclusion 37 6.2 Future Research 37 REFERENCES 38 APPENDIX A : Precision 41 APPENDIX B : Recall 43

    1. J. Landage and P. M. P. Wankhade, “Malware and Malware Detection Techniques: A Survey,” International Journal of Engineering Research & Technology, vol. 2, no. 12, Nov. 2013.
    2. J. J. Blount, “Adaptive rule-based malware detection employing learning classifier systems,” thesis, 2011.
    3. R. Sharp, “An Introduction to Malware.” [Online]. Available: https://orbit.dtu.dk/files/4918204/malware.pdf. [Accessed: 15-Nov-2020]
    4. Vinod, P., et al., Survey on Malware Detection Methods. 2009.
    5. K. Mathur and S. Hiranwal, “A Survey on Techniques in Detection and Analyzing Malware Executables,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 4, Apr. 2013.
    6. Llauradó Daniel Gibert and Alonso Javier Béjar, “Convolutional neural networks for malware classification,” thesis, 2016.
    7. Kaspersky, “What is a Botnet?,” www.kaspersky.com, 13-Jan-2021. [Online]. Available: https://www.kaspersky.com/resource-center/threats/botnet-attacks. [Accessed: 13-Jan-2021]
    8. MalwareBytes, “State of Malware Report”, 2017. Available: https://www.malwarebytes.com/pdf/white-papers/stateofmalware.pdf
    9. “The Evolution of Anti-Virus Software & How MSSPs Have Adapted,” Cerdant, 22-Jun-2020. [Online]. Available: https://www.cerdant.com/the-evolution-of-anti-virus-software-how-mssps-have-adapted/. [Accessed: 15-Nov-2020].
    10. I. You and K. Yim, “Malware Obfuscation Techniques: A Brief Survey”, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications, Fukuoka, Japan, November 4-6, 2010.
    11. L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath. Malware images: Visualization and automatic classification, 2011.
    12. “MaleVis: A Dataset for Vision Based Malware Recognition,” MaleVis Dataset Home Page. [Online]. Available: https://web.cs.hacettepe.edu.tr/~selman/malevis/. [Accessed: 20-Nov-2020].
    13. Marketing and E. Corporation, “PyTorch vs TensorFlow in 2020: What You Should Know,” Exxact, 30-Jan-2020. [Online]. Available: https://blog.exxactcorp.com/pytorch-vs-tensorflow-in-2020-what-you-should-know-about-these-frameworks/. [Accessed: 18-Nov-2020].
    14. Anil Thomas Nikos Karampatziakis, Jack Stokes and Mady Marinescu. Using file relationships in malware classification. Detection of Intrusions and Malware, and Vulnerability Assessment, 7591:1–20, 2013.
    15. “Deep Learning Spreads,” Semiconductor Engineering, 06-Feb-2018. [Online]. Available: https://semiengineering.com/deep-learning-spreads/. [Accessed: 20-Nov-2020].
    16. “Convolutional Neural Networks cheatsheet Star,” CS 230 - Convolutional Neural Networks Cheatsheet. [Online]. Available: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks. [Accessed: 20-Nov-2020].
    17. J. Brownlee, “Crash Course On Multi-Layer Perceptron Neural Networks,” Machine Learning Mastery, 14-Aug-2020. [Online]. Available: https://machinelearningmastery.com/neural-networks-crash-course/. [Accessed: 20-Nov-2020].
    18. “Understanding LSTM Networks,” Understanding LSTM Networks -- colah's blog. [Online]. Available: https://colah.github.io/posts/2015-08-Understanding-LSTMs/. [Accessed: 20-Nov-2020].
    19. G.I. Webb, Overfitting, in: C. Sammut, G.I. Webb (Eds.), Encyclopedia of Machine Learning, Springer, Boston, 2010, p. 744. https://doi.org/10.1007/978-0-387-30164-8_623.
    20. Quoc V. Le Prajit Ramachandran Barret Zoph. Swish: a Self-Gated activation function. 2017
    21. “A hyperparameter optimization framework” Optuna. [Online]. Available: https://optuna.readthedocs.io/en/stable/. [Accessed: 20-Nov-2020].
    22. “Why TensorFlow,” TensorFlow. [Online]. Available: https://www.tensorflow.org/about. [Accessed: 20-Nov-2020].
    23. Konstantinos Kosmidis and Christos Kalloniatis. Machine Learning and Images for Malware Detection and Classification. In Proceedings of the 21st Pan-Hellenic Conference on Informatics, PCI 2017, New York, NY, USA, 2017. Association for Computing Machinery.
    24. Md. Zabirul Islam, Md. Milon Islam, Amanullah Asraf, "A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images", Informatics in Medicine Unlocked, Volume 20, 2020, 100412, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2020.100412.

    QR CODE