利用 Beyond-SMOTE 合成數據生成技術解決基於網路流量的惡意軟體檢測中的數據集失衡問題

簡易檢索 / 詳目顯示

回結果列表

研究生：	Andrew Febrian Miyata Andrew Febrian Miyata
論文名稱：	利用 Beyond-SMOTE 合成數據生成技術解決基於網路流量的惡意軟體檢測中的數據集失衡問題 Addressing Dataset Imbalance in Network Traffic-based Malware Detection Using Beyond-SMOTE Synthetic Data Generation
指導教授：	鄭欣明 Shin-Ming Cheng 柯拉飛 Rafael Kaliski
口試委員:	鄭欣明 Shin-Ming Cheng 柯拉飛 Rafael Kaliski 王志宇 Tomky Wang 徐瑞壕 Richard Hsu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	112
語文別：	英文
論文頁數：	76
中文關鍵詞：	GAN 、惡意軟體檢測、網路流量
外文關鍵詞：	GAN, malware detection, network traffic
相關次數：	點閱：28 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

惡意軟體偵測是網路安全的一個重要面向。本研究針對基於影像的惡意軟體偵測所面臨的挑戰，並認識到惡意軟體資料集的不平衡性，研究了合成資料生成方法的應用。這項研究涉及資料集的選擇和預處理，將原始資料轉化為適合模型訓練的影像。使用卷積神經網路（CNN）對網路流量進行分類。

為了緩解類別不平衡問題，使用了過度採樣技術、SMOTE、GAN、WGAN 和 DCGAN 來減少模型對少數類別的偏差。此外，該研究還引入了一種新方法，透過修改 DCGAN 來改進惡意軟體檢測。 DCGAN 有助於資料擴增、豐富訓練集和增強模型對不同惡意軟體實例的穩健性。

模擬結果證明了所提方法的有效性，展示了 CNN 準確檢測惡意軟體的能力。事實證明，過度採樣技術有助於解決不平衡資料集的問題，而 GAN 增強模型在辨別微妙的惡意軟體特徵方面表現出更高的效能。這項研究利用先進技術提高了檢測的準確性，為不斷發展的惡意軟體檢測提供了見解。研究結果為提高基於影像的惡意軟體檢測模型的穩健性提供了全面的方法和新穎的途徑，從而為該領域做出了貢獻。

Malware detection is a critical aspect of cybersecurity. This research addresses the challenges in image-based malware detection, recognizing the imbalanced nature of malware datasets, the study investigates the application of synthetic data generation methods. This reserach involves selection and preprocessing of datasets, transforming raw data into images suitable for model training. Using convolutional neural networks (CNNs) to classify the network traffic.

To mitigate class imbalance, an over-sampling technique,SMOTE, GAN, WGAN, and DCGAN are used to reduce model bias againts minority class. Furthermore, the research introduces a novel approach by modifying DCGAN to improve malware detection. DCGAN contribute to data augmentation, enriching the training set and enhancing the model's robustness against varying malware instances.

Simulation results demonstrate the effectiveness of the proposed methodology, showcasing the CNN's capacity for accurate malware detection. The over-sampling technique proves instrumental in addressing imbalanced datasets, while the GAN-enhanced model exhibits improved performance in discerning subtle malware features. This research provides insights into the evolving landscape of malware detection, leveraging advanced techniques to enhance detection accuracy. The findings contribute to the field by offering a comprehensive methodology and novel approaches for improving the robustness of image-based malware detection models.

Recommendation Letter i
Approval Letter ii
Abstract in Chinese iii
Abstract in English iv
Acknowledgements v
Contents vi
List of Figures viii
List of Tables ix
Chapter I Introduction 1
1.1 Background 1
1.2 Research Contributions 3
Chapter II Literature Review 6
2.1 Evolution of Malware Detection Techniques 6
2.2 Challenges in Image-Based Malware Detection 9
2.3 Synthetic Data Generation for Imbalanced Datasets 11
2.4 Previous Malware Detection Techniques 13
2.5 Malware Detection Using Convolutional Neural Networks (CNN) 14
Chapter III Methodology 17
3.1 Dataset Selection 17
3.2 Converting Raw Data to Images 23
3.3 Preprocessing for Model Input 31
Chapter IV Simulation and Results 33
4.1 CNN Model Architecture 33
4.2 Over-sampling Technique 39
4.3 GAN improvement for malware detection 51
Chapter V Conclusions 71
5.1 Future Work 71
References 73
                                

[1] F. Meneghello, M. Calore, D. Zucchetto, M. Polese, and A. Zanella, “Iot: Internet of threats? asurvey of practical security vulnerabilities in real iot devices,” IEEE Internet of Things Journal, vol. 6,pp. 8182–8201, 10 2019.
[2] A. Zerouali, T. Mens, G. Robles, and J. M. Gonzalez-Barahona, “On the relation between outdateddocker containers, severity vulnerabilities, and bugs,” SANER 2019 - Proceedings of the 2019 IEEE26th International Conference on Software Analysis, Evolution, and Reengineering, pp. 491–501, 32019.
[3] T. OConnor, D. Jessee, and D. Campos, “Towards examining the security cost of inexpensive smarthome iot devices,” in 2023 IEEE 47th Annual Computers, Software, and Applications Conference(COMPSAC), pp. 1293–1298, 2023.
[4] C. Haar and E. Buchmann, “Fane: A firewall appliance for the smart home,” Proceedings of the 2019Federated Conference on Computer Science and Information Systems, FedCSIS 2019, pp. 449–458,9 2019.
[5] R. Mohan, A. Yazidi, B. Feng, and B. J. Oommen, “Dynamic ordering of firewall rules using a novelswapping window-based paradigm,” ACM International Conference Proceeding Series, pp. 11–20,11 2016.
[6] Y. M. P. Pa, S. Tanizaki, T. Kou, M. V. Eeten, K. Yoshioka, and T. Matsumoto, “An attacker’s dream?exploring the capabilities of chatgpt for developing malware,” ACM International Conference Pro-ceeding Series, pp. 10–18, 8 2023.
[7] S. Huang and K. Lei, “Igan-ids: An imbalanced generative adversarial network towards intrusiondetection system in ad-hoc networks,” Ad Hoc Networks, vol. 105, p. 102177, 8 2020.
[8] M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Edge-iiotset: A new comprehen-sive realistic cyber security dataset of iot and iiot applications for centralized and federated learning,”IEEE Access, vol. 10, pp. 40281–40306, 2022.
[9] M. R. Longadge, M. Snehlata, S. Dongre, and D. L. Malik, “Class imbalance problem in data miningreview,” International Journal of Computer Science and Network, vol. 2, 5 2013.
[10] S. Machmeier and V. Heuveline, “heifip: A network traffic image converter,” Sept. 2023.
[11] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” 2017.
[12] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutionalgenerative adversarial networks,” 2016.
[13] S. Garcia, A. Parmisano, and M. J. Erquiaga, “IoT-23: A labeled dataset with malicious and benignIoT network traffic,” May 2021.
[14] N. Moustafa and J. Slay, “Unsw-nb15: a comprehensive data set for network intrusion detectionsystems (unsw-nb15 network data set),” in 2015 Military Communications and Information SystemsConference (MilCIS), pp. 1–6, 2015.
[15] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh, “A survey on heuristic malware detectiontechniques,” in The 5th Conference on Information and Knowledge Technology, pp. 113–120, 2013.
[16] F. Mercaldo and A. Santone, “Deep learning for image-based mobile malware detection,” Journal ofComputer Virology and Hacking Techniques, vol. 16, pp. 157–171, 6 2020.
[17] W. Zhou, S. Gao, L. Zhang, and X. Lou, “Histogram of oriented gradients feature extraction from rawbayer pattern images,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 5,pp. 946–950, 2020.
[18] W. L. Tan and T. Truong-Huu, “Enhancing robustness of malware detection using synthetically-adversarial samples,” in GLOBECOM 2020 - 2020 IEEE Global Communications Conference, pp. 1–6, 2020.
[19] J. Jeon, J. H. Park, and Y.-S. Jeong, “Dynamic analysis for iot malware detection with convolutionneural network model,” IEEE Access, vol. 8, pp. 96899–96911, 2020.
[20] W. Niu, Z. Zhuo, X. Zhang, X. Du, G. Yang, and M. Guizani, “A heuristic statistical testing basedapproach for encrypted network traffic identification,” IEEE Transactions on Vehicular Technology,vol. 68, no. 4, pp. 3843–3853, 2019.
[21] N. A. A. Bahar, I. Ismail, and S. Sadiah, “Malicious traffic classification using hybrid heuristic-payload based technique with machine learning,” AIP Conference Proceedings, vol. 2795, p. 050002,05 2023.
[22] W. Wang, M. Zhu, X. Zeng, X. Ye, and Y. Sheng, “Malware traffic classification using convolu-tional neural network for representation learning,” in 2017 International Conference on InformationNetworking (ICOIN), pp. 712–717, 2017.
[23] M. Hassan, M. E. Haque, M. E. Tozal, V. Raghavan, and R. Agrawal, “Intrusion detection usingpayload embeddings,” IEEE Access, vol. 10, pp. 4015–4030, 2022.
[24] Y. A. Farrukh, S. Wali, I. Khan, and N. D. Bastian, “Senet-i: An approach for detecting network intru-sions through serialized network traffic images,” Engineering Applications of Artificial Intelligence,vol. 126, p. 107169, 2023.
[25] A. Aggarwal, M. Mittal, and G. Battineni, “Generative adversarial network: An overview of theoryand applications,” International Journal of Information Management Data Insights, vol. 1, no. 1,p. 100004, 2021.
[26] Y. Zhang, H. Li, Y. Zheng, S. Yao, and J. Jiang, “Enhanced dnns for malware classification with gan-based adversarial training,” Journal of Computer Virology and Hacking Techniques, vol. 17, pp. 153–163, 6 2021.
[27] C. A. Fadhilla, M. D. Alfikri, and R. Kaliski, “Lightweight meta-learning botnet attack detection,”IEEE Internet of Things Journal, vol. 10, no. 10, pp. 8455–8466, 2023.
[28] X. Wu, W. Guo, J. Yan, B. Coskun, and X. Xing, “From grim reality to practical solution: Malwareclassification in real-world noise,” in 2023 IEEE Symposium on Security and Privacy (SP), pp. 2602–2619, 2023.
[29] S. Dambra, Y. Han, S. Aonzo, P. Kotzias, A. Vitale, J. Caballero, D. Balzarotti, and L. Bilge, “De-coding the secrets of machine learning in malware classification: A deep dive into datasets, featureextraction, and model performance,” in Proceedings of the 2023 ACM SIGSAC Conference on Com-puter and Communications Security, CCS ’23, (New York, NY, USA), p. 60–74, Association forComputing Machinery, 2023.
[30] R. Zhu, A. Loeffler, J. Hochstetter, A. Diaz-Alvarez, T. Nakayama, A. Stieg, J. Gimzewski, J. Lizier,and Z. Kuncic, “Mnist classification using neuromorphic nanowire networks,” in International Con-ference on Neuromorphic Systems 2021, ICONS 2021, (New York, NY, USA), Association for Com-puting Machinery, 2021.
[31] P. Thanapol, K. Lavangnananda, P. Bouvry, F. Pinel, and F. Leprévost, “Reducing overfitting andimproving generalization in training convolutional neural network (cnn) under limited sample sizesin image recognition,” in 2020 - 5th International Conference on Information Technology (InCIT),pp. 300–305, 2020.
[32] Q. Li, M. Yan, and J. Xu, “Optimizing convolutional neural network performance by mitigating un-derfitting and overfitting,” in 2021 IEEE/ACIS 19th International Conference on Computer and In-formation Science (ICIS), pp. 126–131, 2021.
[33] Z. Dai and R. Heckel, “Channel normalization in convolutional neural network avoids vanishing gra-dients,” 2019.
[34] O. Daanouni, B. Cherradi, and A. Tmiri, “Nsl-mha-cnn: A novel cnn architecture for robust diabeticretinopathy prediction against adversarial attacks,” IEEE Access, vol. 10, pp. 103987–103999, 2022.
[35] L. N. Vu and S. Jung, “Admat: A cnn-on-matrix approach to android malware detection and classifi-cation,” IEEE Access, vol. 9, pp. 39680–39694, 2021.
[36] T. Z. Project, “The zeek network security monitor,” 2020. Accessed: 10 January 2024.
[37] H. Nguyen, F. Di Troia, G. Ishigaki, and M. Stamp, “Generative adversarial networks and image-basedmalware classification,” Journal of Computer Virology and Hacking Techniques, pp. 1–17, 2023.
[38] J. C. Kimmel, A. D. Mcdole, M. Abdelsalam, M. Gupta, and R. Sandhu, “Recurrent neural networksbased online behavioural malware detection techniques for cloud infrastructure,” IEEE Access, vol. 9,pp. 68066–68080, 2021.
[39] I. Dychka, D. Chernyshev, I. Tereikovskyi, L. Tereikovska, and V. Pogorelov, “Malware detectionusing artificial neural networks,” in Advances in Computer Science for Engineering and Education II(Z. Hu, S. Petoukhov, I. Dychka, and M. He, eds.), (Cham), pp. 3–12, Springer International Publish-ing, 2020.
[40] D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, “Image-based malware classification usingensemble of cnn architectures (imcec),” Computers & Security, vol. 92, p. 101748, 2020.
[41] A. Wolsey, “The state-of-the-art in ai-based malware detection techniques: A review,” 2022.
[42] A. Bensaoud, N. Abudawaood, and J. Kalita, “Classifying malware images with convolutional neuralnetwork models,” 10 2020.
[43] D. Jha, A. Yazidi, M. A. Riegler, D. Johansen, H. D. Johansen, and P. Halvorsen, “Lightlayers: Pa-rameter efficient dense and convolutional layers for image classification,” 2021.
[44] M. Mukherjee and M. Khushi, “Smote-enc: A novel smote-based method to generate synthetic datafor nominal and continuous features,” Applied System Innovation, vol. 4, p. 18, Mar. 2021.
[45] H. Sahlaoui, E. A. A. Alaoui, S. Agoujil, and A. Nayyar, “An empirical assessment of smote variantstechniques and interpretation methods in improving the accuracy and the interpretability of studentperformance models,” Education and Information Technologies, pp. 1–37, 2023.
[46] C. Dewi, R.-C. Chen, and Y.-T. Liu, “Wasserstein generative adversarial networks for realistic trafficsign image generation,” in Intelligent Information and Database Systems (N. T. Nguyen, S. Chittaya-sothorn, D. Niyato, and B. Trawiński, eds.), (Cham), pp. 479–493, Springer International Publishing,2021.
[47] Z. Ding, S. Jiang, and J. Zhao, “Take a close look at mode collapse and vanishing gradient in gan,” in2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information(ICETCI), pp. 597–602, 2022.

全文公開日期 2034/02/05 (校外網路)
全文公開日期 2025/02/05 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文