簡易檢索 / 詳目顯示

研究生: 康帷晟
Wei-Cheng Kang
論文名稱: 使用基於優化的資料增強方法提升物聯網惡意軟體變體檢測的穩健性
Improving Robustness against Detection of IoT Malware Variants Using Optimization-based Data Augmentation
指導教授: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
口試委員: 游家牧
Chia-Mu Yu
黃意婷
Yi-Ting Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 51
中文關鍵詞: 物聯網惡意軟體檢測惡意軟體變體資料增強穩健性人工智慧
外文關鍵詞: IoT malware detection, malware variants, data augmentation, Robustness, artificial intelligence
相關次數: 點閱:509下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著物聯網設備的快速興起,這些設備成為惡意軟體攻擊的目標。其中,攻擊者常通過修改現有的惡意軟體來創建新的變體,導致大量惡意軟體變體的出現,而這些多樣的變體往往能夠規避檢測器的偵測。當檢測器基於不同的特徵給出相互矛盾的結果時,這對惡意軟體的正確標籤造成了挑戰。
    為提高檢測器在面對惡意軟體變體時的穩健性,本研究提出了一種新的資料增強方法,將特徵向量空間上的資料增強過程設計成一個優化問題,並融入了三個精心設計的懲罰函數。我們不對生成的增強資料設定一個明確約束,而是透過懲罰函數有效地調整和優化生成的增強資料,並採用BFGS算法進行優化,提升了增強資料的處理效率。這種方法使增強資料能夠模擬會被錯誤分類的變體。同時,這對於應對多樣化的惡意軟體變體至關重要。
    通過一系列與現有方法的比較實驗,我們證實了這一策略的有效性。實驗結果顯示,這種方法有效提高了模型在不同靜態特徵檢測任務上的通用性與面對變體的穩健性,同時保持了高效的增強資料生成。我們進行的消融實驗也確認了各個懲罰函數對提升模型效能的貢獻。總結來說,我們的研究為物聯網惡意軟體變體檢測提供了一個有效的新工具。


    With the rapid rise of Internet of Things (IoT) devices, these devices have become prime targets for malware attacks. Attackers often create new variants by modifying existing malware, resulting in a proliferation of diverse variants that can evade detection mechanisms. The inconsistency in detection results based on different features poses a significant challenge to the accurate labeling of malware.
    To improve the robustness of detectors against malware variants, we proposed a novel data augmentation method. The augmentation process is formulated as an optimization problem, subject to three meticulously designed penalty functions. Instead of imposing explicit constraints, the penalty functions effectively adjust and optimize the augmented data. We use the BFGS algorithm to optimize and enhance the processing efficiency of the augmented data. This approach enables the augmented data to simulate variants, which is crucial for addressing the diversity of malware variants.
    Through comparative experiments with existing methods, we demonstrate the effectiveness of our strategy. Results show significant improvements in model generalizability and robustness against variants in static feature detection tasks, while maintaining efficient data generation. Ablation studies confirm the contributions of each penalty function. In summary, our research offers a new effective tool for detecting IoT malware variants.

    中文摘要 i ABSTRACT ii 誌謝 iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Background and Related Work 9 2.1 IoT Malware Variants . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 IoT Static Malware Detection . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Data Augmentation Approaches in Malware Detection . . . . . . . . 11 2.4 BFGS Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . 12 3 Proposed Method 14 3.1 Optimization Process . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.1 To Simulate Misclassified Malware Variants . . . . . . . . . 18 3.1.2 Retain the Original Performance . . . . . . . . . . . . . . . . 19 3.2 Augmented Data Optimization . . . . . . . . . . . . . . . . . . . . . 20 4 Experimental Setup 22 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Model Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Experiments of Hyperparameters . . . . . . . . . . . . . . . . . . . . 24 5 Evaluation 26 5.1 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.3 Comparison with other methods . . . . . . . . . . . . . . . . . . . . 30 6 Limitations & Future Work 32 6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7 Conclusion 35

    [1] A. D. Raju, I. Y. Abualhaol, R. S. Giagone, Y. Zhou, and S. Huang, “A survey on cross-architectural IoT malware threat hunting,” IEEE Access, vol. 9, pp. 91 686–91 709, Jun. 2021.
    [2] A. D. Jurcut, P. Ranaweera, and L. Xu, “Introduction to iot security,” IoT security:advances in authentication, pp. 27–64, 2020.
    [3] S. Talukder and Z. Talukder, “A survey on malware detection and analysis tools,”IJNSA, vol. 12, 2020.
    [4] Q.-D. Ngo, H.-T. Nguyen, V.-H. Le, and D.-H. Nguyen, “A survey of iot malware and detection methods based on static features,” ICT express, vol. 6, no. 4, pp.280–286, 2020.
    [5] T.-L. Wan, T. Ban, S.-M. Cheng, Y.-T. Lee, B. Sun, R. Isawa, T. Takahashi,
    and D. Inoue, “Efficient detection and classification of internet-of-things mal-
    ware based on byte sequences from executable files,” IEEE Open Journal of the Computer Society, vol. 1, pp. 262–275, 2020.
    [6] O. P. Samantray and S. N. Tripathy, “Iot-malware classification model using byte sequences and supervised learning techniques,” in Next Generation of Internet of Things: Proceedings of ICNGIoT 2021. Springer, 2021, pp. 51–60.
    [7] B. Jung, T. Kim, and E. G. Im, “Malware classification using byte sequence information,” in Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018, pp. 143–148.
    [8] A. S. Kale, V. Pandya, F. Di Troia, and M. Stamp, “Malware classification with
    word2vec, hmm2vec, bert, and elmo,” Journal of Computer Virology and Hacking Techniques, vol. 19, no. 1, pp. 1–16, 2023.
    [9] H. Lee, S. Kim, D. Baek, D. Kim, and D. Hwang, “Robust iot malware detection and classification using opcode category features on machine learning,” IEEE Access, vol. 11, pp. 18 855–18 867, 2023.
    [10] İ. Gülataş, H. H. Kılınç, M. A. Aydın, and A. H. Zaim, “Iot malware detection based on opcode purification.” Electrica, vol. 23, no. 3, 2023.
    [11] D. Vij, V. Balachandran, T. Thomas, and R. Surendran, “Gramac: A graph based android malware classification mechanism,” in Proc. tenth ACM conference on data and application security and privacy, 2020, pp. 156–158.
    [12] C.-Y. Wu, T. Ban, S.-M. Cheng, T. Takahashi, and D. Inoue, “Iot malware classification based on reinterpreted function-call graphs,” Computers & Security, vol.125, p. 103060, 2023.
    [13] G.-Y. Lin, M.-H. Lin, B.-K. Hong, and S.-M. Cheng, “Hash-based function call graph fusion method for iot malware detection,” in 2023 26th International Symposium on Wireless Personal Multimedia Communications (WPMC). IEEE, 2023, pp. 159–164.
    [14] L.-B. Ouyang, “Robustness evaluation of graph-based malware detection using code-level adversarial attack with explainability,” in Master, NTUST, Taipei, Taiwan, 2021.
    [15] S.-J. Bu and S.-B. Cho, “Triplet-trained graph transformer with control flow graph for few-shot malware classification,” Information Sciences, vol. 649, p. 119598, 2023.
    [16] B. Esmaeili, A. Azmoodeh, A. Dehghantanha, G. Srivastava, H. Karimipour,
    and J. C.-W. Lin, “A gnn-based adversarial internet of things malware detection framework for critical infrastructure: Studying gafgyt, mirai and tsunami campaigns,” IEEE Internet of Things Journal, 2023.
    [17] K. Allix, Q. Jérome, T. F. Bissyandé, J. Klein, R. State, and Y. Le Traon, “A
    forensic analysis of android malware–how is malware written and how it could
    be detected?” in Proc. IEEE COMPSAC 2014, 2014, pp. 384–393.
    [18] J. Ramamoorthy, K. Gupta, N. K. Shashidhar, and C. Varol, “Linux iot mal-
    ware variant classification using binary lifting and opcode entropy,” Electronics, vol. 13, no. 12, p. 2381, 2024.
    [19] “VirusTotal,” https://www.virustotal.com.
    [20] “AVClass,” https://github.com/malicialab/avclass.
    [21] S. Li, Y. Li, W. Han, X. Du, M. Guizani, and Z. Tian, “Malicious mining code detection based on ensemble learning in cloud computing environment,” Simulation Modelling Practice and Theory, vol. 113, p. 102391, 2021.
    [22] A. Al-Dujaili, A. Huang, E. Hemberg, and U.-M. O'Reilly, “Adversarial deep learning for robust detection of binary encoded malware,” in Proc. IEEE SPW 2018, 2018, pp. 76–82.
    [23] J. Carter, S. Mancoridis, P. Protopapas, and E. Galinkin, “Iot malware data augmentation using a generative adversarial network.” in Proc. HICSS, 2024, pp. 7572–7581.
    [24] J. Bae and C. Lee, “Easy data augmentation for improved malware detection: A comparative study,” in Proc. IEEE Big Comp 2021. IEEE, 2021, pp. 214–218.
    [25] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “To-
    wards deep learning models resistant to adversarial attacks,” arXiv preprint
    arXiv:1706.06083, 2017.
    [26] “mirai-code-re-use-in-gafgyt,” https://www.uptycs.com/blog/
    mirai-code-re-use-in-gafgyt.
    [27] B. Jin, J. Choi, J. B. Hong, and H. Kim, “On the effectiveness of perturbations in generating evasive malware variants,” IEEE Access, vol. 11, pp. 31 062–31 074, 2023.
    [28] Y.-x. Yuan, “A modified bfgs algorithm for unconstrained optimization,” IMA Journal of Numerical Analysis, vol. 11, no. 3, pp. 325–332, 1991.
    [29] Y.-H. Dai, “A perfect example for the bfgs method,” Mathematical Programming, vol. 138, pp. 501–530, 2013.
    [30] D. C. Liu and J. Nocedal, “On the limited memory bfgs method for large scale optimization,” Mathematical programming, vol. 45, no. 1, pp. 503–528, 1989.
    [31] Y.-H. Dai, “Convergence properties of the bfgs algoritm,” SIAM Journal on Optimization, vol. 13, no. 3, pp. 693–701, 2002.
    [32] “VirusShare,” https://virusshare.com.
    [33] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, “IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture,” Computer Networks, vol. 171, p. 107138, 2020.
    [34] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. K. Nicholas,
    “Malware detection by eating a whole exe,” in Proc. AAAI 2018, 2018.
    [35] R. Chaganti, V. Ravi, and T. D. Pham, “Deep learning based cross architecture internet of things malware detection and classification,” Computers & Security, vol. 120, p. 102779, 2022.

    無法下載圖示 全文公開日期 2029/08/13 (校內網路)
    全文公開日期 2034/08/13 (校外網路)
    全文公開日期 2029/08/13 (國家圖書館:臺灣博碩士論文系統)
    QR CODE