針對基於操作碼的惡意軟體檢測器在組合語言層級使用 Transformer 之對抗式攻擊

簡易檢索 / 詳目顯示

回結果列表

研究生：	潘家洋 Jia-Yang Pan
論文名稱：	針對基於操作碼的惡意軟體檢測器在組合語言層級使用 Transformer 之對抗式攻擊 Adversarial Attacks Against Opcode-based Malware Detectors Using Transformer at the Assembly Level
指導教授：	李漢銘 Hahn-Ming Lee 鄭欣明 Shin-Ming Cheng
口試委員:	李育杰 Yuh-Jye Lee 黃意婷 Yi-Ting Huang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	58
中文關鍵詞：	對抗式攻擊、機器學習、Transformer 、惡意軟體檢測、靜態分析
外文關鍵詞：	Adversarial Attack, Machine Learning, Transformer, Malware Detection, Static Analysis
相關次數：	點閱：290 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著數位世界的快速發展，惡意軟體已成為網路安全上的重大威脅。在惡意軟體檢測中，機器學習扮演著關鍵的角色。然而，攻擊者持續不斷地尋找建立對抗性樣本以繞過檢測器的方法，這使得惡意軟體檢測器的穩健性成為一個重要的問題。在本研究中，我們針對基於操作碼的惡意軟體檢測器，透過使用 Transformer 生成 benign-looking payload，並在不影響完整性和功能性的情況下將其插入目標二進制文件中，以引導檢測器做出誤判。Transformer 是一種具有自我注意力機制的模型，這種機制使得 Transformer 在處理長文本和序列等資料時能夠更好地捕捉長距離的相互作用，同時具有平行計算的能力。因此，我們選擇使用合法且具有意義的良性操作碼序列來訓練 Transformer 模型。這些操作碼序列能夠準確描述軟體的操作行為，並有效地影響模型的預測能力。此外，為了減少樣本生成的迭代次數，本研究採用加權抽選優化算法（Weighted Sampling Optimization Algorithm, WSOA），旨在提高注入效率減少注入量。我們對四種不同操作碼特徵設置下的檢測器進行了評估，實驗結果顯示，相較於現有方法，我們節省了超過1/2的攻擊成本。總結來說，我們提出了一種新的方式來測試惡意軟體檢測器的穩健性，透過 Transformer 生成不同的可能性，以提高檢測器的防禦能力，為防禦者提供有益的啟示。

With the rapid development of the digital world, malware has become a significant threat to cybersecurity. Machine learning plays a crucial role in malware detection. However, attackers persistently seek ways to create adversarial examples to evade the detectors, making the robustness of malware detectors a critical concern. In this study, we target an opcode-based malware detector and employ Transformer to generate benign-looking payloads. These payloads are inserted into the target binary files without compromising their executability and functionality, thus misleading the detector's judgments. Transformer is a model with a self-attention mechanism, which allows it to better capture long-range interactions in data such as long texts and sequences while possessing parallel computing capabilitie. Therefore, we choose to train the Transformer model using legitimate and meaningful opcode sequences. These opcode sequences accurately describe the software's behavior and effectively influence the model's predictive capability. Moreover, to reduce the number of iterations in sample generation, this study adopts the Weighted Sampling Optimization Algorithm (WSOA), aiming to improve injection efficiency and reduce injection quantity. We evaluate the detector under four different opcode feature settings, and the experimental results show that compared to existing methods, we save over half of the attack costs. In conclusion, we propose a novel approach to test the robustness of malware detectors by leveraging Transformer to generate diverse possibilities, thereby enhancing the detector's defense capabilities and providing valuable insights to defenders.

中文摘要 i
ABSTRACT ii
誌謝 iv
Introduction 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Background and Related Work 9
1 Static Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 9
2 ELF File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Adversarial Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Functionality-preserving . . . . . . . . . . . . . . . . . . . . 17
4 Sequential Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Recurrent Neural Networks（RNNs） . . . . . . . . . . . . 20
4.2 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Methodology 24
1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . 26
2 Code Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Training Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Payload Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Attack Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Experimental Results and Robustness Analysis 37
1 Dataset and Experiment Setting . . . . . . . . . . . . . . . . . . . . . 37
2 Malware Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Analyzing Attack Results . . . . . . . . . . . . . . . . . . . . . . . . 39
Limitations and Future Work 47
1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Conclusions 49
                                

[1] I. Makhdoom, M. Abolhasan, J. Lipman, R. P. Liu, and W. Ni, “Anatomy of threats to the Internet of Things,” IEEE Commun. Surveys Tuts., vol. 21, no. 2, pp. 1636–1675, Oct. 2018.
[2] A. D. Raju, I. Y. Abualhaol, R. S. Giagone, Y. Zhou, and S. Huang, “A survey on cross-architectural IoT malware threat hunting,” IEEE Access, vol. 9, pp. 91 686–91 709, Jun. 2021.
[3] Q.-D. Ngo, H.-T. Nguyen, V.-H. Lec, and D.-H. Nguyen, “A survey of IoT malware and detection methods based on static features,” ICT Express, vol. 6, no. 4, pp. 280–286, Dec. 2020.
[4] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware detection by eating a whole EXE,” in Proc. AAAI 2018, Jun. 2018.
[5] D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, “Image-based malware classification using ensemble of CNN architectures (IMCEC),” Computers & Security, vol. 92, May 2020.
[6] S. A. Roseline, S. Geetha, S. Kadry, and Y. Nam, “Intelligent vision-based malware detection and classification using deep random forest paradigm,” IEEE Access, vol. 8, pp. 206 303–206 324, Nov. 2020.
[7] M. Alhanahnah, Q. Lin, Q. Yan, N. Zhang, and Z. Chen, “Efficient signature generation for classifying cross-architecture IoT malware,” in Proc. IEEE CNS 2018, May 2018.
[8] P. Mohandas, S. K. S. Kumar, S. P. Kulyadi, M. J. S. Raman, V. V. S, and B. Venkataswami, “Detection of malware using machine learning based on operation code frequency,” in Proc. IEEE IAICT 2021, Sep. 2021.
[9] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad, D. Nyang, and A. Mohaisen, “Analyzing and detecting emerging Internet of Things malware: A graph-based approach,” IEEE Internet Things J., vol. 6, no. 5, pp. 8977–8988, Oct. 2019.
[10] J. Yan, G. Yan, and D. Jin, “Classifying malware represented as control flow graphs using deep graph convolutional neural network,” in Proc. IEEE/IFIP DSN 2019, Jun. 2019, pp. 52–63.
[11] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated dynamic malware-analysis techniques and tools,” ACM computing surveys (CSUR), vol. 44, no. 2, pp. 1–42, Mar. 2008.
[12] K. Lucas, M. Sharif, L. Bauer, M. K. Reiter, and S. Shintre, “Malware makeover: Breaking ML-based static analysis by modifying executable bytes,” in Proc. ACM Asia CCS 2021, May 2021, pp. 744–758.
[13] L. Demetrio, S. E. Coull, B. Biggio, G. Lagorio, A. Armando, and F. Roli, “Adversarial EXEmples: A survey and experimental evaluation of practical attacks on machine learning for windows malware detection,” ACM Trans. Privacy and Security, vol. 24, no. 4, pp. 1–31, Nov. 2021.
[14] J. L. Hu, M. Ebrahimi, and H. Chen, “Single-shot black-box adversarial attacks against malware detectors: A causal language model approach,” in Proc. IEEE ISI 2021, Nov. 2021.
[15] M. Ebrahimi, N. Zhang, J. Hu, M. T. Raza, and H. Chen, “Binary black-box evasion attacks against deep learning-based static malware detectors with adversarial byte-level language model,” in Proc. AAAI Workshop on RSEML, Feb. 2021.
[16] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, Mar. 2015.
[17] W. Fleshman, E. Raff, R. Zak, M. McLean, and C. Nicholas, “Static malware detection & subterfuge: Quantifying the robustness of machine learning and current anti-virus,” in Proc. IEEE MALWARE 2018, Oct. 2018, pp. 1–10.
[18] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Functionality-preserving black-box optimization of adversarial windows malware,” IEEE Trans. Inf. Forensics Security, vol. 16, pp. 3469–3478, May 2021.
[19] C. Yang, J. Xu, S. Liang, Y. Wu, Y. Wen, B. Zhang, and D. Meng, “DeepMal: maliciousness-preserving adversarial instruction learning against static malware detection,” Cybersecurity, vol. 4, May 2021.
[20] F. Kreuk, A. Barak, S. Aviv-Reuven, M. Baruch, B. Pinkas, and J. Keshet, “Deceiving end-to-end deep learning malware detectors using adversarial examples,” arXiv preprint arXiv:1802.04528, Feb. 2018.
[21] O. Suciu, S. E. Coull, and J. Johns, “Exploring adversarial examples in malware detection,” in Proc. IEEE SPW 2019, May 2019, pp. 8–14.
[22] R. L. Castro, C. Schmitt, and G. D. Rodosek, “Armed: How automatic malware modifications can evade static detection?” in Proc. IEEE ICIM 2019, May 2019, pp. 20–27.
[23] H. Bostani and V. Moonsamy, “EvadeDroid: A practical evasion attack on machine learning for black-box android malware detection,” arXiv preprint arXiv:2110.03301, Oct. 2021.
[24] M. Krčál, O. Švec, M. Bálek, and O. Jašek, “Deep convolutional malware classifiers can learn from raw executables and labels only,” in Proc. ICLR Workshop 2018, Apr. 2018.
[25] X. Liu, Y. Lin, H. Li, and J. Zhang, “A novel method for malware detection on ML-based visualization technique,” Computers & Security, vol. 89, p. 101682, Feb. 2020.
[26] R. Islam, R. Tian, L. M. Batten, and S. Versteeg, “Classification of malware based on integrated static and dynamic features,” Journal of Network and Computer Applications, vol. 32, no. 2, pp. 646–656, Mar. 2013.
[27] A. Anidu and Z. Obuzor, “Evaluation of machine learning algorithms on internet of things (IoT) malware opcodes,” Handbook of Big Data Analytics and Forensics, pp. 177–191, Jan. 2022.
[28] P. N. Yeboah, S. K. Amuquandoh, and H. B. B. Musah, “Malware detection using ensemble n-gram opcode sequences,” iJIM, vol. 15, no. 24, Dec. 2021.
[29] R. Lu, “Malware detection with lstm using opcode language,” arXiv preprint arXiv:1906.04593, Jun. 2019.
[30] L.-B. Ouyang, “Robustness evaluation of graph-based malware detection using code-level adversarial attack with explainability,” Master, NTUST, Taipei, Taiwan, Jul. 2021.
[31] B. Wu, Y. Xu, and F. Zou, “Malware classification by learning semantic and structural features of control flow graphs,” in Proc. IEEE TrustCom 2021, Oct. 2021, pp. 540–547.
[32] “Executable and linking format (ELF) specification version 1.2,” Tool Interface Standard (TIS), (1995, May). [Online]. Available: https://refspecs.linuxbase.org/ elf/elf.pdf
[33] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in Proc. MECML PKDD 2013, Sep. 2013, pp. 387–402.
[34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE CVPR 2016, Dec. 2016, pp. 770–778.
[35] N. Sharma, V. Jain, and A. Mishra, “An analysis of convolutional neural networks for image classification,” Procedia computer science, vol. 132, pp. 377–384, 2018.
[36] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 828–841, Jan. 2019.
[37] F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro, “Intriguing properties of adversarial ML attacks in the problem space,” in Proc. IEEE SP 2020, May 2020, p. 1332–1349.
[38] A. Abusnaina, A. Khormali, H. Alasmary, J. Park, A. Anwar, and A. Mohaisen, “Adversarial learning attacks on graph-based IoT malware detection systems,” in Proc. IEEE ICDCS 2019, Jul. 2019, pp. 1296–1305.
[39] C.-H. Yang, “An imperceptible adversarial attack on structure-based malware de- tectors,” Master, NTUST, Taipei, Taiwan, Jul. 2022.
[40] K. Li, W. Guo, F. Zhang, and J. Du, “Gambd: Generating adversarial malware against malconv,” Computers & Security, vol. 130, p. 103279, Jul. 2023.
[41] X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y. Xiang, and K. Ren, “Android HIV: A study of repackaging malware for evading machine-learning detection,” IEEE Trans. Inf. Forensics Security, vol. 15, pp. 987–1001, Jul. 2019.
[42] Y. Kucuk and G. Yan, “Deceiving portable executable malware classifiers into targeted misclassification with practical adversarial examples,” in Proc. ACM CODASPY 2020, Mar. 2020, pp. 341–352.
[43] T.-Y. Chen, “Structural attack against graph-based IoT malware detection at assembly level,” Master, NTUST, Taipei, Taiwan, Jan. 2022.
[44] D. Park, H. Khan, and B. Yener, “Generation & evaluation of adversarial examples for malware obfuscation,” in Proc. IEEE ICMLA 2019, Dec. 2019, pp. 1283–1290.
[45] T. Mikolov, M. Karafiát, L. Burget, J. C. ‘y, , and S. Khudanpur, “Recurrent neural network based language model,” in Proc. Interspeech 2010, Sep. 2010.
[46] W. D. Mulder, S. Bethard, and M.-F. Moens, “A survey on the application of recurrent neural networks to statistical language modeling,” Computer Speech & Language, vol. 30, no. 1, pp. 61–98, Mar. 2015.
[47] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Łukasz Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. NIPS 2017, Dec. 2017.
[48] R. Alessandro and J. Tiedemann., “An analysis of encoder representations in transformer-based machine translation,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Nov. 2018, pp. 287–297.
[49] J. Zhu, Y. Xia, L. Wu, D. He, T. Qin, W. Zhou, H. Li, and T.-Y. Liu, “Incorporating bert into neural machine translation,” arXiv preprint arXiv:2002.06823, Feb. 2020.
[50] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End object detection with transformers,” in Proc. ECCV 2020, Aug. 2020, pp. 213–229.
[51] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,” in Proc. ACM AISec 2017, Nov. 2017, pp. 15–26.
[52] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna, “SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis,” in Proc. IEEE S&P 2016, May 2016, pp. 138–157.
[53] S. Kirkpatrick, C. D. Gelatt Jr, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, May 1983.
[54] W. Peters, A. Dehghantanha, R. M. Parizi, and G. Srivastava, “A comparison of state-of-the-art machine learning models for OpCode-based IoT malware detection,” Handbook of Big Data Privacy, pp. 109–120, Mar. 2020.

全文公開日期 2026/08/10 (校內網路)
全文公開日期 2026/08/10 (校外網路)
全文公開日期 2026/08/10 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文