Basic Search / Detailed Display

Author: 王勃淵
Po-Yuan Wang
Thesis Title: 利用執行順序進行惡意軟體檢測以提升穩健性
Robustness Enhancement of Malware Detection Using Execution Order
Advisor: 鄭欣明
Shin-Ming Cheng
Committee: 李漢銘
Hahn-Ming Lee
李育杰
Yuh-Jye Lee
黃意婷
Yi-Ting Huang
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2023
Graduation Academic Year: 111
Language: 英文
Pages: 48
Keywords (in Chinese): 惡意軟體穩健性人工智慧機器學習控制流程圖
Keywords (in other languages): malware, robustness, artificial intelligence, machine learning, CFG
Reference times: Clicks: 348Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

伴隨物聯網(IoT)的快速發展,針對物聯網設備的惡意軟體也因此大量產生。儘管藉由機器學習模型,人們已經可以自動化的檢測惡意軟體的存在與否。但仍然存在極大的隱憂,也就是針對機器學習模型的對抗式攻擊。對抗式攻擊可以藉由模型的反饋,對惡意軟體做出改良,進而產生能夠繞過模型的對抗式樣本,也因此模型的穩健性成為了重要的議題之一。本文中,我們透過考量執行順序來獲取惡意軟體中的惡意意圖,以增進惡意軟體檢測的穩健度。為了衡量模型對於惡意樣本的穩健度,我們實做了兩種對抗式攻擊的手法,並產出真實的對抗式樣本來進行穩健度的驗證。我們的結果說明了執行順序的考量可以在我們的資料集上獲取到相對正確的結果。同時,我們的模型也可以因此保持相對高的模型穩健度。此外,我們進一步驗證了我們的方法對於攻擊程度的穩健性,而該攻擊程度是以攻擊擾動的插入次數所決定的。我們發現我們的方法可以在不同的攻擊程度下保持一致且相對較低的水準。


With the booming development of Internet of Things (IoT), lots of malware programs targeting IoT devices are generated. Despite the ability to automatically detect the presence of malware through machine learning models, there is still a significant concern known as adversarial attacks targeting these models. Adversarial attacks can leverage feedback from the model to make improvements to malware, and generate adversarial samples that can evade the model. As a result, the robustness of the model has become one of the most important issues.
In our work, we utilize the execution order to further keep semantic information of the malice hidden in malware programs in order to enhance the robustness of malware detection. To evaluation the the robustness against adversarial samples, we implement two adversarial attack methods to generate authentic adversarial samples to verify our robustness. The results demonstrate that considering the execution order enables us to achieve relatively accurate outcomes on our dataset while maintaining a high standard of robustness. Furthermore, we assess the performance of our method across different attack levels, where the attack level is determined by the number of payload injections. Remarkably, we observe that our method exhibits resistance to the escalation of the attack level, resulting in consistently low evasion rates.

1 Introduction 1 1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background and Related Work 7 2.1 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Binary-based . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Signature-based . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Structure-based . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Adversarial Attacks on Malware . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Adversarial Attacks Scenarios . . . . . . . . . . . . . . . . . 12 2.2.2 Functionality Preserving Problem . . . . . . . . . . . . . . . 13 2.2.3 Functionality Preserving Attacks . . . . . . . . . . . . . . . . 13 3 Methodology 15 3.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Node Embedding . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Graph Embedding . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Adversarial Sample Generation . . . . . . . . . . . . . . . . . . . . . 20 4 Experimental Results 22 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4 Robustness against Adversarial Samples . . . . . . . . . . . . . . . . 26 5 Conclusion 32

[1] “Malware av-test,” Accessed Jun 25, 2023. [Online]. Available: https:
//www.av-test.org/en/statistics/malware/
[2] M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang, and F. Iqbal,
“Malware classification with deep convolutional neural networks,” in Proc. IFIP
on NTMS, Feb. 2018, pp. 1–5.
[3] N. McLaughlin, “Malceiver: Perceiver with hierarchical and multi-modal features for android malware detection,” arXiv preprint arXiv:2204.05994, 2022.
[4] D. Vij, V. Balachandran, T. Thomas, and R. Surendran, “GRAMAC: A graph
based android malware classification mechanism,” in Proc. of the 10th ACM CODASPY, Mar. 2020, p. 156–158.
[5] J. Yan, G. Yan, and D. Jin, “Classifying malware represented as control flow
graphs using deep graph convolutional neural network,” in IEEE/IFIP international conference on DSN, Jun. 2019, pp. 52–63.
[6] J. D. Herath, P. P. Wakodikar, P. Yang, and G. Yan, “CFGExplainer: Explaining
graph neural network-based malware classification from control flow graphs,” in
IEEE/IFIP International Conference on DSN, Jun. 2022.
[7] H. Shacham, “The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86),” in Proc. of the 14th ACM conference on CCS, Oct.
2007, pp. 552–561.
[8] B. Kang, S. Y. Yerima, S. Sezer, and K. McLaughlin, “N-gram opcode analysis
for android malware detection,” arXiv preprint arXiv:1612.01445, 2016.
[9] T. K. Tran and H. Sato, “NLP-based approaches for malware classification from
api sequences,” in 2017 21st Asia Pacific Symposium on IES, Nov. 2017, pp. 101–
105.
[10] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for
biomedical image segmentation,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2015, Nov. 2015, pp. 234–241.
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Łukasz
Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of
deep bidirectional transformers for language understanding,” arXiv preprint
arXiv:1810.04805, 2018.
[13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications
of the ACM, vol. 63, no. 11, pp. 139–144, Nov. 2020.
[14] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–
1901, 2020.
[15] S. Gülmez and I. Sogukpinar, “Graph-based malware detection using opcode sequences,” in 2021 9th ISDFS, Jun. 2021, pp. 1–5.
[16] C. Yang, J. Xu, S. Liang, Y. Wu, Y. Wen, B. Zhang, and D. Meng, “DeepMal:
maliciousness-preserving adversarial instruction learning against static malware
detection,” Cybersecurity, vol. 4, pp. 1–14, May 2021.
[17] A. Abusnaina, A. Khormali, H. Alasmary, J. Park, A. Anwar, and A. Mohaisen,
“Adversarial learning attacks on graph-based IoT malware detection systems,” in
2019 IEEE 39th ICDCS, Jul. 2019, pp. 1296–1305.
[18] A. Abusnaina, H. Alasmary, M. Abuhamad, S. Salem, D. Nyang, and A. Mohaisen, “Subgraph-based adversarial examples against graph-based iot malware
detection systems,” in Proc. Computational Data and Social Networks 2019, Nov.
2019, pp. 268–281.
[19] Z. Zhang, Y. Li, W. Wang, H. Song, and H. Dong, “Malware detection with dynamic evolving graph convolutional networks,” International Journal of Intelligent Systems, vol. 37, pp. 7261–7280, Mar. 2022.
[20] X.-W. Wu, Y. Wang, Y. Fang, and P. Jia, “Embedding vector generation based
on function call graph for effective malware detection and classification,” Neural
Computing and Applications, pp. 1–14, 2022.
[21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word
representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[22] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[23] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio et al.,
“Graph attention networks,” stat, vol. 1050, no. 20, pp. 10–48 550, May 2017.
[24] L. Massarelli, G. A. D. Luna, F. Petroni, L. Querzoni, and R. Baldoni, “Investigating graph embedding neural networks with unsupervised features extraction
for binary analysis,” in Proc. of the 2nd Workshop on Binary Analysis Research,
Feb. 2019.
[25] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware detection by eating a whole EXE,” in Proc. AAAI 2018, Jun. 2018.
[26] B. Yuan, J. Wang, D. Liu, W. Guo, P. Wu, and X. Bao, “Byte-level malware
classification based on markov images and deep learning,” Computers & Security,
vol. 92, p. 101740, May 2020.
[27] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[28] T. Rezaei and A. Hamze, “An efficient approach for malware detection using PE
header specifications,” in Proc. IEEE on ICWR 2020, Apr. 2020, pp. 234–239.
[29] H. HaddadPajouh, A. Dehghantanha, R. Khayami, and K.-K. R. Choo, “A deep
recurrent neural network based approach for internet of things malware threat
hunting,” Future Generation Computer Systems, pp. 88–96, Aug. 2018.
[30] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[31] M. Alhanahnah, Q. Lin, Q. Yan, N. Zhang, and Z. Chen, “Efficient signature
generation for classifying cross-architecture IoT malware,” in Proc. IEEE CNS
2018, May 2018, pp. 1–9.
[32] Y.-T. Lee, T. Ban, T.-L. Wan, S.-M. Cheng, R. Isawa, T. Takahashi, and D. Inoue,
“Cross platform IoT-malware family classification based on printable strings,” in
Proc. IEEE TrustCom 2020, Dec. 2020, pp. 775–784.
[33] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens,
“DREBIN: Effective and explainable detection of Android malware in your
pocket,” in Proc. NDSS Symposium 2014, Feb. 2014.
[34] L. Onwuzurike, E. Mariconti, P. Andriotis, E. D. Cristofaro, G. Ross, and
G. Stringhini, “MaMaDroid: Detecting android malware by building markov
chains of behavioral models (extended version),” ACM Trans. Privacy and Security, vol. 22, no. 2, Apr. 2019.
[35] N. Namani and A. Khan, “Symbolic execution based feature extraction for detection of malware,” in 2020 5th ICCCS, Oct. 2020, pp. 1–6.
[36] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad,
D. Nyang, and A. Mohaisen, “Analyzing and detecting emerging Internet of
Things malware: A graph-based approach,” IEEE Internet of Things Journal,
vol. 6, no. 5, pp. 8977–8988, Oct. 2019.
[37] W. W. Lo, S. Layeghy, M. Sarhan, M. Gallagher, and M. Portmann, “Graph neural
network-based android malware classification with jumping knowledge,” arXiv
e-prints, 2022.
[38] F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro, “Intriguing properties
of adversarial ml attacks in the problem space,” in 2020 IEEE symposium on SP,
May 2020, pp. 1332–1349.
[39] M. Ebrahimi, N. Zhang, J. Hu, M. T. Raza, and H. Chen, “Binary black-box evasion attacks against deep learning-based static malware detectors with adversarial
byte-level language model,” arXiv preprint arXiv:2012.07994, 2020.
[40] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Functionality preserving black-box optimization of adversarial windows malware,” IEEE
Transactions on IFS, vol. 16, pp. 3469–3478, May 2021.
[41] “Angr,” Accessed July 9, 2023. [Online]. Available: https://angr.io/
[42] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li,
and P. J. Liu, “Exploring the limits of transfer learning with a unified text-totext transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp.
5485–5551, Jan. 2020.
[43] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint
arXiv:1312.6114, 2013.
[44] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural
networks?” arXiv preprint arXiv:1810.00826, 2018.
[45] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal,
“graph2vec: Learning distributed representations of graphs,” arXiv preprint
arXiv:1707.05005, 2017.
[46] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,”
in Proc. of the 31st ICML, Jun. 2014, pp. 1188–1196.
[47] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: from phenomena to black-box attacks using adversarial samples,” arXiv
preprint arXiv:1605.07277, 2016.
[48] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami,
“Practical black-box attacks against machine learning,” in Proceedings of the
2017 ACM on Asia conference on computer and communications security, Apr.
2017, pp. 506–519.
[49] C.-H. Yang, “An imperceptible adversarial attack on structure-based malware detectors,” Master, NTUST, Taipei, Taiwan, Jul. 2022.
[50] “Virustotal,” Accessed Jun 25, 2023. [Online]. Available: https://www.virustotal.com/gui/intelligence-overview
[51] L.-B. Ouyang, “Robustness evaluation of graph-based malware detection using
code-level adversarial attack with explainability,” Master, NTUST, Taipei, Taiwan, Jul. 2021.

無法下載圖示 Full text public date 2025/08/15 (Intranet public)
Full text public date 2025/08/15 (Internet public)
Full text public date 2025/08/15 (National library)
QR CODE