利用分割技術對結構型惡意軟體檢測器的後門攻擊｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱昱誠 Yu-Cheng Chiu
論文名稱：	利用分割技術對結構型惡意軟體檢測器的後門攻擊 Backdoor Attack against Structure-based Malware Detector Using Partition
指導教授：	李漢銘 Hahn-Ming Lee 鄭欣明 Shin-Ming Cheng
口試委員:	李育杰 Yuh-Jye Lee 黃意婷 Yi-Ting Huang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	64
中文關鍵詞：	機器學習、圖神經網路、後門攻擊、惡意軟體、惡意軟體檢測器、結構型特徵、控制流程圖、函數呼叫圖
外文關鍵詞：	Machine Learning(ML), Graph Neural Network(GNN), Backdoor Attack, Malware, Malware Detector, Structural Feature, Control Flow Graph(CFG), Function Call Graph(FCG)
相關次數：	點閱：283 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在靜態惡意軟體檢測中，結構型特徵，如控制流程圖（CFG）與函數呼叫圖(FCG)，因其包含執行流程和軟體結構信息，被視作是識別惡意軟體的關鍵特徵。圖神經網路（GNN）可以有效地捕捉到節點之間的依賴關係，再進一步引入注意力機制後，能夠賦予節點不同的權重，使模型能更為精確地識別出重要的節點或子圖，進而更準確地識別出惡意軟體。然而，為了應對惡意軟體的不斷演進，基於AI的惡意軟體檢測器需要定期收集樣本並重新訓練模型，此過程中也面臨到後門攻擊的可能性。後門攻擊是一種藉由將觸發器植入資料集的部分樣本中，進而讓模型在訓練過程中生成後門的攻擊方法。不過據我們所知，目前還未有針對結構型特徵的後門攻擊方法。因此，本研究提出一種創新的後門攻擊方法，專門針對結構型特徵的惡意軟體檢測器。我們利用CFGExplainer來分析和找出在圖形中權重較高的良性子圖和惡意節點，並以此作為我們選擇觸發器的基礎進行後門攻擊。此外，我們也提出了一種新的Partition策略，可以有效地分散惡意節點的權重。實驗結果顯示，無論觸發器的強度如何，我們提出的策略都能顯著提升後門攻擊的成功率。同時，我們認為這項研究將有助於提供更有效的對策，以抵禦此類後門攻擊。

In static malware detection, structural features such as Control Flow Graphs (CFGs) and Function Call Graphs (FCGs) are recognized as key to identifying malicious software, as they include execution flow and software structure information. Graph Neural Networks (GNNs) excel at capturing the interdependencies among nodes. By incorporating attention mechanisms, these networks can assign different weights to nodes, facilitating the precise identification of significant nodes or subgraphs, and thus enhancing malware detection accuracy. Nevertheless, to cope with the constant evolution of malware, AI-based malware detectors need to routinely gather samples and retrain their models, a process that exposes them to potential backdoor attacks. These attacks typically involve the implantation of triggers into some samples of the dataset, thereby enabling the model to generate backdoors during training. However, to the best of our knowledge, no backdoor attack methods currently exist specifically tailored for structural features. Hence, this study proposes a novel backdoor attack method specifically designed for malware detectors using structural features. We leverage CFGExplainer to analyze and identify high-weight benign subgraphs and malicious nodes within the graph, serving as the basis for our trigger selection for the backdoor attack. Moreover, we introduce a novel Partition strategy capable of effectively diversifying the weight of malicious nodes. Experimental results demonstrate that, regardless of trigger strength, our proposed strategies can significantly boost the success rate of backdoor attacks. Furthermore, we believe our research can contribute to the development of more effective countermeasures against such backdoor attacks.

中文摘要 i
ABSTRACT ii
誌謝 iv
Introduction 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Background and Related Work 8
1 ELF File Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1 ELF File Format . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Code Injection . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Static Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Structure-based Malware Detection . . . . . . . . . . . . . . 13
3 Backdoor Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Backdoor Attacks on Malware Detection . . . . . . . . . . . 16
4 Explainability Analysis and Applications of Graph Neural Network
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 CFGExplainer . . . . . . . . . . . . . . . . . . . . . . . . . 19
Backdoor Attack against Structure-based Malware Detector Using Partition 20
1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Feature Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 Feature Importance Analysis . . . . . . . . . . . . . . . . . . 27
2.2 Trigger Generation . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Experimental Results 34
1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 Target Model and Experiment Setting . . . . . . . . . . . . . . . . . 35
3 Results of Backdoor Attack . . . . . . . . . . . . . . . . . . . . . . . 36
Limitations and Future Work 42
1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Conclusions 44
                                

[1] C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas, “DDoS in the IoT: Mirai and other botnets,” IEEE Computer, vol. 50, pp. 80–84, Jul. 2017.
[2] I. Makhdoom, M. Abolhasan, J. Lipman, R. P. Liu, and W. Ni, “Anatomy of threats to the Internet of Things,” IEEE communications surveys & tutorials, vol. 21, no. 2, pp. 1636–1675, Oct. 2018.
[3] S.-M. Cheng, P.-Y. Chen, C.-C. Lin, and H.-C. Hsiao, “Traffic-aware patching for cyber security in mobile IoT,” IEEE Communications Magazine, vol. 55, no. 7, pp. 29–35, Jul. 2017.
[4] A. D. Raju, I. Y. Abualhaol, R. S. Giagone, Y. Zhou, and S. Huang, “A survey on cross-architectural IoT malware threat hunting,” IEEE Access, vol. 9, pp. 91 686–91 709, Jun. 2021.
[5] Y. Ai, C. Lei, J. Cheng, and J. Mei, “Prediction of weld area based on image recognition and machine learning in laser oscillation welding of aluminum alloy,”Optics and Lasers in Engineering, vol. 160, p. 107258, 2023.
[6] V. Mahipal, S. Ghosh, I. T. Sanusi, R. Ma, J. E. Gonzales, and F. G. Martin, “Doo46dleit: A novel tool and approach for teaching how cnns perform image recognition,” in Proc. ACE 2023, Jan. 2023, pp. 31–38.
[7] X. Li, M. Liu, S. Gao, and W. Buntine, “A survey on out-of-distribution evaluation of neural nlp models,” arXiv preprint arXiv:2306.15261, 2023.
[8] Q.-D. Ngo, H.-T. Nguyen, V.-H. Lec, and D.-H. Nguyen, “A survey of IoT malware and detection methods based on static features,” ICT Express, vol. 6, no. 4, pp. 280–286, Dec. 2020.
[9] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware detection by eating a whole EXE,” in Proc. AAAI 2018, Jun. 2018.
[10] H. S. Anderson and P. Roth, “EMBER: An open dataset for training static PE malware machine learning models,” arXiv preprint arXiv:1804.04637, Apr. 2018.
[11] J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional binary program features,” in Proc. IEEE MALWARE 2015. IEEE, Feb. 2015, pp. 11–20.
[12] J. Su, D. V. Vasconcellos, S. Prasad, D. Sgandurra, Y. Feng, and K. Sakurai, “Lightweight classification of IoT malware based on image recognition,” in Proc. IEEE COMPSAC 2018, Jul. 2018, pp. 664–669.
[13] X. Liu, Y. Lin, H. Li, and J. Zhang, “A novel method for malware detection on ML-based visualization technique,” Computers & Security, vol. 89, p. 101682, Feb. 2020.
[14] H. HaddadPajouh, A. Dehghantanha, R. Khayami, and K.-K. R. Choo, “A deep recurrent neural network based approach for internet of things malware threat hunting,” FGCS, pp. 88–96, Aug. 2018.
[15] E. M. Dovom, A. Azmoodeh, A. Dehghantanha, D. E. Newton, R. M. Parizi, and H. Karimipour, “Fuzzy pattern tree for edge malware detection and categorization in IoT,” Journal of Systems Architecture, vol. 97, pp. 1–7, Aug. 2019.
[16] T.-L. Wan, T. Ban, S.-M. Cheng, Y.-T. Lee, B. Sun, R. Isawa, T. Takahashi, and D. Inoue, “Efficient detection and classification of internet-of-things malware based on byte sequences from executable files,” IEEE Open Journal of the Computer Society, vol. 1, pp. 262–275, 2020.
[17] X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network-based graph embedding for cross-platform binary code similarity detection,” in Proc. ACM SIGSAC 2017, Oct. 2017, pp. 363–376.
[18] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad, D. Nyang, and A. Mohaisen, “Analyzing and detecting emerging Internet of Things malware: A graph-based approach,” IEEE IoT-J, vol. 6, no. 5, pp. 8977–8988, Oct. 2019.
[19] J. Yan, G. Yan, and D. Jin, “Classifying malware represented as control flow graphs using deep graph convolutional neural network,” in Proc. IEEE/IFIP DSN 2019, Jun. 2019, pp. 52–63.
[20] B. Wu, Y. Xu, and F. Zou, “Malware classification by learning semantic and structural features of control flow graphs,” in Proc. IEEE TrustCom 2021, Oct. 2021, pp. 540–547.
[21] S. Zhao, X. Ma, W. Zou, and B. Bai, “DeepCG: classifying metamorphic malware through deep learning of call graphs,” in Proc. EAI SecureComm 2019, Oct. 2019, p. 171–190.
[22] C.-Y. Wu, T. Ban, S.-M. Cheng, B. Sun, and T. Takahashi, “IoT malware detection using function-call-graph embedding,” in Proc. IEEE PST 2021, Dec. 2021, pp.1–9.
[23] X.-W. Wu, Y. Wang, Y. Fang, and P. Jia, “Embedding vector generation based on function call graph for effective malware detection and classification,” Neural Computing and Applications, pp. 1–14, Feb. 2022.
[24] “Virustotal,” https://www.virustotal.com.
[25] G. Severi, J. Meyer, S. Coull, and A. Oprea, “Explanation-Guided backdoor poisoning attacks against malware classifiers,” in Proc. USENIX 2021, Aug. 2021, pp. 1487–1504.
[26] J. Lin, L. Xu, Y. Liu, and X. Zhang, “Composite backdoor attack for deep neural network by mixing existing benign features,” in Proc. ACM SIGSAC 2020, Oct. 2020, p. 113–131.
[27] Y. He, Z. Shen, C. Xia, J. Hua, W. Tong, and S. Zhong, “SGBA: A stealthy scapegoat backdoor attack against deep neural networks,” arXiv preprint arXiv:2104.01026, 2021.
[28] Y. Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” in Proc. IEEE/CVF ICCV, Oct. 2021, pp. 16 473–16 481.
[29] L. Gan, J. Li, T. Zhang, X. Li, Y. Meng, F. Wu, Y. Yang, S. Guo, and C. Fan, “Triggerless backdoor attack for nlp tasks with clean labels,” arXiv preprint arXiv:2111.07970, 2021.
[30] C. Li, X. Chen, D. Wang, S. Wen, M. E. Ahmed, S. Camtepe, and Y. Xiang, “Backdoor attack on machine learning based android malware detectors,” IEEE TDSC, vol. 19, no. 5, pp. 3357–3370, 2021.
[31] M.-W. Tsang, “Analysis of invisible data poisoning backdoor attacks against malware classifiers,” Master, NTUST, Taipei, Taiwan, Jul. 2021.
[32] C.-M. Lai, “Backdoor attack against malware detector based on data extremum analysis,” Master, NTUST, Taipei, Taiwan, Jan. 2022.
[33] W. Qiu, “A survey on poisoning attacks against supervised machine learning,” arXiv preprint arXiv:2202.02510, 2022.
[34] J. Wang, G. M. Hassan, and N. Akhtar, “A survey of neural trojan attacks and defenses in deep learning,” arXiv preprint arXiv:2202.07183, 2022.
[35] L. Yang, Z. Chen, J. Cortellazzi, F. Pendlebury, K. Tu, F. Pierazzi, L. Cavallaro, and G. Wang, “Jigsaw puzzle: Selective backdoor attack to subvert malware classifiers,” arXiv preprint arXiv:2202.05470, 2022.
[36] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,” arXiv preprint arXiv:1712.05526, 2017.
[37] A. Turner, D. Tsipras, and A. Madry, “Label-consistent backdoor attacks,” arXiv preprint arXiv:1912.02771, 2019.
[38] P. Xia, Z. Li, W. Zhang, and B. Li, “Data-Efficient backdoor attacks,” arXiv preprint arXiv:2204.12281, 2022.
[39] X. Chen, Y. Dong, Z. Sun, S. Zhai, Q. Shen, and Z. Wu, “Kallima: A clean-label framework for textual backdoor attacks,” in Proc. ESORICS 2022. Springer, Sep. 2022, pp. 447–466.
[40] J. Xu, M. Xue, and S. Picek, “Explainability-based backdoor attacks against graph neural networks,” in Proc. ACM WiseML Workshop 2021, Jun. 2021, pp. 31–36.
[41] Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph neural networks,” in Proc. ACM SACMAT 2021, Jun. 2021, pp. 15–26.
[42] “Executable and linking format (ELF) specification version 1.2,” Tool Interface Standard (TIS), (1995, May). [Online]. Available: https://refspecs.linuxbase.org/elf/elf.pdf
[43] J. D. Herath, P. P. Wakodikar, P. Yang, and G. Yan, “CFGExplainer: Explaining graph neural network-based malware classification from control flow graphs,” in Proc. IEEE/IFIP DSN 2022, Jun. 2022, pp. 172–184.
[44] T.-Y. Chen, “Structural attack against graph-based IoT malware detection at assembly level,” Master, NTUST, Taipei, Taiwan, Jan. 2022.
[45] M. Alhanahnah, Q. Lin, Q. Yan, N. Zhang, and Z. Chen, “Efficient signature generation for classifying cross-architecture IoT malware,” in Proc. IEEE CNS 2018, May 2018.
[46] Y.-T. Lee, T. Ban, T.-L. Wan, S.-M. Cheng, R. Isawa, T. Takahashi, and D. Inoue, “Cross platform IoT-malware family classification based on printable strings,” in Proc. IEEE TrustCom 2020, Dec. 2020, pp. 775–784.
[47] F. Shahzad and M. Farooq, “ELF-Miner: Using structural knowledge and data mining methods to detect new Linux malicious executables,” KAIS, vol. 30, no. 3, pp. 589–612, Mar. 2012.
[48] L. Onwuzurike, E. Mariconti, P. Andriotis, E. D. Cristofaro, G. Ross, and G. Stringhini, “MaMaDroid: Detecting android malware by building markov chains of behavioral models (extended version),” ACM TOPS, vol. 22, no. 2, Apr. 2019.
[49] T. N. Phu, L. Hoang, N. N. Toan, N. D. Tho, and N. N. Binh, “C500-CFG: A novel algorithm to extract control flow-based features for IoT malware detection,” in Proc. IEEE ISCIT 2019, Sep. 2019, pp. 568–573.
[50] L.-B. Ouyang, “Robustness evaluation of graph-based malware detection using code-level adversarial attack with explainability,” Master, NTUST, Taipei, Taiwan, Jul. 2021.
[51] L. Massarelli, G. A. D. Luna, F. Petroni, L. Querzoni, and R. Baldoni, “Investigating graph embedding neural networks with unsupervised features extraction for binary analysis,” in Proc. NDSS BAR Workshop 2019, Feb. 2019.
[52] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[53] W. W. Lo, S. Layeghy, M. Sarhan, M. Gallagher, and M. Portmann, “Graph neural network-based android malware classification with jumping knowledge,” in IEEE DSC 2022, Jun. 2022, pp. 1–9.
[54] M. Toneva, A. Sordoni, R. T. des Combes, A. Trischler, Y. Bengio, and G. J. Gordon, “An empirical study of example forgetting during deep neural network learning,” in Proc. ICLR 2018, Apr. 2018.
[55] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, “GNNExplainer: Generating explanations for graph neural networks,” in Proc. NeurIPS 2019, vol. 32, Dec. 2019.
[56] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji, “On explainability of graph neural networks via subgraph explorations,” in Proc. ICML 2021, vol. 139, Jul. 2021, pp. 12 241–12 252.
[57] D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang, “Parameterized explainer for graph neural network,” in Proc. NeurIPS 2020, vol. 33, Dec. 2020, pp. 19 620–19 631.
[58] F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro., “Intriguing properties of adversarial ML attacks in the problem space,” in Proc. IEEE S&P 2020, May 2020, p. 1332–1349.
[59] Angr, “http://angr.io/.”
[60] “Radare2,” https://rada.re/r/.
[61] A. Hagberg, P. Swart, and D. Chult, “Exploring network structure, dynamics, and function using NetworkX,” in Proc. SciPy 2008, Aug. 2008, p. 11–15.

全文公開日期 2025/08/15 (校內網路)
全文公開日期 2025/08/15 (校外網路)
全文公開日期 2025/08/15 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文