簡易檢索 / 詳目顯示

研究生: 賴啓明
Chi-Ming Lai
論文名稱: 基於資料極值分析的惡意軟體檢測器後門攻擊
Backdoor Attack against Malware Detector based on Data Extremum Analysis
指導教授: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
口試委員: 游家牧
Chia-Mu Yu
黃俊穎
Chun-Ying Huang
李育杰
Yuh-Jye Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 43
中文關鍵詞: 後門攻擊惡意軟體檢測
外文關鍵詞: backdoor attack, malware detection
相關次數: 點閱:145下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著惡意軟件數量和類型的急劇增加,機器學習(ML)逐漸應用於惡意軟件檢測領域。作為第一層的保護機制,可以顯著提高分析人員的檢測效率。基於機器學習的惡意軟件檢測方法通常需要依靠大量正確的訓練資料才能保有檢測的準確度,這也為其帶來了潛在問題,例如臭名昭著的後門攻擊。在模型訓練階段,攻擊者可以製作一些訓練樣本來毒害惡意軟件分類器,從而在不影響原始檢測效果的情況下誤判帶有後門觸發器的樣本。本文提出了一種與模型無關的後門攻擊特徵選擇方法,而不是根據對特定檢測模型的相關重要性來選擇特徵作為後門觸發器。特別是,我們分析了訓練資料的行為,並選擇目標特徵中的極值作為後門觸發器。觸發器具有隱蔽性,因為它是從特徵中的現有值中選擇的。實驗結果表明,從效率和有效性的角度來看,我們的方法優 於現有的模型依賴攻擊。


    In recent years, with the sharp increase in the number and types of malware, machine learning (ML) has gradually been applied to the field of malware detection. As a first-stage protection mechanism, it can significantly improve the detection efficiency of analysts. ML-based malware detection methods usually rely on the correctness of training data so that high detection accuracy is maintained, thereby bringing potential vulnerabilities such as infamous backdoor attack. Functioning in the model training phase, adversary could craft a few training samples to poison the malware classifier so that samples with backdoor triggers will be misjudged without affecting the original detection effect. Instead of selecting features as backdoor triggers according to the related importance to a specific detection model, this paper proposes a model-independent feature selection method for backdoor attack. In particular, we investigate the behavior of training data and select the extremum one in the target features as backdoor trigger. The trigger remains invisible since it is chosen from existing value in the feature. The experimental results show that our method outperforms the existing model-dependent attacks from the perspectives of efficiency and effectiveness.

    中文摘要 i ABSTRACT ii 誌謝 iii 1 Introduction 1 1.1 OutlineoftheThesis .......................... 3 2 Background and Related Work 4 2.1 ExistingResearch............................ 4 2.1.1 MalwareDetection....................... 4 2.1.2 Vulnerability in Transfer Learning and Federated Learning . . 6 2.1.3 AttacksagainstMachineLearningModel . . . . . . . . . . . 6 2.1.4 PoisonedModelDetection................... 8 3 Methodology 10 3.1 ThreatModel .............................. 10 3.1.1 Data Extremum Analysis . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Model-IndependentBackdoorAttack . . . . . . . . . . . . . 13 4 Experimental Results 15 4.1 Dataset ................................. 15 4.2 TargetModel .............................. 16 4.3 AttackEvaluation............................ 17 5 Limitations and Future Work 23 5.1 Limitations ............................... 23 5.2 FutureWork............................... 24 6 Conclusions 25 List of Figures Fig. 2.1 Overview of data poison attack. . . . . . . . . . . . . . . . . . . 7 Fig.3.1 Overviewofourattack....................... 12 List of Tables Table 4.1 The detector's accuracy in different setting. . . . . . . . . . . 16 Table 4.2 FeasiblefeatureofEMBER. ................... 16 Table 4.3 Comparison with other methods on LightGBM. . . . . . . . . . 17 Table 4.4 Comparison with other data analysis methods on LightGBM. . . 18 Table 4.5 Comparison with other methods on XGBoost. . . . . . . . . . . 20 Table 4.6 ComparisonwithothermethodsonRF. . . . . . . . . . . . . . 21 Table 4.7 Comparison of bypass rates for different poisoning rates on DNN. 22

    [1] H.B.McMahan,E.Moore,D.Ramage,andB.A.yArcas,“Federatedlearningof deep networks using model averaging,” arXiv preprint arXiv:1602.05629, 2016.
    [2] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43– 76, Jul. 2020.
    [3] T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017.
    [4] Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in Proc. NDSS Symposium 2018, Feb. 2018.
    [5] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neural net- works,” in Proc. NIPS 2018, Dec. 2018, p. 6106–6116.
    [6] J. Lin, L. Xu, Y. Liu, and X. Zhang, “Composite backdoor attack for deep neural network by mixing existing benign features,” in Proc. ACM CCS 2020, Oct. 2020, p. 113–131.
    [7] Y. He, Z. Shen, C. Xia, W. Tong, J. Hua, and S. Zhong, “SGBA: A stealthy scapegoat backdoor attack against deep neural networks,” arXiv preprint
    arXiv:2104.01026, 2021.
    [8] Y. Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” in Proc. ICCV 2021, Oct. 2021, pp. 16 473– 16 481.
    [9] L. Gan, J. Li, T. Zhang, X. Li, Y. Meng, F. Wu, S. Guo, and C. Fan, “Triggerless backdoor attack for NLP tasks with clean labels,” arXiv preprint arXiv:2111.07970, 2021.
    [10] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in Proc. AISTATS 2020, Sep. 2020, pp. 2938–2948.
    [11] C. Xie, K. Huang, P.-Y. Chen, and B. Li, “DBA: Distributed backdoor attacks against federated learning,” in Proc. ICLR 2020, May 2020.
    [12] G. Severi, J. Meyer, S. Coull, and A. Oprea, “Explanation-guided backdoor poi- soning attacks against malware classifiers,” in Proc. USENIX Security 2021, Aug. 2021, pp. 1487–1504.
    [13] C. Li, X. Chen, D. Wang, S. Wen, M. E. Ahmed, S. Camtepe, and Y. Xiang, “Backdoor attack on machine learning based Android malware detectors,” IEEE Trans. Dependable Secure Comput., 2021, accepted for publication.
    [14] T. Lei, Z. Qin, Z. Wang, Q. Li, and D. Ye, “EveDroid: Event aware Android mal- ware detection against model degrading for IoT devices,” IEEE Internet Things J., vol. 6, no. 4, pp. 6668–6680, Apr. 2019.
    [15] X. Zhang, Y. Zhang, M. Zhong, D. Ding, Y. Cao, Y. Zhang, M. Zhang, and M. Yang, “Enhancing state-of-the-art classifiers with API semantics to detect evolved Android malware,” in Proc. ACM CCS 2020, Nov. 2020, p. 757–770.
    [16] M.-W.Tsang,“Analysisofinvisibledatapoisoningbackdoorattacksagainstmal- ware classifiers,” Master, NTUST, Taipei, Taiwan, Jul. 2021.
    [17] H. S. Anderson and P. Roth, “EMBER: An open dataset for training static PE malware machine learning models,” arXiv preprint arXiv:1804.04637, 2018.
    [18] Z. Zhang, P. Qi, and W. Wang, “Dynamic malware analysis with feature engi- neering and feature learning,” in Proc. AAAI 2020, Apr. 2020, pp. 1210–1217.
    [19] A. Küchler, A. Mantovani, Y. Han, L. Bilge, and D. Balzarotti, “Does every sec- ond count? Time-based evolution of malware behavior in sandboxes,” in Proc. NDSS Symposium 2021, Jan. 2021.
    [20] V. Ravi and S. Kp, “DeepMalNet: Evaluating shallow and deep networks for static PE malware detection,” ICT Express, vol. 4, pp. 255–258, Nov. 2018.
    [21] D. Kim, E. Kim, S. K. Cha, S. Son, and Y. Kim, “Revisiting binary code simi- larity analysis using interpretable feature engineering and lessons learned,” arXiv preprint arXiv:2011.10749, 2020.
    [22] E.Raff,J.Barker,J.Sylvester,R.Brandon,B.Catanzaro,andC.Nicholas,“Mal- ware detection by eating a whole EXE,” in Proc. AAAI 2018, Feb. 2018, pp. 268– 276.
    [23] M. Krčál, O. Švec, M. Bálek, and O. Jašek, “Deep convolutional malware classi- fiers can learn from raw executables and labels only,” in Proc. ICLR 2018, Feb. 2018.
    [24] J. Yan, G. Yan, and D. Jin, “Classifying malware represented as control flow graphs using deep graph convolutional neural network,” in Proc. DSN 2019, Jun. 2019, pp. 52–63.
    [25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recogni- tion,” in Proc. CVPR 2016, Dec. 2016, pp. 770–778.
    [26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    [27] O. Suciu, S. E. Coull, and J. Johns, “Exploring adversarial examples in malware detection,” in Proc. S&P Workshops 2019, May 2019, pp. 8–14.
    [28] J. Yuan, S. Zhou, L. Lin, F. Wang, and J. Cui, “Black-box adversarial attacks against deep learning based malware binaries detection with GAN,” in Proc. ECAI 2020, Jun. 2020, pp. 2536–2542.
    [29] Y.KucukandG.Yan,“Deceivingportableexecutablemalwareclassifiersintotar- geted misclassification with practical adversarial examples,” in Proc. CODASPY 2020, Mar. 2020.
    [30] H. Bostani and V. Moonsamy, “EvadeDroid: A practical evasion attack on machine learning for black-box android malware detection,” arXiv preprint arXiv:2110.03301, 2021.
    [31] A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita- Rotaru, and F. Roli, “Why do adversarial attacks transfer? Explaining transfer- ability of evasion and poisoning attacks,” in Proc. USENIX Security 2019, Aug. 2019, pp. 321–338.
    [32] T. Shapira, D. Berend, I. Rosenberg, Y. Liu, A. Shabtai, and Y. Elovici, “Be- ing single has benefits. instance poisoning to deceive malware classifiers,” arXiv preprint arXiv:2010.16323, 2020.
    [33] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li, “Manip- ulating machine learning: Poisoning attacks and countermeasures for regression learning,” in Proc. S&P 2018, May 2018, pp. 19–35.
    [34] Y. Li, Y. Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” in Proc. ICCV 2021, Oct. 2021, pp. 16 463–16 472.
    [35] B.Wang,Y.Yao,S.Shan,H.Li,B.Viswanath,H.Zheng,andB.Y.Zhao,“Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in Proc. S&P 2019, May 2019, pp. 707–723.
    [36] Y.Liu,W.-C.Lee,G.Tao,S.Ma,Y.Aafer,andX.Zhang,“ABS:Scanningneural networks for backdoors by artificial brain stimulation,” in Proc. ACM CCS 2019, Nov. 2019, p. 1265–1282.
    [37] S.-T. Huang, “Detection of DNN backdoor model using layer-wise relevance propagation based neurons relationship analysis,” Master, NTUST, Taipei, Tai- wan, Jul. 2020.
    [38] VirusTotal, “Analyze suspicious files and URLs to detect types of malware, automatically share them with the security community.” [Online]. Available: https://www.virustotal.com/gui/home/upload
    [39] C.-Y.Wu,T.Ban,S.-M.Cheng,B.Sun,andT.Takahashi,“IoTmalwaredetection using function-call-graph embedding,” in Proc. PST 2021, Dec. 2021, pp. 1–9.
    [40] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” in Proc. NIPS 2017, Dec. 2017, p. 3149–3157.
    [41] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. ACM SIGKDD 2016, Aug. 2016, pp. 785–794.
    [42] T.K.Ho,“Randomdecisionforests,”inProc.ICDAR1995,Aug.1995,pp.278– 282.
    [43] C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detec- tion,” in Proc. NIPS 2013, Dec. 2013.

    QR CODE