利用 LRP 分析神經元關係檢測 DNN 後門攻擊｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃詩婷 Shih-Ting Huang
論文名稱：	利用 LRP 分析神經元關係檢測 DNN 後門攻擊 Detection of DNN Backdoor Model Using Layer-Wise Relevance Propagation based Neurons Relationship Analysis
指導教授：	李漢銘 Hahn-Ming Lee 鄭欣明 Shin-Ming Cheng
口試委員:	林豐澤 Feng-Tse Lin 鄧惟中 Wei-Chung Teng 毛敬豪 Ching-Hao Mao 鄭欣明 Shin-Ming Cheng 李漢銘 Hahn-Ming Lee
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	76
中文關鍵詞：	深度神經網路、木馬攻擊、後門、神經元關係、特徵歸因分析
外文關鍵詞：	deep neural network, trojan attacks, backdoor, relationship of neurons, attribution analysis
相關次數：	點閱：170 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

基於神經網路的應用雖然能夠有效地解決許多現代複雜的問題，然而由於訓練深度神經網路模型需要大量的計算成本及訓練資料，對於資源有限的中小企業或個人單位可能難以負擔。因此將訓練複雜模型的工作外包給第三方，或是在公開平台上下載預訓練模型使用的案例日與俱增。然而，攻擊者可能會在模型訓練的過程中植入後門，造成隱蔽性高的木馬攻擊。由於木馬模型在輸入資料具有攻擊者嵌入的觸發器前，表現皆十分正常，因此如何有效地檢測模型是否被植入後門，是提高廣泛被應用的人工智慧安全性不可或缺的重要議題。過去的木馬模型檢測方法，多數基於防禦者具有毒化訓練數據集的不實際假設。因此，本論文提出了無需訓練資料集之基於神經元關係分析的木馬模型檢測方法。我們統計分析了模型最後卷積層的神經元觸發情形，並使用 Layer-Wise Relevance Propagation (LRP) 對輸出及該層的神經元進行特徵歸因分析，藉此找到木馬神經元。實驗結果證明，本論文所提出之方法無需透過毒化數據即可利用模型的參數及神經元的觸發行為進行分析，並有效地發現更多與觸發器相關的神經元。

Although the Deep Neural Network (DNN) model could effectively help people resolve complicated problems, the intensive computation cost might not be affordable by individual or company with limited resources. It results in a new business model where customers could buy a pre-trained model that outsourced from a provider using an acceptable price. However, the adversary could provide a polluted model where a backdoor is injected during the training phase, which is known as Trojan attacks in neural networks. A trojaned model is very difficult to be detected since the model behaves as normal until it receives a crafted trigger. How to efficiently detect trojaned models is a significant topic to improve the security of applying the model. However, prior defenses need the authority to access the poisonous dataset or need extensive computing resources, which is not practical. In this thesis, we propose a measure that only investigates the pre-trained model itself without any information on the poisonous dataset. In particular, we examine the statistic last convolution layer activation of the model, and adopt Layer-Wise Relevance Propagation (LRP) to perform attribution analysis on this layer to find trojan neurons. Without the restrictions on the poisonous data, only investigates the relationship among neurons of the suspicious model. The experimental results show that our method can efficiently identify more neurons related to triggers.

中文摘要 i
ABSTRACT ii
誌謝 iii
Introduction 1
1 Motivation  2
2 Challenges and Goals 4
3 Contributions 5
4 Outline of the Thesis  6
Background and Related Work 8
1 Neural Networks 8
2 Neural Trojan Attacks 10
3 Threat Model 12
4 Existing Defense on DNN Trojan Attacks 14
4.1 Trojan Attacks Detection 14
4.2 Fixing Trojaned Model 16
5 The Current Status of Detection Limits 16
Backdoor Attack on DNN Detection 18
1 Suspicious Neurons Selection 20
2 Neuron Activation Values Adjustment 22
3 Neuron Relationship Score Estimator 24
3.1 Layer-wise Relevance Propagation, LRP 24
3.2 LRP Score of Trojan Data 27
4 Neurons Relation Analysis 28
5 Damaged Neurons Identification 30
Experimental Results and Effectiveness Analysis 31
1 Environment Setup 31
1.1 Experiment Concept 31
1.2 Experiment Dataset and Environment 32
2 Detection Effectiveness 34
2.1 Trojan Model Detection Effectiveness 34
2.2 Target Label Identification 35
2.3 Trojan Neurons Identification 37
3 Additional Experiment 39
3.1 Multi-target Category Attack 39
3.2 The Statistics of Neuron Activation Through Different Proportions of Poisonous Data 40
3.3 The Process of Finding Trojan Neurons without Filtering Neurons 42
Discussion and Limitations 44
1 Observations 44
2 Limitations 46
Conclusions and Further Work 48
                                

[1] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A Survey on Deep Transfer Learning,” in Artificial Neural Networks and Machine Learning – ICANN 2018, 2018, pp. 270–279.
[2] “Keras,” accessed on: Jul. 08, 2020. [Online]. Available: https://keras.io
[3] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the 22nd ACM International Conference on Multimedia, 2014, pp. 675–678.
[4] “Google cloud machine learning engine,” accessed on: Jul. 08, 2020. [Online]. Available: https://cloud.google.com/ai-platform
[5] “Amazon sagemaker,” accessed on: Jul. 08, 2020. [Online]. Available: https://aws.amazon.com/sagemaker/ nc1=h_ls
[6] “Microsoft’s azurebatch ai training,” accessed on: Jul. 08, 2020. [Online]. Available: https://azure.microsoft.com/en-us/services/machine-learning/
[7] Y. Ji, X. Zhang, S. Ji, X. Luo, and T. Wang, “Model-reuse attacks on deep learning systems,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 349–363.
[8] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “BadNets: Evaluating Backdooring Attacks on Deep Neural Networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019.
[9] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning Attack on Neural Networks,” in 25nd Annual Network and Distributed System Security Symposium (NDSS), 2018.
[10] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering,” CoRR, vol. abs/1811.03728, 2019. [Online]. Available: http://arxiv.org/abs/1702.06280
[11] Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “STRIP: A Defence against Trojan Attacks on Deep Neural Networks,” in Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC’19), 2019, pp. 113–125.
[12] B. Tran, J. Li, and A. Madry, “Spectral Signatures in Backdoor Attacks,” in Advances in Neural Informatio Processing Systems 31, 2018, pp. 8000–8010.
[13] X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting AI Trojans Using Meta Neural Analysis,” CoRR, vol. abs/1910.03137, 2019. [Online]. Available: https://arxiv.org/abs/1910.03137
[14] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks,” in 2019 IEEE Symposium on Security and Privacy (S&P), 2019, pp. 707–723.
[15] H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), 2019, pp. 4658–4664.
[16] Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, “ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ’19), 2019, pp. 1265–1282.
[17] W. Samek, T. Wiegand, and K.-R. Müller, “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models,” ITU Journal: ICT Discoveries - Special Issue 1 - The Impact of Artificial Intelligence (AI) on Communication Networks and Services, vol. 1, pp. 1–10, 2017.
[18] B. K. Iwana, R. Kuroki, and S. Uchida, “Explaining Convolutional Neural Networks using Softmax Gradient Layer wise Relevance Propagation,” in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 4176–4185.
[19] J. Stilgoe, “Machine learning, social learning and the governance of self-driving cars,” Social Studies of Science, vol. 48, pp. 25–56, 2018.
[20] G. López, L. Quesada, and L. A. Guerrero, “Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces,” in Advances in Human Factors and Systems Interaction, 2018, pp. 241–250.
[21] A. Vaswani, S. Bengio, E. Brevdo, F. Chollet, A. N. Gomez, S. Gouws, L. Jones, L. Kaiser, N. Kalchbrenner, N. Parmar, R. Sepassi, N. Shazeer, and J. Uszkoreit, “Tensor2tensor for neural machine translation,” CoRR, vol. abs/1803.07416, 2018. [Online]. Available: http://arxiv.org/abs/1803.07416
[22] Y. Lecun, Y. Bengio, and G. E. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
[23] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung, “A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View,” IEEE Access, vol. 6, pp. 12 103–12 117, 2018.
[24] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big Data, vol. 3, p. 9, 2016.
[25] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, 1998.
[26] H. Zhong, C. Liao, A. C. Squicciarini, S. Zhu, and D. Miller, “Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation,” in Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy (CODASPY ’20), 2020, pp. 97–108.
[27] A. Saha, A. Subramanya, and H. Pirsiavash, “Hidden Trigger Backdoor Attacks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11 957–11 965, 2020.
[28] X. Wang, J. Li, X. Kuang, Y.-a. Tan, and J. Li, “The security of machine learning in an adversarial setting: A survey,” Journal of Parallel and Distributed Computing, vol. 130, pp. 12–23, 2019.
[29] S. Udeshi, S. Peng, G. Woo, L. Loh, L. Rawshan, and S. Chattopadhyay, “Model Agnostic Defence against Backdoor Attacks in Machine Learning,” CoRR, vol. cs.LG/1908.02203, 2019. [Online]. Available: https://arxiv.org/abs/1908.02203
[30] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks,” in Research in Attacks, Intrusions, and Defenses, 2018, pp. 273–294.
[31] H. Cheng, K. Xu, S. Liu, P.-Y. Chen, P. Zhao, and X. Lin, “Defending against Backdoor Attack on Deep Neural Networks,” CoRR, vol. abs/2002.1, 2020. [Online]. Available: https://arxiv.org/abs/2002.12162
[32] K. Lee, K. Lee, H. Lee, and J. Shin, “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks,” in Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018, pp. 7167–7177.
[33] P. Yang, J. Chen, C.-J. Hsieh, J.-L. Wang, and M. I. Jordan, “ML-LOO: Detecting Adversarial Examples with Feature Attribution,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 6639–6647.
[34] X. Qiao, Y. Yang, and H. Li, “Defending Neural Backdoors via Generative Distribution Modeling,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 14 004–14 013.
[35] W. Guo, L. Wang, X. Xing, M. Du, and D. Song, “Tabor: A highly accurate approach to inspecting and restoring trojan backdoors in ai systems,” CoRR, vol. abs/1908.0, 2019. [Online]. Available: https://arxiv.org/abs/1908.01763
[36] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” in Computer Vision – ECCV 2014, 2014, pp. 818–833.
[37] A. Adadi and M. Berrada, “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access, vol. 6, pp. 52 138–52 160, 2018.
[38] P. Cortez and M. J. Embrechts, “Using sensitivity analysis and visualization techniques to open black box data mining models,” Information Sciences, vol. 225, pp. 1–17, 2013.
[39] M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-Precision ModelAgnostic Explanations,” in AAAI Conference on Artificial Intelligence, 2018, pp. 1527–1535.
[40] S. García, A. Fernández, and F. Herrera, “Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems,” Applied Soft Computing, vol. 9, pp. 1304–1314, 2009.
[41] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, and K.-R. Müller, Layer-Wise Relevance Propagation: An Overview. Springer International Publishing, 2019, pp. 193–209.
[42] “Lrp’s heatmapping tools,” accessed on: Jul. 08, 2020. [Online]. Available: http://www.heatmapping.org/
[43] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation,” PLOS ONE, vol. 10, pp. 1–46, 2015.
[44] A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” University of Toronto, 2012.
[45] M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
[46] M. Fredrikson, S. Jha, and T. Ristenpart, “Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS ’15), 2015, pp. 1322–1333.

簡易檢索 / 詳目顯示

相關論文