簡易檢索 / 詳目顯示

研究生: 張勝富
Sheng-Fu Chang
論文名稱: 利用本徵維度變化偵測查詢式對抗式攻擊
Detection of Query-based Adversarial Attacks Using Intrinsic Dimension Changes
指導教授: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
口試委員: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
鄧惟中
Wei-Chung Teng
林豐澤
Feng-Tse Lin
毛敬豪
Ching-Hao Mao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 73
中文關鍵詞: 對抗式樣本神經網路查詢式對抗式攻擊查詢行為分析異常偵測
外文關鍵詞: adversarial examples, neural networks, query-based adversarial attacks, query behavior analysis, anomaly detection
相關次數: 點閱:304下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近期,對抗式攻擊研究受到愈來愈多學者關注,由於對抗式攻擊難以察覺,對機器學習應用而言是潛在的安全威脅。在過去,對抗式攻擊必須先取得模型內部參數。然而,在無法獲取模型參數的狀況下,近年來也有學者提出利用多次模型查詢來製造對抗式樣本。過去對抗式攻擊防禦主要針對輸入樣本進行個別檢測,對當前查詢式對抗式攻擊而言不實用且效率低。由於查詢式對抗式攻擊會產生連續相似的圖片,相對於個別檢測,我們認為透過時序性資料分析,更能有效看出模型查詢間的相關性,因此本論文提出了一個基於本徵維度變化的查詢行為分析方法,且進行了廣泛的實驗以訓練用於連續模型查詢序列的分類器。實驗成果表明,在符合現有查詢式對抗式攻擊的假設下,本論文所提出的方法可以達到接近100%的偵測率。本論文主要有以下幾點貢獻:(1)透過連續查詢序列偵測異常查詢行為。(2)採用多個知名資料集作為評估。結果證實,即使是非常複雜的模型,仍非常有效。 (3)藉由時序性分析,我們可以大幅下降模型誤判率。


    Recently, adversarial examples receive lots of attention since they could pose a security threat to machine learning applications without being detected. In general, the adversary has to know the internal parameters of neural model. However, an adversary is able to craft adversarial samples without the knowledge of targeted model by using query-based attacks. Prior defense mechanisms aiming at the detection of each query input are inefficient and impractical for current query-based attacks. Since query-based attack will successively produce similar images, we believe that a temporal analysis approach is better for understanding of the correlation between these queries. In this thesis, we propose a system for detecting anomaly query behavior based on the temporal changes of the intrinsic dimension values, and extensive experiments are developed to train a query-behavior classifier. The experimental results show that, under the current query-based adversarial attack's assumption, our detection system can achieve nearly 100% detection rate. This thesis has the following contributions: (1) Detecting anomaly query behavior through the correlation of query sequences. (2) We use multiple well-known datasets for evaluations. The experimental results confirm that even for a complex model, our approach is still effective. (3) Through the temporal analysis, we can significantly reduce the classification model's false positive rate.

    Contents 中文摘要 i ABSTRACT ii 誌謝 iii 1 Introduction 1 1.1 Motivation 2 1.2 Challenges and Goals 4 1.3 Contributions 5 1.4 Outline of the Thesis 6 2 Background 7 2.1 Preliminaries 7 2.2 Threat Model 9 2.3 Adversarial Attacks 11 2.3.1 White-box Attacks 11 2.3.2 Black-box Attacks 13 2.4 Existing Defensive Mechanisms against Black-box Adversarial Attacks 14 2.4.1 Model Hardening 14 2.4.2 Input Preprocessing 15 2.4.3 Detection 15 3 Temporal Query Behavior Analysis for Adversarial Attacks Detection 17 3.1 Query Behavior Constructor 19 3.2 Query Behavior Feature Extractor 19 3.2.1 Activation Extraction 20 3.2.2 Intrinsic Dimension Estimation (IDE) 22 3.3 Query Behavior Analysis 23 3.4 Attack Pattern Detector 27 3.4.1 Dataset Generation 27 3.4.2 Classifier Training 29 4 Experimental Results and Effectiveness Analysis 30 4.1 Environment Setup and Dataset 31 4.1.1 Experiment Concept 31 4.1.2 Query Sequences Datasets 32 4.1.3 Analysis Environment 34 4.2 Evaluation Metrics 34 4.3 Effectiveness Analysis 36 4.3.1 Effectiveness of temporal query behavior analysis 36 4.3.2 Comparison of different parameter settings 37 5 Conclusions and Further Work 43 5.1 Conclusions 43 5.2 Limitations and Future Work 44

    [1] “Machine Learning as a Service Market,” accessed on: Jun. 28, 2020. [Online]. Available: https://www.transparencymarketresearch.com/pressrelease/machine-learning-as-a-service-market.htm
    [2] M. Comiter, “Attacking Artificial Intelligence: AI’s Security Vulnerability and What Policymakers Can Do About It,” Belfer Center for Science and International Affairs, Harvard Kennedy School, accessed on: Jun. 28, 2020. [Online]. Available: https://www.belfercenter.org/publication/AttackingAI
    [3] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramèr, A.Prakash, and T. Kohno, “Physical Adversarial Examples for Object Detectors,” in 12th USENIX Workshop on Offensive Technologies (WOOT 18), 2018.
    [4] “Cylance, i kill you!” accessed on: Jun. 28, 2020. [Online]. Available: https://skylightcyber.com/2019/07/18/cylance-i-kill-you
    [5] B. Nassi, D. Nassi, R. Ben-Netanel, Y. Mirsky, O. Drokin, and Y. Elovici, “Phantom of the adas: Phantom attacks on driver-assistance systems,” IACR Cryptol. ePrint Arch., vol. 2020, p. 85, 2020.
    [6] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. D. McDaniel, “On the (statistical) detection of adversarial examples,” CoRR, vol. abs/1702.06280,2017. [Online]. Available: http://arxiv.org/abs/1702.06280
    [7] D. Meng and H. Chen, “Magnet: A two-pronged defense against adversarial examples,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’17. Association for Computing Machinery, 2017, pp. 135–147.
    [8] K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018, pp. 7167–7177.
    [9] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. N. R. Wijewickrema, G. Schoenebeck, D. Song, M. E. Houle, and J. Bailey, “Characterizing adversarial subspaces using local intrinsic dimensionality,” in 6th International Conference on Learning Representations, Conference Track Proceedings, 2018.
    [10] S. Chen, “Stateful detection of black box adversarial attacks,” Master’s thesis, EECS Department, University of California, Berkeley, May 2019. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-55.html
    [11] M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “PRADA: Protecting AgainstDNN Model Stealing Attacks,” in 2019 IEEE European Symposium on Security and Privacy (EuroS&P), 2019, pp. 512–527.
    [12] A. Ansuini, A. Laio, J. H. Macke, and D. Zoccolan, “Intrinsic dimension of data representations in deep neural networks,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 6111–6122.
    [13] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [14] A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009.
    [15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
    [16] C.-C. Tu, P. Ting, P.-Y. Chen, S. Liu, H. Zhang, J. Yi, C.-J. Hsieh, and S.-M.Cheng, “AutoZOOM: Autoencoder-Based Zeroth Order Optimization Method for Attacking Black-Box Neural Networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 742–749, 2019.
    [17] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning in computer vision: A survey,” IEEE Access, vol. 6, pp. 14410–14430, 2018.
    [18] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung, “A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View,” IEEE Access, vol. 6, pp. 12103–12117, 2018.
    [19] Y. Lecun, Y. Bengio, and G. E. Hinton, “Deep learning,” Nature, vol. 521, no.7553, pp. 436–444, 2015.
    [20] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in 2nd International Conference on Learning Representations, Conference Track Proceedings, 2014.
    [21] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in 3rd International Conference on Learning Representations, Conference Track Proceedings, 2015.
    [22] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin, “Black-box Adversarial Attacks with Limited Queries and Information,” in Proceedings of the 35th InternationalConference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 80. PMLR, 2018, pp. 2137–2146.
    [23] N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman, “SoK: Security and Privacy in Machine Learning,” in 2018 IEEE European Symposium on Security and Privacy (EuroS&P), 2018, pp. 399–414.
    [24] S. Bhambri, S. Muku, A. Tulasi, and A. B. Buduru, “A survey of black-box adversarial attacks on computer vision models,” CoRR, vol. abs/1912.01667,2019. [Online]. Available: http://arxiv.org/abs/1912.01667
    [25] “Google Cloud Platform,” accessed on: Jun. 28, 2020. [Online]. Available: https://cloud.google.com
    [26] “Amazon Web Services (AWS),” accessed on: Jun.28,2020.[Online]. Available: https://aws.amazon.com/machine-learning
    [27] “BigML,” accessed on: Jun. 28, 2020. [Online]. Available: https://bigml.com
    [28] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in 5th International Conference on Learning Representations, Workshop Track Proceedings, 2017.
    [29] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in 6th International Conference on Learning Representations, Conference Track Proceedings, 2018.
    [30] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P), 2016, pp. 372–387.
    [31] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: A simple and accurate method to fool deep neural networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
    [32] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (S&P), 2017, pp. 39–57.
    [33] N. Carlini and D. Wagner, “Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods,” in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, ser. AISec ’17. Association for Computing Machinery, 2017, pp. 3–14.
    [34] X.Yuan, P.He, Q. Zhu, and X. Li, “Adversarial Examples: Attacks and Defenses for Deep Learning,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–20, 2019.
    [35] W. Brendel, J. Rauber, and M. Bethge, “Decision-based adversarial attacks: Reliable attacks against black-box machine learning models,” in 6th International Conference on Learning Representations, Conference Track Proceedings, 2018.
    [36] F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. Mc-Daniel, “Ensemble adversarial training: Attacks and defenses,” in 6th International Conference on Learning Representations, Conference Track Proceedings, 2018.
    [37] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE Symposium on Security and Privacy (S&P), 2016, pp. 582–597.
    [38] C. Guo, M. Rana, M. Cissé, and L. van der Maaten, “Countering adversarial images using input transformations,” in 6th International Conference on Learning Representations, Conference Track Proceedings, 2018.
    [39] M. Kesarwani, B. Mukhoty, V. Arya, and S. Mehta, “Model Extraction Warning in MLaaS Paradigm,” in Proceedings of the 34th Annual Computer Security Applications Conference, ser. ACSAC ’18. ACM, 2018, pp. 371–380.
    [40] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks,” in 25th Annual Network and Distributed System Security Symposium, NDSS. The Internet Society, 2018.
    [41] E. Facco, M. D’Errico, A. Rodriguez, and A. Laio, “Estimating the intrinsic dimension of datasets by a minimal neighborhood information,” Scientific Reports, vol. 7, no. 1, p. 12140, 2017.
    [42] “AutoZOOM Attack Github repository,” accessed on: Jun. 28, 2020. [Online]. Available: https://github.com/IBM/Autozoom-Attack
    [43] “MNIST baseline model source code,” accessed on: Jun. 28, 2020. [Online]. Available: https://github.com/carlini/nn_robust_attacks/blob/master/setup_mnist.py
    [44] “CIFAR-10 baseline model source code,” accessed on: Jun. 28, 2020. [Online]. Available: https://github.com/carlini/nn_robust_attacks/blob/master/setup_cifar.py
    [45] “ILSVRC website,” accessed on: Jun. 28, 2020. [Online]. Available: http://www.image-net.org/challenges/LSVRC
    [46] “ILSVRC2012 validation dataset,” accessed on: Jun. 28, 2020. [Online]. Available: http://academictorrents.com/details/5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5
    [47] “Inception-V3 model pre-trained by TensorFlow,” accessed on: Jun. 28, 2020. [Online]. Available: https://github.com/tensorflow/models/tree/master/research/slim
    [48] F. Cholletet al., “Keras,” 2015, accessed on: Jun. 28, 2020. [Online]. Available: https://keras.io
    [49] F. Pedregosaet al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825–2830, 2011.
    [50] M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/

    QR CODE