研究生: |
李恩 En Lee |
---|---|
論文名稱: |
利用流形生成之對抗樣本預測深度學習模型的泛化能力 Predicting Generalization of Deep Learning Models through On-Manifold Adversarial Examples |
指導教授: |
李漢銘
Hahn-Ming Lee 鄭欣明 Shin-Ming Cheng |
口試委員: |
黃俊穎
Chun-Ying Huang 游家牧 Chia-Mu Yu 陳品諭 Pin-Yu Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 54 |
中文關鍵詞: | 泛化預測 、一般化 、對抗式攻擊 、對抗式樣本 、決策邊界 |
外文關鍵詞: | Generalization Prediction, Generalization, Adversarial Attack, Adversarial Example, Decision Boundary |
相關次數: | 點閱:173 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度學習已成功應用於從圖像識別到物件偵測等多個領域、在多種任務中都取得了顯著的成功。然而,如果無法預測模型的泛化能力(generalization),就無法確保模型在未見過數據上的最差表現,這對關鍵基礎設施而言是重大的安全問題。如何在深度學習模型上準確地預測其泛化能力是重要且迫切的需求。
過去的泛化預測方法,多數會破壞原始訓練集的語意結構(semantic structure),導致預測的準確度下降。因此,本論文提出了一個基於擾動樣本且能維持語意結構的泛化預測方法。我們透過限制擾動樣本落在流形空間(manifold space)內,使對抗式樣本保留原始語意及空間分佈特性,以此克服了過去方法的限制。值得一提的是,利用此方法生成的對抗式樣本與訓練集在語意上高度相似,故可以該樣本模擬測試集,用以預測模型的泛化能力。實驗結果證明,本論文所提出之方法生成的模擬測試集能保留更多訓練集的原始特徵和語意分佈,並更準確的預測深度學習模型的泛化能力。
Deep learning (DL) has been successfully applied in many fields, such as image recognition and object detection, and has achieved significant success in various tasks. However, without the ability to accurately measure the generalization of the model, there will be no way to guarantee the worst performance of the model on unseen data, which will constitute a major safety concern when applying DL to critical infrastructures. Therefore, how to accurately predict generalization on DL models is an important and urgent need. Most of the previous generalization prediction methods break the semantic structure of the original training set, causing the decrease of prediction accuracy. In view of it, this thesis proposes a generalization prediction method based on perturbation samples that can maintain the semantic structure. We retain the original semantic and distributional properties of the adversarial examples by restricting the perturbed output in the manifold space so that we can overcome the limitations of previous works. In particular, the adversarial examples generated by this method is highly similar to training set in semantic, and thus can be used to simulate the test set for predicting the generalization of the model. The experimental results show that the simulated test set generated by our method can maintain more original features and semantic distributions of the training set, and so more accurately predict the generalization of the DL models.
[1] Y. Jiang, P. Foret, S. Yak, D. M. Roy, H. Mobahi, G. K. Dziugaite, S. Bengio, S. Gunasekar, I. Guyon, and B. Neyshabur, “Neurips 2020 competition: Predict ing generalization in deep learning,” arXiv preprint arXiv:2012.07976, 2020.
[2] V. N. Vapnik and A. Y. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,” in Measures of complexity, 2015.
[3] N. Harvey, C. Liaw, and A. Mehrabian, “Nearlytight vcdimension bounds for piecewise linear neural networks,” in Proceedings of Conference on learning the ory, 2017.
[4] Y. Jiang, D. Krishnan, H. Mobahi, and S. Bengio, “Predicting the generalization gap in deep networks with margin distributions,” in Proceedings of International Conference on Learning Representations, 2018.
[5] D. Kashyap, N. Subramanyam et al., “Robustness to augmentations as a general ization metric,” arXiv preprint arXiv:2101.06459, 2021.
[6] H. Zhang, M. Cisse, Y. N. Dauphin, and D. LopezPaz, “Mixup: Beyond empir ical risk minimization,” in Proceedings of International Conference on Learning Representations, 2018.
[7] Y. Schiff, B. Quanz, P. Das, and P.Y. Chen, “Gi and pal scores: Deep neural
network generalization statistics,” arXiv preprint arXiv:2104.03469, 2021.
[8] P. Natekar and M. Sharma, “Representation based complexity measures for pre
dicting generalization in deep learning,” arXiv preprint arXiv:2012.02775, 2020.
[9] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Proceedings of International Conference on Neural Information Processing Systems, 2017.
[10] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proceedings of Interna tional Conference on Learning Representations, 2014.
[11] I.Goodfellow,J.Shlens,andC.Szegedy,“Explainingandharnessingadversarial examples,” in Proceedings of International Conference on Learning Representa tions, 2015.
[12] B. Riemann, Grundlagen fur eine allgemeine Theorie der Functionen einer veränderlichen complexen Grösse, 1867.
[13] T. Sauer, J. A. Yorke, and M. Casdagli, “Embedology,” Journal of statistical Physics, 1991.
[14] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proceedings of International Conference on Learning Representations, 2018.
[15] J.B.Tenenbaum,V.d.Silva,andJ.C.Langford,“Aglobalgeometricframework
for nonlinear dimensionality reduction,” Science, 2000.
[16] “Cifar10 dataset,” [Online]. Available: https://www.cs.toronto.edu/~kriz/cifar.
html.
[17] “The street view house numbers (svhn) dataset,” [Online]. Available: http://ufldl. stanford.edu/housenumbers/.
[18] L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “Cinic10 is not imagenet or cifar10,” arXiv preprint arXiv:1810.03505, 2018.
[19] “Oxford flower dataset,” [Online]. Available: https://www.robots.ox.ac.uk/~vgg/ data/flowers/.
[20] “The oxfordiiit pet dataset,” [Online]. Available: https://www.robots.ox.ac.uk/ ~vgg/data/pets/.
[21] H. Xiao, K. Rasul, and R. Vollgraf, “Fashionmnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
[22] M. Lin, Q. Chen, and S. Yan, “Network in network,” in Proceedings of Interna tional Conference on Learning Representations, 2014.
[23] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two timescale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, 2017.
[24] L. Van der Maaten and G. Hinton, “Visualizing data using tsne,” Journal of ma
chine learning research, 2008.
[25] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transac tions on pattern analysis and machine intelligence, 1979.