研究生: |
袁倫大 Lun-Da Yuan |
---|---|
論文名稱: |
基於對比式學習之知識蒸餾於瑕疵偵測之應用 Defect Detection with Contrastive Learning Based Knowledge Distillation |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
楊士萱
Shih-Hsuan YANG 王乃堅 Nai-Jian Wang 范志鵬 Chih-Peng Fan 黃敬群 Ching-Chun Huang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 75 |
中文關鍵詞: | 瑕疵偵測 、知識蒸餾 、對比式學習 、自監督式學習 、無監督式學習 |
外文關鍵詞: | defect detection, knowledge distillation, contrastive learning, self-supervised learning, unsupervised learning |
相關次數: | 點閱:406 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出了一種方法,利用對比式學習預訓練模型作為知識蒸餾的輔助工具,以改善瑕疵偵測的性能。本論文採用MVTec資料集,內含15種工業生產影像,其中有10種物件影像、5種材質影像,總計5354個樣本資料中,訓練集佔3629張,測試集則有1725張,當中,訓練集為無瑕疵的正常樣本,測試集則由瑕疵樣本與其對應的標註影像所組成。在模型方面,本論文從對比式學習模型—simCLR模型開始探討,分成從頭訓練(From Scratch)與微調訓練(Fine Tune)兩個面向,經實驗發現,微調方式的損失函數會比從頭訓練的方式要來的低,其Top-1 Accuracy也相對較高。接著,本論文針對反向知識蒸餾瑕疵偵測模型(RD模型)進行修改,將老師編碼器替換為經simCLR模型訓練過的骨幹架構,實驗結果表示,透過微調方式所得到的骨幹架構可以得到平均83.07%的AUC數值;另外,若是採用從頭訓練的骨幹架構,則可以得到平均90.51%的AUC數值,這樣的結果是令人感到意外的。最後,本論文將預訓練的骨幹架構作為反向知識蒸餾瑕疵偵測模型(RD模型)的輔助模型進行訓練,在測試階段時則移除輔助模型,並保留下訓練好的映射模型。經實驗得到,使用微調方式所訓練的骨幹架構可以在測試階段獲得平均91.86%的AUC數值;採用從頭訓練所得到的預訓練骨幹架構在測試階段則可以得到平均92.56%的AUC數值。本論文採用自監督與無監督的學習方式,且僅只使用無瑕疵、無標註的正常樣本進行訓練,這使得本論文所提出的方法更可行、可靠於當前工業生產線上的應用。
This thesis presents a method that employs pre-trained contrastive learning models as auxiliary tools for knowledge distillation to enhance defect detection performance. The study utilizes the MVTec dataset, consisting of 15 types of industrial production images, including 10 object images and 5 texture images. Among the 5354 total samples, the training set comprises 3629 images, while the test set consists of 1725 images. The training set contains normal samples without defects, whereas the test set includes defect samples along with their corresponding annotated images. Regarding the model, the paper begins by exploring the simCLR model, a contrastive learning approach, and investigates two training strategies: training from scratch and fine-tuning. Experimental findings reveal that fine-tuning achieves lower loss functions and relatively higher Top-1 Accuracy compared to training from scratch. Subsequently, modifications are made to the reverse knowledge distillation defect detection model (RD model) by replacing the teacher encoder with a backbone architecture trained by the simCLR model. Surprisingly, the experimental results show that the fine-tuned backbone architecture attains an average AUC value of 83.07%, while the from-scratch trained backbone architecture achieves an average AUC value of 90.51%. Lastly, the paper employs the pre-trained backbone architecture as the auxiliary model during training for the reverse knowledge distillation defect detection model (RD model). The auxiliary model is removed during the testing phase, retaining only the trained projection head. Experimental results demonstrate that the fine-tuned backbone architecture achieves an average AUC value of 91.86% during testing, whereas the from-scratch trained pre-trained backbone architecture achieves an even higher average AUC value of 92.56%. The study adopts self-supervised and unsupervised learning approaches, utilizing only normal samples without defects or annotations for training. These factors contribute to the practicality and reliability of the proposed method in current industrial production line applications.
[1] Wu, Z., Xiong, Y., Yu, S.X., Lin, D., "Unsupervised feature learning via non-parametric instance discrimination," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733-3742, 2018.
[2] Ye, M., Zhang, X., Yuen, P.C., Chang, S.F., "Unsupervised Embedding Learning via Invariant and Spreading Instance Feature," in arXiv, arXiv:1904.03436, 2019.
[3] Van den Oord, Aäron, Li, Yazhe, Vinyals, Oriol, "Representation Learning with Contrastive Predictive Coding," in arXiv preprint arXiv: 1807.03748, 2018.
[4] Y. Tian, D. Krishnan, and P. Isola., "Contrastive Multiview Coding," in arXiv preprint arXiv:1906.05849, 2019.
[5] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, "Momentum contrast for unsupervised visual representation learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9726-9735, 2020
[6] X. Chen, H. Fan, R. Girshick, and K. He, "Improved baselines with momentum contrastive learning," in arXiv preprint arXiv:2003.04297, 2020.
[7] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, A. Joulin, "Unsupervised Learning
of Visual Features by Contrasting Cluster Assignments," in arXiv preprint
arXiv:2006.09882, 2020.
[8] J.-B. Grill et al., "Bootstrap your own latent: A new approach to self-supervised learning,"
in Proc. 34th Int. Conf. Neural Inf. Process. Syst., Art. no. 1786, 2020.
[9] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, A. Joulin, "Emerging
properties in self-supervised vision transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9650-9660, 2021.
[10] Bromley, Jane, Bentz, James W., Bottou, Léon, Guyon, Isabelle, LeCun, Yann, Moore, Cliff, Säckinger, Eduard and Shah, Roopak, "Signature Verification Using A "Siamese" Time Delay Neural Network," IJPRAI 7, no. 4, pp. 669-688, 1993.
[11] Chen, T., Kornblith, S., Norouzi, M., Hinton, G., "A Simple Framework for Contrastive
Learning of Visual Representations," in Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research, no.119, pp.1597-1607, 2020.
[12] C. Bucilua, R. Caruana, and A. Niculescu-Mizil, "Model compression," in Proceedings
of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, KDD ’06, pp. 535-541, 2006.
[13] G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," in
arXiv preprint arXiv:1503.02531, 2015.
[14] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, "Deep mutual learning," in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
4320-4328, 2018.
[15] J. H. Cho and B. Hariharan, "On the efficacy of knowledge distillation," in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4794-4802,
2019.
[16] Q. Xie, M.-T. Luong, E. Hovy, Q.-V. Le, "Self-training with noisy student improves ImageNet classification," in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2020.
[17] C. Yang, L. Xie, S. Qiao, and A. L. Yuille, "Training deep neural networks in generations:
A more tolerant teacher educates better students," in Proc. AAAI Conf. Artif. Intell., pp.
5628–5635, 2019.
[18] M. Phuong and C. H. Lampert, "Distillation-based training for multi-exit architectures,"
in Proc. IEEE Int. Conf. Comput. Vis., pp. 1355-1364, 2019.
[19] J. Yim, D. Joo, J.-H Bae, and J. Kim, "A gift from knowledge distillation: Fast optimization, network minimization and transfer learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[20] F. Tung and G. Mori, "Similarity-preserving knowledge distillation," in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 1365–1374, 2019.
[21] A. Romero, N. Ballas, S.-E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, "Fitnets: Hints
for thin deep nets," in International Conference on Learning Representations, 2015.
[22] S. Zagoruyko, N. Komodakis, "Paying more attention to attention: improving the
performance of convolutional neural networks via attention transfer," in International
Conference on Learning Representations, 2017.
[23] P. Bergmann, M. Fauser, D. Sattlegger, C. Steger, "Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4183–4192, 2020.
[24] H. Deng and X. Li., "Anomaly detection via reverse distillation from one-class embedding," In Proc.IEEE/CVF Conf. Computer Vision and Pattern Recognition, pp. 9737–9746, 2022.
[25] Tien, T. D., Nguyen, A. T., Tran, N. H., Huy, T. D., Duong, S. T. m, Nguyen, C. D. tr., & Truong, S. Q. h., "Revisiting Reverse Distillation for Anomaly Detection, " in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24511–24520, 2023.
[26] P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, C. Steger, "The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection," in International Journal of Computer Vision, vol. 129, no. 4, pp.1038-1059, 2021.
[27] P. Bergmann, M. Fauser, D. Sattlegger, C. Steger, "MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection, " in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9584-9592, 2019.
[28] S. Lee, S. Lee, B.-C. Song, "CFA: coupled-hypersphere based feature adaptation for target-oriented anomaly localization," in IEEE Access, no. 10, pp. 78446–78454, 2022.
[29] K. Roth, L. Pemula, J. Zepeda, B. Scholkopf, T. Brox, and P. Gehler, "Towards total recall in industrial anomaly detection," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14298–14308, 2022.
[30] Chen, X., He, K., "Exploring simple siamese representation learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750-15758, 2021.
[31] Koch, Gregory, Zemel, Richard and Salakhutdinov, Ruslan, "Siamese Neural Networks for One-shot Image Recognition," in ICML deep learning workshop, vol. 2, 2015.