研究生: |
鄭伊捷 Yi-Chieh Cheng |
---|---|
論文名稱: |
在嵌入空間中擴增數據點應用於對比表示蒸餾 Augmentation in the Embedding Space for Contrastive Representation Distillation |
指導教授: |
鮑興國
Hsing-Kuo Pao |
口試委員: |
項天瑞
Tien-Ruey Hsiang 楊傳凱 Chuan-Kai Yang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 40 |
中文關鍵詞: | 深度學習 、知識蒸餾 、對比學習 、嵌入擴散 、埃尔米特插值 |
外文關鍵詞: | Deep Learning, Knowledge Distillation, Contrastive Learning, Embedding Expansion, Hermite Interpolation |
相關次數: | 點閱:480 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於模型結構複雜,深度學習方法通常在模型訓練和模型預測中引入較高的時間和空間複雜度。為了有效地將深度學習方法應用於輕量級設備,例如物聯網環境設備,模型壓縮到輕量級版本近年來引起了人們的關注。知識蒸餾就是這樣的方法之一。目標主要用於將大模型提供的知識轉移到輕模型的學習中。在最先進的方法之一中,知識蒸餾和對比學習的結合可以有效地促進學生模型的學習。對比學習渴望大量的負樣本,因此以前的方法增加記憶庫或使用大數據增強和批量大小來實現。然而,近年來許多論文都致力於研究有意義的負樣本。因此,我們提出了一種在嵌入空間內直接生成有意義的負樣本的方法,並探索在數據點之間使用不同插值方法的效果。這樣就避免了使用記憶庫來存儲數據點,達到找到有用的負樣本的效果。這樣,進而有效地減少資源消耗。我們通過列舉不同任務的性能改進來證明所提出方法的有效性和多個數據集。所提出的方法在知識蒸餾和跨模型遷移任務方面也優於當前最先進的方法。
The deep learning methods usually introduce high time and space complexity in model training and model prediction due to their complex model structures. In order to effectively apply the deep learning methods to deploy in light-weight devices such as the devices for Internet of things environment, the model compression to its lighter version has drawn attention in recent years. The knowledge distillation is one of such approaches. The goal is mainly used to transfer the knowledge offered by the bulky model to the learning of the light model. In one of the state-of-the-art method, the combination of knowledge distillation and contrastive learning can effectively promote the learning of the student model. Contrastive learning asks for a large number of negative samples, so previous methods either increase the memory banks or use large data augmentation and batch size to achieve the goal. In recent years, many researchers have also devoted themselves in finding informative negative examples. We propose a method to directly generate informative negative samples in the embedding space, and explore the effect of using different interpolation methods between data points to further improve the performance. This avoids the use of memory banks to store data points, and achieves the effect of finding helpful negative samples. By having that, resource consumption can be effectively reduced. We demonstrate the effectiveness of the proposed approach by enumerating performance improvement across different tasks and various datasets. The proposed method also outperforms current state-of-the-art methods on knowledge distillation and cross-model transfer tasks.
[1] J. Cheng, P.-s. Wang, G. Li, Q.-h. Hu, and H.-q. Lu, “Recent advances in efficient computation of deep convolutional neural networks,” Frontiers of Information Technology & Electronic Engineering, vol. 19, no. 1, pp. 64–77, 2018.
[2] A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive selfsupervised learning,” Technologies, vol. 9, no. 1, p. 2, 2020.
[3] Y. Tian, D. Krishnan, and P. Isola, “Contrastive representation distillation,” in International Conference on Learning Representations, 2020.
[4] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
[5] Y. Kalantidis, M. B. Sariyildiz, N. Pion, P. Weinzaepfel, and D. Larlus, “Hard negative mixing for contrastive learning,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 21798–21809, Curran Associates, Inc., 2020.
[6] G. Hinton, O. Vinyals, J. Dean, et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
[7] J. Ba and R. Caruana, “Do deep nets really need to be deep?,” Advances in neural information processing systems, vol. 27, 2014.
[8] J. H. Cho and B. Hariharan, “On the efficacy of knowledge distillation,” in Proceedings of the IEEE/ CVF international conference on computer vision, pp. 4794–4802, 2019.
[9] J. Kim, S. Park, and N. Kwak, “Paraphrasing complex network: Network compression via factor transfer,” Advances in neural information processing systems, vol. 31, 2018.
[10] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
[11] S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” arXiv preprint arXiv:1612.03928, 2016.
[12] F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374, 2019.
[13] B. Peng, X. Jin, J. Liu, D. Li, Y. Wu, Y. Liu, S. Zhou, and Z. Zhang, “Correlation congruence for knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016, 2019.
[14] W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976, 2019.
[15] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4320–4328, 2018.
[16] D. Chen, J.-P. Mei, C. Wang, Y. Feng, and C. Chen, “Online knowledge distillation with diverse peers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3430–3437, 2020.
[17] L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722, 2019.
[18] L. Chen, D. Wang, Z. Gan, J. Liu, R. Henao, and L. Carin, “Wasserstein contrastive representation distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16296–16305, 2021.
[19] J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, no. 6, pp. 1789–1819, 2021.
[20] C. Yang, Z. An, L. Cai, and Y. Xu, “Hierarchical self-supervised augmented knowledge distillation,” arXiv preprint arXiv:2107.13715, 2021.
[21] W. Falcon and K. Cho, “A framework for contrastive self-supervised learning and designing a new approach,” arXiv preprint arXiv:2009.00104, 2020.
[22] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015.
[23] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
[24] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, pp. 1597–1607, PMLR, 2020.
[25] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21271–21284, 2020.
[26] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” Advances in Neural Information Processing Systems, vol. 33, pp. 9912–9924, 2020.
[27] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758, 2021.
[28] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18661–18673, 2020.
[29] J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive learning with hard negative samples,”arXiv preprint arXiv:2010.04592, 2020.
[30] Y. Suh, B. Han, W. Kim, and K. M. Lee, “Stochastic class-based hard example mining for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7251–7259, 2019.
[31] Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722, 2021.
[32] K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in neural information processing systems, vol. 29, 2016.
[33] B. Ko and G. Gu, “Embedding expansion: Augmentation in embedding space for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7255–7264, 2020.
[34] G. Xu, Z. Liu, X. Li, and C. C. Loy, “Knowledge distillation meets self-supervision,” in European Conference on Computer Vision, pp. 588–604, Springer, 2020.
[35] T. Wang and P. Isola, “Understanding contrastive representation learning through alignment and uniformity on the hypersphere,” in International Conference on Machine Learning, pp. 9929–9939, PMLR, 2020