在嵌入空間中擴增數據點應用於對比表示蒸餾｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭伊捷 Yi-Chieh Cheng
論文名稱：	在嵌入空間中擴增數據點應用於對比表示蒸餾 Augmentation in the Embedding Space for Contrastive Representation Distillation
指導教授：	鮑興國 Hsing-Kuo Pao
口試委員:	項天瑞 Tien-Ruey Hsiang 楊傳凱 Chuan-Kai Yang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	40
中文關鍵詞：	深度學習、知識蒸餾、對比學習、嵌入擴散、埃尔米特插值
外文關鍵詞：	Deep Learning, Knowledge Distillation, Contrastive Learning, Embedding Expansion, Hermite Interpolation
相關次數：	點閱：315 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

由於模型結構複雜，深度學習方法通常在模型訓練和模型預測中引入較高的時間和空間複雜度。為了有效地將深度學習方法應用於輕量級設備，例如物聯網環境設備，模型壓縮到輕量級版本近年來引起了人們的關注。知識蒸餾就是這樣的方法之一。目標主要用於將大模型提供的知識轉移到輕模型的學習中。在最先進的方法之一中，知識蒸餾和對比學習的結合可以有效地促進學生模型的學習。對比學習渴望大量的負樣本，因此以前的方法增加記憶庫或使用大數據增強和批量大小來實現。然而，近年來許多論文都致力於研究有意義的負樣本。因此，我們提出了一種在嵌入空間內直接生成有意義的負樣本的方法，並探索在數據點之間使用不同插值方法的效果。這樣就避免了使用記憶庫來存儲數據點，達到找到有用的負樣本的效果。這樣，進而有效地減少資源消耗。我們通過列舉不同任務的性能改進來證明所提出方法的有效性和多個數據集。所提出的方法在知識蒸餾和跨模型遷移任務方面也優於當前最先進的方法。

The deep learning methods usually introduce high time and space complexity in model training and model prediction due to their complex model structures. In order to effectively apply the deep learning methods to deploy in light-weight devices such as the devices for Internet of things environment, the model compression to its lighter version has drawn attention in recent years. The knowledge distillation is one of such approaches. The goal is mainly used to transfer the knowledge offered by the bulky model to the learning of the light model. In one of the state-of-the-art method, the combination of knowledge distillation and contrastive learning can effectively promote the learning of the student model. Contrastive learning asks for a large number of negative samples, so previous methods either increase the memory banks or use large data augmentation and batch size to achieve the goal. In recent years, many researchers have also devoted themselves in finding informative negative examples. We propose a method to directly generate informative negative samples in the embedding space, and explore the effect of using different interpolation methods between data points to further improve the performance. This avoids the use of memory banks to store data points, and achieves the effect of finding helpful negative samples. By having that, resource consumption can be effectively reduced. We demonstrate the effectiveness of the proposed approach by enumerating performance improvement across different tasks and various datasets. The proposed method also outperforms current state-of-the-art methods on knowledge distillation and cross-model transfer tasks.

Recommendation Letter  i
Approval Letter  ii
Abstract in Chinese  iii
Abstract in English  iv
Acknowledgements  v
Contents  vi
List of Figures  viii
List of Tables  x
List of Algorithms  xi
1 Introduction  1
2 Related Work  5
2.1 Knowledge Distillation  5
2.2 Contrastive Learning  8
3 Methodology  11
3.1 Conventional Knowledge Distillation  11
3.2 Contrastive Learning in the Embedding space  12
3.3 Diffusion for the Sampled region  15
4 Experiments  18
4.1 Experiments on CIFAR100  18
4.2 Transferability of Representations  21
4.3 Few-shot Scenario  22
4.4 Capturing Inter-class Correlations  23
4.5 Computation Cost  24
4.6 Distillation from an Ensemble  26
4.7 Ablation Study  26
4.7.1 Positive samples vs. Negative samples  26
4.7.2 Strategy for Generating Negative samples  28
4.7.3 Hyperparameters of AECRD  30
4.7.4 Hyperparameters of loss function  32
5 Conclusions  34
References  35
Appendix A  38
A.1 Why we use Hermite curve  38
A.2 Calculation of Hermite curve  39
                                

[1] J. Cheng, P.-s. Wang, G. Li, Q.-h. Hu, and H.-q. Lu, “Recent advances in efficient computation of deep convolutional neural networks,” Frontiers of Information Technology & Electronic Engineering, vol. 19, no. 1, pp. 64–77, 2018.
[2] A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive selfsupervised learning,” Technologies, vol. 9, no. 1, p. 2, 2020.
[3] Y. Tian, D. Krishnan, and P. Isola, “Contrastive representation distillation,” in International Conference on Learning Representations, 2020.
[4] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
[5] Y. Kalantidis, M. B. Sariyildiz, N. Pion, P. Weinzaepfel, and D. Larlus, “Hard negative mixing for contrastive learning,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 21798–21809, Curran Associates, Inc., 2020.
[6] G. Hinton, O. Vinyals, J. Dean, et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
[7] J. Ba and R. Caruana, “Do deep nets really need to be deep?,” Advances in neural information processing systems, vol. 27, 2014.
[8] J. H. Cho and B. Hariharan, “On the efficacy of knowledge distillation,” in Proceedings of the IEEE/ CVF international conference on computer vision, pp. 4794–4802, 2019.
[9] J. Kim, S. Park, and N. Kwak, “Paraphrasing complex network: Network compression via factor transfer,” Advances in neural information processing systems, vol. 31, 2018.
[10] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
[11] S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” arXiv preprint arXiv:1612.03928, 2016.
[12] F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374, 2019.
[13] B. Peng, X. Jin, J. Liu, D. Li, Y. Wu, Y. Liu, S. Zhou, and Z. Zhang, “Correlation congruence for knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016, 2019.
[14] W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976, 2019.
[15] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4320–4328, 2018.
[16] D. Chen, J.-P. Mei, C. Wang, Y. Feng, and C. Chen, “Online knowledge distillation with diverse peers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3430–3437, 2020.
[17] L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722, 2019.
[18] L. Chen, D. Wang, Z. Gan, J. Liu, R. Henao, and L. Carin, “Wasserstein contrastive representation distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16296–16305, 2021.
[19] J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, no. 6, pp. 1789–1819, 2021.
[20] C. Yang, Z. An, L. Cai, and Y. Xu, “Hierarchical self-supervised augmented knowledge distillation,” arXiv preprint arXiv:2107.13715, 2021.
[21] W. Falcon and K. Cho, “A framework for contrastive self-supervised learning and designing a new approach,” arXiv preprint arXiv:2009.00104, 2020.
[22] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015.
[23] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
[24] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, pp. 1597–1607, PMLR, 2020.
[25] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21271–21284, 2020.
[26] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” Advances in Neural Information Processing Systems, vol. 33, pp. 9912–9924, 2020.
[27] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758, 2021.
[28] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18661–18673, 2020.
[29] J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive learning with hard negative samples,”arXiv preprint arXiv:2010.04592, 2020.
[30] Y. Suh, B. Han, W. Kim, and K. M. Lee, “Stochastic class-based hard example mining for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7251–7259, 2019.
[31] Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722, 2021.
[32] K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in neural information processing systems, vol. 29, 2016.
[33] B. Ko and G. Gu, “Embedding expansion: Augmentation in embedding space for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7255–7264, 2020.
[34] G. Xu, Z. Liu, X. Li, and C. C. Loy, “Knowledge distillation meets self-supervision,” in European Conference on Computer Vision, pp. 588–604, Springer, 2020.
[35] T. Wang and P. Isola, “Understanding contrastive representation learning through alignment and uniformity on the hypersphere,” in International Conference on Machine Learning, pp. 9929–9939, PMLR, 2020

全文公開日期 2025/09/28 (校內網路)
全文公開日期 2027/09/28 (校外網路)
全文公開日期 2027/09/28 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文