基於密度之影像表徵原型對比學習｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鍾岳霖 Yue-Lin Jhong
論文名稱：	基於密度之影像表徵原型對比學習 Density-Based Prototypical Contrastive Learning on Visual Representations
指導教授：	鮑興國 Hsing-Kuo Pao
口試委員:	楊傳凱 Chuan-Kai yang 項天瑞 Tien-Ruey Hsiang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	38
中文關鍵詞：	自監督學習、表徵學習、對比學習
外文關鍵詞：	self-supervised learning, representation learning, contrastive learning
相關次數：	點閱：313 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

自監督學習正在迅速地縮減監督式學習在資料短缺及充足的情況下的差距。這項成就歸功於對比學習方法近期的研究成果。然而，學者們仍探究著對比學習的弱點。首先，在實務上以實例判別任務為基礎的對比學習能夠輕易地透過發掘影像低階的差異來達成，而不去考慮影像在高階語意上的資訊。第二，只要兩個樣本是由不同影像所擴增出來，對比學習便會將他們視作是負對，也因此導致了過度分群的結果。此外，對比學習並不會在同影像所擴增出的不同樣本間發掘他們的差異性，而因此導致了潛在的分群不足的結果。在本篇論文中，我們提出基於密度之影像表徵原型對比學習，縮寫為DBPCL，一個將實例判別任務結合基於密度之分群以此對資料的語意結構編碼的自監督影像表徵學習方法。根據本篇論文所提出的ProtoXent 目標函數，本論文提出的方法促使資料樣本能夠接近他們所處的原型(prototype)，並同時區隔開他們的負原型。藉由計算出來的原型，DBPCL 便能夠找出偽陽以及偽陰對，並解決分群不足以及過度分群的問題。基於上述的原因，本篇論文提出的DBPCL藉由提升學習到的表徵品質以超越最頂尖的方法。舉例來說，相較於那些同類的最優方法而言，本論文所提出的方法在CIFAR-100 資料集上取得了較高的預測準確度。本論文提出之DBPCL 的程式碼將會提供在https://github.com/YuehLinChung/DBPCL。

Self-supervised learning is closing the gap between a supervised learning task with scarce labeled data and one with fully labeled data rapidly. The effort goes to various recent achievements of contrastive learning methods. However, researchers learn the weakness of contrastive learning. First, contrastive learning is typically an instance discrimination task in practice, which could be easily solved by exploiting low-level image differences, regardless of the semantic information. Second, contrastive learning treats two representation samples as a negative pair as long as they are from different instances, which may lead to an over-clustering result. Furthermore, contrastive learning does not encode the dissimilarity between the augmented views from the same image, which may result in a potential under-clustering outcome. In this thesis, we introduce Density-Based Prototypical Contrastive Learning (DBPCL), a self-supervised visual representation learning method that combines the instance discrimination task with density-based clustering to encode the semantic structures of data. By the proposed ProtoXent loss, the proposed method encourages samples to be closer to their assigned prototypes and push their negative prototypes apart at the same time. With the computed prototypes, DBPCL is able to discover both false-positive and false-negative pairs, and then address the issues of under-clustering and over-clustering. With the aforementioned reasons, the proposed DBPCL is superior to the state-of-the-art methods by improving the quality of learned representations. As an example, the proposed method has its prediction accuracy superior to the best group of methods of the same kind on CIFAR-100. The code of the proposed DBPCL is available at https://github.com/YuehLinChung/DBPCL.

Recommendation Letter i
Approval Letter ii
Abstract in Chinese iii
Abstract in English iv
Acknowledgements v
Contents vi
List of Figures ix
List of Tables x
List of Algorithms xi
Symbol xii
Introduction 1
Related Work 6
1 Vanilla self-supervised learning 6
2 Contrastive learning 6
3 Deep clustering 7
4 Stop-gradient 8
Methodology 10
1 Preliminary 10
2 Description of the proposed DBPCL 11
3 Concentration estimation 13
Experiments 15
1 Implementation details 15
1.1 Image data augmentations 16
1.2 Architecture 16
2 Image classification benchmarks 17
2.1 Linear evaluation 17
2.2 Domain transfer 17
2.3 Grain 18
2.4 Finetune 18
3 Batch size 19
4 Encoder backbone 19
5 Hidden nodes in projection head 20
6 Image classification with limited training data 22
6.1 Few-shot classification 22
6.2 Semi-supervised learning 22
7 Imbalanced dataset 23
8 Influence of longer training 25
9 Ablations 25
9.1 Ablation on density-based clustering 25
9.2 Ablation on ProtoXent 28
9.3 Ablation on different choices of epsilons 28
9.4 Ablation on different design of concentration 29
9.5 Ablation on addressing false positives and false negatives 30
10 Visualization of learned representation 30
Conclusions 32
References 35
Appendix A 36
A.1 Illustration of false positives 36
A.2 Illustration of false negatives 37
A.3 Demonstration of learned representation 38
                                

[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML, 2020.
[2] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in CVPR, 2020.
[3] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” NeurIPS, 2020.
[4] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al., “Bootstrap your own latent-a new approach to self-supervised learning,” NeurIPS, 2020.
[5] Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastive learning?,” NeurIPS, 2020.
[6] X. Chen and K. He, “Exploring simple siamese representation learning,” in CVPR, 2021.
[7] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in ICML, 2021.
[8] J. Li, P. Zhou, C. Xiong, and S. Hoi, “Prototypical contrastive learning of unsupervised representations,” in ICLR, 2021.
[9] T. Huynh, S. Kornblith, M. R. Walter, M. Maire, and M. Khademi, “Boosting contrastive selfsupervised learning with false negative cancellation,” in WACV, 2022.
[10] G. Wang, K. Wang, G. Wang, P. H. Torr, and L. Lin, “Solving inefficiency of self-supervised representation learning,” in ICCV, 2021.
[11] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in ICML, 2008.
[12] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in CVPR, 2016.
[13] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised visual representation learning by context prediction,” in ICCV, 2015.
[14] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in ECCV, 2016.
[15] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in ECCV, 2016.
[16] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in ICLR, 2018.
[17] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,” in ICLR, 2019.
[18] P. Bachman, R. D. Hjelm, and W. Buchwalter, “Learning representations by maximizing mutual information across views,” Advances in neural information processing systems, 2019.
[19] A. Van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv:1807.03748, 2018.
[20] O. J. Hénaff, A. Srinivas, J. De Fauw, A. Razavi, C. Doersch, S. M. A. Eslami, and A. Van Den Oord, “Data-efficient image recognition with contrastive predictive coding,” in ICML, 2020.
[21] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in CVPR, 2006.
[22] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in CVPR, 2018.
[23] M. Ye, X. Zhang, P. C. Yuen, and S.-F. Chang, “Unsupervised embedding learning via invariant and spreading instance feature,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
[24] D. Dwibedi, Y. Aytar, J. Tompson, P. Sermanet, and A. Zisserman, “With a little help from my friends: Nearest-neighbor contrastive learning of visual representations,” in ICCV, 2021.
[25] I. Misra and L. v. d. Maaten, “Self-supervised learning of pretext-invariant representations,” in CVPR, 2020.
[26] Y. Tian, D. Krishnan, and P. Isola, “Contrastive multiview coding,” in ECCV, 2020.
[27] Z. Xie, Y. Lin, Z. Yao, Z. Zhang, Q. Dai, Y. Cao, and H. Hu, “Self-supervised learning with swin transformers,” arXiv:2105.04553, 2021.
[28] Y. You, I. Gitman, and B. Ginsburg, “Large batch training of convolutional networks,” arXiv:1708.03888, 2017.
[29] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in ECCV, 2018.
[30] M. Caron, P. Bojanowski, J. Mairal, and A. Joulin, “Unsupervised pre-training of image features on non-curated data,” in ICCV, 2019.
[31] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering and representation learning,” arXiv:1911.05371, 2019.
[32] X. Zhan, J. Xie, Z. Liu, Y.-S. Ong, and C. C. Loy, “Online deep clustering for unsupervised representation learning,” in CVPR, 2020.
[33] F. Wang and H. Liu, “Understanding the behaviour of contrastive loss,” in CVPR, 2021.
[34] M. Patacchiola and A. J. Storkey, “Self-supervised relational reasoning for representation learning,” NeurIPS, 2020.
[35] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” tech. rep., University of Toronto, 2009.
[36] A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in AISTATS, 2011.
[37] X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-supervised semi-supervised learning,” in ICCV, 2019.
[38] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
[39] X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv:2003.04297, 2020.
[40] I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” ICLR, 2017.
[41] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
[42] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, 2008.

全文公開日期 2024/09/28 (校內網路)
全文公開日期 2024/09/28 (校外網路)
全文公開日期 2024/09/28 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文