簡易檢索 / 詳目顯示

研究生: 陳鈺澔
Yu-Hao Chen
論文名稱: 最大池化和平均池化以及注意力機制消融的少樣本分類
Few-shot Classification with Max-Average-Pooling and Attention Mechanism
指導教授: 郭景明
Jing-Ming Guo
口試委員: 張傳育
Chuan-Yu Chang
宋啟嘉
Chi-Chia Sun
陸敬互
Ching-Hu Lu
高文忠
Wen-Chung Kao
郭景明
Jing-Ming Guo
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 82
中文關鍵詞: 少樣本分類特徵比對注意力機制消融實驗原模型
外文關鍵詞: Few-shot classification, Feature matching, Attention mechanism, Ablation experiment, Prototypical Networks
相關次數: 點閱:256下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今深度學習的研究愈趨成熟,必實際應用在了各種場域中,然而大多數的模型時所需要消耗大量的樣本,且這樣本都必須被人立正確標註,否則將會影響模型訓練後的正確判斷能力。以現今學術中最常見的影像分類領域中,訓練一個學上意義比較的監督式分類模型,所需的圖片動輒上百千萬張,少則數十萬張樣本。然而,現實中的資料收集上往往都收集困難,這也使深度學習的通用化變得更加困難。因此,該如何將深度學習從大量樣本的困境中轉向少樣本學習,近年來則被視為了一大挑戰,不管是少量資料很可能會導致模型的無法收斂;又或者樣本代表性不足,導致模型過於側重於某些影像中,導致識別成果出現偏差,甚至無法適應,這些都被視為了深度學習於少量樣本分類上的重大挑戰。本論文中做為測試的實驗數據採用了學術界所公認的少量樣本資料及Mini-ImageNet、CIFAR-FS與CUB 200,分別做為本論文的驗證所用,其資料集整體皆少於十萬張影像,而由這些資料分割而成的訓練集與測試集所屬的類別沒有任何的交集。因此本論文除了研究如何才能改善少量樣本分類下的效能,且也提出了一個新的框架稱為MAFC(MAX Pooling and Average Pooling Feature Combine),不同於以往論文訓練模型時,所採用的都是將單樣本經過模型的崁入映射,本論文使用單樣本間的不同交互信息做為訓練模型的資訊。目的是為了能夠使模型能夠從中判斷,同一樣本下不同信號的樣本該如何進行識別,又或者藉由著消融來形成一個更為代表性的共同信號,而非不可改變的映射過程。藉此來也可以排除樣本上容易受影像到的信號源。並保留最大穩定的特徵,後者在使用原模型來藉由支持樣本來判斷查詢樣本所屬類別為何。本論文與目前少樣本分類準確率最高於當前最前沿方法相比,實驗上也取得了最新的成果。


     Nowadays, deep learning research has become more and more mature and has been practically applied in various fields. However, most models require a large number of samples, and these samples must be correctly labeled by humans; otherwise, the correct judgment ability of the model after training will be affected. In the most common image classification field in academia today, training a supervised classification model that learns meaningful features requires hundreds of millions of images, with tens of thousands of samples at least. However, data collection in reality is often difficult, which also makes the generalization of deep learning more difficult. Therefore, how to turn deep learning from the dilemma of a large number of samples to few-shot learning has been regarded as a major challenge in recent years. Whether it is a small amount of data that may cause the model to fail to converge or insufficient sample representativeness that causes the model to be too biased towards certain images and leads to deviations in recognition results or even inability to adapt, these are all regarded as major challenges for deep learning in few-shot classification.The experimental data tested in this thesis uses the academic community's recognized few-shot data and Mini-ImageNet, CIFAR-FS and CUB 200 as verifications used in this thesis. The overall dataset has less than 100,000 images, and the categories belonging to the training set and test set segmented from these data have no intersection. Therefore, this paper not only studies how to improve the performance under few-shot classification but also proposes a new framework called MAFC (MAX Pooling and Average Pooling Feature Combine). Unlike previous papers that used single samples embedded by the model for training models, this paper uses different interaction information between single samples as information for training models. The purpose is to enable the model to judge from it how different signals under the same sample should be identified or form a more representative common signal through ablation rather than an unchangeable mapping process. This can also eliminate signal sources that are easily affected by images on samples. And retain the most stable features. The latter uses the Prototypical Networks to determine which category the query sample belongs to by supporting samples. This paper has achieved the latest results compared with current state-of-the-art methods for few-shot classification accuracy.

    摘要 I Abstract II 致謝 IV 目錄 V 圖目錄 VI 表目錄 VIII 第一章 緒論 1 1.1 研究背景與動機 1 1.2 論文結構 2 第二章 文獻探討 3 2.1 深度學習相關架構與特徵萃取 3 2.1.1 類神經網路介紹 5 2.1.2 卷積神經網路架構 11 2.1.3 卷積神經網路之演化 13 2.2 少樣本學習、少樣本分類模型介紹 25 2.2.1 Meta Learning [19] 26 2.2.2 Relation Network [20] 27 2.2.3 Prototypical Network [21] 29 2.2.4 Exploring Simple Meta-Learning for Few-Shot Learning [22] 31 2.3 通道與空間注意力機制 32 2.3.1 CBAM:Convolutional Block Attention Module[23] 32 2.4 通道注意力機制與少樣本模型結合 35 2.4.1 Few-shot Classification based on CBAM and prototype network[24] 35 第三章 最大池化和平均池化以及注意力機制消融的小樣本分類 36 3.1 架構流程圖 37 3.2 資料集組成 39 3.3 少樣本資料選取方法說明 42 3.4 論文模型設計探討 43 第四章 少樣本分類實驗結果 55 4.1 訓練與測試實驗環境 55 4.2 測試階段與評估指標 55 4.3 實驗結果與分析 56 第五章 結論與未來展望 64 論文結果比較文獻方法說明 65 參考文獻 69

    [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [2] M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks," arXiv preprint arXiv:1301.3557, 2013.
    [3] C. Gulcehre, K. Cho, R. Pascanu, and Y. Bengio, "Learned-norm pooling for deep feedforward and recurrent neural networks," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014: Springer, pp. 530-546.
    [4] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
    [5] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, "On the importance of initialization and momentum in deep learning," in International conference on machine learning, 2013: PMLR, pp. 1139-1147.
    [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [7] J. Deng et al. "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [8] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
    [9] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [10] G. H. Dunteman, Principal components analysis (no. 69). Sage, 1989.
    [11] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [12] Mingxing Tan, Quoc V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in International Conference on Machine Learning, 2019.
    [13] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929.
    [14]  Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell and S. Xie, “A ConvNet for the 2020s,” 2022 IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1-10, doi: 10.1109/CVPR.2022.
    [15] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1-10, doi: 10.1109/CVPR.2021.
    [16] Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I. S., & Xie, S, "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders,” arXiv preprint arXiv:2301.00808, 2023.
    [17] J. Deng et al. "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [18] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
    [19] C. Finn, P. Abbeel, and S. Levine, "Model-agnostic meta-learning for fast adaptation of deep networks," in International conference on machine learning: PMLR, pp. 1126-1135, 2017.
    [20] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, "Learning to compare: Relation network for few-shot learning," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199-1208, 2018.
    [21] J. Snell, K. Swersky, and R. Zemel, "Prototypical networks for few-shot learning," Advances in neural information processing systems, vol. 30, 2017.
    [22] Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang, " Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning," in International Conference on Computer Vision: ICCV, pp. 9062-9071, 2021
    [23] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon, " CBAM: Convolutional Block Attention Module," in European Conference on Computer Vision: ECCV, pp. 3-19, 2018.
    [24] S. Xin and H. Liu, "Few-shot Classification based on CBAM and prototype network," 2022 4th in International Conference on Data-driven Optimization of Complex Systems : DOCS, pp. 1-6, 2022.
    [25] O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, "Matching networks for one shot learning," Advances in neural information processing systems, vol. 29, 2016.
    [26] Yuqing Hu, Vincent Gripon, Stéphane Pateux, "Leveraging the Feature Distribution in Transfer-based Few-Shot Learning," in arXiv preprint arXiv:2006.03806, 2021.
    [27] H. Zhang, Z. Cao, Z. Yan, and C. Zhang, "Sill-net: Feature augmentation with separated illumination representation," arXiv preprint arXiv:2102.03539, 2021.
    [28] Yuqing Hu, Vincent Gripon, Stéphane Pateux, "Self-supervised learning for few-shot image classification," in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE, pp. 1745-1749, 2021. '
    [29] Y. Hu, S. Pateux, and V. Gripon, "Squeezing backbone feature distributions to the max for efficient few-shot learning," Algorithms, vol. 15, no. 5, p. 147, 2022.
    [30] P. Bateni, J. Barber, J.-W. van de Meent, and F. Wood, "Enhancing few-shot image classification with unlabelled examples," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2796-2805, 2022.
    [31] Y. Bendou et al., "EASY: Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients," arXiv preprint arXiv:2201.09699, 2022.
    [32] D. Shalam and S. Korman, "The Self-Optimal-Transport Feature Transform," arXiv preprint arXiv:2204.03065, 2022.
    [33] Hu, Shell Xu, et al. “Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022.
    [34] T. Chobola, D. Vašata, and P. Kordík, "Transfer learning based few-shot classification using optimal transport mapping from preprocessed latent space of backbone neural network," in AAAI Workshop on Meta-Learning and MetaDL Challenge: PMLR, pp. 29-37, 2021.
    [35] Yangji He, Weihan Liang, Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, Wenqiang Zhang, " Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning, " in Conference on Computer Vision and Pattern Recognition, pp. 9119-9129, 2022.
    [36] Yuqing Hu, Stéphane Pateux, Vincent Gripon, " Adaptive Dimension Reduction and Variational Inference for Transductive Few-Shot Classification," arXiv preprint arXiv:2209.08527, 2022.
    [37] Markus Hiller, Rongkai Ma, Mehrtash Harandi, Tom Drummond, " Rethinking Generalization in Few-Shot Classificatio, " in Conference on Neural Information Processing Systems. 2022
    [38] Yikai Wang, Chengming Xu, Chen Liu, Li Zhang, Yanwei Fu, "Instance Credibility Inference for Few-Shot Learning, " in Conference on Computer Vision and Pattern Recognition, pp. 12836-12845, 2020.

    無法下載圖示 全文公開日期 2025/08/21 (校內網路)
    全文公開日期 2025/08/21 (校外網路)
    全文公開日期 2025/08/21 (國家圖書館:臺灣博碩士論文系統)
    QR CODE