簡易檢索 / 詳目顯示

研究生: 陳文祥
Wen-Shiang Chen
論文名稱: 基於雙模型特徵提取與相似度比對之少樣本分類
Few-shot Classification with Dual Model Deep Feature Extraction and Similarity Measurement
指導教授: 郭景明
Jing-Ming Guo
口試委員: 杭學鳴
Hsueh-Ming Hang
陳彥霖
Yen-Lin Chen
張傳育
Chuan-Yu Chang
丁建均
Jian-Jiun Ding
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 83
中文關鍵詞: 雙模型機制少樣本學習少樣本分類特徵比對
外文關鍵詞: Dual Model, Few-shot Learning, Few-shot Classification, Feature Matching
相關次數: 點閱:232下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在過往的深度學習研究與各項實際應用的任務項目之中,大多數的模型在訓練過程時都極需要大量且正確標註的資料集才能得到較好的訓練效能與較佳的應用成果;然而,只要訓練用的資料集數量減少或者是訓練資料集當中存在著部分的錯誤標註資料,將會影響到深度學習模型的準確率與實際應用的狀況。以一般學術上的影像分類任務為例,較普遍的深度學習的方法是利用監督式學習讓模型從大量規模且眾多類別的訓練集樣本中學習到各種類別各自的深度影像特徵。然而在實際應用的影像分類任務上,大量正確樣本的分類標註往往需要耗費與負擔龐大的人力成本,因此實際上能應用的訓練影像數量是遠少於學術上所使用的數量。為了能更貼近實際應用的狀況,少樣本學習演算法對於實際應用上具有極大的研發動機。最主要是著重在只透過少量的標註資料進行訓練,在訓練過程中最大程度萃取資料特徵與各資料之間的特徵比對,進而提升針對少樣本分類任務的應用效能。
    本論文中的實驗使用了公開資料集Mini-ImageNet、CIFAR-FS與CUB 200,所有使用的資料集整體皆少於六萬張影像,並且各個資料集內各自分割而成的訓練集與測試集所屬類別是沒有任何重疊的狀態。因此,本論文除了研究如何在只有少量樣本的條件下讓訓練模型在少樣本分類任務上有更好的效能之外,也探討經過少樣本訓練後的模型是否能適應在新類別的少樣本分類任務上。整體以少樣本學習的角度切入,每次的少樣本訓練開始時會從訓練資料集中隨機抽取少量的訓練資料並分成支援集與查詢集。之後透過雙模型機制分別對支援集與查詢集進行深度特徵提取,支援集深度特徵的用途為尋找各類別最具有代表性特徵,而查詢集深度特徵則作為分類用途。透過數次循環的抽取樣本和支援集與查詢集之間的特徵比對,達到以少量樣本下進行最大程度的分類效果。本論文中所提出的方法能(1)有效在少量樣本中進行學習,(2)透過雙模型訓練機制能有效的分別訓練支援集與查詢集,(3)並且應用在各項少樣本分類任務上。本論文提出的雙模型網路能有效提高少樣本分類準確率並高於當前最前沿方法。


    In former deep learning research and various practical application tasks, most models needed a large amount of correctly labeled dataset for the training stage to obtain satisfactory training performance for applications. However, if the training dataset is too small or any incorrectly labeled data is accompanied, the performance may be degraded. For the applications in the real-life, few-shot learning algorithms have great research motivations. The main goal is only to utilize few labeled data in the training stage, extracting image features and the feature comparison across samples during the training process, thereby improving the effectiveness for the task of few-sample classification.
    The experiments in this thesis incorporate with the public datasets Mini-ImageNet, CIFAR-FS and CUB 200. All datasets are with fewer than 60,000 images. This thesis not only studies how to improve the training model to yield good performance on the few-shot classification, but also explores whether the model can adapt to the new category in the practical application. At the beginning, few training images are selected from the dataset randomly selected and separated into the support dataset and the query dataset. Subsequently, through the Dual model mechanism, the support dataset and the query dataset are respectively used for deep feature extraction. The deep features of the support dataset are used to identify representative features of each category, while the deep features of the query set are used for classification. Through several rounds of sample extraction and feature comparison with the support and query dataset, the best classification result can be achieved with few samples. The proposed method can 1) effectively learn by few samples, 2) effectively train the support and query dataset through Dual model mechanism, and 3) classify most few-shot dataset. As it is examined in the experiments, dual model obtains superior performance on the multiple tasks of few-shot classification compared with all of the state-of-the-art methods.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 V 表目錄 VII 第一章 緒論 1 1.1 研究背景與動機 1 1.2 論文結構 2 第二章 文獻探討 3 2.1 深度學習相關架構與特徵萃取 3 2.1.1 類神經網路架構 5 2.1.2 卷積神經網路架構 13 2.1.3 卷積神經網路之訓練方式 16 2.1.4 卷積神經網路之演化 20 2.1.5 卷積神經網路之視覺化過程 26 2.2 少樣本學習、少樣本分類模型介紹 27 2.2.1 Meta Learning [31] 28 2.2.2 Relation Network [32] 30 2.2.3 Cross Attention Network [33] 31 2.2.4 Prototypical Network [34] 33 2.3 少樣本分類應用 35 2.3.1 Few-shot Keyword Spotting with Prototypical Network [35] 35 2.3.2 Contrastive Few-shot Learning [36] 36 2.3.3 AmdimNet [37] 39 第三章 基於雙模型特徵提取與特徵相似度之少樣本分類 42 3.1 架構流程圖 43 3.2 資料集組成 44 3.3 少樣本資料選取方法說明 47 3.4 雙模型特徵提取架構說明 48 第四章 少樣本分類實驗結果 53 4.1 訓練與測試實驗環境 53 4.2 測試階段與評估指標 53 4.3 實驗結果與分析 54 第五章 結論與未來展望 69 參考文獻 70

    [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [2] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
    [3] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
    [4] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013: PMLR, pp. 1310-1318.
    [5] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Icml, 2010.
    [6] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
    [7] W. Shang, K. Sohn, D. Almeida, and H. Lee, "Understanding and improving convolutional neural networks via concatenated rectified linear units," in international conference on machine learning, 2016: PMLR, pp. 2217-2225.
    [8] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," arXiv preprint arXiv:1706.02515, 2017.
    [9] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
    [10] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 315-323.
    [11] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, "Noisy activation functions," in International conference on machine learning, 2016: PMLR, pp. 3059-3068.
    [12] M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks," arXiv preprint arXiv:1301.3557, 2013.
    [13] C. Gulcehre, K. Cho, R. Pascanu, and Y. Bengio, "Learned-norm pooling for deep feedforward and recurrent neural networks," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014: Springer, pp. 530-546.
    [14] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
    [15] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, "On the importance of initialization and momentum in deep learning," in International conference on machine learning, 2013: PMLR, pp. 1139-1147.
    [16] A. Botev, G. Lever, and D. Barber, "Nesterov's accelerated gradient and momentum as approximations to regularised update descent," in 2017 International Joint Conference on Neural Networks (IJCNN), 2017: IEEE, pp. 1899-1903.
    [17] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotation," in International conference on medical image computing and computer-assisted intervention, 2016: Springer, pp. 424-432.
    [18] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [19] C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of Big Data, vol. 6, no. 1, pp. 1-48, 2019.
    [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [21] J. Deng et al. "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [22] G. H. Dunteman, Principal components analysis (no. 69). Sage, 1989.
    [23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
    [24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [25] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
    [26] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [27] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265-283.
    [28] Y. Jia et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.
    [29] A. Paszke et al., "Pytorch: An imperative style, high-performance deep learning library," arXiv preprint arXiv:1912.01703, 2019.
    [30] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
    [31] C. Finn, P. Abbeel, and S. Levine, "Model-agnostic meta-learning for fast adaptation of deep networks," in International conference on machine learning: PMLR, pp. 1126-1135, 2017.
    [32] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, "Learning to compare: Relation network for few-shot learning," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199-1208, 2018.
    [33] R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen, "Cross attention network for few-shot classification," Advances in Neural Information Processing Systems, vol. 32, 2019.
    [34] J. Snell, K. Swersky, and R. Zemel, "Prototypical networks for few-shot learning," Advances in neural information processing systems, vol. 30, 2017.
    [35] A. Parnami and M. Lee, "Few-shot keyword spotting with prototypical networks," in 2022 7th International Conference on Machine Learning Technologies (ICMLT), pp. 277-283, 2022.
    [36] X. Chen, L. Yao, T. Zhou, J. Dong, and Y. Zhang, "Momentum contrastive learning for few-shot COVID-19 diagnosis from chest CT images," Pattern recognition, vol. 113, p. 107826, 2021.
    [37] D. Chen, Y. Chen, Y. Li, F. Mao, Y. He, and H. Xue, "Self-supervised learning for few-shot image classification," in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE, pp. 1745-1749, 2021.
    [38] O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, "Matching networks for one shot learning," Advances in neural information processing systems, vol. 29, 2016.
    [39] H. Zhang, Z. Cao, Z. Yan, and C. Zhang, "Sill-net: Feature augmentation with separated illumination representation," arXiv preprint arXiv:2102.03539, 2021.
    [40] X. Chen and G. Wang, "Few-shot learning by integrating spatial and frequency representation," in 2021 18th Conference on Robots and Vision (CRV): IEEE, pp. 49-56, 2021.
    [41] T. Chobola, D. Vašata, and P. Kordík, "Transfer learning based few-shot classification using optimal transport mapping from preprocessed latent space of backbone neural network," in AAAI Workshop on Meta-Learning and MetaDL Challenge: PMLR, pp. 29-37, 2021.
    [42] Y. Hu, S. Pateux, and V. Gripon, "Squeezing backbone feature distributions to the max for efficient few-shot learning," Algorithms, vol. 15, no. 5, p. 147, 2022.
    [43] P. Bateni, J. Barber, J.-W. van de Meent, and F. Wood, "Enhancing few-shot image classification with unlabelled examples," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2796-2805, 2022.
    [44] Y. Bendou et al., "EASY: Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients," arXiv preprint arXiv:2201.09699, 2022.
    [45] D. Shalam and S. Korman, "The Self-Optimal-Transport Feature Transform," arXiv preprint arXiv:2204.03065, 2022.

    無法下載圖示 全文公開日期 2024/08/24 (校內網路)
    全文公開日期 2024/08/24 (校外網路)
    全文公開日期 2024/08/24 (國家圖書館:臺灣博碩士論文系統)
    QR CODE