研究生: |
陳冠宇 Kuan-Yu Chan |
---|---|
論文名稱: |
基於雙模型一致性與資料增強在半監督學習之錯誤標註效能提昇技術 Robust Semi-Supervised Learning on Noisy Labels with Multi-Consistency and Data Augmentation |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
李宗南
Chung-Nan Lee 李佩君 Pei-Jun Lee 夏至賢 Chih-Hsien Hsia 徐位文 Wei-Wen Hsu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 97 |
中文關鍵詞: | 半監督學習 、雙模型一致性 、資料增強 、噪音學習 、Animal-10N 、Clothing1M |
外文關鍵詞: | Semi-Supervised Learning, Multi-consistency, Augmentation, Noisy labels Learning, Animal-10N, Clothing1M |
相關次數: | 點閱:277 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在過去的深度學習研究中,大部分的模型訓練需要依靠大量且正確標註的資料才能得到較好的效能;然而,一旦資料集中存有部分錯誤的標註資料,將可能嚴重影響到模型的準確率。在影像分類任務中,常見的深度學習方法是利用大量的標註資料透過監督式學習讓模型從樣本中學習到各類別資料的特徵,而通常正確的分類需要耗費龐大的人力成本去重新標註。為了有效降低在標註上的成本,半監督式學習演算法在實際應用上具有很大的研發動機。其主要是透過少量的標註資料與大量的無標註資料對模型進行訓練,再透過資料間的特徵比對進一步提升模型性能。
在本論文中的實驗使用兩種公開的噪音資料集,分別為Clothing1M與 Animal-10N,這兩種數據集都是透過網路爬蟲所生成的,因此其標籤的錯誤程度是無法估計的。對此本論文主要研究如何在有錯誤標註資料的條件下讓訓練模型具有更好的準確度。以半監督式學習的角度來設計,將預熱模型輸出的結果依據其損失進行分類,損失較小的視為標註資料,較大的則為錯誤標註的資料集因此視為無標註資料,再藉由雙模型一致性對無標註的資料進行近一步的分類,從資料集中選出致性程度較高的為資料並給予其偽標籤進行訓練,其餘的視為噪音資料不進行訓練。如此一來,便形成雙重篩選機制,透過雙重篩選機制可以從無標註資料集中挑選出噪音的資料樣本,降低噪音資料與錯誤標註的影響。另一方面,為了提高模型的強健性,將篩選出的乾淨資料集與雙模型一致性的所挑選出透過模型給與偽標籤的無標註資料會加入強資料增強並透過Mixup演算法結合訓練,以進一步提高分類效能。
實驗結果顯示,藉由所提出的資料集去噪方法並強化資料集的運用能有效提高在Clothing1M與當前最前沿的方法約0.1%的準確率,而在Animal-10N與當前最前沿的方法高出約2%的準確率。論文中所提出的方法能(1)在訓練階段導入強資料增強增加模型的強健性,(2)利用雙模型一致性機制能降低噪音資料對模型訓練的影響,(3)藉由半監督式學習的演算法進行資料訓練以提高準確率。
In deep learning, most of the former model training relied on a large scale of correctly annotated data to achieve promising performance. However, if any annotated data were incorrect, the accuracy of the model may be seriously affected. In image classification tasks, the deep learning models learn class-relevant features from the big data under supervised learning, and it causes a high labor cost to classify each sample correctly to generate ground truth labels. To reduce the labeling cost, the semi-supervised learning algorithms were proposed to exploit unlabeled data to boost models’ performance by feature matching among samples.
In this study, two public noisy datasets were used, including Clothing1M and Animal-10N. Both datasets are generated by web crawlers, and thus the label quality cannot be estimated. Therefore, the purpose of this study is to tackle the problem of noisy datasets to boost the model performance. The model was designed based on a semi-supervised learning approach. The results of the warm-up model are separated according to their losses, those with small losses are considered as labeled data, while those with big losses are classified as unlabeled data, which are further processed by multi-consistency. The consistent data are selected from the data set and given pseudo-labels for training, while the others are not trained as they are considered as noisy data. Subsequently, the dual screening scheme is proposed to select the noisy data from the unlabeled data, reducing the impact of noisy data and mislabeling. In addition, to improve the robustness of the model, the clean data and the unlabeled data are combined with the strong data augmentation and trained by Mixup algorithm to further enhance the robustness of the models.
Experimental results show that proposed methods can boost the classification performance. The accuracy of is 0.1% higher than the state-of-the-art methods on Clothing1M, and the accuracy is 2% higher than the state-of-the-art methods on Animal-10N. The contributions are summarized as 1) enhanced the model by adding strong data augmentation, 2) using a multi-consistency to reduce the impact of noisy data, and 3) performance boosting by semi-supervised learning.
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[2] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
[3] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
[4] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013: PMLR, pp. 1310-1318.
[5] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Icml, 2010.
[6] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
[7] W. Shang, K. Sohn, D. Almeida, and H. Lee, "Understanding and improving convolutional neural networks via concatenated rectified linear units," in international conference on machine learning, 2016: PMLR, pp. 2217-2225.
[8] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," arXiv preprint arXiv:1706.02515, 2017.
[9] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
[10] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 315-323.
[11] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, "Noisy activation functions," in International conference on machine learning, 2016: PMLR, pp. 3059-3068.
[12] M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks," arXiv preprint arXiv:1301.3557, 2013.
[13] C. Gulcehre, K. Cho, R. Pascanu, and Y. Bengio, "Learned-norm pooling for deep feedforward and recurrent neural networks," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014: Springer, pp. 530-546.
[14] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
[15] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, "On the importance of initialization and momentum in deep learning," in International conference on machine learning, 2013: PMLR, pp. 1139-1147.
[16] A. Botev, G. Lever, and D. Barber, "Nesterov's accelerated gradient and momentum as approximations to regularised update descent," in 2017 International Joint Conference on Neural Networks (IJCNN), 2017: IEEE, pp. 1899-1903.
[17] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotation," in International conference on medical image computing and computer-assisted intervention, 2016: Springer, pp. 424-432.
[18] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[19] C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of Big Data, vol. 6, no. 1, pp. 1-48, 2019.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[22] G. H. Dunteman, Principal components analysis (no. 69). Sage, 1989.
[23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
[24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[25] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[26] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[27] M. Tan and Q. V. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," arXiv preprint arXiv:1905.11946, 2019.
[28] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265-283.
[29] Y. Jia et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.
[30] A. Paszke et al., "Pytorch: An imperative style, high-performance deep learning library," arXiv preprint arXiv:1912.01703, 2019.
[31] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
[32] J. Zhou et al., "ibot: Image bert pre-training with online tokenizer," arXiv preprint arXiv:2111.07832, 2021.
[33] H. Bao, L. Dong, and F. Wei, "Beit: Bert pre-training of image transformers," arXiv preprint arXiv:2106.08254, 2021.
[34] M. Caron et al., "Emerging properties in self-supervised vision transformers," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9650-9660.
[35] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, "Mixmatch: A holistic approach to semi-supervised learning," Advances in neural information processing systems, vol. 32, 2019.
[36] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, "mixup: Beyond empirical risk minimization," arXiv preprint arXiv:1710.09412, 2017.
[37] S. Laine and T. Aila, "Temporal ensembling for semi-supervised learning," arXiv preprint arXiv:1610.02242, 2016.
[38] M. Sajjadi, M. Javanmardi, and T. Tasdizen, "Regularization with stochastic transformations and perturbations for deep semi-supervised learning," Advances in neural information processing systems, vol. 29, 2016.
[39] D.-H. Lee, "Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks," in Workshop on challenges in representation learning, ICML, 2013, vol. 3, no. 2, p. 896.
[40] T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, "Virtual adversarial training: a regularization method for supervised and semi-supervised learning," IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 8, pp. 1979-1993, 2018.
[41] A. Tarvainen and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," Advances in neural information processing systems, vol. 30, 2017.
[42] K. Sohn et al., "Fixmatch: Simplifying semi-supervised learning with consistency and confidence," Advances in neural information processing systems, vol. 33, pp. 596-608, 2020.
[43] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, "Randaugment: Practical automated data augmentation with a reduced search space," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 702-703.
[44] T. DeVries and G. W. Taylor, "Improved regularization of convolutional neural networks with cutout," arXiv preprint arXiv:1708.04552, 2017.
[45] Q. Xie, Z. Dai, E. Hovy, M. Luong, and Q. Le, "Unsupervised data augmentation for consistency training. arXiv 2019," arXiv preprint arXiv:1904.12848, 1904.
[46] D. Berthelot et al., "Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring," arXiv preprint arXiv:1911.09785, 2019.
[47] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, "Autoaugment: Learning augmentation strategies from data," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 113-123.
[48] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, "Augmix: A simple data processing method to improve robustness and uncertainty," arXiv preprint arXiv:1912.02781, 2019.
[49] J. Li, R. Socher, and S. C. Hoi, "Dividemix: Learning with noisy labels as semi-supervised learning," arXiv preprint arXiv:2002.07394, 2020.
[50] S. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich, "Training deep neural networks on noisy labels with bootstrapping," arXiv preprint arXiv:1412.6596, 2014.
[51] G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and L. Qu, "Making deep neural networks robust to label noise: A loss correction approach," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1944-1952.
[52] X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M. Sugiyama, "How does disagreement help generalization against label corruption?," in International Conference on Machine Learning, 2019: PMLR, pp. 7164-7173.
[53] K. Yi and J. Wu, "Probabilistic end-to-end noise correction for learning with noisy labels," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7017-7025.
[54] J. Li, Y. Wong, Q. Zhao, and M. S. Kankanhalli, "Learning to learn from noisy labeled data," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5051-5059.
[55] E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness, "Unsupervised label noise modeling and loss correction," in International conference on machine learning, 2019: PMLR, pp. 312-321.
[56] D. Tanaka, D. Ikami, T. Yamasaki, and K. Aizawa, "Joint optimization framework for learning with noisy labels," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5552-5560.
[57] W. Zhang, Y. Wang, and Y. Qiao, "Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7373-7382.
[58] A. Iscen, J. Valmadre, A. Arnab, and C. Schmid, "Learning with Neighbor Consistency for Noisy Labels," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4672-4681.
[59] D. Ortego, E. Arazo, P. Albert, N. E. O'Connor, and K. McGuinness, "Multi-objective interpolation training for robustness to label noise," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6606-6615.
[60] S. Liu, J. Niles-Weed, N. Razavian, and C. Fernandez-Granda, "Early-learning regularization prevents memorization of noisy labels," Advances in neural information processing systems, vol. 33, pp. 20331-20342, 2020.
[61] K.-H. Lee, X. He, L. Zhang, and L. Yang, "Cleannet: Transfer learning for scalable image classifier training with label noise," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5447-5456.
[62] F. R. Cordeiro, R. Sachdeva, V. Belagiannis, I. Reid, and G. Carneiro, "Longremix: Robust learning with high confidence samples in a noisy label environment," arXiv preprint arXiv:2103.04173, 2021.
[63] Y. Chen, X. Shen, S. X. Hu, and J. A. Suykens, "Boosting co-teaching with compression regularization for label noise," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2688-2692.
[64] C. Feng, G. Tzimiropoulos, and I. Patras, "S3: Supervised Self-supervised Learning under Label Noise," arXiv preprint arXiv:2111.11288, 2021.
[65] K. Nishi, Y. Ding, A. Rich, and T. Hollerer, "Augmentation strategies for learning with noisy labels," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8022-8031.
[66] W. Chen, C. Zhu, and Y. Chen, "Sample Prior Guided Robust Model Learning to Suppress Noisy Labels," arXiv preprint arXiv:2112.01197, 2021.
[67] H. Song, M. Kim, and J.-G. Lee, "Selfie: Refurbishing unclean samples for robust deep learning," in International Conference on Machine Learning, 2019: PMLR, pp. 5907-5915.
[68] Y. Zhang, S. Zheng, P. Wu, M. Goswami, and C. Chen, "Learning with feature-dependent label noise: A progressive approach," arXiv preprint arXiv:2103.07756, 2021.