Author: |
李皓凱 Hao-Kai Lee |
Thesis Title: |
一個利用噪聲學生模型與AEDA以提高NLP任務預測準確度的方法 Leveraging the noisy student model and AEDA for improving the prediction accuracy of NLP tasks |
Advisor: |
Yung-Ho Leu |
Committee: |
Wei-Ning Yang 陳雲岫 Yun-Shiow Chen |
Degree: |
碩士 Master |
Department: |
管理學院 - 資訊管理系 Department of Information Management |
Thesis Publication Year: | 2022 |
Graduation Academic Year: | 110 |
Language: | 英文 |
Pages: | 48 |
Keywords (in Chinese): | 深度學習 、自然語言處理 、噪聲學生訓練 、資料增強 |
Keywords (in other languages): | Deep Learning, Natural Language Processing, Noisy student training, Data augmentation |
Reference times: | Clicks: 656 Downloads: 4 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
Overfitting can be a serious problem when we don’t have enough training data to generalize our model. However, the development of data augmentation and semi-supervised learning can be conducive to solving the problem. Data augmentation has been a mature technique in the Computer Vision field in comparison to the Natural Language Processing (NLP) field, so does semi-supervised learning. This field is still exploring in NLP.
In this paper, we transfer noisy student [1] training from the Computer Vision field to the Natural Language Processing field. By combining data augmentation methods from the NLP field such as EDA [2] or AEDA [3] with noisy student training process, we outperform EDA and AEDA across all the five datasets in their experiment. We further experiment that uses another dataset as an unlabeled dataset can be helpful in this method. We also conduct extensive experiments to research how many generations are needed in the scale and difficulty of these five datasets. This paper shows that if we have only a few labeled data, we can still greatly improve model performance with our method and don’t need any additional pre-trained model to implement this method.
[1] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves imagenet classification," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10687-10698.
[2] J. Wei and K. Zou, "Eda: Easy data augmentation techniques for boosting performance on text classification tasks," arXiv preprint arXiv:1901.11196, 2019.
[3] A. Karimi, L. Rossi, and A. Prati, "AEDA: An Easier Data Augmentation Technique for Text Classification," arXiv preprint arXiv:2108.13230, 2021.
[4] S. Y. Feng et al., "A survey of data augmentation approaches for nlp," arXiv preprint arXiv:2105.03075, 2021.
[5] C. Shorten, T. M. Khoshgoftaar, and B. Furht, "Text Data Augmentation for Deep Learning," Journal of Big Data, vol. 8, no. 1, p. 101, 2021/07/19 2021, doi: 10.1186/s40537-021-00492-0.
[6] R. Sennrich, B. Haddow, and A. Birch, "Improving neural machine translation models with monolingual data," arXiv preprint arXiv:1511.06709, 2015.
[7] F. Mi, W. Zhou, F. Cai, L. Kong, M. Huang, and B. Faltings, "Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems," arXiv preprint arXiv:2108.12589, 2021.
[8] D.-H. Lee, "Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks," in Workshop on challenges in representation learning, ICML, 2013, vol. 3, no. 2, p. 896.
[9] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, "Mixmatch: A holistic approach to semi-supervised learning," Advances in Neural Information Processing Systems, vol. 32, 2019.
[10] K. Sohn et al., "Fixmatch: Simplifying semi-supervised learning with consistency and confidence," Advances in Neural Information Processing Systems, vol. 33, pp. 596-608, 2020.
[11] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
[12] G. A. Miller, "WordNet: a lexical database for English," Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.
[13] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[14] N. Ng, K. Cho, and M. Ghassemi, "SSMBA: Self-supervised manifold based data augmentation for improving out-of-domain robustness," arXiv preprint arXiv:2009.10195, 2020.
[15] X. Ding, B. Liu, and P. S. Yu, "A holistic lexicon-based approach to opinion mining," in Proceedings of the 2008 international conference on web search and data mining, 2008, pp. 231-240.
[16] M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168-177.
[17] Q. Liu, Z. Gao, B. Liu, and Y. Zhang, "Automated rule selection for aspect extraction in opinion mining," in Twenty-Fourth international joint conference on artificial intelligence, 2015.
[18] R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng, "Parsing with compositional vector grammars," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2013, pp. 455-465.
[19] B. Pang and L. Lee, "A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts," arXiv preprint cs/0409058, 2004.
[20] M. Ganapathibhotla and B. Liu, "Mining opinions in comparative sentences," in Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), 2008, pp. 241-248.
[21] X. Li and D. Roth, "Learning question classifiers," in COLING 2002: The 19th International Conference on Computational Linguistics, 2002.
[22] B. Pang and L. Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales," arXiv preprint cs/0506075, 2005.