簡易檢索 / 詳目顯示

研究生: 李皓凱
Hao-Kai Lee
論文名稱: 一個利用噪聲學生模型與AEDA以提高NLP任務預測準確度的方法
Leveraging the noisy student model and AEDA for improving the prediction accuracy of NLP tasks
指導教授: 呂永和
Yung-Ho Leu
口試委員: 楊維寧
Wei-Ning Yang
陳雲岫
Yun-Shiow Chen
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 48
中文關鍵詞: 深度學習自然語言處理噪聲學生訓練資料增強
外文關鍵詞: Deep Learning, Natural Language Processing, Noisy student training, Data augmentation
相關次數: 點閱:211下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 當我們沒有足夠的訓練資料使模型足夠泛化時,過擬和會是一個很嚴重的問題。然而,資料增強與半監督學習有助於解決這樣的問題。資料增強的發展在電腦視覺領域中相比起自然語言處理領域而言成熟了不少,半監督學習也是如此。NLP領域仍在探索這類的技術。
    在這篇論文中,我們將噪聲學生訓練從電腦視覺領域轉移到自然語言分析領域。藉由將噪聲學生訓練過程與自然語言分析領域的資料增強技術例如EDA或AEDA合併,我們的實驗成果在五個資料集都勝過EDA或AEDA。我們額外進行了實驗證實使用外來的未標記資料集也非常有幫助。我們也額外進行了實驗來研究在這五個不同難度與規模的資料集下應該訓練到第幾世代的模型才停止。這篇論文顯示如果我們只有非常少量的標記資料,還是可以使用我們的方法大幅提升模型的準確度,並且不需要使用任何預訓練模型來實作這個方法。


    Overfitting can be a serious problem when we don’t have enough training data to generalize our model. However, the development of data augmentation and semi-supervised learning can be conducive to solving the problem. Data augmentation has been a mature technique in the Computer Vision field in comparison to the Natural Language Processing (NLP) field, so does semi-supervised learning. This field is still exploring in NLP.
    In this paper, we transfer noisy student [1] training from the Computer Vision field to the Natural Language Processing field. By combining data augmentation methods from the NLP field such as EDA [2] or AEDA [3] with noisy student training process, we outperform EDA and AEDA across all the five datasets in their experiment. We further experiment that uses another dataset as an unlabeled dataset can be helpful in this method. We also conduct extensive experiments to research how many generations are needed in the scale and difficulty of these five datasets. This paper shows that if we have only a few labeled data, we can still greatly improve model performance with our method and don’t need any additional pre-trained model to implement this method.

    ABSTRACT ACKNOWLEDGEMENT TABLE OF CONTENTS LIST OF FIGURES LIST OF TABLES Chapter 1 Introduction 1.1 RESEARCH BACKGROUND 1.2 RESEARCH PURPOSE 1.3 RESEARCH METHOD 1.4 RESEARCH OVERVIEW Chapter 2 Related Work 2.1 DATA AUGMENTATION 2.1.1 Easy Data Augmentation (EDA) 2.1.2 An Easier Data Augmentation (AEDA) 2.1.3 Back-Translation 2.1.4 GradAug 2.2 SEMI-SUPERVISED LEARNING 2.2.1 Pseudo-labeling 2.2.2 Unsupervised Data Augmentation (UDA) 2.2.3 Noisy Student Training Chapter 3 Research Method Chapter 4 Experiments and Results 4.1 EXPERIMENT DETAILS 4.1.1 Unlabeled Dataset 4.1.2 Models 4.1.3 Noisy Student Training 4.2 DATASETS 4.3 PERFORMANCE OF COMBINING WITH EDA OR AEDA 4.4 HOW MANY GENERATION DO WE NEEDED TO ACHIEVE BETTER RESULT 4.5 FEASIBILITY OF USING THE OTHER DATASET AS THE UNLABELED DATASET Chapter 5 Conclusion and Future Research 5.1 CONCLUSION 5.2 FUTURE RESEARCH Chapter 6 Appendix Reference

    [1] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves imagenet classification," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10687-10698.
    [2] J. Wei and K. Zou, "Eda: Easy data augmentation techniques for boosting performance on text classification tasks," arXiv preprint arXiv:1901.11196, 2019.
    [3] A. Karimi, L. Rossi, and A. Prati, "AEDA: An Easier Data Augmentation Technique for Text Classification," arXiv preprint arXiv:2108.13230, 2021.
    [4] S. Y. Feng et al., "A survey of data augmentation approaches for nlp," arXiv preprint arXiv:2105.03075, 2021.
    [5] C. Shorten, T. M. Khoshgoftaar, and B. Furht, "Text Data Augmentation for Deep Learning," Journal of Big Data, vol. 8, no. 1, p. 101, 2021/07/19 2021, doi: 10.1186/s40537-021-00492-0.
    [6] R. Sennrich, B. Haddow, and A. Birch, "Improving neural machine translation models with monolingual data," arXiv preprint arXiv:1511.06709, 2015.
    [7] F. Mi, W. Zhou, F. Cai, L. Kong, M. Huang, and B. Faltings, "Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems," arXiv preprint arXiv:2108.12589, 2021.
    [8] D.-H. Lee, "Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks," in Workshop on challenges in representation learning, ICML, 2013, vol. 3, no. 2, p. 896.
    [9] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, "Mixmatch: A holistic approach to semi-supervised learning," Advances in Neural Information Processing Systems, vol. 32, 2019.
    [10] K. Sohn et al., "Fixmatch: Simplifying semi-supervised learning with consistency and confidence," Advances in Neural Information Processing Systems, vol. 33, pp. 596-608, 2020.
    [11] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
    [12] G. A. Miller, "WordNet: a lexical database for English," Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.
    [13] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
    [14] N. Ng, K. Cho, and M. Ghassemi, "SSMBA: Self-supervised manifold based data augmentation for improving out-of-domain robustness," arXiv preprint arXiv:2009.10195, 2020.
    [15] X. Ding, B. Liu, and P. S. Yu, "A holistic lexicon-based approach to opinion mining," in Proceedings of the 2008 international conference on web search and data mining, 2008, pp. 231-240.
    [16] M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168-177.
    [17] Q. Liu, Z. Gao, B. Liu, and Y. Zhang, "Automated rule selection for aspect extraction in opinion mining," in Twenty-Fourth international joint conference on artificial intelligence, 2015.
    [18] R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng, "Parsing with compositional vector grammars," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2013, pp. 455-465.
    [19] B. Pang and L. Lee, "A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts," arXiv preprint cs/0409058, 2004.
    [20] M. Ganapathibhotla and B. Liu, "Mining opinions in comparative sentences," in Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), 2008, pp. 241-248.
    [21] X. Li and D. Roth, "Learning question classifiers," in COLING 2002: The 19th International Conference on Computational Linguistics, 2002.
    [22] B. Pang and L. Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales," arXiv preprint cs/0506075, 2005.

    QR CODE