簡易檢索 / 詳目顯示

研究生: 盧建榮
Christopher Albert Lorentius
論文名稱: IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding
IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding
指導教授: 呂永和
Yung-Ho Leu
口試委員: 楊維寧
Wei-Ning Yang
陳雲岫
Yun-Shiow Chen
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 51
中文關鍵詞: DataNLPLanguage ModelingSummarizationSentiment AnalysisPOS TaggerELECTRA
外文關鍵詞: Data, NLP, Language Modeling, Summarization, Sentiment Analysis, POS Tagger, ELECTRA
相關次數: 點閱:311下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報


Abstract— According to their state-of-the-art performance, transformer-based models like BERT have become increasingly popular recently. To allow successful transfer learning for natural language processing, deep learning-based language models pre-trained on large unannotated text corporations have been developed. IndoBERT has been able to achieve better performance than M-BERT models on Indonesian Language. However, ELECTRA approach works well at scale while using less than 1/4 of the compute time. This paper proposes a monolingual ELECTRA for the Indonesian language (Indo-ELECTRA), which shows its state-of-the-art performance compared to IndoBERT and Multilingual BERT (M-BERT) models.
We built Indo-ELECTRA from scratch. This model shows the best result compared to the multilingual BERT model and IndoBERT on downstream NLP tasks such as Sentiment Analysis and Summarization, and POS Tagger task

Table of Contents ABSTRACT iv Acknowledgements v List of Equations ix List of Tables x List of Figures xi Chapter 1 1 INTRODUCTION 1 1.1 Background of the Problem 1 1.2 Research Objectives 3 1.3 Research Contributions 3 Chapter 2 4 LITERATURE REVIEW 4 2.1 Related Works 4 2.2 Theoretical Foundation 4 2.2.1 Machine Learning 5 2.2.2 Deep Learning 5 2.2.3 Natural Language Processing 9 2.2.4 Transformers 10 2.2.5 ELECTRA 15 Chapter 3 18 RESEARCH METHODOLOGY 18 3.1 Literature Study 18 3.2 Tools of Research 18 3.3 Datasets 18 3.3.1 Datasets for IndoELECTRA 18 3.3.2 Datasets for Sentiment Analysis 19 3.3.3 Datasets for Summarization 19 3.3.4 Datasets for POS Tagger 20 3.4 Work Procedure for IndoELECTRA 21 3.5 Work Procedure for Sentiment Analysis 23 3.5.1 Model Architecture 23 3.6 Work Procedure for Summarization 24 3.6.1 Pre-processing 24 3.6.2 Summarization 24 3.7 Work Procedure for POS Tagger 26 3.7.1 Model Architecture 26 3.8 Evaluation 26 3.8.1 Evaluation for Sentiment Analysis and POS Tagger 26 3.8.2 Evaluation for Summarization 27 Chapter 4 28 EXPERIMENTAL RESULT 28 4.1 Experimental IndoELECTRA 28 4.1.1 Dataset 28 4.1.2 Experimental Settings 28 4.2 Experimental Sentiment Analysis 30 4.2.1 Datasets 30 4.2.2 Experimental Settings 30 4.2.3 Baseline Methods 30 4.2.4 Results 31 4.3 Experimental Summarization 32 4.3.1 Datasets 32 4.3.2 Experimental Settings 32 4.3.3 Baseline Methods 32 4.3.4 Result 32 4.4 Experimental POS Tagger 33 4.4.1 Datasets 33 4.4.2 Experimental Settings 33 4.4.3 Baseline Methods 33 4.4.4 Results 34 Chapter 5 35 CONCLUSIONS AND FUTURE RESEARCH 35 5.1 Result Summary 35 3.2 Limitation 36 3.3 Future Research 36 REFERENCES 37

REFERENCES
[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.
[2] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. 2019.
[3] Y. Liu et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019.
[4] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. Le, XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2019.
[5] C. Raffel et al., Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. 2019.
[6] A. Radford, "Improving Language Understanding by Generative Pre-Training," 2018.
[7] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators," ArXiv, vol. abs/2003.10555, 2020.
[8] S. Lintang and Y. Leu, IndoBERT: Transformer-based Model for Indonesian Language Understanding. 2020.
[9] B. Wilie et al., IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. 2020.
[10] F. Koto, A. Rahimi, J. Lau, and T. Baldwin, IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. 2020.
[11] A. Vaswani et al., "Attention Is All You Need," 06/12 2017.
[12] M. Polignano, P. Basile, M. de Gemmis, G. Semeraro, and V. Basile, ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. 2019.
[13] W. Antoun, F. Baly, and H. Hajj, AraBERT: Transformer-based Model for Arabic Language Understanding. 2020.
[14] W. Vries, A. Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, and M. Nissim, BERTje: A Dutch BERT Model. 2019.
[15] A. Virtanen et al., Multilingual is not enough: BERT for Finnish. 2019.
[16] F. Souza, R. Nogueira, and R. Lotufo, Portuguese Named Entity Recognition using BERT-CRF. 2019.
[17] F. Souza, R. Nogueira, and R. Lotufo, "BERTimbau: Pretrained BERT Models for Brazilian Portuguese," 2020, pp. 403-417.
[18] Y. Kuratov and M. Arkhipov, Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. 2019.
[19] D. Nguyen and A. Nguyen, PhoBERT: Pre-trained language models for Vietnamese. 2020.
[20] Z. Husein, "Malaya, Natural-Language-Toolkit library for bahasa Malaysia," p. https://github.com/huseinzol05/malaya.
[21] V. The, O. Tran, and L.-H. Phuong, Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models. 2020.
[22] T. M. Mitchell, Machine Learning. McGraw-Hill, Inc., 1997.
[23] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015/05/01 2015, doi: 10.1038/nature14539.
[24] Y. Bengio, A. Courville, and P. Vincent, "Representation Learning: A Review and New Perspectives," IEEE transactions on pattern analysis and machine intelligence, vol. 35, pp. 1798-1828, 08/01 2013, doi: 10.1109/TPAMI.2013.50.
[25] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, vol. 61, 04/30 2014, doi: 10.1016/j.neunet.2014.09.003.
[26] L. Deng and D. Yu, "Deep Learning: Methods and Applications," Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197-387, 2014, doi: 10.1561/2000000039.
[27] F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological review, vol. 65 6, pp. 386-408, 1958.
[28] V. Nair and G. E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines," in ICML, 2010.
[29] I. Sutskever, O. Vinyals, and Q. Le, "Sequence to Sequence Learning with Neural Networks," Advances in Neural Information Processing Systems, vol. 4, 09/10 2014.
[30] S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural computation, vol. 9, pp. 1735-80, 12/01 1997, doi: 10.1162/neco.1997.9.8.1735.
[31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986/10/01 1986, doi: 10.1038/323533a0.
[32] M. Sundermeyer, R. Schl¸ter, and H. Ney, "LSTM Neural Networks for Language Modeling," in INTERSPEECH, 2012.
[33] V. V enÌ, T. V. Brn , G. A. MultimÈdiÌ, and D. n. Pr·ce, "Statistical Language Models Based on Neural Networks," 2012.
[34] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," CoRR, vol. abs/1409.0473, 2015.
[35] K. Cho et al., "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," ArXiv, vol. abs/1406.1078, 2014.
[36] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in ICML '08, 2008.
[37] I. J. Goodfellow et al., "Generative Adversarial Nets," in NIPS, 2014.
[38] M. Caccia, L. Caccia, W. Fedus, H. Larochelle, J. Pineau, and L. Charlin, "Language GANs Falling Short," ArXiv, vol. abs/1811.02549, 2020.
[39] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, "Dataset Indonesia untuk Analisis Sentimen," 2019.
[40] K. Kurniawan and S. Louvan, "Indosum: A New Benchmark Dataset for Indonesian Text Summarization," 2018 International Conference on Asian Language Processing (IALP), pp. 215-220, 2018.
[41] R. McDonald et al., "Universal Dependency Annotation for Multilingual Parsing," in ACL, 2013.
[42] Y. Liu, "Fine-tune BERT for Extractive Summarization," ArXiv, vol. abs/1903.10318, 2019.
[43] C.-Y. Lin, ROUGE: A Package for Automatic Evaluation of summaries. 2004, p. 10.

QR CODE