IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding

簡易檢索 / 詳目顯示

回結果列表

研究生：	盧建榮 Christopher Albert Lorentius
論文名稱：	IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding
指導教授：	呂永和 Yung-Ho Leu
口試委員:	楊維寧 Wei-Ning Yang 陳雲岫 Yun-Shiow Chen
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	51
中文關鍵詞：	Data 、NLP 、Language Modeling 、Summarization 、Sentiment Analysis 、POS Tagger 、ELECTRA
外文關鍵詞：	Data, NLP, Language Modeling, Summarization, Sentiment Analysis, POS Tagger, ELECTRA
相關次數：	點閱：317 下載：10
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

Abstract— According to their state-of-the-art performance, transformer-based models like BERT have become increasingly popular recently. To allow successful transfer learning for natural language processing, deep learning-based language models pre-trained on large unannotated text corporations have been developed. IndoBERT has been able to achieve better performance than M-BERT models on Indonesian Language. However, ELECTRA approach works well at scale while using less than 1/4 of the compute time. This paper proposes a monolingual ELECTRA for the Indonesian language (Indo-ELECTRA), which shows its state-of-the-art performance compared to IndoBERT and Multilingual BERT (M-BERT) models.
We built Indo-ELECTRA from scratch. This model shows the best result compared to the multilingual BERT model and IndoBERT on downstream NLP tasks such as Sentiment Analysis and Summarization, and POS Tagger task

Table of Contents
ABSTRACT    iv
Acknowledgements    v
List of Equations    ix
List of Tables    x
List of Figures    xi
Chapter 1    1
INTRODUCTION    1
1    Background of the Problem    1
2    Research Objectives    3
3    Research Contributions    3
Chapter 2    4
LITERATURE REVIEW    4
1    Related Works    4
2    Theoretical Foundation    4
2.1    Machine Learning    5
2.2    Deep Learning    5
2.3    Natural Language Processing    9
2.4    Transformers    10
2.5    ELECTRA    15
Chapter 3    18
RESEARCH METHODOLOGY    18
1    Literature Study    18
2    Tools of Research    18
3    Datasets    18
3.1    Datasets for IndoELECTRA    18
3.2    Datasets for Sentiment Analysis    19
3.3    Datasets for Summarization    19
3.4    Datasets for POS Tagger    20
4    Work Procedure for IndoELECTRA    21
5    Work Procedure for Sentiment Analysis    23
5.1    Model Architecture    23
6    Work Procedure for Summarization    24
6.1    Pre-processing    24
6.2    Summarization    24
7    Work Procedure for POS Tagger    26
7.1    Model Architecture    26
8    Evaluation    26
8.1    Evaluation for Sentiment Analysis and POS Tagger    26
8.2    Evaluation for Summarization    27
Chapter 4    28
EXPERIMENTAL RESULT    28
1    Experimental IndoELECTRA    28
1.1    Dataset    28
1.2    Experimental Settings    28
2    Experimental Sentiment Analysis    30
2.1    Datasets    30
2.2    Experimental Settings    30
2.3    Baseline Methods    30
2.4    Results    31
3    Experimental Summarization    32
3.1    Datasets    32
3.2    Experimental Settings    32
3.3    Baseline Methods    32
3.4    Result    32
4    Experimental POS Tagger    33
4.1    Datasets    33
4.2    Experimental Settings    33
4.3    Baseline Methods    33
4.4    Results    34
Chapter 5    35
CONCLUSIONS AND FUTURE RESEARCH    35
1    Result Summary    35
2    Limitation    36
3    Future Research    36
REFERENCES    37

                                

REFERENCES
[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.
[2] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. 2019.
[3] Y. Liu et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019.
[4] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. Le, XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2019.
[5] C. Raffel et al., Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. 2019.
[6] A. Radford, "Improving Language Understanding by Generative Pre-Training," 2018.
[7] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators," ArXiv, vol. abs/2003.10555, 2020.
[8] S. Lintang and Y. Leu, IndoBERT: Transformer-based Model for Indonesian Language Understanding. 2020.
[9] B. Wilie et al., IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. 2020.
[10] F. Koto, A. Rahimi, J. Lau, and T. Baldwin, IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. 2020.
[11] A. Vaswani et al., "Attention Is All You Need," 06/12 2017.
[12] M. Polignano, P. Basile, M. de Gemmis, G. Semeraro, and V. Basile, ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. 2019.
[13] W. Antoun, F. Baly, and H. Hajj, AraBERT: Transformer-based Model for Arabic Language Understanding. 2020.
[14] W. Vries, A. Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, and M. Nissim, BERTje: A Dutch BERT Model. 2019.
[15] A. Virtanen et al., Multilingual is not enough: BERT for Finnish. 2019.
[16] F. Souza, R. Nogueira, and R. Lotufo, Portuguese Named Entity Recognition using BERT-CRF. 2019.
[17] F. Souza, R. Nogueira, and R. Lotufo, "BERTimbau: Pretrained BERT Models for Brazilian Portuguese," 2020, pp. 403-417.
[18] Y. Kuratov and M. Arkhipov, Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. 2019.
[19] D. Nguyen and A. Nguyen, PhoBERT: Pre-trained language models for Vietnamese. 2020.
[20] Z. Husein, "Malaya, Natural-Language-Toolkit library for bahasa Malaysia," p. https://github.com/huseinzol05/malaya.
[21] V. The, O. Tran, and L.-H. Phuong, Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models. 2020.
[22] T. M. Mitchell, Machine Learning. McGraw-Hill, Inc., 1997.
[23] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015/05/01 2015, doi: 10.1038/nature14539.
[24] Y. Bengio, A. Courville, and P. Vincent, "Representation Learning: A Review and New Perspectives," IEEE transactions on pattern analysis and machine intelligence, vol. 35, pp. 1798-1828, 08/01 2013, doi: 10.1109/TPAMI.2013.50.
[25] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, vol. 61, 04/30 2014, doi: 10.1016/j.neunet.2014.09.003.
[26] L. Deng and D. Yu, "Deep Learning: Methods and Applications," Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197-387, 2014, doi: 10.1561/2000000039.
[27] F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological review, vol. 65 6, pp. 386-408, 1958.
[28] V. Nair and G. E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines," in ICML, 2010.
[29] I. Sutskever, O. Vinyals, and Q. Le, "Sequence to Sequence Learning with Neural Networks," Advances in Neural Information Processing Systems, vol. 4, 09/10 2014.
[30] S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural computation, vol. 9, pp. 1735-80, 12/01 1997, doi: 10.1162/neco.1997.9.8.1735.
[31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986/10/01 1986, doi: 10.1038/323533a0.
[32] M. Sundermeyer, R. Schl¸ter, and H. Ney, "LSTM Neural Networks for Language Modeling," in INTERSPEECH, 2012.
[33] V. V enÌ, T. V. Brn , G. A. MultimÈdiÌ, and D. n. Pr·ce, "Statistical Language Models Based on Neural Networks," 2012.
[34] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," CoRR, vol. abs/1409.0473, 2015.
[35] K. Cho et al., "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," ArXiv, vol. abs/1406.1078, 2014.
[36] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in ICML '08, 2008.
[37] I. J. Goodfellow et al., "Generative Adversarial Nets," in NIPS, 2014.
[38] M. Caccia, L. Caccia, W. Fedus, H. Larochelle, J. Pineau, and L. Charlin, "Language GANs Falling Short," ArXiv, vol. abs/1811.02549, 2020.
[39] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, "Dataset Indonesia untuk Analisis Sentimen," 2019.
[40] K. Kurniawan and S. Louvan, "Indosum: A New Benchmark Dataset for Indonesian Text Summarization," 2018 International Conference on Asian Language Processing (IALP), pp. 215-220, 2018.
[41] R. McDonald et al., "Universal Dependency Annotation for Multilingual Parsing," in ACL, 2013.
[42] Y. Liu, "Fine-tune BERT for Extractive Summarization," ArXiv, vol. abs/1903.10318, 2019.
[43] C.-Y. Lin, ROUGE: A Package for Automatic Evaluation of summaries. 2004, p. 10.

簡易檢索 / 詳目顯示

相關論文