研究生: |
盧建榮 Christopher Albert Lorentius |
---|---|
論文名稱: |
IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding IndoELECTRA: A Pre-Trained Language Model For Indonesian Language Understanding |
指導教授: |
呂永和
Yung-Ho Leu |
口試委員: |
楊維寧
Wei-Ning Yang 陳雲岫 Yun-Shiow Chen |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 51 |
中文關鍵詞: | Data 、NLP 、Language Modeling 、Summarization 、Sentiment Analysis 、POS Tagger 、ELECTRA |
外文關鍵詞: | Data, NLP, Language Modeling, Summarization, Sentiment Analysis, POS Tagger, ELECTRA |
相關次數: | 點閱:317 下載:10 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Abstract— According to their state-of-the-art performance, transformer-based models like BERT have become increasingly popular recently. To allow successful transfer learning for natural language processing, deep learning-based language models pre-trained on large unannotated text corporations have been developed. IndoBERT has been able to achieve better performance than M-BERT models on Indonesian Language. However, ELECTRA approach works well at scale while using less than 1/4 of the compute time. This paper proposes a monolingual ELECTRA for the Indonesian language (Indo-ELECTRA), which shows its state-of-the-art performance compared to IndoBERT and Multilingual BERT (M-BERT) models.
We built Indo-ELECTRA from scratch. This model shows the best result compared to the multilingual BERT model and IndoBERT on downstream NLP tasks such as Sentiment Analysis and Summarization, and POS Tagger task
REFERENCES
[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.
[2] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. 2019.
[3] Y. Liu et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019.
[4] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. Le, XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2019.
[5] C. Raffel et al., Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. 2019.
[6] A. Radford, "Improving Language Understanding by Generative Pre-Training," 2018.
[7] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators," ArXiv, vol. abs/2003.10555, 2020.
[8] S. Lintang and Y. Leu, IndoBERT: Transformer-based Model for Indonesian Language Understanding. 2020.
[9] B. Wilie et al., IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. 2020.
[10] F. Koto, A. Rahimi, J. Lau, and T. Baldwin, IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. 2020.
[11] A. Vaswani et al., "Attention Is All You Need," 06/12 2017.
[12] M. Polignano, P. Basile, M. de Gemmis, G. Semeraro, and V. Basile, ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. 2019.
[13] W. Antoun, F. Baly, and H. Hajj, AraBERT: Transformer-based Model for Arabic Language Understanding. 2020.
[14] W. Vries, A. Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, and M. Nissim, BERTje: A Dutch BERT Model. 2019.
[15] A. Virtanen et al., Multilingual is not enough: BERT for Finnish. 2019.
[16] F. Souza, R. Nogueira, and R. Lotufo, Portuguese Named Entity Recognition using BERT-CRF. 2019.
[17] F. Souza, R. Nogueira, and R. Lotufo, "BERTimbau: Pretrained BERT Models for Brazilian Portuguese," 2020, pp. 403-417.
[18] Y. Kuratov and M. Arkhipov, Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. 2019.
[19] D. Nguyen and A. Nguyen, PhoBERT: Pre-trained language models for Vietnamese. 2020.
[20] Z. Husein, "Malaya, Natural-Language-Toolkit library for bahasa Malaysia," p. https://github.com/huseinzol05/malaya.
[21] V. The, O. Tran, and L.-H. Phuong, Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models. 2020.
[22] T. M. Mitchell, Machine Learning. McGraw-Hill, Inc., 1997.
[23] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015/05/01 2015, doi: 10.1038/nature14539.
[24] Y. Bengio, A. Courville, and P. Vincent, "Representation Learning: A Review and New Perspectives," IEEE transactions on pattern analysis and machine intelligence, vol. 35, pp. 1798-1828, 08/01 2013, doi: 10.1109/TPAMI.2013.50.
[25] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, vol. 61, 04/30 2014, doi: 10.1016/j.neunet.2014.09.003.
[26] L. Deng and D. Yu, "Deep Learning: Methods and Applications," Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197-387, 2014, doi: 10.1561/2000000039.
[27] F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological review, vol. 65 6, pp. 386-408, 1958.
[28] V. Nair and G. E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines," in ICML, 2010.
[29] I. Sutskever, O. Vinyals, and Q. Le, "Sequence to Sequence Learning with Neural Networks," Advances in Neural Information Processing Systems, vol. 4, 09/10 2014.
[30] S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural computation, vol. 9, pp. 1735-80, 12/01 1997, doi: 10.1162/neco.1997.9.8.1735.
[31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986/10/01 1986, doi: 10.1038/323533a0.
[32] M. Sundermeyer, R. Schl¸ter, and H. Ney, "LSTM Neural Networks for Language Modeling," in INTERSPEECH, 2012.
[33] V. V enÌ, T. V. Brn , G. A. MultimÈdiÌ, and D. n. Pr·ce, "Statistical Language Models Based on Neural Networks," 2012.
[34] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," CoRR, vol. abs/1409.0473, 2015.
[35] K. Cho et al., "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," ArXiv, vol. abs/1406.1078, 2014.
[36] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in ICML '08, 2008.
[37] I. J. Goodfellow et al., "Generative Adversarial Nets," in NIPS, 2014.
[38] M. Caccia, L. Caccia, W. Fedus, H. Larochelle, J. Pineau, and L. Charlin, "Language GANs Falling Short," ArXiv, vol. abs/1811.02549, 2020.
[39] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, "Dataset Indonesia untuk Analisis Sentimen," 2019.
[40] K. Kurniawan and S. Louvan, "Indosum: A New Benchmark Dataset for Indonesian Text Summarization," 2018 International Conference on Asian Language Processing (IALP), pp. 215-220, 2018.
[41] R. McDonald et al., "Universal Dependency Annotation for Multilingual Parsing," in ACL, 2013.
[42] Y. Liu, "Fine-tune BERT for Extractive Summarization," ArXiv, vol. abs/1903.10318, 2019.
[43] C.-Y. Lin, ROUGE: A Package for Automatic Evaluation of summaries. 2004, p. 10.