Basic Search / Detailed Display

Author: Sarah Lintang Sariwening
Sarah Lintang Sariwening
Thesis Title: IndoBERT: Transformer-based Model for Indonesian Language Understanding
IndoBERT: Transformer-based Model for Indonesian Language Understanding
Advisor: 呂永和
Yungho Leu
Committee: 楊維寧
Wei-Ning Yang
Yun-Shiou Chen
Yun-Shiou Chen
Degree: 碩士
Master
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2020
Graduation Academic Year: 108
Language: 英文
Pages: 64
Keywords (in Chinese): DataNLPLanguage ModelingSummarizationSentiment AnalysisBERT
Keywords (in other languages): Data, NLP, Language Modeling, Summarization, Sentiment Analysis, BERT
Reference times: Clicks: 347Downloads: 5
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

Deep learning-based language models pre-trained on large unannotated text corpora have been developed to allow efficient transfer learning for natural language processing. A recent approach, Transformer-based models such as BERT, has become increasingly popular due to their state-of-the-art performance. However, most work of these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Indonesian language (IndoBERT), which shows its state-of-the-art performance compared to other architectures and Multilingual BERT (M-BERT) models.
We built IndoBERT from scratch. This model consistently outperforms the multilingual BERT model on downstream NLP tasks such as Sentiment Analysis and Summarization task.


Deep learning-based language models pre-trained on large unannotated text corpora have been developed to allow efficient transfer learning for natural language processing. A recent approach, Transformer-based models such as BERT, has become increasingly popular due to their state-of-the-art performance. However, most work of these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Indonesian language (IndoBERT), which shows its state-of-the-art performance compared to other architectures and Multilingual BERT (M-BERT) models.
We built IndoBERT from scratch. This model consistently outperforms the multilingual BERT model on downstream NLP tasks such as Sentiment Analysis and Summarization task.

TABLE OF CONTENTS Pages ABSTRACT ii ACKNOWLEDGEMENTS iii TABLE OF CONTENTS iii LIST OF FIGURES iv LIST OF TABLES v CHAPTER 1 INTRODUCTION 6 1.1 Background of the Problem 6 Research Question 8 Limitation of Research 8 1.2 Objective 9 1.3 Benefits of Research 9 CHAPTER 2 LITERATURE REVIEW 10 2.1 Related Works 10 2.2 Theoretical Foundation 13 Language Modelling 18 Encoder-Decoder Model 19 Attention 20 Transformers 21 CHAPTER 3 RESEARCH METHODOLOGY 29 3.1 Literature Study 29 3.2 Tools of Research 29 3.3 Data for research 29 3.4 Work Procedure for IndoBERTT 32 3.5 Work Procedure for Summarization 34 3.6 Work Procedure for Sentiment Analysis 39 3.7 Work Procedure for POS Tagger 40 3.8 Evaluation 41 CHAPTER 4 EXPERIMENTAL RESULT 42 4.1 Experimental IndoBERT 42 4.2 Experimental Summarization 44 4.3 Experimental Sentiment Analysis 48 4.4 Experimental POS Tagger 50 CHAPTER 5 52 REFERENCES 55

[1] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” ArXiv13104546 Cs Stat, Oct. 2013, Accessed: Jun. 21, 2020. [Online]. Available: http://arxiv.org/abs/1310.4546.
[2] J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543, doi: 10.3115/v1/D14-1162.
[3] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” ArXiv160704606 Cs, Jun. 2017, Accessed: Jun. 21, 2020. [Online]. Available: http://arxiv.org/abs/1607.04606.
[4] M. Peters et al., “Deep Contextualized Word Representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, 2018, pp. 2227–2237, doi: 10.18653/v1/N18-1202.
[5] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” p. 24.
[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” ArXiv181004805 Cs, May 2019, Accessed: May 07, 2020. [Online]. Available: http://arxiv.org/abs/1810.04805.
[7] G. Huang and H. Hu, “c-RNN: A Fine-Grained Language Model for Image Captioning,” Neural Process. Lett., vol. 49, no. 2, pp. 683–691, Apr. 2019, doi: 10.1007/s11063-018-9836-2.
[8] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[9] A. Vaswani et al., “Attention Is All You Need,” ArXiv170603762 Cs, Dec. 2017, Accessed: May 14, 2020. [Online]. Available: http://arxiv.org/abs/1706.03762.
[10] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” p. 12.
[11] W. de Vries, A. van Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, and M. Nissim, “BERTje: A Dutch BERT Model,” ArXiv191209582 Cs, Dec. 2019, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1912.09582.
[12] M. Polignano, P. Basile, and M. de Gemmis, “ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets,” p. 6.
[13] W. Antoun, F. Baly, and H. Hajj, “AraBERT: Transformer-based Model for Arabic Language Understanding,” ArXiv200300104 Cs, Jun. 2020, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/2003.00104.
[14] A. Virtanen et al., “Multilingual is not enough: BERT for Finnish,” ArXiv191207076 Cs, Dec. 2019, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1912.07076.
[15] Y. Kuratov and M. Arkhipov, “Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language,” ArXiv190507213 Cs, May 2019, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1905.07213.
[16] F. Souza, R. Nogueira, and R. Lotufo, “Portuguese Named Entity Recognition using BERT-CRF,” ArXiv190910649 Cs, Feb. 2020, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1909.10649.
[17] H. Imaduddin, Widyawan, and S. Fauziati, “Word Embedding Comparison for Indonesian Language Sentiment Analysis,” in 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), Yogyakarta, Indonesia, Mar. 2019, pp. 426–430, doi: 10.1109/ICAIIT.2019.8834536.
[18] A. Najibullah, “Indonesian Text Summarization based on Naïve Bayes Method,” p. 12.
[19] G. Garmastewira, “SUMMARIZING INDONESIAN NEWS ARTICLES USING GRAPH CONVOLUTIONAL NETWORK,” J. Inf. Commun. Technol., vol. 18, no. 3, pp. 345–365, 2019, doi: 10.32890/jict2019.18.3.6.
[20] K. Kurniawan and S. Louvan, “IndoSum: A New Benchmark Dataset for Indonesian Text Summarization,” ArXiv181005334 Cs, Mar. 2019, Accessed: Apr. 27, 2020. [Online]. Available: http://arxiv.org/abs/1810.05334.
[21] Z. Cai, N. Lin, C. Ma, and S. Jiang, “Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level,” in Proceedings of the 2019 International Conference on Big Data Engineering (BDE 2019) - BDE 2019, Hong Kong, Hong Kong, 2019, pp. 30–35, doi: 10.1145/3341620.3341626.
[22] S. G. S. Syahra, “IMPLEMENTASI ATTENTIVE RECURRENT NEURAL NETWORK DALAM PEMBUATAN HEADLINE DENGAN PENDEKATAN ABSTRAKTIF,” p. 101.
[23] R. Adelia, S. Suyanto, and U. N. Wisesty, “Indonesian Abstractive Text Summarization Using Bidirectional Gated Recurrent Unit,” Procedia Comput. Sci., vol. 157, pp. 581–588, 2019, doi: 10.1016/j.procs.2019.09.017.
[24] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset Indonesia untuk Analisis Sentimen,” J. Nas. Tek. Elektro Dan Teknol. Inf. JNTETI, vol. 8, no. 4, p. 334, Nov. 2019, doi: 10.22146/jnteti.v8i4.533.
[25] A. M. Ningtyas and G. B. Herwanto, “The Influence of Negation Handling on Sentiment Analysis in Bahasa Indonesia,” in 2018 5th International Conference on Data and Software Engineering (ICoDSE), Mataram, Lombok, Indonesia, Nov. 2018, pp. 1–6, doi: 10.1109/ICODSE.2018.8705802.
[26] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.,” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958, doi: 10.1037/h0042519.
[27] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” p. 8.
[28] J. Goodman, “A Bit of Progress in Language Modeling,” arXiv:cs/0108005, Aug. 2001, Accessed: May 26, 2020. [Online]. Available: http://arxiv.org/abs/cs/0108005.
[29] R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here?,” Proc. IEEE, vol. 88, no. 8, pp. 1270–1278, Aug. 2000, doi: 10.1109/5.880083.
[30] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” ArXiv14093215 Cs, Dec. 2014, Accessed: May 26, 2020. [Online]. Available: http://arxiv.org/abs/1409.3215.
[31] M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” ArXiv150804025 Cs, Sep. 2015, Accessed: May 27, 2020. [Online]. Available: http://arxiv.org/abs/1508.04025.
[32] Y. Zhu et al., “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books,” ArXiv150606724 Cs, Jun. 2015, Accessed: May 27, 2020. [Online]. Available: http://arxiv.org/abs/1506.06724.
[33] Y. Liu, “Fine-tune BERT for Extractive Summarization,” ArXiv190310318 Cs, Sep. 2019, Accessed: May 07, 2020. [Online]. Available: http://arxiv.org/abs/1903.10318.
[34] H. Zhang, J. Xu, and J. Wang, “Pretraining-Based Natural Language Generation for Text Summarization,” ArXiv190209243 Cs, Apr. 2019, Accessed: May 14, 2020. [Online]. Available: http://arxiv.org/abs/1902.09243.
[35] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” p. 8.
[36] J. M. Conroy and D. P. O’leary, “Text summarization via hidden Markov models,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’01, New Orleans, Louisiana, United States, 2001, pp. 406–407, doi: 10.1145/383952.384042.
[37] S. D. Larasati, V. Kuboň, and D. Zeman, “Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus,” in Systems and Frameworks for Computational Morphology, vol. 100, C. Mahlow and M. Piotrowski, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 119–129.

QR CODE