IndoBERT: Transformer-based Model for Indonesian Language Understanding｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	Sarah Lintang Sariwening Sarah Lintang Sariwening
論文名稱：	IndoBERT: Transformer-based Model for Indonesian Language Understanding IndoBERT: Transformer-based Model for Indonesian Language Understanding
指導教授：	呂永和 Yungho Leu
口試委員:	楊維寧 Wei-Ning Yang Yun-Shiou Chen Yun-Shiou Chen
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	64
中文關鍵詞：	Data 、NLP 、Language Modeling 、Summarization 、Sentiment Analysis 、BERT
外文關鍵詞：	Data, NLP, Language Modeling, Summarization, Sentiment Analysis, BERT
相關次數：	點閱：435 下載：5
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

Deep learning-based language models pre-trained on large unannotated text corpora have been developed to allow efficient transfer learning for natural language processing. A recent approach, Transformer-based models such as BERT, has become increasingly popular due to their state-of-the-art performance. However, most work of these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Indonesian language (IndoBERT), which shows its state-of-the-art performance compared to other architectures and Multilingual BERT (M-BERT) models.
We built IndoBERT from scratch. This model consistently outperforms the multilingual BERT model on downstream NLP tasks such as Sentiment Analysis and Summarization task.

TABLE OF CONTENTS
Pages
ABSTRACT    ii
ACKNOWLEDGEMENTS    iii
TABLE OF CONTENTS    iii
LIST OF FIGURES    iv
LIST OF TABLES    v
CHAPTER 1 INTRODUCTION    6
1.1    Background of the Problem    6
Research Question    8
Limitation of Research    8
1.2    Objective    9
1.3    Benefits of Research    9
CHAPTER 2 LITERATURE REVIEW    10
2.1    Related Works    10
2.2    Theoretical Foundation    13
Language Modelling    18
Encoder-Decoder Model    19
Attention    20
Transformers    21
CHAPTER 3 RESEARCH METHODOLOGY    29
3.1    Literature Study    29
3.2    Tools of Research    29
3.3    Data for research    29
3.4    Work Procedure for IndoBERTT    32
3.5    Work Procedure for Summarization    34
3.6    Work Procedure for Sentiment Analysis    39
3.7    Work Procedure for POS Tagger    40
3.8    Evaluation    41
CHAPTER 4 EXPERIMENTAL RESULT    42
4.1    Experimental IndoBERT    42
4.2    Experimental Summarization    44
4.3    Experimental Sentiment Analysis    48
4.4    Experimental POS Tagger    50
CHAPTER 5    52
REFERENCES    55
                                

[1] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” ArXiv13104546 Cs Stat, Oct. 2013, Accessed: Jun. 21, 2020. [Online]. Available: http://arxiv.org/abs/1310.4546.
[2] J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543, doi: 10.3115/v1/D14-1162.
[3] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” ArXiv160704606 Cs, Jun. 2017, Accessed: Jun. 21, 2020. [Online]. Available: http://arxiv.org/abs/1607.04606.
[4] M. Peters et al., “Deep Contextualized Word Representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, 2018, pp. 2227–2237, doi: 10.18653/v1/N18-1202.
[5] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” p. 24.
[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” ArXiv181004805 Cs, May 2019, Accessed: May 07, 2020. [Online]. Available: http://arxiv.org/abs/1810.04805.
[7] G. Huang and H. Hu, “c-RNN: A Fine-Grained Language Model for Image Captioning,” Neural Process. Lett., vol. 49, no. 2, pp. 683–691, Apr. 2019, doi: 10.1007/s11063-018-9836-2.
[8] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[9] A. Vaswani et al., “Attention Is All You Need,” ArXiv170603762 Cs, Dec. 2017, Accessed: May 14, 2020. [Online]. Available: http://arxiv.org/abs/1706.03762.
[10] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” p. 12.
[11] W. de Vries, A. van Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, and M. Nissim, “BERTje: A Dutch BERT Model,” ArXiv191209582 Cs, Dec. 2019, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1912.09582.
[12] M. Polignano, P. Basile, and M. de Gemmis, “ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets,” p. 6.
[13] W. Antoun, F. Baly, and H. Hajj, “AraBERT: Transformer-based Model for Arabic Language Understanding,” ArXiv200300104 Cs, Jun. 2020, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/2003.00104.
[14] A. Virtanen et al., “Multilingual is not enough: BERT for Finnish,” ArXiv191207076 Cs, Dec. 2019, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1912.07076.
[15] Y. Kuratov and M. Arkhipov, “Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language,” ArXiv190507213 Cs, May 2019, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1905.07213.
[16] F. Souza, R. Nogueira, and R. Lotufo, “Portuguese Named Entity Recognition using BERT-CRF,” ArXiv190910649 Cs, Feb. 2020, Accessed: Jul. 06, 2020. [Online]. Available: http://arxiv.org/abs/1909.10649.
[17] H. Imaduddin, Widyawan, and S. Fauziati, “Word Embedding Comparison for Indonesian Language Sentiment Analysis,” in 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), Yogyakarta, Indonesia, Mar. 2019, pp. 426–430, doi: 10.1109/ICAIIT.2019.8834536.
[18] A. Najibullah, “Indonesian Text Summarization based on Naïve Bayes Method,” p. 12.
[19] G. Garmastewira, “SUMMARIZING INDONESIAN NEWS ARTICLES USING GRAPH CONVOLUTIONAL NETWORK,” J. Inf. Commun. Technol., vol. 18, no. 3, pp. 345–365, 2019, doi: 10.32890/jict2019.18.3.6.
[20] K. Kurniawan and S. Louvan, “IndoSum: A New Benchmark Dataset for Indonesian Text Summarization,” ArXiv181005334 Cs, Mar. 2019, Accessed: Apr. 27, 2020. [Online]. Available: http://arxiv.org/abs/1810.05334.
[21] Z. Cai, N. Lin, C. Ma, and S. Jiang, “Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level,” in Proceedings of the 2019 International Conference on Big Data Engineering (BDE 2019) - BDE 2019, Hong Kong, Hong Kong, 2019, pp. 30–35, doi: 10.1145/3341620.3341626.
[22] S. G. S. Syahra, “IMPLEMENTASI ATTENTIVE RECURRENT NEURAL NETWORK DALAM PEMBUATAN HEADLINE DENGAN PENDEKATAN ABSTRAKTIF,” p. 101.
[23] R. Adelia, S. Suyanto, and U. N. Wisesty, “Indonesian Abstractive Text Summarization Using Bidirectional Gated Recurrent Unit,” Procedia Comput. Sci., vol. 157, pp. 581–588, 2019, doi: 10.1016/j.procs.2019.09.017.
[24] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset Indonesia untuk Analisis Sentimen,” J. Nas. Tek. Elektro Dan Teknol. Inf. JNTETI, vol. 8, no. 4, p. 334, Nov. 2019, doi: 10.22146/jnteti.v8i4.533.
[25] A. M. Ningtyas and G. B. Herwanto, “The Influence of Negation Handling on Sentiment Analysis in Bahasa Indonesia,” in 2018 5th International Conference on Data and Software Engineering (ICoDSE), Mataram, Lombok, Indonesia, Nov. 2018, pp. 1–6, doi: 10.1109/ICODSE.2018.8705802.
[26] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.,” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958, doi: 10.1037/h0042519.
[27] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” p. 8.
[28] J. Goodman, “A Bit of Progress in Language Modeling,” arXiv:cs/0108005, Aug. 2001, Accessed: May 26, 2020. [Online]. Available: http://arxiv.org/abs/cs/0108005.
[29] R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here?,” Proc. IEEE, vol. 88, no. 8, pp. 1270–1278, Aug. 2000, doi: 10.1109/5.880083.
[30] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” ArXiv14093215 Cs, Dec. 2014, Accessed: May 26, 2020. [Online]. Available: http://arxiv.org/abs/1409.3215.
[31] M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” ArXiv150804025 Cs, Sep. 2015, Accessed: May 27, 2020. [Online]. Available: http://arxiv.org/abs/1508.04025.
[32] Y. Zhu et al., “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books,” ArXiv150606724 Cs, Jun. 2015, Accessed: May 27, 2020. [Online]. Available: http://arxiv.org/abs/1506.06724.
[33] Y. Liu, “Fine-tune BERT for Extractive Summarization,” ArXiv190310318 Cs, Sep. 2019, Accessed: May 07, 2020. [Online]. Available: http://arxiv.org/abs/1903.10318.
[34] H. Zhang, J. Xu, and J. Wang, “Pretraining-Based Natural Language Generation for Text Summarization,” ArXiv190209243 Cs, Apr. 2019, Accessed: May 14, 2020. [Online]. Available: http://arxiv.org/abs/1902.09243.
[35] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” p. 8.
[36] J. M. Conroy and D. P. O’leary, “Text summarization via hidden Markov models,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’01, New Orleans, Louisiana, United States, 2001, pp. 406–407, doi: 10.1145/383952.384042.
[37] S. D. Larasati, V. Kuboň, and D. Zeman, “Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus,” in Systems and Frameworks for Computational Morphology, vol. 100, C. Mahlow and M. Piotrowski, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 119–129.

簡易檢索 / 詳目顯示

相關論文