簡易檢索 / 詳目顯示

研究生: 林小芳
Gloria Stefani Subagio
論文名稱: 應用孿生長短期記憶網路於語境詞意鑑別之研究
Siamese LSTM Network for Discriminating Word Sense within Context
指導教授: 林伯慎
Bor-Shen Lin
口試委員: 羅乃維
Nai-Wei Lo
楊傳凱
Chuan-Kai Yang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 61
中文關鍵詞: 詞義消歧暹羅網絡長期短期記憶獨立句子模型雙向LSTM
外文關鍵詞: word sense disambiguation, siamese network, long-short term memory, separate sentence model, bi-directional LSTM
相關次數: 點閱:262下載:25
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

一個單詞根據其上下文可能具有不同的含義。在自然語言處理中,解析單詞的含義是一項具有挑戰性的任務,稱為詞義消歧,它專門致力於如何識別或區分單詞的含義。在本研究中,我們使用名為“上下文中的單詞”的數據集來研究如何在上下文中區分單詞的意義。此數據集中的每個樣本都包含一對含有共同目標詞的句子和一個標記,標記用於描述在兩個句子中該目標詞的詞義是否相同。本研究提出了一個孿生LSTM網絡,使其根據從LSTM層所累積的隱藏狀態來學習以及區別兩個句子的含義。我們評估了單向和雙向結構,並比較了整句模型和以目標詞分隔句子的模型。實驗結果顯示,雙向結構的分類正確率優於單向結構,分隔句子模型優於整句模型。我們也在語義搜索任務上測試孿生網絡,發現所提出的方法可以有效地找出與查詢語句在語義上相似的語句。


A word might carry different meanings depending on its contexts. In natural language processing, resolving the meaning of the words is a challenging task called word sense disambiguation (WSD), which is specifically focused on how to identify or discriminate the meanings of the words. In this study we use the dataset named “word in context” to investigate how to discriminate the word sense within context. Each sample in this dataset contains a pair of sentences with a common target word and a label describing whether or not the word senses of that target word in the two sentences are the same. A siamese network was proposed to learn and discriminate the meanings of the sentence pairs according to their hidden states accumulated from the LSTM layer. In this study, uni-directional and bi-directional structure of LSTM are evaluated, and the models of whole sentence and separate sentence are compared. Experimental results show that bi-directional structure is superior to uni-directional structure, and the model of separate sentence is better than that of whole sentence. When the siamese network is tested on the semantic search task, it could be found that the proposed approaches are effective to find out those sentences that are semantically similar to the query sentence.

摘要 iv Abstract v Acknowledgement vi Content of Table ix Content of Figure x List of Algorithms xi Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 1 1.3 Summary 2 Chapter 2 Literature Review 4 2.1 Word Sense Disambiguation 4 2.1.1 Basic Concept 4 2.1.2 WiC: The Word-in-Context Dataset 5 2.2 Preliminary Network and Method 8 2.2.1 Recurrent Neural Network 8 2.2.2 Siamese Network 12 2.2.3 Word Embeddings 13 2.3 Summary 17 Chapter 3 Baseline Approach 18 3.1 Uni-directional Siamese LSTM 18 3.2 Dataset 20 3.3 Experimental results 21 3.4 Bi-directional Siamese LSTM Network 21 3.5 Implementation 23 3.6 Summary 24 Chapter 4 Improvement and Analysis 25 4.1 Siamese Network with Separate LSTM 25 4.3 Semantic Search 29 4.4 Additional Implementation with GloVe Embedding 36 4.5 Additional Implementation using Attention-based Network 38 4.6 Additional Implementation using LSTM Separate Bi-directional on Each Separation Segment 40 4.5 Additional Implementation using LSTM Segmentation 41 4.6 Summary 42 Chapter 5 Conclusion 44 References 46

[1] E. Brill, “Transformation-based error-driven learning and natural language processing: A case studyin part-of-speech tagging,”Computational linguistics, vol. 21, no. 4, pp. 543–565, 1995.
[2] R. Navigli, “Word sense disambiguation: A survey,”ACM computing surveys (CSUR), vol. 41, no. 2,p. 10, 2009.
[3] R.Navigli, “Meaningful clustering of senses helps boost word sense disambiguation performance,” in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.105–112,Association for Computational Linguistics, 2006.
[4] D. S. Chaplot and R. Salakhutdinov, “Knowledge-based word sense disambiguation using topic models,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[5] E. Agirre and D. Martinez, “Knowledge sources for word sense disambiguation,” in International Conference on Text, Speech and Dialogue, pp. 1–10, Springer, 2001.
[6] F. Luo, T. Liu, Q. Xia, B. Chang, and Z. Sui, “Incorporating glosses into neural word sense disam-biguation,” arXiv preprint arXiv:1805.08028, 2018
[7] Y. K. Lee, H. T. Ng, and T. K. Chia, “Supervised word sense disambiguation with support vector machines and multiple knowledge sources,” in Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140, 2004
[8] A. Raganato, C. D. Bovi, and R. Navigli, “Neural sequence learning models for word sense disambiguation,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1156–1167, 2017.
[9] I. Iacobacci, M. T. Pilehvar, and R. Navigli, “Sensembed: Learning sense embeddings for word and relational similarity,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing Volume 1:Long Papers), pp. 95–105, 2015.
[10] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, “Improving word representations via global context and multiple word prototypes,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 873–882, Association for Computational Linguistics, 2012.
[11] M. T. Pilehvar and J. Camacho-Collados, “Wic: the word-in-context dataset for evaluating context-sensitive meaning representations,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers), pp. 1267–1273, 2019.
[12] M. E. Peters, M. Neumann, I. Logan, L. Robert, R. Schwartz, V. Joshi, S. Singh, and N. A. Smith,“Knowledge enhanced contextual word representations,” arXiv preprint arXiv:1909.04164, 2019.
[13] Y. Levine, B. Lenz, O. Dagan, D. Padnos, O. Sharir, S. Shalev-Shwartz, A. Shashua, and Y. Shoham,“Sensebert: Driving some sense into bert,” arXiv preprint arXiv:1908.05646, 2019.
[14] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International conference on machine learning, pp. 1310–1318, 2013.
[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[16] A. Graves, “Supervised sequence labelling,” in Supervised sequence labelling with recurrent neural networks, pp. 5–13, Springer, 2012.
[17] I. Sutskever, O. Vinyals, and Q. Le, “Sequence to sequence learning with neural networks,” Advances in NIPS, 2014.
[18] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, “Signature verification using a ”siamese” time delay neural network,” in Advances in neural information processing systems, pp.737–744,1994.
[19] R.R.Varior, B.Shuai,J. Lu,D. Xu,and G.Wang,“A siamese long short-term memory architecture for human re-identification,” in European conference on computer vision, pp. 135–153, Springer, 2016.
[20] G.Koch, R.Zemel, and R.Salakhutdinov,“Siamese neural networks for one-shot image recognition,” in ICML deep learning workshop, vol. 2, 2015.
[21] P.Neculoiu, M.Versteegh,andM.Rotaru,“Learning text similarity with siamese recurrent networks,” in Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 148–157, 2016.
[22] J. Mueller and A. Thyagarajan, “Siamese recurrent architectures for learning sentence similarity,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[23] C. Baziotis, N. Pelekis, and C. Doulkeridis, “Datastories at semeval-2017 task 6: Siamese lstm with attention for humorous text comparison,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 390–395, 2017.
[24] J.Ramoset al., “Using tf-idf to determine word relevance in document queries,” in Proceedings of the first instructional conference on machine learning, vol. 242, pp. 133–142, Piscataway, NJ, 2003.
[25] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013.
[26] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
[27] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[28] O. Melamud, J. Goldberger, and I. Dagan, “context2vec: Learning generic context embedding with bidirectional lstm,”in Proceedings of the 20th SIGNLL conference on computational natural language learning, pp. 51–61, 2016.
[29] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[30] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional trans-formers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

QR CODE