研究生: |
林小芳 Gloria Stefani Subagio |
論文名稱: |
應用孿生長短期記憶網路於語境詞意鑑別之研究 Siamese LSTM Network for Discriminating Word Sense within Context |
指導教授: |
Bor-Shen Lin |
口試委員: |
Nai-Wei Lo 楊傳凱 Chuan-Kai Yang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 61 |
中文關鍵詞: | 詞義消歧 、暹羅網絡 、長期短期記憶 、獨立句子模型 、雙向LSTM |
外文關鍵詞: | word sense disambiguation, siamese network, long-short term memory, separate sentence model, bi-directional LSTM |
相關次數: | 點閱:460 下載:25 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
A word might carry different meanings depending on its contexts. In natural language processing, resolving the meaning of the words is a challenging task called word sense disambiguation (WSD), which is specifically focused on how to identify or discriminate the meanings of the words. In this study we use the dataset named “word in context” to investigate how to discriminate the word sense within context. Each sample in this dataset contains a pair of sentences with a common target word and a label describing whether or not the word senses of that target word in the two sentences are the same. A siamese network was proposed to learn and discriminate the meanings of the sentence pairs according to their hidden states accumulated from the LSTM layer. In this study, uni-directional and bi-directional structure of LSTM are evaluated, and the models of whole sentence and separate sentence are compared. Experimental results show that bi-directional structure is superior to uni-directional structure, and the model of separate sentence is better than that of whole sentence. When the siamese network is tested on the semantic search task, it could be found that the proposed approaches are effective to find out those sentences that are semantically similar to the query sentence.
[1] E. Brill, “Transformation-based error-driven learning and natural language processing: A case studyin part-of-speech tagging,”Computational linguistics, vol. 21, no. 4, pp. 543–565, 1995.
[2] R. Navigli, “Word sense disambiguation: A survey,”ACM computing surveys (CSUR), vol. 41, no. 2,p. 10, 2009.
[3] R.Navigli, “Meaningful clustering of senses helps boost word sense disambiguation performance,” in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.105–112,Association for Computational Linguistics, 2006.
[4] D. S. Chaplot and R. Salakhutdinov, “Knowledge-based word sense disambiguation using topic models,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[5] E. Agirre and D. Martinez, “Knowledge sources for word sense disambiguation,” in International Conference on Text, Speech and Dialogue, pp. 1–10, Springer, 2001.
[6] F. Luo, T. Liu, Q. Xia, B. Chang, and Z. Sui, “Incorporating glosses into neural word sense disam-biguation,” arXiv preprint arXiv:1805.08028, 2018
[7] Y. K. Lee, H. T. Ng, and T. K. Chia, “Supervised word sense disambiguation with support vector machines and multiple knowledge sources,” in Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140, 2004
[8] A. Raganato, C. D. Bovi, and R. Navigli, “Neural sequence learning models for word sense disambiguation,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1156–1167, 2017.
[9] I. Iacobacci, M. T. Pilehvar, and R. Navigli, “Sensembed: Learning sense embeddings for word and relational similarity,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing Volume 1:Long Papers), pp. 95–105, 2015.
[10] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, “Improving word representations via global context and multiple word prototypes,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 873–882, Association for Computational Linguistics, 2012.
[11] M. T. Pilehvar and J. Camacho-Collados, “Wic: the word-in-context dataset for evaluating context-sensitive meaning representations,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers), pp. 1267–1273, 2019.
[12] M. E. Peters, M. Neumann, I. Logan, L. Robert, R. Schwartz, V. Joshi, S. Singh, and N. A. Smith,“Knowledge enhanced contextual word representations,” arXiv preprint arXiv:1909.04164, 2019.
[13] Y. Levine, B. Lenz, O. Dagan, D. Padnos, O. Sharir, S. Shalev-Shwartz, A. Shashua, and Y. Shoham,“Sensebert: Driving some sense into bert,” arXiv preprint arXiv:1908.05646, 2019.
[14] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International conference on machine learning, pp. 1310–1318, 2013.
[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[16] A. Graves, “Supervised sequence labelling,” in Supervised sequence labelling with recurrent neural networks, pp. 5–13, Springer, 2012.
[17] I. Sutskever, O. Vinyals, and Q. Le, “Sequence to sequence learning with neural networks,” Advances in NIPS, 2014.
[18] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, “Signature verification using a ”siamese” time delay neural network,” in Advances in neural information processing systems, pp.737–744,1994.
[19] R.R.Varior, B.Shuai,J. Lu,D. Xu,and G.Wang,“A siamese long short-term memory architecture for human re-identification,” in European conference on computer vision, pp. 135–153, Springer, 2016.
[20] G.Koch, R.Zemel, and R.Salakhutdinov,“Siamese neural networks for one-shot image recognition,” in ICML deep learning workshop, vol. 2, 2015.
[21] P.Neculoiu, M.Versteegh,andM.Rotaru,“Learning text similarity with siamese recurrent networks,” in Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 148–157, 2016.
[22] J. Mueller and A. Thyagarajan, “Siamese recurrent architectures for learning sentence similarity,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[23] C. Baziotis, N. Pelekis, and C. Doulkeridis, “Datastories at semeval-2017 task 6: Siamese lstm with attention for humorous text comparison,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 390–395, 2017.
[24] J.Ramoset al., “Using tf-idf to determine word relevance in document queries,” in Proceedings of the first instructional conference on machine learning, vol. 242, pp. 133–142, Piscataway, NJ, 2003.
[25] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013.
[26] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
[27] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[28] O. Melamud, J. Goldberger, and I. Dagan, “context2vec: Learning generic context embedding with bidirectional lstm,”in Proceedings of the 20th SIGNLL conference on computational natural language learning, pp. 51–61, 2016.
[29] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[30] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional trans-formers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.