一個基於自注意力機制結合卷積網路的雙向長短期記憶模型用以實現文本關係分類之方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	史恩維 En-Wei Shih
論文名稱：	一個基於自注意力機制結合卷積網路的雙向長短期記憶模型用以實現文本關係分類之方法 A Text Relation Classification Method Using Bidirectional LSTM Based on Self-Attention Mechanism Combined with Convolutional Neural Networks
指導教授：	范欽雄 Chin-Shyurng Fahn
口試委員:	黃榮堂 Jung-Tang Huang 林啟芳 Chi-Fang Lin 吳怡樂 Yi-Leh Wu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	63
中文關鍵詞：	關係分類、自注意力機制、雙向長短期記憶、卷積神經網路、深度學習、知識圖譜、專家系統、自然語言處理
外文關鍵詞：	Relation Classification, Self-Attention Mechanism, Bidirectional Long Short-term Memory, Convolutional Neural Network, Deep Learning, Knowledge Graph, Expert System, Natural Language Processing
相關次數：	點閱：465 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，深度學習在自然語言處理的領域裡蓬勃發展，尤其在知識圖譜、專家系統，以及問答系統等更趨熱門。關係分類作為上述應用重要的子任務，會根據文本的上下文資訊，判斷兩個實體之間是屬於何種語意關係。
在過去，大多數方法透過文字探勘，並依賴具備先驗知識的自然語言工具輔助，例如：WordNet字典、句法解析器，以及Part-of-Speech (POS) 標籤等等，用以取得文本中的單詞或句法特徵，再以機器學習方式進行分類。然而，關係分類任務隨著遷移學習技術的演進，已逐漸減少對先驗知識的依賴，但是又造成另一個難題，即模型參數與架構極為龐大，導致在訓練或應用上需花費大量的資源及成本。
為了解決上述問題，本文提出了一個可以使用一般等級顯示卡進行訓練的深度學習模型，其採用基於自注意力機制的雙向長短期記憶網路，用以擷取單詞的上下文表示法，並且結合平行架構的多通道卷積神經網路，對詞嵌入層以不同大小的卷積核提取詞級資訊，最後我們以全連接層整合所有特徵，以此進行實體之間的關係判定。
本論文以SemEval 2010-Task8及KBP-37這兩個公開資料集進行研究與實驗，並採用Macro-F1分數作為評分標準。本文所提出的方法在SemEval 2010-Task8資料集中，分數達到85.8%，而在KBP-37資料集中，分數則達到61.8%。在僅使用詞向量作為特徵的條件下與其他同類模型相比，我們的模型有最高的Macro-F1分數。

In recent years, deep learning has flourished in Natural Language Processing (NLP) community, especially in knowledge graphs, expert systems, and question answering systems. In the above applications, relation classification is a vital subtask, which aims at determining the semantic relation between two entities based on the contextual information of the text.
The previous common approaches used text exploration relying on prior knowledge and NLP tools, such as WordNet, dependency parsers, and Part-of-Speech (POS) tags to get syntactic features, so as to utilize a machine learning model to classify the relation types. However, with the evolution of transfer learning, the dependence on prior knowledge has gradually decreased, but another problem will arise; that is, model’s parameters and architectures are extremely large, resulting in a huge cost of training and applications.
In order to solve the problems mentioned above, this thesis proposes a deep learning model which can be trained on general-level GPUs. It adopts bidirectional Long Short-Term Memory (Bi-LSTM) with self-attention mechanism to extract contextual representations of words, as well as utilizes multi-channel Convolutional Neural Network (CNN) in parallel to obtain word-level information from a word embedding layer. Eventually, we integrate all the features by fully connected layers to inference the relation between two entities.
This thesis conducts many experiments on two distinct open datasets: SemEval 2010 Task-8 and KBP-37, and adopts the Macro-F1 score as an evaluation standard. We reach 85.8% and 61.8% on SemEval 2010 Task-8 and KBP-37, respectively. Compared with other similar models, our model achieves the highest Macro-F1 score under the condition that only word vectors are used as features.

中文摘要 i
Abstract ii
Contents iii
誌 謝 v
List of Figures vi
List of Tables ix
Chapter 1 Introduction 1
1.1 Overview 1
1.2 Motivation 2
1.3 System Description 3
1.4 Thesis Organization 4
Chapter 2 Related Works 5
2.1 Rule-Based and Linguistics Methods 5
2.2 Deep Learning Methods 6
Chapter 3 Datasets and Data Preprocessing 8
3.1 Datasets 8
3.1.1 Datasets selection 8
3.1.2 Introduction to SemEval 2010 Task-8 dataset 9
3.1.3 Introduction to KBP-37 dataset 12
3.2 Tokenization and Representation 16
3.2.1 Word preprocessing 16
3.2.2 Word embedding vectors 18
Chapter 4 Deep Learning for Relation Classification 20
4.1 Artificial Neural Networks 20
4.1.1 Convolutional neural network 22
4.1.2 Bidirectional long short-term memory 24
4.1.3 Self-attention 26
4.2 Model Architecture for Text Relation Classification 29
4.2.1 Extract contextual cues from sentences 30
4.2.2 Extract word-level features 31
Chapter 5 Experimental Results and Discussions 32
5.1 Experimental Setup 32
5.1.1 Developing tools setup 32
5.1.2 Word embeddings setup 33
5.2 Evaluation and Visualization on Open Datasets 34
5.2.1 Results of SemEval 2010 Task-8 36
5.2.2 Results of KBP-37 41
Chapter 6 Conclusions and Future Works 48
6.1 Conclusions 48
6.2 Future Works 49
References 50
                                

[1] Y. Zhang et al., “Variational reasoning for question answering with knowledge graph,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, pp. 6069-6076, 2018.
[2] S. R. Indurthi et al., “Generating natural language question-answer pairs from a knowledge graph using a RNN based question generation model,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 376-385, 2017.
[3] D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvisticae Investigationes, vol. 30, no. 1, pp. 3-26, 2007.
[4] Y. Liu et al., “A dependency-based neural network for relation classification,” arXiv preprint arXiv:1507.04646, 2015.
[5] M. Zhang and D. X. Zhang, “Trained SVMs based rules extraction method for text classification,” in Proceedings of the IEEE International Symposium on IT in Medicine and Education, Xiamen, China, pp. 16-19, 2008.
[6] B. Rink and S. Harabagiu, “Utd: Classifying semantic relations by combining lexical and semantic resources,” in Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 256-259, Uppsala, Sweden, 2010.
[7] L. Yao, C. Mao, and Y. Luo, “Clinical text classification with rule-based features and knowledge-guided convolutional neural networks,” BMC Medical Informatics and Decision Making, vol. 19, no. 3, p. 71, 2019.
[8] S. Zhang et al., “Bidirectional long short-term memory networks for relation classification,” in Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, Shanghai, China, pp. 73-78, 2015.
[9] Y. Xu et al., “Classifying relations via long short term memory networks along shortest dependency paths,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1785-1794, 2015.
[10] Y. Xu et al., “Improved relation classification by deep recurrent neural networks with data augmentation,” arXiv preprint arXiv:1601.03651, 2016.
[11] D. Zeng et al., “Relation classification via convolutional deep neural network,” in Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, pp. 2335-2344, 2014.
[12] D. Zhang and D. Wang, “Relation classification via recurrent neural network,” arXiv preprint arXiv:1508.01006, 2015.
[13] P. Zhou et al., “Attention-based bidirectional long short-term memory networks for relation classification,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 207-212, 2016.
[14] Y. Shen and X.-J. Huang, “Attention-based convolutional neural network for semantic relation extraction,” in Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 2526-2536, 2016.
[15] M. E. Peters et al., “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[16] J. Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[17] A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI Blog, vol. 1, no. 8, p. 9, 2019.
[18] I. Hendrickx et al., “Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals,” arXiv preprint arXiv:1911.10422, 2019.
[19] G. Angeli et al., “Combining distant and partial supervision for relation extraction,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1556-1567, 2014.
[20] L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579-2605, 2008.
[21] L. Adilova, S. Giesselbach, and S. Rüping, “Making efficient use of a domain expert's time in relation extraction,” arXiv preprint arXiv:1807.04687, 2018.
[22] J. J. Webster and C. Kit, “Tokenization as the initial phase in NLP,” in Proceedings of the 15th International Conference on Computational Linguistics, vol. 4, Nantes, France, 1992.
[23] G. Sidorov et al., “Soft similarity and soft cosine measure: Similarity of features in vector space model,” Computación y Sistemas, vol. 18, no. 3, pp. 491-504, 2014.
[24] V. Novotný, “Implementation notes for the soft cosine measure,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, pp. 1639-1642, 2018.
[25] T. Mikolov et al., “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[26] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1532-1543, 2014.
[27] Y. LeCun et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[28] Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1746-1751, 2014.
[29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[30] D. Hu, “An introductory survey on attention mechanisms in NLP problems,” in Proceedings of the SAI Intelligent Systems Conference, London, United Kingdom, pp. 432-448, 2019.
[31] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[32] A. Vaswani et al., “Attention is all you need,” in Proceedings of the Advances in Neural Information Processing Systems, Long Beach, California, pp. 5998-6008, 2017.
[33] Y. Zhang, P. Qi, and C. D. Manning, “Graph convolution over pruned dependency trees improves relation extraction,” arXiv preprint arXiv:1809.10185, 2018.
[34] P. Gupta and H. Schütze, “Lisa: Explaining recurrent neural network judgments via layer-wise semantic accumulation and example to pattern transformation,” arXiv preprint arXiv:1808.01591, 2018.
[35] J. Lee, S. Seo, and Y. S. Choi, “Semantic relation classification via bidirectional LSTM networks with entity-aware attention using latent entity typing,” Symmetry, vol. 11, no. 6, p. 785, 2019.
[36] A. P. B. Veyseh, T. H. Nguyen, and D. Dou, “Improving cross-domain performance for relation extraction via dependency prediction and information flow control,” arXiv preprint arXiv:1907.03230, 2019.
[37] Q. Xiao et al., “Attention-based improved BLSTM-CNN for relation classification,” in Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, pp. 34-43, 2019.
[38] D. Zhang and D. Wang.“Relation classification: CNN or RNN?,” Natural Language Understanding and Intelligent Applications, vol. 10102, pp.665-675, 2016.
[39] Y. Wang et al., “Attention-based LSTM for aspect-level sentiment classification,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 606-615, 2016.
[40] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the icml, Bari, Italy, vol. 96, pp. 148-156, 1996.
[41] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, California, pp. 785-794, 2016.
[42] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 281-305, 2012.

全文公開日期 2025/07/27 (校內網路)
全文公開日期 2030/07/27 (校外網路)
全文公開日期 2030/07/27 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文