簡易檢索 / 詳目顯示

研究生: 方誌賢
Chih-Hsien Fang
論文名稱: 使用多頭注意力機制及詞層次指針網路於實體與關聯之聯合萃取
Joint Entity and Relation Extraction with Multi-head Attention and Token-wise Pointer Network
指導教授: 陳怡伶
Yi-Ling Chen
口試委員: 陳怡伶
陳冠宇
葉彌妍
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 108
語文別: 中文
論文頁數: 49
中文關鍵詞: 命名實體識別關聯萃取多任務學習多頭注意力機制指針網路
外文關鍵詞: named entity recognition, relation extraction, multitask learning, multi-head attention, pointer network
相關次數: 點閱:295下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Stacking Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Sequence-wise Multi-head Attention . . . . . . . . . . . . . . . . . . . . 13 3.5 Token-wise Pointer Network . . . . . . . . . . . . . . . . . . . . . . . . 14 3.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Hyperparameters and Training Details . . . . . . . . . . . . . . . . . . . 19 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1 Ablation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 Feature-based Approach with BERT . . . . . . . . . . . . . . . . . . . . 24 5.3 Multi-head Attention Variations . . . . . . . . . . . . . . . . . . . . . . 24 5.4 Analysis for Distance between Entity Pair . . . . . . . . . . . . . . . . . 26 5.5 Accuracy at Sentence Level . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
    2] Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. Adversarial training for multi-context joint entity and relation extraction. arXiv preprint arXiv:1808.06876, 2018.
    [3] Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 114: 34–45, 2018.
    [4] Razvan C Bunescu and Raymond J Mooney. A shortest path dependency kernel for relation extraction. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 724–731. Association for Computational Linguistics, 2005.
    [5] Yee Seng Chan and Dan Roth. Exploiting background knowledge for relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 152–160. Association for Computational Linguistics, 2010.
    [6] Yee Seng Chan and Dan Roth. Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 551–560. Association for Computational Linguistics, 2011.
    [7] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. Journal of machine learning research, 12(Aug): 2493–2537, 2011.
    [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [9] George R Doddington, Alexis Mitchell, Mark A Przybocki, Lance A Ramshaw, Stephanie M Strassel, and Ralph M Weischedel. The automatic content extraction (ace) program-tasks, data, and evaluation. In Lrec, volume 2, page 1, 2004.
    [10] Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
    [11] Radu Florian, Hany Hassan, Abraham Ittycheriah, Hongyan Jing, Nanda Kambhatla, Xiaoqiang Luo, H Nicolov, and Salim Roukos. A statistical model for multilingual entity detection and tracking. Technical report, IBM THOMAS J WATSON RESEARCH CENTER YORKTOWN HEIGHTS NY, 2004.
    [12] Radu Florian, John F Pitrelli, Salim Roukos, and Imed Zitouni. Improving mention detection robustness to noisy input. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 335–345. Association for Computational Linguistics, 2010.
    [13] Pankaj Gupta, Hinrich Schütze, and Bernt Andrassy. Table filling multi-task recurrent neural network for joint entity and relation extraction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2537–2547, 2016.
    [14] Harsha Gurulingappa, Abdul Mateen Rajput, Angus Roberts, Juliane Fluck, Martin Hofmann-Apitius, and Luca Toldo. Development of a benchmark corpus to support the automatic extraction of drugrelated adverse effects from medical case reports. Journal of biomedical informatics, 45(5):885–892, 2012.
    [15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [16] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735– 1780, 1997.
    [17] Jing Jiang and ChengXiang Zhai. A systematic exploration of the feature space for relation extraction. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics? Proceedings of the Main Conference, pages 113–120, 2007.
    [18] Arzoo Katiyar and Claire Cardie. Investigating lstms for joint extraction of opinion entities and relations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 919–929, 2016.
    [19] Arzoo Katiyar and Claire Cardie. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 917–928, 2017.
    [20] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [21] Onur Kuru, Ozan Arkan Can, and Deniz Yuret. Charner: Character-level named entity recognition. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 911–921, 2016.
    [22] Qi Li and Heng Ji. Incremental joint extraction of entity mentions and relations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 402–412, 2014.
    [23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
    [24] Makoto Miwa and Mohit Bansal. End-to-end relation extraction using lstms on sequences and tree structures. arXiv preprint arXiv:1601.00770, 2016.
    [25] Makoto Miwa and Yutaka Sasaki. Modeling joint entity and relation extraction with table representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1858–1869, 2014.
    [26] David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26, 2007.
    [27] II. I Ntroduction. The ace 2005 ( ace 05 ) evaluation plan evaluation of the detection and recognition of ace entities , values , temporal expressions , relations , and events 1. 2005.
    [28] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
    [29] Longhua Qian and Guodong Zhou. Clustering-based stratified seed sampling for semi-supervised relation classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 346–355. Association for Computational Linguistics, 2010.
    [30] Lev Ratinov and Dan Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the thirteenth conference on computational natural language learning, pages 147–155. Association for Computational Linguistics, 2009.
    [31] Dan Roth and Wen-tau Yih. A linear programming formulation for global inference in natural language tasks. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, 2004.
    [32] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
    [33] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
    [34] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
    [35] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015.
    [36] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
    [37] Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. Kernel methods for relation extraction. Journal of machine learning research, 3(Feb):1083–1106, 2003.
    [38] Meishan Zhang, Yue Zhang, and Guohong Fu. End-to-end neural relation extraction with global optimization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1730–1740, 2017.
    [39] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D Manning. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35–45, 2017.
    [40] Yuhao Zhang, Peng Qi, and Christopher D Manning. Graph convolution over pruned dependency trees improves relation extraction. arXiv preprint arXiv:1809.10185, 2018.
    [41] Shubin Zhao and Ralph Grishman. Extracting relations with integrated information using kernel methods. In Proceedings of the 43rd annual meeting on association for computational linguistics, pages 419–426. Association for Computational Linguistics, 2005.
    [42] Guodong Zhou, Min Zhang, DongHong Ji, and Qiaoming Zhu. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 728–736, 2007.

    無法下載圖示 全文公開日期 2024/12/17 (校內網路)
    全文公開日期 2024/12/17 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE