一個基於注意力機制與卷積神經網路的中文閱讀理解及問答系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王邑倫 Yi-Lun Wang
論文名稱：	一個基於注意力機制與卷積神經網路的中文閱讀理解及問答系統 A Chinese Reading Comprehension and Question Answering System Based on Attention Mechanism and Convolutional Neural Networks
指導教授：	范欽雄 Chin-Shyurng Fahn
口試委員:	傅楸善 Chiou-Shann Fuh 王聖智 Sheng-Jyh Wang 陳冠宇 Kuan-Yu Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	45
中文關鍵詞：	中文機器閱讀理解、自然語言處理、注意力機制、卷積神經網路、深度學習、快速收斂、使用較少記憶體
外文關鍵詞：	Chinese Machine Reading Comprehension, Natural Language Processing, Attention Mechanism, Convolutional Neural Network, Deep Learning, Fast Convergence, Less Memory Usage
相關次數：	點閱：391 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

　　在深度學習與自然語言處理的領域中有著許多各式各樣的討論議題，其中最熱門的題目，莫過於針對文章回答問題的機器閱讀理解。對於人類的提問，電腦能夠從事前提供的文章中尋找並提取出問題的答案，在機器人與智慧助理的研究上有著很大的應用空間。但近幾年發表的模型架構隨著時間的推進也變得愈來愈龐大，導致在訓練與應用上都必須耗費大量的資源。
　　針對上述問題，本論文提出一個針對中文環境的嶄新閱讀理解深度學習模型，它可以使用一般等級顯示卡進行訓練，並同時在短時間內即可達到收斂。在模型的前處理部分，我們使用了現有的中文斷詞套件和預訓練詞向量字典，且同時取出每個單一字符的向量做為額外的模型輸入，使模型能獲取更多資訊並防止斷詞錯誤的狀況。
　　在主要的架構設計上，我們放棄使用傳統循環神經網路架構(Recurrent Neural Network, RNN)，而是採用了最近流行的自我注意力機制(Self -Attention)且與卷積神經網路(Convolutional Neural Network, CNN)相結合，如此能夠更加有效地節省訓練時間。另外，在相互作用層，我們使用二次的文章與問題之間交互注意力機制(Context-Query Attention)，改善文章與問題之間交互關係的計算，使得模型能更快速且有效地在文章中取得與問題有關係的資訊，迅速達到模型的收斂。
　　在實驗的過程，我們使用台達閱讀理解資料集(Delta Reading Comprehension Dataset, DRCD) 作為在中文環境下的主要研究對象。在評分方面則是使用精確匹配分數(Exact Match, EM)與模糊匹配分數(F1)兩種計算方法，最終我們的模型在使用相對較少記憶體的Titan XP顯示卡下，花費訓練時間約1小時即可達到EM 65%與F1 79%的中文閱讀理解準確率，此結果比其它擁有類似架構的模型大約快3倍。

　　There are many different topics of research in deep learning and natural language processing projects, and one of the most popular issues is machine reading comprehension for questions answering. For human questions, the computer can search and extract the answers to the questions from the provided articles in advance, and has a great amount of applications in the field of robots and intelligent personal assistants. However, the model architecture published in recent years has become huger with the advancement of time, resulting in a lot of resources in training and applications.
　　In order to overcome the above problems, this thesis proposes a new reading comprehension deep learning model in Chinese environment. Training can be performed using a general-level GPU, and convergence can be achieved in a short time. In the pre-processing part, we use the existing Chinese text segmentation package and a pre-trained word embedding dictionary. We also provide each single character embedding vector as an additional model input, so that the model can obtain more information and prevent text segmentation error situation.
　　In the main architectural design, we abandon the use of traditional Recurrent Neural Network (RNN) but adopt the recent popular Self-Attention and Convolutional Neural Network (CNN) which can save training time effectively. At the interaction layer, we use two times Context-Query Attention to enhance the interaction calculation between the article and the questions, so that the model can acquire the information related to questions in the article more effectively and reach the convergence faster.
　　In the experiment, we adopt the Delta Reading Comprehension Dataset (DRCD) as the main test data in the Chinese environment. In terms of scoring, the Exact Match score (EM) and F1 score are used. The experimental results reveal that our model is able to reach the accuracy of 65% for EM and 79% for F1 whose training time is less than 1 hour using the Titan XP GPU that possesses less memory. Its performance is about 3 times faster than other models with similar architectures.

中文摘要.........................................................i
Abstract........................................................ii
致謝............................................................iv
Contents.........................................................v
List of Figures................................................vii
List of Tables................................................viii
Chapter 1  Introduction..........................................1
1.1  Overview....................................................1
1.2  Motivation..................................................3
1.3  System Description..........................................4
1.4  Thesis Organization.........................................6
Chapter 2  Related Work..........................................7
2.1  RNN-based Reading Comprehension.............................8
2.2  Attention-based Reading Comprehension......................10
2.3  Recently Popular Huge Architecture.........................12
Chapter 3  Natural Language Processing and Deep Learning........14
3.1  Tokenization and Embedding.................................14
3.2  Convolutional Neural Network (CNN).........................16
3.2.1  Ordinary convolution.....................................16
3.2.2  Depthwise separable convolution..........................18
3.3  Recurrent Neural Network (RNN).............................19
3.4  Attention Mechanism........................................21
3.4.1  Context-query attention..................................22
3.4.2  Self-attention...........................................23
Chapter 4  Machine Reading Comprehension Model..................25
4.1  Input Pre-processing Layer.................................26
4.2  Input Encoding Layer.......................................28
4.3  Interaction Layer..........................................29
4.4  Model Encoding Layer.......................................30
4.5  Output Layer...............................................30
Chapter 5  Experimental Results and Discussion..................32
5.1  Experimental Setup.........................................32
5.2  Test on Stanford Question Answering Dataset................33
5.3  Test on Delta Reading Comprehension Dataset................36
Chapter 6  Conclusions and Future Work..........................40
6.1  Conclusions................................................40
6.2  Future Work................................................42
References......................................................44


                                

[1] P. Rajpurkar et al., “SQuAD: 100,000+ questions for machine comprehension of text,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, November 1-5, 2016, pp. 2383-2392.
[2] C. C. Shao et al., “DRCD: A Chinese machine reading comprehension dataset,” arXiv preprint arXiv:1806.00920, 2018.
[3] J. Pennington, R. Socher and C. D. Manning. “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, October 25-29, 2014, pp. 1532-1543.
[4] Y. Song et al., “Directional skip-gram: Explicitly distinguishing left and right context for word embeddings,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, June 1-6, 2018, pp. 175-180.
[5] D. Bahdanau, K. Cho and Y. Bengio. “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[6] A. Vaswani et al., “Attention is all you need,” in Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, CA, December 4-9, 2017, pp. 5998-6008.
[7] S. Wang and J. Jiang. “Machine comprehension using Match-LSTM and answer pointer,” arXiv preprint arXiv:1608.07905, 2016.
[8] M. Seo et al., “Bidirectional attention flow for machine comprehension,” in Proceedings of International Conference on Learning Representations, Toulon, France, April 24-26, 2017.

[9] Y. Gong and S. R. Bowman . “Ruminating reader: Reasoning with gated multi-hop attention,” arXiv preprint arXiv:1704.07415, 2017.
[10] Natural Language Computing Group, Microsoft Research Asia. “R-NET: machine reading comprehension with self-matching networks,” in Proceedings of Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, July 30-August 4, 2017, pp. 189-198.
[11] M. Hu et al., “Reinforced mnemonic reader for machine reading comprehension,” in Proceedings of 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, July 13-19, 2018, pp. 4099-4106.
[12] A. W. Yu et al., “QANet: Combining local convolution with global self-attention for reading comprehension,” in Proceedings of International Conference on Learning Representations, Vancouver, Canada, April 30-May 3, 2018.
[13] J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[14] K. He et al., “Deep residual learning for image recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, June 26-July 1, 2016, pp. 770-778.
[15] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 22-25, 2017, pp. 1800-1807.
[16] R. K. Srivastava, K. Greff and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.
[17] J. L. Ba, J. R. Kiros and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv: 1607.06450, 2016.

全文公開日期 2024/07/31 (校內網路)
全文公開日期 2024/07/31 (校外網路)
全文公開日期 2024/07/31 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文