簡易檢索 / 詳目顯示

研究生: 陳柏睿
Bo-Ruei Chen
論文名稱: 基於多候選人機制之上下文模塊於編解碼器之整合應用於結構化數據文本生成任務
An Encoder-Decoder Structure with Multi-Candidate-based Context Module for Data-to-Text Generation
指導教授: 郭景明
Jing-Ming Guo
口試委員: 陳彥霖
Yen-Lin Chen
王乃堅
Nai-Jian Wang
徐繼聖
Gee-Sern Jison Hsu
李宗南
Chung-Nan Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 82
中文關鍵詞: 多候選人機制結構化數據生成文本任務自然語言生成監督式學習
外文關鍵詞: Multi-Candidate-based Mechanism, Data-to-Text Generation, Natural Language Generation, Supervised Learning
相關次數: 點閱:158下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在過去幾年中,深度學習的文本生成系統在機器翻譯和摘要任務上表現出令人印象深刻的效果。隨著深度學習系統開始朝著生成更長輸出以響應更長更複雜的輸入方向發展,也因此生成的文本開始顯示出句間不連貫、錯誤的敘述以及對輸入資訊的保真度不足。這問題也在結構化文本生成任務中會發現,該任務可以簡化為獲取結構化數據作為輸入,並產生描述該數據的文本作為輸出。與機器翻譯旨在完整翻譯要翻譯的句子不同,這種自然語言生成任務通常需要解決至少兩個獨立的挑戰: 該說甚麼(挑選結構化數據進行討論)、該怎麼說(使挑選的數據生成符合自然語言的句子)。
結構化數據生成文本任務主要使用編解碼器架構,其中在上下文模塊會提供解碼端此刻想要觀察的信息。但一個句子中有多個實體和元素,因此我們認為基礎架構是有改善空間來更適合此任務。本文提出了基於多候選人機制的上下文模塊,利用多候選的概念同時觀察多個實體及其元素,並且透過串聯各時間點資訊,有效地幫助文本生成正確的實體排列。該實驗顯示了我們的多候選概念的有效性,並改進了最近發布的 RotoWire 數據集的最新技術。結構化數據文本生成的應用領域相當的廣泛,未來更可以走向各類型的結構化數據解析,使任務能夠應用於各種類型的大數據當中,將此項技術之應用更加貼近於日常生活。


In the past few years, deep learning text generation systems have shown impressive results on machine translation and summarization tasks. As deep learning systems began to develop in the direction of generating longer outputs in response to longer and more complex inputs, the generated text began to show incoherence between sentences, false narratives, and insufficient fidelity of the input information. This problem is also found in the structured text generation task. It can be simplified to obtain structured data as input, and generate text describing the data as output. Different from the goal of machine translation, this kind of natural language generation task usually needs to solve at least two independent challenges: what to say (select structured data for discussion) and how to say (make the selected data generate natural language sentences).
The data-to-text generation task mainly uses the encoder-decoder architecture, in which the context module provides the information that the decoder wants to observe at the moment. However, there are multiple entities and elements in a single sentence. We believe that the architecture has room for improvement to be more suitable for this task. This paper proposes the Multi-Candidate-based Context Module, using the concept of multiple candidates to observe multiple entities and their elements simultaneously. The experiment shows the effectiveness of our multi-candidate concept and improves the state-of-the-art on the recently released RotoWire dataset. The application field of data-to-text generation is quite wide. In the future, it can move to various types of structured data analysis so that tasks can be applied to various types of big data. Moreover, the application of this technology is closer to daily life.

摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VII 表目錄 IX 第一章 緒論 10 1.1 研究背景與動機 10 1.2 Rotowire資料集簡介 11 1.3 論文架構 13 第二章 文獻探討 14 2.1 類神經網路相關文獻 14 2.1.1向前傳播 (Forward Propagation) 14 2.1.2 反向傳播 (Back Propagation) 17 2.2 機器翻譯 22 2.2.1 機器翻譯歷史 23 2.2.2 神經機器翻譯簡介 24 2.2.3 神經機器翻譯模型發展 27 2.2.4 神經機器翻譯模型 29 2.2.5 編碼器與解碼器架構 33 2.2.6 注意力機制 35 2.3 結構化數據文本生成任務 40 2.3.1 基準模型 40 2.3.2 基於模板概念 42 2.3.3 基於規劃概念 44 2.3.4 基於實體概念 47 2.3.5 基於自注意力概念 51 第三章 演算法 52 3.1 編碼端架構 52 3.1.1 Transformer 53 3.1.2 Hierarchical Encoding Model 55 3.1.3 Hierarchical Attention 56 3.2解碼端資訊萃取 57 3.3藉由多候選人機制之上下文模塊 58 3.2.1 完整模型架構 58 3.2.2 詳細說明 59 3.4 複製機制 62 第四章 實驗設計及結果討論 64 4.1 訓練與推論 64 4.1.1 如何訓練 64 4.1.2 如何推論 65 4.2 評估指標 66 4.3 比較對象 68 4.4 消融測試 69 4.5 實驗設定 69 4.6 實驗結果與討論 70 4.6.1消融測試結果 70 4.6.2本論文與特定方法導向比較結果 71 4.6.3本論文與數據特性導向比較結果 72 4.6.4綜合比較結果 73 4.6.5案例分析 74 第五章 結論及未來展望 76 第六章 參考文獻 77

[1] S. Wiseman, S. M. Shieber, and A. M. Rush, "Challenges in data-to-document generation," arXiv preprint arXiv:1707.08052, 2017.
[2] P. Liang, M. I. Jordan, and D. Klein, "Learning semantic correspondences with less supervision," in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009, pp. 91-99.
[3] D. L. Chen and R. J. Mooney, "Learning to sportscast: a test of grounded language acquisition," in Proceedings of the 25th international conference on Machine learning, 2008, pp. 128-135.
[4] H. Mei, M. Bansal, and M. R. Walter, "What to talk about and how? selective generation using lstms with coarse-to-fine alignment," arXiv preprint arXiv:1509.00838, 2015.
[5] R. Lebret, D. Grangier, and M. Auli, "Neural text generation from structured data with application to the biography domain," arXiv preprint arXiv:1603.07771, 2016.
[6] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
[7] N. Kalchbrenner and P. Blunsom, "Recurrent continuous translation models," in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1700-1709.
[8] R. B. Allen, "Several studies on natural language and back-propagation," in Proceedings of the IEEE First International Conference on Neural Networks, 1987, vol. 2, no. 5: IEEE Piscataway, NJ, pp. 335-341.
[9] J. B. Pollack, "Recursive distributed representations," Artificial Intelligence, vol. 46, no. 1-2, pp. 77-105, 1990.
[10] L. Chrisman, "Learning recursive distributed representations for holistic computation," Connection Science, vol. 3, no. 4, pp. 345-366, 1991.
[11] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[12] Y. Wu et al., "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
[13] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[14] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
[15] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[16] J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, "Deep recurrent models with fast-forward connections for neural machine translation," Transactions of the Association for Computational Linguistics, vol. 4, pp. 371-383, 2016.
[17] N. Shazeer et al., "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer," arXiv preprint arXiv:1701.06538, 2017.
[18] Ł. Kaiser and S. Bengio, "Can active memory replace attention?," Advances in Neural Information Processing Systems, vol. 29, pp. 3781-3789, 2016.
[19] Ł. Kaiser and I. Sutskever, "Neural gpus learn algorithms," arXiv preprint arXiv:1511.08228, 2015.
[20] J. Gehring, M. Auli, D. Grangier, and Y. N. Dauphin, "A convolutional encoder model for neural machine translation," arXiv preprint arXiv:1611.02344, 2016.
[21] N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, and K. Kavukcuoglu, "Neural machine translation in linear time," arXiv preprint arXiv:1610.10099, 2016.
[22] F. Meng, Z. Lu, M. Wang, H. Li, W. Jiang, and Q. Liu, "Encoding source language with convolutional neural network for machine translation," arXiv preprint arXiv:1503.01838, 2015.
[23] L. Kaiser, A. N. Gomez, and F. Chollet, "Depthwise separable convolutions for neural machine translation," arXiv preprint arXiv:1706.03059, 2017.
[24] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in International Conference on Machine Learning, 2017: PMLR, pp. 1243-1252.
[25] D. Britz, A. Goldie, M.-T. Luong, and Q. Le, "Massive exploration of neural machine translation architectures," arXiv preprint arXiv:1703.03906, 2017.
[26] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[27] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
[28] M. Wang, Z. Lu, J. Zhou, and Q. Liu, "Deep neural machine translation with linear associative unit," arXiv preprint arXiv:1705.00861, 2017.
[29] B. Zhang, D. Xiong, J. Su, Q. Lin, and H. Zhang, "Simplifying neural machine translation with addition-subtraction twin-gated recurrent networks," arXiv preprint arXiv:1810.12546, 2018.
[30] M.-T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," arXiv preprint arXiv:1508.04025, 2015.
[31] K. Xu et al., "Show, attend and tell: Neural image caption generation with visual attention," in International conference on machine learning, 2015: PMLR, pp. 2048-2057.
[32] C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio, "Pointing the unknown words," arXiv preprint arXiv:1603.08148, 2016.
[33] J. Gu, Z. Lu, H. Li, and V. O. Li, "Incorporating copying mechanism in sequence-to-sequence learning," arXiv preprint arXiv:1603.06393, 2016.
[34] Z. Yang, P. Blunsom, C. Dyer, and W. Ling, "Reference-aware language models," arXiv preprint arXiv:1611.01628, 2016.
[35] L. Li and X. Wan, "Point precisely: Towards ensuring the precision of data in generated texts using delayed copy mechanism," in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1044-1055.
[36] J. Li, M.-T. Luong, and D. Jurafsky, "A hierarchical neural autoencoder for paragraphs and documents," arXiv preprint arXiv:1506.01057, 2015.
[37] Y. Zhang, V. Zhong, D. Chen, G. Angeli, and C. D. Manning, "Position-aware attention and supervised data improve slot filling," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 35-45.
[38] R. Puduppully, L. Dong, and M. Lapata, "Data-to-text generation with content selection and planning," in Proceedings of the AAAI conference on artificial intelligence, 2019, vol. 33, no. 01, pp. 6908-6915.
[39] R. Puduppully, L. Dong, and M. Lapata, "Data-to-text generation with entity modeling," arXiv preprint arXiv:1906.03221, 2019.
[40] B. J. Grosz, A. K. Joshi, and S. Weinstein, "Centering: A framework for modelling the local coherence of discourse," 1995.
[41] C. Rebuffel, L. Soulier, G. Scoutheeten, and P. Gallinari, "A hierarchical model for data-to-text generation," Advances in Information Retrieval, vol. 12035, p. 65, 2020.
[42] A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv preprint arXiv:1704.04368, 2017.
[43] F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of the ACM, vol. 7, no. 3, pp. 171-176, 1964.
[44] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311-318.
[45] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.

無法下載圖示 全文公開日期 2024/09/09 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 2041/09/09 (國家圖書館:臺灣博碩士論文系統)
QR CODE