研究生: |
陳柏睿 Bo-Ruei Chen |
---|---|
論文名稱: |
基於多候選人機制之上下文模塊於編解碼器之整合應用於結構化數據文本生成任務 An Encoder-Decoder Structure with Multi-Candidate-based Context Module for Data-to-Text Generation |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
陳彥霖
Yen-Lin Chen 王乃堅 Nai-Jian Wang 徐繼聖 Gee-Sern Jison Hsu 李宗南 Chung-Nan Lee |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 82 |
中文關鍵詞: | 多候選人機制 、結構化數據生成文本任務 、自然語言生成 、監督式學習 |
外文關鍵詞: | Multi-Candidate-based Mechanism, Data-to-Text Generation, Natural Language Generation, Supervised Learning |
相關次數: | 點閱:158 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在過去幾年中,深度學習的文本生成系統在機器翻譯和摘要任務上表現出令人印象深刻的效果。隨著深度學習系統開始朝著生成更長輸出以響應更長更複雜的輸入方向發展,也因此生成的文本開始顯示出句間不連貫、錯誤的敘述以及對輸入資訊的保真度不足。這問題也在結構化文本生成任務中會發現,該任務可以簡化為獲取結構化數據作為輸入,並產生描述該數據的文本作為輸出。與機器翻譯旨在完整翻譯要翻譯的句子不同,這種自然語言生成任務通常需要解決至少兩個獨立的挑戰: 該說甚麼(挑選結構化數據進行討論)、該怎麼說(使挑選的數據生成符合自然語言的句子)。
結構化數據生成文本任務主要使用編解碼器架構,其中在上下文模塊會提供解碼端此刻想要觀察的信息。但一個句子中有多個實體和元素,因此我們認為基礎架構是有改善空間來更適合此任務。本文提出了基於多候選人機制的上下文模塊,利用多候選的概念同時觀察多個實體及其元素,並且透過串聯各時間點資訊,有效地幫助文本生成正確的實體排列。該實驗顯示了我們的多候選概念的有效性,並改進了最近發布的 RotoWire 數據集的最新技術。結構化數據文本生成的應用領域相當的廣泛,未來更可以走向各類型的結構化數據解析,使任務能夠應用於各種類型的大數據當中,將此項技術之應用更加貼近於日常生活。
In the past few years, deep learning text generation systems have shown impressive results on machine translation and summarization tasks. As deep learning systems began to develop in the direction of generating longer outputs in response to longer and more complex inputs, the generated text began to show incoherence between sentences, false narratives, and insufficient fidelity of the input information. This problem is also found in the structured text generation task. It can be simplified to obtain structured data as input, and generate text describing the data as output. Different from the goal of machine translation, this kind of natural language generation task usually needs to solve at least two independent challenges: what to say (select structured data for discussion) and how to say (make the selected data generate natural language sentences).
The data-to-text generation task mainly uses the encoder-decoder architecture, in which the context module provides the information that the decoder wants to observe at the moment. However, there are multiple entities and elements in a single sentence. We believe that the architecture has room for improvement to be more suitable for this task. This paper proposes the Multi-Candidate-based Context Module, using the concept of multiple candidates to observe multiple entities and their elements simultaneously. The experiment shows the effectiveness of our multi-candidate concept and improves the state-of-the-art on the recently released RotoWire dataset. The application field of data-to-text generation is quite wide. In the future, it can move to various types of structured data analysis so that tasks can be applied to various types of big data. Moreover, the application of this technology is closer to daily life.
[1] S. Wiseman, S. M. Shieber, and A. M. Rush, "Challenges in data-to-document generation," arXiv preprint arXiv:1707.08052, 2017.
[2] P. Liang, M. I. Jordan, and D. Klein, "Learning semantic correspondences with less supervision," in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009, pp. 91-99.
[3] D. L. Chen and R. J. Mooney, "Learning to sportscast: a test of grounded language acquisition," in Proceedings of the 25th international conference on Machine learning, 2008, pp. 128-135.
[4] H. Mei, M. Bansal, and M. R. Walter, "What to talk about and how? selective generation using lstms with coarse-to-fine alignment," arXiv preprint arXiv:1509.00838, 2015.
[5] R. Lebret, D. Grangier, and M. Auli, "Neural text generation from structured data with application to the biography domain," arXiv preprint arXiv:1603.07771, 2016.
[6] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
[7] N. Kalchbrenner and P. Blunsom, "Recurrent continuous translation models," in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1700-1709.
[8] R. B. Allen, "Several studies on natural language and back-propagation," in Proceedings of the IEEE First International Conference on Neural Networks, 1987, vol. 2, no. 5: IEEE Piscataway, NJ, pp. 335-341.
[9] J. B. Pollack, "Recursive distributed representations," Artificial Intelligence, vol. 46, no. 1-2, pp. 77-105, 1990.
[10] L. Chrisman, "Learning recursive distributed representations for holistic computation," Connection Science, vol. 3, no. 4, pp. 345-366, 1991.
[11] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[12] Y. Wu et al., "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
[13] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[14] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
[15] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[16] J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, "Deep recurrent models with fast-forward connections for neural machine translation," Transactions of the Association for Computational Linguistics, vol. 4, pp. 371-383, 2016.
[17] N. Shazeer et al., "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer," arXiv preprint arXiv:1701.06538, 2017.
[18] Ł. Kaiser and S. Bengio, "Can active memory replace attention?," Advances in Neural Information Processing Systems, vol. 29, pp. 3781-3789, 2016.
[19] Ł. Kaiser and I. Sutskever, "Neural gpus learn algorithms," arXiv preprint arXiv:1511.08228, 2015.
[20] J. Gehring, M. Auli, D. Grangier, and Y. N. Dauphin, "A convolutional encoder model for neural machine translation," arXiv preprint arXiv:1611.02344, 2016.
[21] N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, and K. Kavukcuoglu, "Neural machine translation in linear time," arXiv preprint arXiv:1610.10099, 2016.
[22] F. Meng, Z. Lu, M. Wang, H. Li, W. Jiang, and Q. Liu, "Encoding source language with convolutional neural network for machine translation," arXiv preprint arXiv:1503.01838, 2015.
[23] L. Kaiser, A. N. Gomez, and F. Chollet, "Depthwise separable convolutions for neural machine translation," arXiv preprint arXiv:1706.03059, 2017.
[24] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in International Conference on Machine Learning, 2017: PMLR, pp. 1243-1252.
[25] D. Britz, A. Goldie, M.-T. Luong, and Q. Le, "Massive exploration of neural machine translation architectures," arXiv preprint arXiv:1703.03906, 2017.
[26] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[27] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
[28] M. Wang, Z. Lu, J. Zhou, and Q. Liu, "Deep neural machine translation with linear associative unit," arXiv preprint arXiv:1705.00861, 2017.
[29] B. Zhang, D. Xiong, J. Su, Q. Lin, and H. Zhang, "Simplifying neural machine translation with addition-subtraction twin-gated recurrent networks," arXiv preprint arXiv:1810.12546, 2018.
[30] M.-T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," arXiv preprint arXiv:1508.04025, 2015.
[31] K. Xu et al., "Show, attend and tell: Neural image caption generation with visual attention," in International conference on machine learning, 2015: PMLR, pp. 2048-2057.
[32] C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio, "Pointing the unknown words," arXiv preprint arXiv:1603.08148, 2016.
[33] J. Gu, Z. Lu, H. Li, and V. O. Li, "Incorporating copying mechanism in sequence-to-sequence learning," arXiv preprint arXiv:1603.06393, 2016.
[34] Z. Yang, P. Blunsom, C. Dyer, and W. Ling, "Reference-aware language models," arXiv preprint arXiv:1611.01628, 2016.
[35] L. Li and X. Wan, "Point precisely: Towards ensuring the precision of data in generated texts using delayed copy mechanism," in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1044-1055.
[36] J. Li, M.-T. Luong, and D. Jurafsky, "A hierarchical neural autoencoder for paragraphs and documents," arXiv preprint arXiv:1506.01057, 2015.
[37] Y. Zhang, V. Zhong, D. Chen, G. Angeli, and C. D. Manning, "Position-aware attention and supervised data improve slot filling," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 35-45.
[38] R. Puduppully, L. Dong, and M. Lapata, "Data-to-text generation with content selection and planning," in Proceedings of the AAAI conference on artificial intelligence, 2019, vol. 33, no. 01, pp. 6908-6915.
[39] R. Puduppully, L. Dong, and M. Lapata, "Data-to-text generation with entity modeling," arXiv preprint arXiv:1906.03221, 2019.
[40] B. J. Grosz, A. K. Joshi, and S. Weinstein, "Centering: A framework for modelling the local coherence of discourse," 1995.
[41] C. Rebuffel, L. Soulier, G. Scoutheeten, and P. Gallinari, "A hierarchical model for data-to-text generation," Advances in Information Retrieval, vol. 12035, p. 65, 2020.
[42] A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv preprint arXiv:1704.04368, 2017.
[43] F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of the ACM, vol. 7, no. 3, pp. 171-176, 1964.
[44] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311-318.
[45] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.