研究生: |
施添財 Jeffry Susanto |
---|---|
論文名稱: |
Measuring the Progression of a Goal-Oriented Dialogue Measuring the Progression of a Goal-Oriented Dialogue |
指導教授: |
鮑興國
Hsing-Kuo Pao |
口試委員: |
鄧惟中
Wei-Chung Teng 項天瑞 Tien-Ruey Hsiang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 68 |
中文關鍵詞: | Multitask Learning 、Conditional Entropy 、Dialogue Representation |
外文關鍵詞: | Multitask Learning, Conditional Entropy, Dialogue Representation |
相關次數: | 點閱:166 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
In a goal oriented dialogue, each dialogue have at least one topic that the
two speakers are trying to communicate across. In this research, we would
like to investigate how that topic is concluded, and also measure how close
it is from reaching it. We explore on how conditional entropy of the words
that were spoken by the two speakers converges between the two participants.
But, conditional entropy alone is not enough to measure the progression
of a dialogue. Therefore we proposed a several tasks, inspired by
multi task learning, in order to create a better neural representation in order
to measure this. We showed that training our dialogue representation in
various tasks will achieve better results than those that was trained in each
of the tasks from scratch and those that use common pre-trained embedding
like GloVe or BERT.
In a goal oriented dialogue, each dialogue have at least one topic that the
two speakers are trying to communicate across. In this research, we would
like to investigate how that topic is concluded, and also measure how close
it is from reaching it. We explore on how conditional entropy of the words
that were spoken by the two speakers converges between the two participants.
But, conditional entropy alone is not enough to measure the progression
of a dialogue. Therefore we proposed a several tasks, inspired by
multi task learning, in order to create a better neural representation in order
to measure this. We showed that training our dialogue representation in
various tasks will achieve better results than those that was trained in each
of the tasks from scratch and those that use common pre-trained embedding
like GloVe or BERT.
[1] C. Clark, P. A. Moss, S. Goering, R. J. Herter, B. Lamar, D. Leonard, S. Robbins, M. Russell, M. Templin, and K. Wascha, “Collaboration as dialogue: Teachers and researchers engaged in conversation and professional development,” American Educational Research Journal, vol. 33, no. 1, pp. 193–231, 1996.
[2] Y. Xu and D. Reitter, “Entropy converges between dialogue participants: explanations from an information-theoretic perspective,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 537–546, 2016.
[3] D. Genzel and E. Charniak, “Entropy rate constancy in text,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
[4] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
[5] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[6] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
[7] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[8] J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” arXiv preprint arXiv:1801.06146, 2018.
[9] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/ languageunsupervised/language understanding paper. pdf, 2018.
[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[11] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
[12] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for squad,” arXiv preprint arXiv:1806.03822, 2018.
[13] P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gašić,“Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling,” arXiv preprint arXiv:1810.00278, 2018. 51
[14] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[15] L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: An overview,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603, IEEE, 2013.
[16] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, pp. 160–167, ACM, 2008.
[17] P. H. Algoet and T. M. Cover, “A sandwich proof of the shannon-mcmillan-breiman theorem,” The annals of probability, pp. 899–909, 1988.
[18] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
[19] E. Loper and S. Bird, “Nltk: The natural language toolkit,” in In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics, 2002.
[20] F. Chollet et al., “Keras.” https://github.com/fchollet/keras, 2015.
[21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[23] S. Hochreiter, “Untersuchungen zu dynamischen neuronalen netzen [in german] diploma thesis,” TU Münich, 1991.
[24] Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
[25] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.