Measuring the Progression of a Goal-Oriented Dialogue｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	施添財 Jeffry Susanto
論文名稱：	Measuring the Progression of a Goal-Oriented Dialogue Measuring the Progression of a Goal-Oriented Dialogue
指導教授：	鮑興國 Hsing-Kuo Pao
口試委員:	鄧惟中 Wei-Chung Teng 項天瑞 Tien-Ruey Hsiang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	68
中文關鍵詞：	Multitask Learning 、Conditional Entropy 、Dialogue Representation
外文關鍵詞：	Multitask Learning, Conditional Entropy, Dialogue Representation
相關次數：	點閱：166 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

In a goal oriented dialogue, each dialogue have at least one topic that the
two speakers are trying to communicate across. In this research, we would
like to investigate how that topic is concluded, and also measure how close
it is from reaching it. We explore on how conditional entropy of the words
that were spoken by the two speakers converges between the two participants.
But, conditional entropy alone is not enough to measure the progression
of a dialogue. Therefore we proposed a several tasks, inspired by
multi task learning, in order to create a better neural representation in order
to measure this. We showed that training our dialogue representation in
various tasks will achieve better results than those that was trained in each
of the tasks from scratch and those that use common pre-trained embedding
like GloVe or BERT.

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . 5
Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1 Overall Framework . . . . . . . . . . . . . . . . . . . . . 8
1.1 Problem Definition . . . . . . . . . . . . . . . . . 8
1.2 Proposed Method . . . . . . . . . . . . . . . . . . 11
2 Feature Representation . . . . . . . . . . . . . . . . . . . 12
2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . 12
2.2 Tokenization and Padding . . . . . . . . . . . . . 13
v
2.3 Word Embeddings . . . . . . . . . . . . . . . . . 13
2.4 Pre-Trained Embeddings . . . . . . . . . . . . . . 15
3 Hand Engineered Features . . . . . . . . . . . . . . . . . 16
3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Conditional Entropy . . . . . . . . . . . . . . . . 20
4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Multi-Layer Perceptron . . . . . . . . . . . . . . . 22
4.2 Long Short Term Memory networks . . . . . . . . 23
4.3 Attention Networks . . . . . . . . . . . . . . . . . 26
Experiments and Results . . . . . . . . . . . . . . . . . . . . . 27
1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Experimental Evaluation . . . . . . . . . . . . . . . . . . 29
3 Task 1: Conclusion Classification . . . . . . . . . . . . . 30
4 Task 2: Booking Classification . . . . . . . . . . . . . . . 33
5 Task 3: Predict Position in a Dialogue . . . . . . . . . . . 37
5.1 Progress as Percentage . . . . . . . . . . . . . . . 38
5.2 Steps to Reach a Booking . . . . . . . . . . . . . 44
5.3 Error Pattern Comparison . . . . . . . . . . . . . 47
vi
5.4 Model Size Comparison . . . . . . . . . . . . . . 49
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Appendix A: Dialogue Examples . . . . . . . . . . . . . . . . . . . 53
Appendix B: Auxiliary Plots . . . . . . . . . . . . . . . . . . . . . 55
B.1 Average Entropy of other Dialogue Lengths . . . . . . . . 55
B.2 Average Conditional Entropy of other Dialogue Lengths . 56
                                

[1] C. Clark, P. A. Moss, S. Goering, R. J. Herter, B. Lamar, D. Leonard, S. Robbins, M. Russell, M. Templin, and K. Wascha, “Collaboration as dialogue: Teachers and researchers engaged in conversation and professional development,” American Educational Research Journal, vol. 33, no. 1, pp. 193–231, 1996.
[2] Y. Xu and D. Reitter, “Entropy converges between dialogue participants: explanations from an information-theoretic perspective,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 537–546, 2016.
[3] D. Genzel and E. Charniak, “Entropy rate constancy in text,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
[4] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
[5] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[6] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
[7] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[8] J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” arXiv preprint arXiv:1801.06146, 2018.
[9] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/ languageunsupervised/language understanding paper. pdf, 2018.
[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[11] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
[12] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for squad,” arXiv preprint arXiv:1806.03822, 2018.
[13] P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gašić,“Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling,” arXiv preprint arXiv:1810.00278, 2018. 51
[14] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[15] L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: An overview,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603, IEEE, 2013.
[16] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, pp. 160–167, ACM, 2008.
[17] P. H. Algoet and T. M. Cover, “A sandwich proof of the shannon-mcmillan-breiman theorem,” The annals of probability, pp. 899–909, 1988.
[18] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
[19] E. Loper and S. Bird, “Nltk: The natural language toolkit,” in In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics, 2002.
[20] F. Chollet et al., “Keras.” https://github.com/fchollet/keras, 2015.
[21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[23] S. Hochreiter, “Untersuchungen zu dynamischen neuronalen netzen [in german] diploma thesis,” TU Münich, 1991.
[24] Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
[25] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.

全文公開日期 2024/08/22 (校內網路)
全文公開日期 2024/08/22 (校外網路)
全文公開日期 2024/08/22 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文