簡易檢索 / 詳目顯示

研究生: 施添財
Jeffry Susanto
論文名稱: Measuring the Progression of a Goal-Oriented Dialogue
Measuring the Progression of a Goal-Oriented Dialogue
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 鄧惟中
Wei-Chung Teng
項天瑞
Tien-Ruey Hsiang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 68
中文關鍵詞: Multitask LearningConditional EntropyDialogue Representation
外文關鍵詞: Multitask Learning, Conditional Entropy, Dialogue Representation
相關次數: 點閱:166下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • In a goal oriented dialogue, each dialogue have at least one topic that the
    two speakers are trying to communicate across. In this research, we would
    like to investigate how that topic is concluded, and also measure how close
    it is from reaching it. We explore on how conditional entropy of the words
    that were spoken by the two speakers converges between the two participants.
    But, conditional entropy alone is not enough to measure the progression
    of a dialogue. Therefore we proposed a several tasks, inspired by
    multi task learning, in order to create a better neural representation in order
    to measure this. We showed that training our dialogue representation in
    various tasks will achieve better results than those that was trained in each
    of the tasks from scratch and those that use common pre-trained embedding
    like GloVe or BERT.


    In a goal oriented dialogue, each dialogue have at least one topic that the
    two speakers are trying to communicate across. In this research, we would
    like to investigate how that topic is concluded, and also measure how close
    it is from reaching it. We explore on how conditional entropy of the words
    that were spoken by the two speakers converges between the two participants.
    But, conditional entropy alone is not enough to measure the progression
    of a dialogue. Therefore we proposed a several tasks, inspired by
    multi task learning, in order to create a better neural representation in order
    to measure this. We showed that training our dialogue representation in
    various tasks will achieve better results than those that was trained in each
    of the tasks from scratch and those that use common pre-trained embedding
    like GloVe or BERT.

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Overall Framework . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 Problem Definition . . . . . . . . . . . . . . . . . 8 3.1.2 Proposed Method . . . . . . . . . . . . . . . . . . 11 3.2 Feature Representation . . . . . . . . . . . . . . . . . . . 12 3.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . 12 3.2.2 Tokenization and Padding . . . . . . . . . . . . . 13 v 3.2.3 Word Embeddings . . . . . . . . . . . . . . . . . 13 3.2.4 Pre-Trained Embeddings . . . . . . . . . . . . . . 15 3.3 Hand Engineered Features . . . . . . . . . . . . . . . . . 16 3.3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Conditional Entropy . . . . . . . . . . . . . . . . 20 3.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.1 Multi-Layer Perceptron . . . . . . . . . . . . . . . 22 3.4.2 Long Short Term Memory networks . . . . . . . . 23 3.4.3 Attention Networks . . . . . . . . . . . . . . . . . 26 4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 27 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . 29 4.3 Task 1: Conclusion Classification . . . . . . . . . . . . . 30 4.4 Task 2: Booking Classification . . . . . . . . . . . . . . . 33 4.5 Task 3: Predict Position in a Dialogue . . . . . . . . . . . 37 4.5.1 Progress as Percentage . . . . . . . . . . . . . . . 38 4.5.2 Steps to Reach a Booking . . . . . . . . . . . . . 44 4.5.3 Error Pattern Comparison . . . . . . . . . . . . . 47 vi 4.5.4 Model Size Comparison . . . . . . . . . . . . . . 49 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Appendix A: Dialogue Examples . . . . . . . . . . . . . . . . . . . 53 Appendix B: Auxiliary Plots . . . . . . . . . . . . . . . . . . . . . 55 B.1 Average Entropy of other Dialogue Lengths . . . . . . . . 55 B.2 Average Conditional Entropy of other Dialogue Lengths . 56

    [1] C. Clark, P. A. Moss, S. Goering, R. J. Herter, B. Lamar, D. Leonard, S. Robbins, M. Russell, M. Templin, and K. Wascha, “Collaboration as dialogue: Teachers and researchers engaged in conversation and professional development,” American Educational Research Journal, vol. 33, no. 1, pp. 193–231, 1996.
    [2] Y. Xu and D. Reitter, “Entropy converges between dialogue participants: explanations from an information-theoretic perspective,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 537–546, 2016.
    [3] D. Genzel and E. Charniak, “Entropy rate constancy in text,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
    [4] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
    [5] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
    [6] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
    [7] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
    [8] J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” arXiv preprint arXiv:1801.06146, 2018.
    [9] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/ languageunsupervised/language understanding paper. pdf, 2018.
    [10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    [11] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
    [12] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for squad,” arXiv preprint arXiv:1806.03822, 2018.
    [13] P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gašić,“Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling,” arXiv preprint arXiv:1810.00278, 2018. 51
    [14] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
    [15] L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: An overview,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603, IEEE, 2013.
    [16] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, pp. 160–167, ACM, 2008.
    [17] P. H. Algoet and T. M. Cover, “A sandwich proof of the shannon-mcmillan-breiman theorem,” The annals of probability, pp. 899–909, 1988.
    [18] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
    [19] E. Loper and S. Bird, “Nltk: The natural language toolkit,” in In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics, 2002.
    [20] F. Chollet et al., “Keras.” https://github.com/fchollet/keras, 2015.
    [21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
    [22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
    [23] S. Hochreiter, “Untersuchungen zu dynamischen neuronalen netzen [in german] diploma thesis,” TU Münich, 1991.
    [24] Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
    [25] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.

    無法下載圖示 全文公開日期 2024/08/22 (校內網路)
    全文公開日期 2024/08/22 (校外網路)
    全文公開日期 2024/08/22 (國家圖書館:臺灣博碩士論文系統)
    QR CODE