智能虛擬助理之多重意圖偵測框架｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	曾一修 I-Hsiu Tseng
論文名稱：	智能虛擬助理之多重意圖偵測框架 The multi-intent detection framework of Intelligent Virtual Assistant
指導教授：	盧希鵬 Hsi-Peng Lu
口試委員:	黃世禎 Sun-Jen Huang 羅天一 Tain-yi Luor
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	95
中文關鍵詞：	智能虛擬助理、口語對話系統、自然語言處理、人工智慧、機器學習
外文關鍵詞：	Intelligent Virtual Assistant, Spoken Dialogue Systems, Natural Language Processing, Artificial Intelligence, Machine Learning
相關次數：	點閱：251 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著時代的演進，與智能虛擬助理相關的研究也越來越多，特別是使用者意圖預測相關的研究也有增加的趨勢，但現有的智能虛擬助理對於使用者意圖偵測通常侷限於特定領域，且一次僅能處理單一意圖。然而，人們的意圖往往是多元且複雜的，在完成這些意圖的過程中需要許多不同的應用程序才能滿足。近年已出現多重意圖偵測的相關研究，但極其少數是著重於中文的多重意圖處理的。因此，本研究基於中文自然語言處理提出一個多重意圖偵測智能虛擬助理框架。本研究將人的意圖分為顯性意圖與隱性意圖，提出兩種意圖處理模塊分別為顯性意圖處理(EMIP)模塊與隱性意圖處理(IMIP)模塊，並將其結合於口語對話系統中。顯性意圖處理(EMIP)模塊用於識別用戶口語中的多重意圖，隱性意圖處理(IMIP)模塊是基於用戶話語及用戶的多重顯性意圖來預測用戶話語以外的潛在意圖。
本研究透過實驗研究法評估顯性意圖處理(EMIP)模塊表現，並交叉比較四種不同隱性意圖處理模型在人們表達相關多重意圖或不相關多重意圖兩個情境下的表現。先導性實驗結果發現，顯性意圖處理(EMIP)模塊準確度可達 88.2%，而基於本文框架的兩個隱性意圖處理模型表現皆優於其他模型。
本研究在學術的貢獻為提出首個基於中文自然語言處理的多重意圖偵測框架，讓 IVA 能同時識別用戶中文話語中的多重顯性意圖與其隱性意圖，解決現有 IVA 僅能處理單一意圖的問題。另外透過實驗法，提供不同隱性意圖模型在不同情境下的表現結果。在管理意涵上，本文提出的框架可以應用於 IVA 的產業。此框架可以應用在各種領域、與場景之中，甚至是跨域場景。此外，我們提出的框架不需要已標記多重意圖標籤的訓練數據。只有單一意圖數據可用，我們提出的框架仍然有效。我們減少了對於多重意圖標籤數據的依賴。通過這種方式，可以減少標記多重意圖標籤數據的成本，如時間成本，人力成本。

In recent year, research interest in Intelligent Virtual Assistant (IVA) has soared in the world. However, current IVA is usually limited to specific domain and only handle a single intent per time. However, people’s intents are usually complex and require several different applications to meet. Several studies explored related issues recently, and however, only a few of studies focused on multi-intent processing of Chinese. Therefore, the purpose of this paper is to propose a multi-intent detection framework of IVA based on Chinese Natural Language Processing. In this paper, people’s intents are categorized into two types which are the explicit intent and the implicit intent. Based on spoken dialogue systems, we propose the Explicit Multi- Intent Processing(EMIP) module and the Implicit Multi-Intent Processing(IMIP) module. EMIP is responsible for recognizing multi-intent from the users’ utterance. IMIP predicts the potential intent of the user based on the users’ utterance and explicit multi-intent.
Finally, we evaluate the performance of EMIP and cross-compare different models which is for processing the users’ implicit multi-intent in two scenarios (the users’ explicit multi-intent is related to each other and the users’ explicit multi-intent is unrelated to each other). The result of our pilot experiment shows that the accuracy of EMIP is 88.2% and the models based on IMIP are better than other models. Moreover, IMIP-ANN-based model has better performance when the users’ explicit multi-intent is related to each other. IMIP-Cluster-based model has better performance when the users’ explicit multi-intent is unrelated to each other.
Regarding the theoretical implications of this paper, the framework we proposed allows IVA to simultaneously recognize explicit multi-intent and implicit-intent from the user's Chinese utterance. We solve the existing problem that IVA can only handle single intent which that people usually cannot express their intent in a sentence. In addition, we provide the performance of different implicit multi-intent models in different scenarios. Regarding the practical implications, the framework we proposed can be applied to any industry that uses IVA. They can use the framework in different fields, different scenarios, and even cross-domain scenarios. In addition, the framework does not require multi-intent-labeled training data. Even if there are only single-intent-labeled training data available, the framework we proposed can still work. We reduce the dependency on multi-intent-labeled training data. Through this way, the cost for labeling multi-intent of data can be reduced, such as time cost, manpower cost.

Table of Contents
摘要................................................................................................................................ I
Abstract ......................................................................................................................... II
誌謝.............................................................................................................................. III
Table of Contents.........................................................................................................IV
List of Figures..............................................................................................................VI
List of Tables ............................................................................................................. VII
Chapter 1 Introduction...................................................................................................1
1.1 Background and motivation.............................................................................1
1.2    Research aims ........................................................................................2
1.3 Overview..........................................................................................................4
Chapter 2 Literature.......................................................................................................6
2.1 Intelligent Virtual Assistant .............................................................................6
2.1.1 Spoken Dialogue Systems.....................................................................8
2.2 Context...........................................................................................................11
2.3 Research of multi-intent processing...............................................................12
2.4 Statistical Language Model............................................................................15
2.5 Distributional Model......................................................................................17
2.5.1 Word2Vec...........................................................................................18
2.5.2 App2Vec .............................................................................................19
2.5.3 Doc2Vec .............................................................................................20
2.6 Approximate Nearest Neighbor .....................................................................20
2.7 Cluster (Affinity Propagation).......................................................................22
Chapter 3 Methods.......................................................................................................23
3.1 Research Process............................................................................................23
3.2 Proposed Framework .....................................................................................25
3.2.1 Explicit Multi-Intent Processing (EMIP)............................................26
3.2.2 Implicit Multi-Intent Processing (IMIP).............................................32
Chapter 4 Experiment ..................................................................................................36
4.1 Training Data Collection................................................................................36
4.2 System Design and Construction...................................................................37
4.2.1 IBM Watson Conversation Service API.............................................39
4.2.2 Baidu Natural Language Processing API ...........................................40
4.3 Experimental Design......................................................................................43
4.4 Sample............................................................................................................46
4.5 Result .............................................................................................................48
Chapter 5 Conclusion and Discussion .........................................................................49
5.1 Theoretical implications.................................................................................49
5.2 Practice recommendations .............................................................................51
5.3 Limitations.....................................................................................................52
5.4 Future research...............................................................................................52
References.................................................................................................................... 53
Appendix 1. Experimental Rule...................................................................................62
Appendix 2. Experimental Data...................................................................................64
                                

Chinese part
1. 百度。自然語言處理-Python SDK文檔-簡介-百度雲。民107年3月5日，取自：https://cloud.baidu.com/doc/NLP/NLP-API.html#.E7.AE.80.E4.BB.8B
English part
1. Abowd, G., Dey, A., Brown, P., Davies, N., Smith, M., & Steggles, P. (1999). Towards a better understanding of context and context-awareness. In Handheld and ubiquitous computing(pp. 304-307). Springer Berlin/Heidelberg.
2. Altszyler, E., Sigman, M., Ribeiro, S., & Slezak, D. F. (2016). Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint arXiv:1610.01520.
3. Bangalore, S., Di Fabbrizio, G., & Stent, A. (2008). Learning the structure of task-driven human–human dialogs. IEEE Transactions on Audio, Speech, and Language Processing, 16(7), 1249-1259.
4. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
5. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509-517.
6. Bentley, J. L. (1980). Multidimensional divide-and-conquer. Communications of the ACM, 23(4), 214-229.
7. Bouayad-Agha, N., Casamayor, G., & Wanner, L. (2014). Natural language generation in the context of the semantic web. Semantic Web, 5(6), 493-513.
8. Bunt, H. (1994). Context and dialogue control. Think Quarterly, 3(1), 19-31.
9. Cao, Y., Qi, H., Zhou, W., Kato, J., Li, K., Liu, X., & Gui, J. (2017). Binary Hashing for Approximate Nearest Neighbor Search on Big Data: A Survey. IEEE Access.
10. Chen, Y. N., & Rudnicky, A. I. (2014). Dynamically supporting unexplored domains in conversational interactions by enriching semantics with neural word embeddings. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 590-595). IEEE.
11. Chen, Y. N., Sun, M., Rudnicky, A. I., & Gershman, A. (2015). Leveraging behavioral patterns of mobile applications for personalized spoken language understanding. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 83-86). ACM.
12. Chung, H., Iorga, M., Voas, J., & Lee, S. (2017). Alexa, Can I Trust You?. Computer, 50(9), 100-104.
13. Crowley, J. L., Coutaz, J., Rey, G., & Reignier, P. (2002). Perceptual components for context aware computing. In International conference on ubiquitous computing (pp. 117-134). Springer Berlin Heidelberg.
14. Dan Jurafsky (2017). Language Modeling – Introduction to N-grams. Retrieved April 4,2018, from Standford University, Institute for CS 124: From Language to Information https://web.stanford.edu/class/cs124/lec/languagemodeling2017.pdf
15. Daniel Jurafsky and James H. Martin (2016). Vector Semantic. Retrieved March 4, 2018, from Stanford University, Institute Speech and Language Processing Web site: https://web.stanford.edu/~jurafsky/slp3/15.pdf
16. Dutoit, T. (1997). An introduction to text-to-speech synthesis(Vol. 3). Springer Science & Business Media.
17. Erik Bernhardsson (2015). Nearest neighbors and vector models – part 2 – algorithms and data structures. Retrieved from April 4,2018 from https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html
18. Erik Bernhardsson (n.d.). Approximate Nearest Neighbor in C++/Python optimized for memory usage and loading/saving to disk. Retrieved from April 4,2018 from https://github.com/spotify/annoy
19. Finegan-Dollak, C., Coke, R., Zhang, R., Ye, X., & Radev, D. (2016). Effects of creativity and cluster tightness on short text clustering performance. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 654-665).
20. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. science, 315(5814), 972-976.
21. Fu, C., & Cai, D. (2016). Efanna: An extremely fast approximate nearest neighbor search algorithm based on knn graph. arXiv preprint arXiv:1609.07228.
22. Garcia-Gasulla, D., Ayguadé, E., Labarta, J., Béjar, J., & Cortés, U. (2015). Extracting visual patterns from deep learning representations. Ithaca (NY): Cornell University.
23. Gartner. (n.d.). Virtual Assistant (VA). Retrieved December 12, 2017, from https://www.gartner.com/it-glossary/virtual-assistant-va/
24. Gašić, M., Mrkšić, N., Rojas-Barahona, L. M., Su, P. H., Ultes, S., Vandyke, D., ... & Young, S. (2017). Dialogue manager domain adaptation using Gaussian process reinforcement learning. Computer Speech & Language, 45, 552-569.
25. Gatt, A., & Krahmer, E. (2018). Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65-170.
26. Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. In Vldb (Vol. 99, No. 6, pp. 518-529).
27. Goldberg, Y., & Levy, O. (2014). word2vec explained: Deriving mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.
28. Google (n.d.) Natural Language Processing API. Retrieved April 4,2018, from https://cloud.google.com/natural-language/?hl=zh-tw
29. Griol, D., Callejas, Z., López-Cózar, R., & Riccardi, G. (2014). A domain-independent statistical methodology for dialog management in spoken dialog systems. Computer Speech & Language, 28(3), 743-768.
30. Griol, D., Hurtado, L. F., Segarra, E., & Sanchis, E. (2008). A statistical approach to spoken dialog systems design and evaluation. Speech Communication, 50(8-9), 666-682.
31. Griol, D., Molina, J. M., & Callejas, Z. (2015). A proposal for the development of adaptive spoken interfaces to access the Web. Neurocomputing, 163, 56-68.
32. Gwizdka, J. (2000). What’s in the context. In Computer Human Interaction (Vol. 2000, pp. 1-4).
33. Habibi, M., Weber, L., Neves, M., Wiegandt, D. L., & Leser, U. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14), i37-i48.
34. Hakkani-Tur D, Tur G, Heck L, Fidler A (2012) A discriminative classification-based approach to information state updates for a multi-domain dialog System. in Proc. Interspeech
35. Harish, A., Naveensankar, K. S., Abdullah, M., & Devi, K. K. (2016). VFF-a framework for linking virtual assistants with IoT. In Proceedings of National Conference on Communication and Informatics (pp. 58-62).
36. Hill, F., Cho, K., Jean, S., & Bengio, Y. (2017). The representational geometry of word meanings acquired by neural machine translation models. Machine Translation, 31(1-2), 3-18.
37. IBM. (2016). Watson Conversation is moving from Experimental to General Availability. Retrieved from March 12,2018 from https://www.ibm.com/blogs/bluemix/2016/07/watson-conversation-general-availability/
38. IBM. (n.d.). IBM Watson Assistant. Retrieved from March 12,2018 from https://www.ibm.com/watson/ai-assistant/
39. IVVA. (n.d.-a) Birth of the industry. Retrieved December 12,2017, from https://ivaa.org/about/ivaa-history/
40. IVVA. (n.d.-b).What is a Virtual Assistant .Retrieved December 12,2017 from https://ivaa.org/
41. Jansen, S. (2017). Word and Phrase Translation with word2vec. arXiv preprint arXiv:1705.03127.
42. Kalns, E. T., Mark, W. S., & Ayan, N. F. (2018). U.S. Patent No. 9,875,494. Washington, DC: U.S. Patent and Trademark Office.
43. Kang, H., Suh, E., & Yoo, K. (2008). Packet-based context aware system to determine information system user’s context. Expert systems with applications, 35(1), 286-300.
44. Kim, B., Ryu, S., & Lee, G. G. (2017a). Two-stage multi-intent detection for spoken language understanding. Multimedia Tools and Applications, 76(9), 11377-11390.
45. Kim, S., D’Haro, L. F., Banchs, R. E., Williams, J. D., & Henderson, M. (2017b). The fourth dialog state tracking challenge. In Dialogues with Social Robots (pp. 435-449). Springer Singapore.
46. Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015). From word embeddings to document distances. In International Conference on Machine Learning (pp. 957-966).
47. Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
48. Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp. 1188-1196).
49. Lemon, O. (2011). Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation. Computer Speech & Language, 25(2), 210-221.
50. Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.
51. Lison, P. (2015). A hybrid approach to dialogue management based on probabilistic rules. Computer Speech & Language, 34(1), 232-255.
52. Liu J, Pasupat P, Wang Y, Cyphers S, Glass J (2013) Query understanding enhanced by hierarchical parsing structure. in Proc. ASRU
53. Liu, C., Zhao, S., & Volkovs, M. (2017). Learning Document Embeddings With CNNs. arXiv preprint arXiv:1711.04168.
54. Lodato, J. (2005). Advances in voice recognition: a first-hand look at the magic of voice-recognition technology. Futurist, 39(1).
55. Maskeliunas, R. (2014). The evaluation of spoken dialog management models for multimodal HCIs. Int. Arab J. Inf. Technol., 11(1), 19-24.
56. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
57. Noh H, Ryu S, Lee D, Lee K, Lee C, Lee GG (2012) An example-based approach to ranking multiple dialog states for flexible dialog management. IEEE J Sel Top Sign Process 6(8):943–958 Processing, 15(7), 2116-2129.
58. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.
59. Pham, H. H. Q. (2017). An Empirical Approach to Sentiment Analysis with Doc2Vec.
60. Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.
61. Rengasamy, V., Fu, T. Y., Lee, W. C., & Madduri, K. (2017). Optimizing Word2Vec Performance on Multicore Systems. In Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms (p. 3). ACM.
62. Riccardi, G. (2014). Towards healthcare personal agents. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges (pp. 53-56). ACM.
63. Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627-633.
64. Saad, U., Afzal, U., El-Issawi, A., & Eid, M. (2017). A model to measure QoE for virtual personal assistant. Multimedia Tools and Applications, 76(10), 12517-12537.
65. Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Disability Studies, 20, 33-53.
66. Sarikaya, R., Gao, Y., Erdogan, H., & Picheny, M. (2002). Turn-based language modeling for spoken dialog systems. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on (Vol. 1, pp. I-781). IEEE.
67. Sharif, M., & Alesheikh, A. A. (2018). Context‐aware movement analytics: implications, taxonomy, and design framework. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(1).
68. Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proceedings of the eighth international conference on Information and knowledge management (pp. 316-321). ACM.
69. Sun, L., Guo, C., Liu, C., & Xiong, H. (2017). Fast affinity propagation clustering based on incomplete similarity matrix. Knowledge and Information Systems, 51(3), 941-963.
70. Sun, M., Chen, Y. N., & Rudnicky, A. I. (2016a). An intelligent assistant for high-level task understanding. In Proceedings of the 21st International Conference on Intelligent User Interfaces (pp. 169-174). ACM.
71. Sun, M., Pappu, A., Chen, Y. N., & Rudnicky, A. I. (2016b). Weakly supervised user intent detection for multi-domain dialogues. In Spoken Language Technology Workshop (SLT), 2016 IEEE (pp. 91-97). IEEE.
72. Traum, D. R., & Larsson, S. (2003). The information state approach to dialogue management. In Current and new directions in discourse and dialogue (pp. 325-353). Springer Netherlands.
73. Tsilfidis, A., Mporas, I., Mourjopoulos, J., & Fakotakis, N. (2013). Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing. Computer Speech & Language, 27(1), 380-395.
74. Ultes, S., & Minker, W. (2014). Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5), 523-539.
75. Wan, J., Tang, S., Zhang, Y., Li, J., Wu, P., & Hoi, S. C. (2017). HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search. Neurocomputing, 237, 401-404.
76. Wang, J., Wang, N., Jia, Y., Li, J., Zeng, G., Zha, H., & Hua, X. S. (2014). Trinary-projection trees for approximate nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 36(2), 388-403.
77. Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In Advances in neural information processing systems (pp. 1753-1760).
78. Williams, J., Raux, A., & Henderson, M. (2016). The dialog state tracking challenge series: A review. Dialogue & Discourse, 7(3), 4-33.
79. Wu, C. H., Su, M. H., & Liang, W. B. (2017). Miscommunication handling in spoken dialog systems based on error-aware dialog state detection. EURASIP Journal on Audio, Speech, and Music Processing, 2017(1), 9.
80. Wu, W. L., Lu, R. Z., Duan, J. Y., Liu, H., Gao, F., & Chen, Y. Q. (2010). Spoken language understanding using weakly supervised learning. Computer Speech & Language, 24(2), 358-382.
81. Yasavur, U., Lisetti, C., & Rishe, N. (2014). Let’s talk! speaking virtual counselor offers you a brief intervention. Journal on Multimodal User Interfaces, 8(4), 381-398.
82. Yu, K., Chen, L., Sun, K., Xie, Q., & Zhu, S. (2016). Evolvable dialogue state tracking for statistical dialogue management. Frontiers of Computer Science, 10(2), 201-215.
83. Yürür, Ö., Liu, C. H., Sheng, Z., Leung, V. C., Moreno, W., & Leung, K. K. (2016). Context-awareness for mobile sensing: A survey and future directions. IEEE Communications Surveys & Tutorials, 18(1), 68-93.

全文公開日期 2023/06/26 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文