簡易檢索 / 詳目顯示

研究生: 林廣柏
Guang-Bo Lin
論文名稱: 問答系統與最佳答案編輯工具開發研究
A Study on Question Answering System and Best-answer Editing Tool Development
指導教授: 戴文凱
Wen-Kai Tai
口試委員: 廖文宏
陳冠宇
Kuan-Yu Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 72
中文關鍵詞: 問答系統關鍵字生成問句生成
外文關鍵詞: Question Answering System, Keyword generation, Question generation
相關次數: 點閱:226下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

每逢假期,在觀光園區或者國家公園中,總是會面臨服務人員人手不足之情況,造成遊客等待時間過長、服務人員回答品質下降,進而導致評價下降等等問題。我們的目的為,藉由觀光區或者國家公園的常見問句,去生成一個回答系統,來解決人手不足的問題。

在本論文中,透過問答對 (Q&A pair)中問句與答案之間的關係性,來將問答對進行分群,以找出相同意圖但問法不同的問句中的關鍵字,讓使用者不需去探討自己所收集的問句中關鍵字為何。接著透過問句生成方式,將所匯入的問答對去生成更多的問答對,以利面對更多樣的問句。最後透過遊客詢問後的紀錄(log)去教導機器回答的不佳的問句,藉此強化該系統的回答能力。

我們提出的問答系統具有幾項十分出色的實驗成果。系統可以生成出相同意圖但問法不同的問句中的關鍵字。面對遊客的問句,可以給出最為相關的問句之答案,以協助服務人員解說或人手不足之情況。使用者大多只須將收集到的問答對 (Q&A pair)匯入系統,使之學習各意圖間問句的關鍵字,往後只需要根據遊客的問句,添加新的答案,讓系統進行學習,即可強化該回答能力。


In many sightseeing areas or national parks, there may not always be enough service staff during holidays. This causes the waiting time of tourists to become longer and the quality of the service staff's answer to decline, resulting in problems like reduction in evaluation. Our goal is to develop an question answering system to solve the problem of understaffing by utilizing the frequently asked questions in those sightseeing area or national park.

In this paper, via the relationship between the question and the answer in each Q&A pair, the Q&A pairs are clustered into many groups, in order to find the keywords in those questions which have the same purpose but asked in different ways, so that the user does not need to pick the keywords on his own. Then, we design a question generation method to generate more questions using the imported Q&A pairs; this can help our system dealing with more diverse questions. Finally, we use the visitor inquiry record to teach the machine how to answer those questions that was answered poorly, thereby strengthening the system's answering ability.

Our Q&A system has several excellent experimental results. The system can generate keywords in those questions which have the same purpose but asked in different ways. When facing a question from a visitor, the system can give the answer of the most relevant question, to assist the service staff in explaining, alleviating the problem of understaffing. In most cases, the user only needs to import the collected Q&A pair into the system, so that the system can learn the keywords of the questions among different purposes. After that, all you need to do is to add new answers for the questions from the visitors, and let the system learn, that will enhance the answering ability of the system.

論文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII 1 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究背景與動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究目的與研究問題 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 方法概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.5 本論文之章節結構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 問答系統介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 相似度計算 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 研究方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 訓練部分 (Training) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 輸入資料 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 句子分詞 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.3 句子相似度 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.4 問答對分群 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.5 生成關鍵字 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.6 句子生成 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 詢問部分 (Query) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 關鍵字配對 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 句子相似度 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 問答編輯系統 (Best-answer Editing Tool) . . . . . . . . . . . . . . . . . 24 3.4 問答詢問系統呈現 (Question Answering System Demonstration) . . . . 26 4 實驗結果與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1 生成句子的有效性驗證實驗 . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.1 「是」的規則有效性 . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.2 是非問句轉換的有效性 . . . . . . . . . . . . . . . . . . . . . . 32 4.1.3 疑問詞問句的有效性 . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.4 選擇問句的有效性 . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.5 正反問句的有效性 . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 回答時名詞與動詞加分實驗 . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 回答 threshold 的決定實驗 . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 與其他方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5 結論與後續工作 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 貢獻與結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 限制與未來研究方向 . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 附錄一:「是」的規則生成結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 附錄二:是非問句生成結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 附錄三:疑問詞問句生成結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 附錄四:選擇問句生成結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 附錄五:正反問句生成結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

[1] Xiaofeng Gong and Michael B Richman. On the application of cluster analysis to growing season precipitation data in north america east of the rockies. Journal of climate, 8(4):897–931, 1995.
[2] Gerard Salton and Chris Buckley. Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA, 1987.
[3] Giovanni De Gasperis. Building an aiml chatter bot knowledge-base starting from a faq and a glossary. Journal of e-Learning and Knowledge Society, (2):79–88, 2010.
[4] Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. Ap-
plying deep learning to answer selection: A study and an open task. CoRR, abs/1508.01585, 2015.
[5] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Dis-
tributed representations of words and phrases and their compositionality. In Ad-
vances in neural information processing systems, pages 3111–3119, 2013.
[6] Ruhi Batra, Sanchit Sharma, Anurag Shrivastav, and Puneet Goyal. Efficiently denoising sms text for faq retrieval. In Data Mining and Intelligent Computing (ICD-MIC), 2014 International Conference on, pages 1–5, 2014.
[7] I Dan Melamed. Bitext maps and alignment via pattern recognition. Computational linguistics, 25(1):107–130, 1999.
[8] Siqi Xiang, Wenge Rong, Yikang Shen, Yuanxin Ouyang, and Zhang Xiong. Multidimensional scaling based knowledge provision for new questions in community
question answering systems. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 115–122, 2016.
[9] Paul Jaccard. The distribution of the flora in the alpine zone. New phytologist, 11(2):37–50, 1912.
[10] Ingwer Borg and Patrick JF Groenen. Modern multidimensional scaling: Theory
and applications. Springer Science & Business Media, 2005.
[11] fxsjy. jieba. https://github.com/fxsjy/jieba, 2013. [accessed 27-May-2018].
[12] 中文斷詞系統. http://ckipsvr.iis.sinica.edu.tw/. [Online; accessed 21-June-2018].
[13] 语言云分词. https://www.ltp-cloud.com/intro/#cws_how. [Online; accessed 21-June-2018].
[14] 人民日报. http://paper.people.com.cn/rmrb/html/2018-06/21/nbs.
D110000renmrb_01.htm. [Online; accessed 21-June-2018].
[15] ictclas. http://ictclas.nlpir.org/. [Online; accessed 21-June-2018].
[16] Wikipedia. Hidden Markov model — Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Hidden\%20Markov\%20model&oldid=841523243, 2005. [Online; accessed 06-June-2018].
[17] Wikipedia. Viterbi algorithm — Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Viterbi\%20algorithm&oldid=843528190, 2003. [Online; accessed 01-June-2018].
[18] Anna Shtok, Gideon Dror, Yoelle Maarek, and Idan Szpektor. Learning from the past:Answering new questions with past answers. In Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pages 759–768, New York, NY, USA,2012. ACM.
[19] 洪儷瑜、陳佩盈. 中文句型類型整理, 2007. 國科會專案研究附件資料 (NSC 95-2516-S-003-004-MY 3).
[20] Chang-Chin Hsu and Tyne Liang. 中文問答系統-以網路為基礎之查詢詞擴充策略. Master’s thesis, National Chiao Tung University, 2005.
[21] Wikipedia. Inverted index — Wikipedia, the free encyclopedia. http :
//en.wikipedia.org/w/index.php?title=Inverted\%20index&oldid=815397313, 2007. [Online; accessed 01-August-2018].
[22] Microsoft. Qna maker api | microsoft azure. https://azure.microsoft.com/
zh-tw/services/cognitive-services/qna-maker/, 2018. [accessed 27-May-2018].
[23] Wikipedia. Siri — Wikipedia, the free encyclopedia. http://zh.wikipedia.org/
w/index.php?title=Siri&oldid=49435316, 2011. [Online; accessed 19-May-2018].

QR CODE