研究生: |
Murman Dwi Prasetio Murman - Dwi Prasetio |
---|---|
論文名稱: |
應用單純貝氏機器學習方法調查在工作分析中預測字和子任務類別之間的關係 AN INVESTIGATION OF RELATIONSHIP BETWEEN PREDICTION WORD AND SUBTASK CATEGORY IN TASK ANALYSIS – A NAIVE BAYES BASED MACHINE LEARNING APPROACH |
指導教授: |
林樹強
Shu-chiang Lin |
口試委員: |
林久翔
Chiu-Hsiang Lin 楊朝龍 Chao Lung Yang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業管理系 Department of Industrial Management |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 73 |
中文關鍵詞: | 自然語言處理 、文本處理 、任務分析 、機器學習工具 、快速的礦工 、礦工文本 |
外文關鍵詞: | natural language processing, text processing, task analysis, machine learning tool, rapid miner, text miner |
相關次數: | 點閱:281 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
傳統上,索引和搜索在幾個任務分析講話內容已實現通過一個單獨構建自然語言處理引擎的組合。基於自然語言處理上講話,講話是人類,也最自然,最有效的形式在講話人類之間的信息交換之間的主要溝通模式。因此,它是唯一合乎邏輯的,未來的科技發展是自然語言語音識別人機交互(HCI)。不幸的是,與計算機系統的發展和它的用戶界面,一個用戶的當前活動任務分析是不夠的猜測用戶將做什麼任務,按照以前的任務。在林和萊赫托的研究,開發了基於貝葉斯的半自動化的任務分析工具,以幫助任務分析家預測知識代理人代理嘗試,以幫助客戶解決他們的問題,從電話交談的任務/子類別。
本研究的目的是為了研究集林和萊赫托(2007年)的建立和進一步分析的貝葉斯基於任務分析模型,通過比較現有數據集之間的兩台機器學習開放的結果由林和萊赫托(2009)的建議基於貝葉斯方法文本礦工和霍夫曼,M和科林貝爾,研究2009年發明的快速礦工的源程序。在這種分析中,快速的礦工程序共產生了15個預測結果的話在電話的呼叫中心代理和客戶之間的對話交談。 15個組合詞組成的單詞,對詞,三詞,翻兩番的話,單,雙字,三聯單的話,單翻兩番的話,對三的話,對翻兩番的話,三倍,四倍的話,三聯單對的話,單對翻兩番的話,單三四倍的話,對三,四的話,單對三,四的話。為了確定預測的話和主要子任務類別之間的關係,本研究試圖比較快速的礦工和文本挖掘,機器學習開源計劃。
這些研究觀察結果迅速礦工天真比文本礦工表現不佳的貝葉斯預測預測單詞和主要子任務類別之間的關係為基礎的工具。基於分析的快速礦工和文本挖掘每個人都有71 5184敘事對話集,為所有的敘述集快速礦工的精確率33%,並測試設置26%,也為培訓工具的性能設置35%的子任務類別平均預測正確的概率19.91%。共有11個類別,有超過50%的正確預測。出11大類,39個有低於50%的正確預測。比較的主要模糊貝葉斯任務分析子任務類別包括13個類別的文本挖掘工具的結果有正確預測的80%或以上和34個類別,有50%以上的正確預測。然而,由於文本礦工下發展,需要進一步分析具有相同的數據集,再次確認的結果,並比較其他工具的基礎上的文字處理與其他算法或模型的發展。
Traditionally, indexing and searching of speech content in several tasks analysis have been achieved through a combination of separately construct natural language processing engines. Natural language processing is based on speech, the speech is primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. So, it is only logical that the next technological development to be natural language speech recognition for Human Computer Interaction (HCI).Unfortunately, in line with the development of computer system and its user interface a task analysis of users' current activities is not sufficient to guess what tasks the users will do following the previous tasks. In Lin and Lehto’s study, a Bayesian based semi-automated task analysis tool was developed to help task analysts predict categories of tasks/subtasks performed by knowledge agents from telephone conversations where agents were trying to help customers to troubleshoot their problems.
The purpose of this study is to examine the dataset that was established by Lin and Lehto (2007) and further analyze the result of Bayesian based task analysis model proposed by Lin and Lehto (2009) by comparing the existing datasets result between two machine learning open source program based on Bayesian approach Text miner and Rapid miner which was invented by Hofmann, M and Klinkenberg, R, 2009. In this analysis, the Rapid Miner program generated a total of fifteen prediction results words in telephone’s dialog conversation between call center agent and customer. The fifteen combination words consist of single-words, pair-words, triple-words, quadruple words, single-pairs words, single-triple words, single-quadruple words, pair-triple words, pair-quadruple words, triple-quadruple words, single-pair-triple words, single-pair-quadruple words, single-triple-quadruple words, pair-triple-quadruple words, single-pair-triple-quadruple words. To identify the relationship between prediction words and main subtask categories, this study tries comparing machine learning open source program between Rapid Miner and Text Miner.
These studies observe the results from rapid miner tool based on naive Bayesian having a poor performance than text miner to predict the relationship between prediction words and main subtask categories. Based on analysis Rapid Miner and Text Miner each has 71 subtask categories for 5184 narratives dialog datasets, the precision rate of rapid miner for all narratives datasets 33%, and testing set 26% and also for the training set 35% also the tool performance an average of correct prediction probability 19.91%. A total of 11 categories have correct prediction of over 50%. Out of these 11 Categories, 39 have correct predictions of below 50%. Compare to text miner tool’s results of main subtask categories in Fuzzy Bayesian task analysis consists of 13 categories have correct predictions of 80% or above and 34 categories have correct predictions of 50% or above. However, since Text miner under developing, a further analysis with the same datasets is needed to reconfirm the findings and compare the other tool based on text processing with the other algorithm or model development.
Annet, J. (2005). Hierarchical Task Analysis (HTA).In Stanton et al., (2005): Chapter 33 (pp. 33-1 – 33-7).
Appelt, D.E. (1999) “Introduction to information extraction technology.” Tutorial, Int Joint Conf on Artificial Intelligence IJCAI’99 . Morgan Kaufmann, San Mateo.
Apte, C., Damerau, F.J. and Weiss, S.M. (1994) “Automated learning of decision rules for text categorization.” ACM Trans Information Systems, Vol. 12, No. 3, pp. 233-251.
Baber, C. (1991). Speech Technology in Control Room Systems: A Human Factors Perspective. New York: Ellis Horwood.
Brin, S. and Page, L. (1998) “The anatomy of a large-scale hypertextual Web search engine.”Proc World Wide Web Conference WWW-7.In Computer Networks and ISDN Systems , Vol. 30, No. 1-7, pp. 107-117.
Cunningham, H. (2002) “GATE, a General Architecture for Text Engineering.” Computing and the Humanities, Vol. 36, pp. 223-254.
Davis.J, MacLean.J, Dampier D, Methods of Information hiding and detection in file systems, 2010 Fifth IEEE International workshop, 66.
Fisher, D. (1987) “Knowledge acquisition via incremental conceptual clustering.” Machine Learning Vol. 2, pp. 139–172.
Hearst, M.A. (1999) “Untangling text mining.” Proc Annual Meeting of the Association for Computational Linguistics ACL99 .University of Maryland, June.
Keller, E.: “Fundamentals of Speech Synthesis and Speech Recognition”, John Wiley & Sons, New York, USA, (1994).
Lin, S. and Lehto, M.R. (2009).A Bayesian Based Machine Learning Application to Task Analysis, Encyclopedia of Data Warehousing and Mining, Classification B, 1-7, Wang, John (Ed., 2nd Edition).
L. Liu and M. T. Zsu.Encyclopedia of Database Systems. Springer Publishing Company, Incor-porated, 2009
Nahm, U.Y. and Mooney, R.J. (2002) “Text mining with information extraction.” Proc AAAI-2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases. Stanford, CA.
Porter, M.F. (1980) “An algorithm for suffix stripping.” Program, Vol. 13, No. 3, pp. 130-137.
Rabiner, L.; Juang B.: “Fundamentals of Speech Recognition”, Prentice Hall, Englewood Cliffs, New Jersey, (1993).
Sebastiani, F. (2002) “Machine learning in automated text categorization.” ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47.
T. Fomby. Naive bayes classifier. April 2008.
Tkach, D. (editor) (1998) “Text mining technology: turning information into knowledge.” IBM White Paper, Feb 17, 1998.
Witten, I.H. and Frank, E. (2000) Data mining: Practical machine learning tools and techniques with Java implementations .Morgan Kaufmann, San Francisco, CA.
Zhigang Zhou, Application of Data field clustering in Computer Forensics, ICIC ?10 Proceedings of 2010 Third International Conference on Information and Computing-volume 1.