應用單純貝氏機器學習方法調查在工作分析中預測字和子任務類別之間的關係

簡易檢索 / 詳目顯示

回結果列表

研究生：	Murman Dwi Prasetio Murman - Dwi Prasetio
論文名稱：	應用單純貝氏機器學習方法調查在工作分析中預測字和子任務類別之間的關係 AN INVESTIGATION OF RELATIONSHIP BETWEEN PREDICTION WORD AND SUBTASK CATEGORY IN TASK ANALYSIS – A NAIVE BAYES BASED MACHINE LEARNING APPROACH
指導教授：	林樹強 Shu-chiang Lin
口試委員:	林久翔 Chiu-Hsiang Lin 楊朝龍 Chao Lung Yang
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理系 Department of Industrial Management
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	73
中文關鍵詞：	自然語言處理、文本處理、任務分析、機器學習工具、快速的礦工、礦工文本
外文關鍵詞：	natural language processing, text processing, task analysis, machine learning tool, rapid miner, text miner
相關次數：	點閱：281 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

傳統上，索引和搜索在幾個任務分析講話內容已實現通過一個單獨構建自然語言處理引擎的組合。基於自然語言處理上講話，講話是人類，也最自然，最有效的形式在講話人類之間的信息交換之間的主要溝通模式。因此，它是唯一合乎邏輯的，未來的科技發展是自然語言語音識別人機交互（HCI）。不幸的是，與計算機系統的發展和它的用戶界面，一個用戶的當前活動任務分析是不夠的猜測用戶將做什麼任務，按照以前的任務。在林和萊赫托的研究，開發了基於貝葉斯的半自動化的任務分析工具，以幫助任務分析家預測知識代理人代理嘗試，以幫助客戶解決他們的問題，從電話交談的任務/子類別。
本研究的目的是為了研究集林和萊赫托（2007年）的建立和進一步分析的貝葉斯基於任務分析模型，通過比較現有數據集之間的兩台機器學習開放的結果由林和萊赫托（2009）的建議基於貝葉斯方法文本礦工和霍夫曼，M和科林貝爾，研究2009年發明的快速礦工的源程序。在這種分析中，快速的礦工程序共產生了15個預測結果的話在電話的呼叫中心代理和客戶之間的對話交談。 15個組合詞組成的單詞，對詞，三詞，翻兩番的話，單，雙字，三聯單的話，單翻兩番的話，對三的話，對翻兩番的話，三倍，四倍的話，三聯單對的話，單對翻兩番的話，單三四倍的話，對三，四的話，單對三，四的話。為了確定預測的話和主要子任務類別之間的關係，本研究試圖比較快速的礦工和文本挖掘，機器學習開源計劃。
這些研究觀察結果迅速礦工天真比文本礦工表現不佳的貝葉斯預測預測單詞和主要子任務類別之間的關係為基礎的工具。基於分析的快速礦工和文本挖掘每個人都有71 5184敘事對話集，為所有的敘述集快速礦工的精確率33％，並測試設置26％，也為培訓工具的性能設置35％的子任務類別平均預測正確的概率19.91％。共有11個類別，有超過50％的正確預測。出11大類，39個有低於50％的正確預測。比較的主要模糊貝葉斯任務分析子任務類別包括13個類別的文本挖掘工具的結果有正確預測的80％或以上和34個類別，有50％以上的正確預測。然而，由於文本礦工下發展，需要進一步分析具有相同的數據集，再次確認的結果，並比較其他工具的基礎上的文字處理與其他算法或模型的發展。

Traditionally, indexing and searching of speech content in several tasks analysis have been achieved through a combination of separately construct natural language processing engines. Natural language processing is based on speech, the speech is primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. So, it is only logical that the next technological development to be natural language speech recognition for Human Computer Interaction (HCI).Unfortunately, in line with the development of computer system and its user interface a task analysis of users' current activities is not sufficient to guess what tasks the users will do following the previous tasks. In Lin and Lehto’s study, a Bayesian based semi-automated task analysis tool was developed to help task analysts predict categories of tasks/subtasks performed by knowledge agents from telephone conversations where agents were trying to help customers to troubleshoot their problems.
The purpose of this study is to examine the dataset that was established by Lin and Lehto (2007) and further analyze the result of Bayesian based task analysis model proposed by Lin and Lehto (2009) by comparing the existing datasets result between two machine learning open source program based on Bayesian approach Text miner and Rapid miner which was invented by Hofmann, M and Klinkenberg, R, 2009. In this analysis, the Rapid Miner program generated a total of fifteen prediction results words in telephone’s dialog conversation between call center agent and customer. The fifteen combination words consist of single-words, pair-words, triple-words, quadruple words, single-pairs words, single-triple words, single-quadruple words, pair-triple words, pair-quadruple words, triple-quadruple words, single-pair-triple words, single-pair-quadruple words, single-triple-quadruple words, pair-triple-quadruple words, single-pair-triple-quadruple words. To identify the relationship between prediction words and main subtask categories, this study tries comparing machine learning open source program between Rapid Miner and Text Miner.
These studies observe the results from rapid miner tool based on naive Bayesian having a poor performance than text miner to predict the relationship between prediction words and main subtask categories. Based on analysis Rapid Miner and Text Miner each has 71 subtask categories for 5184 narratives dialog datasets, the precision rate of rapid miner for all narratives datasets 33%, and testing set 26% and also for the training set 35% also the tool performance an average of correct prediction probability 19.91%. A total of 11 categories have correct prediction of over 50%. Out of these 11 Categories, 39 have correct predictions of below 50%. Compare to text miner tool’s results of main subtask categories in Fuzzy Bayesian task analysis consists of 13 categories have correct predictions of 80% or above and 34 categories have correct predictions of 50% or above. However, since Text miner under developing, a further analysis with the same datasets is needed to reconfirm the findings and compare the other tool based on text processing with the other algorithm or model development.

ABSTRACT	i
ACKNOWLEDGEMENT	ii
TABLE OF CONTENTS	iii
LIST OF FIGURES	vi
LIST OF TABLES	vii
INTRODUCTION	1
1.1.	Research Background	1
1.2.	Research Objectives	3
1.3.	Research Outline	4
LITERATURE REVIEW	5
2.1.	Human-Machine Communication through Voice	5
2.2.	Speech Recognition	7
2.2.1	The Basic Problem Speech Recognition	11
2.3.	Task Analysis	12
2.4.	The Techniques of Task Analysis	12
2.4.1 Hierarchical Task Analysis	13
2.4.2 	Cognitive Task Analysis	14
2.5.	Machine Learning	16
2.6.	Text Processing	20
2.6.1.	Text Classification	20
2.6.2.	Information Extraction	21
2.6.3.	Token Identification	22
2.7.	Knowledge Acquisition	23
2.8.	Statistical Methods	25
2.9.	Evaluation Measurement	25
2.10.	Related Work	27
2.10.1.	Data Collection	29
2.10.2.	Pre-processing Data	29
2.10.3.	Define the Main Subtask Categories	30
2.10.4.	Text Miner	32

METHODOLOGY	35
3.1.	Introduction	35
3.2.	The Existing Data Result	35
3.3.	The New Proposed Frame Work	37
3.4.	Hardware and Software Devices	39
3.5.	Machine Learning Tool General Environment	39
3.5.1.	Rapid Miner	39
3.6.	Machine Learning Tool Workspace	43
3.6.1.	Rapid Miner	43
RESULT AND DISCUSSION	53
4.1.	Introduction	53
4.1.1.	Transition Matrix Result	54
4.1.2.	Comparison Machine Learning Tool Environment Result	64
CONCLUSION AND FUTURE WORK	67
5.1.	Conclusion	67
5.2.	Future Works	68
REFERENCES	69
APPENDICES	70
Appendix A Partial Listing of the Table Dialog between Knowledge Agent and Customer that Contains 5184 narratives	70
Appendix B Partial Listing of the Table Dialog between Knowledge Agent and Customer for All Datasets that Contains 5184 narratives	70
Appendix C Partial Listing of the Table Dialog between Knowledge Agent and Customer for Training Datasets that Contains 3472 narratives	70
Appendix D Partial Listing of the Table Dialog between Knowledge Agent and Customer for Testing Datasets that Contains 1712 narratives	70
Appendix E Partial Listing of the Table Decompositions Subtask Categories	71
Appendix F Partial Listing of the Single-Words Frequency List That Contains 2,535 Occurrences of Single Words	72
Appendix G Partial Listing of the Pair-Words Frequency List That Contains 59,571 Occurrences of Pair Words	72
Appendix H Partial Listing of the Triple-Words Frequency List That Contains 315,741 Occurrences of Triple Words	72
Appendix I Partial Listing of the Quadruple-Words Frequency List That Contains 441,342 Occurrences of Quadruple Words	72
Appendix J Partial Listing of Combination Words	73

                                

Annet, J. (2005). Hierarchical Task Analysis (HTA).In Stanton et al., (2005): Chapter 33 (pp. 33-1 – 33-7).
Appelt, D.E. (1999) “Introduction to information extraction technology.” Tutorial, Int Joint Conf on Artificial Intelligence IJCAI’99 . Morgan Kaufmann, San Mateo.
Apte, C., Damerau, F.J. and Weiss, S.M. (1994) “Automated learning of decision rules for text categorization.” ACM Trans Information Systems, Vol. 12, No. 3, pp. 233-251.
Baber, C. (1991). Speech Technology in Control Room Systems: A Human Factors Perspective. New York: Ellis Horwood.
Brin, S. and Page, L. (1998) “The anatomy of a large-scale hypertextual Web search engine.”Proc World Wide Web Conference WWW-7.In Computer Networks and ISDN Systems , Vol. 30, No. 1-7, pp. 107-117.
Cunningham, H. (2002) “GATE, a General Architecture for Text Engineering.” Computing and the Humanities, Vol. 36, pp. 223-254.
Davis.J, MacLean.J, Dampier D, Methods of Information hiding and detection in file systems, 2010 Fifth IEEE International workshop, 66.
Fisher, D. (1987) “Knowledge acquisition via incremental conceptual clustering.” Machine Learning Vol. 2, pp. 139–172.
Hearst, M.A. (1999) “Untangling text mining.” Proc Annual Meeting of the Association for Computational Linguistics ACL99 .University of Maryland, June.
Keller, E.: “Fundamentals of Speech Synthesis and Speech Recognition”, John Wiley & Sons, New York, USA, (1994).
Lin, S. and Lehto, M.R. (2009).A Bayesian Based Machine Learning Application to Task Analysis, Encyclopedia of Data Warehousing and Mining, Classification B, 1-7, Wang, John (Ed., 2nd Edition).
L. Liu and M. T. Zsu.Encyclopedia of Database Systems. Springer Publishing Company, Incor-porated, 2009
Nahm, U.Y. and Mooney, R.J. (2002) “Text mining with information extraction.” Proc AAAI-2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases. Stanford, CA.
Porter, M.F. (1980) “An algorithm for suffix stripping.” Program, Vol. 13, No. 3, pp. 130-137.
Rabiner, L.; Juang B.: “Fundamentals of Speech Recognition”, Prentice Hall, Englewood Cliffs, New Jersey, (1993).
Sebastiani, F. (2002) “Machine learning in automated text categorization.” ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47.
T. Fomby. Naive bayes classifier. April 2008.
Tkach, D. (editor) (1998) “Text mining technology: turning information into knowledge.” IBM White Paper, Feb 17, 1998.
Witten, I.H. and Frank, E. (2000) Data mining: Practical machine learning tools and techniques with Java implementations .Morgan Kaufmann, San Francisco, CA.
Zhigang Zhou, Application of Data field clustering in Computer Forensics, ICIC ?10 Proceedings of 2010 Third International Conference on Information and Computing-volume 1.

簡易檢索 / 詳目顯示

相關論文