簡易檢索 / 詳目顯示

研究生: 寧忠湘
Chung-Hsiang Ning
論文名稱: 支援自然語言理解系統開發的評測工具
A Vetting System Providing Meaningful Feedback for the Development of Natural Language Understanding System
指導教授: 賈繼中
Chi-Chung Chia
口試委員: 陳献忠
Shian-Jung Chen
鄧慧君
Hui-Chun Teng
學位類別: 碩士
Master
系所名稱: 人文社會學院 - 應用外語系
Department of Applied Foreign Languages
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 146
中文關鍵詞: 自然語言處理評估審核回饋意義表示事件句法剖析知識為基礎審核
外文關鍵詞: Natural Language Processing evaluation, Vetting feedback, Meaning representation, Event parsing, Knowledge-based vetting)
相關次數: 點閱:273下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 先進的自然語言處理評估系統Treebank及BLEU(Bilingual Evaluation Understudy),他們不但被評斷為可媲美人工句法剖析語料庫的黃金標準資料,也被認為是人機之間語言處理的衡量基本標準。然而,Treebank的應用受到了其對語言表層句法剖析科技,如詞性標記(Part of Speech tagging)、片語結構(Phrase-structure)或依存結構(Dependency-structure)標註的侷限;而BLEU的演算法則僅是以比較目標文本與黃金標準語料文本之間的符合度,來評斷一套自然語言處理系統所輸出文本的品質優劣。陳亮磬(2016)在其如何訓練學生發覺閱讀與翻譯所需語言知識的研究中,提及語言學者陳献忠教授對機器閱讀的相關論述,指出「閱讀」是閱讀者語言知識的展現,唯有透過尋求適當的語言句法剖析系統運作過程,或是分析出系統確切缺乏的語言知識,我們才能客觀的去評估一套自然語言句法剖析系統是否達成設計者所預期的目標。
      本研究指在探討以知識為基礎的審核(Vetting)系統,這套審核系統是依據語言學者陳献忠教授其獨創的語言意義表徵系統,其中包括語句中「事件的認定」、「『誰』對『何人』『做了什麼』的『格框格位』建構」,以及「事件間的關係判定」等句法分析。審核(Vetting)系統旨在檢視一套自然語言處理系統,是屬意義為基礎,或是事件為基礎,並解釋其在句法剖析輸出的結果中,為何產生漏失、錯置,或者句構成份誤析的原因。從審核系統中,我們可以簡單的分析出一套自然語言處理系統,句法剖析的精確率(Precision)及召回率(Recall),指出系統的錯誤,讓系統設計者清楚看見需要達到及改正的目標。由於可以獲得建設性的回饋,本研究的審核(Vetting)系統頗具意義(Meaningful)。


    State-of-the-art Natural Language Processing (NLP) evaluation systems such as TreeBank and Bleu are either evaluated against hand-checked parsed-text corpus as gold-standard data (Santorini & and Marcinkiewicz,1991) or based on correlation metrics between machine and human processing (Papineni et al,2002). The applications of the former are limited by their shallow parsing technology such as POS tagging, phrase-structure or dependency-structure annotations, while Bleu algorithms are used only to evaluate quality of text based on how close the target texts are to the gold standard data. According to Chen’s review (2016) of his adviser’s (Chen, 2013, 2014) definition of reading as the demonstration of a reader’ knowledge of the language, only through finding out whether a parsing system possesses or lacks certain language knowledge can we objectively evaluate if the system has achieved what is expected by its designer.
    This study introduces a knowledge-based vetting system that demands a parser to output parsing results according to a prescribed meaning representation of event identification, case frame building of Who Did What to Whom, and finding of inter-event relations. The vetting system aims at finding out whether a system is meaning-based or event-based, and explaining why a parser has missed, misplaced or misinterpreted sentence constituents in the output. From the vetting, we can easily compute the precision and recall of the parser performance, point out where the system are wrong and show the designer the clear goals to achieve. The vetting system is meaningful because useful feedback is produced in addition to showing how good or bad the system is.

    Table of Contents 碩士學位論文指導教授推薦書……………………………………………… ii 碩士學位考試委員審定書…………………………………………………… iii 中文摘要………………………………………………………………………. iv Abstract……………………………………………………………………….. v Acknowledgement……………………………………………………………. vi Glossary of terms……………………………………………………………... vii Chapter 1 Introduction………………………………………………………. 1 1.1Motivation…………………………………………………………………. 1 1.2Background………………………………………………………………… 3 1.3Challenges………………………………………………………………….. 6 1.4Objectives…………………………………………………………………... 8 1.5Research Questions…………………………………………………………. 8 Chapter 2 Literature Review………………………………………………… 9 2.1Introduction………………………………………………………………… 9 2.2Natural Language Processing………………………………………………. 9 2.3Parsing……………………………………………………………………… 14 2.3.1 Syntactic parsing and ambiguities……………………………………….. 18 2.3.2 Semantic parsing ………………………………………………………… 19 2.4Parser evaluation……………………………………………………………. 20 2.5Treebank & BLEU………………………………………………………….. 21 2.5.1 Treebank…………………………………………………………………. 21 2.5.2BLEU……………………………………………………………………... 25 2.6Reader’s language knowledge & meaning representation………………….. 26 2.7English parser used as a computer reader to do automatic annotation……… 28 2.8NLP technology that makes the answering to the questions of meaning representation possible……………………………………………………... 32 2.9 Summary…………………………………………………………………… 34 Chapter 3 Methodology………………………………………………………. 35 3.1Introduction………………………………………………………………… 35 3.2Data collection……………………………………………………………… 36 3.3Vetting criteria that are based on meaning representation…………………... 40 3.4Duplicating computer reading results for manual annotation……………….. 43 3.5Manual annotation to provide gold standard data…………………………… 44 3.6Subcategorizing reading errors for the language knowledge discovery…….. 49 3.7Feedbacks from vetting for fixing program bugs or design makeover……… 50 Chapter 4 The vetting of computer reader’s performance and tacit language knowledge discovery…………………………………... 51 4.1introduciton………………………………………………………………… 51 4.2 Definition of reading as demonstration of reader’s language knowledge….. 51 4.3 A meaning representation for both computer reading and human reading….. 55 4.4 The computer reader’s reading report shows plenty of language knowledge. 59 4.5 Summary…………………………………………………………………… 73 Chapter 5 First-run vetting for diagnostic analysis………………………… 74 5.1 Introduction………………………………………………………………… 74 5.2 Finding the most frequent misses first……………………………………… 75 5.3 Diagnosis…………………………………………………………………... 100 5.4 What need to be done?................................................................................... 106 5.4.1 Meaning representation…………………………………………………... 107 5.4.2 Importance of top-down processing (reading) …………………………… 110 5.4.3 Importance of syntax-semantics interface………………………………... 111 5.4.4 Back to the basics………………………………………………………… 112 5.5 Summary…………………………………………………………………… 114 Chapter 6 Vetting Result Analysis…………………………………………… 115 6.1 Introduction………………………………………………………………… 115 6.2 An event-based top-down processing based on meaning representation…… 116 6.2.1 Goal-oriented top-down processing……………………………………… 117 6.2.2 Division of labor between grammar and lexicon…………………………. 118 6.2.3 Ad hoc strategies for language specific problems in English…………….. 119 6.3 Vetting results of a new round after the renovation………………………… 121 6.3.1 Recall of computer reading – finding events…………………………….. 121 6.3.2 Precision of computer reading – case frame building……………………. 124 6.3.3 Precision of computer reading – identifying inter-event relations……….. 127 6.4 The vetting that makes sense to incremental improvement of NLP research.. 131 6.5 Why the vetting in this study is more meaningful to NLP system developers than other vetting system…………………………………………………… 132 6.5.1Limitations in non-linguistic vetting……………………………………… 136 6.5.2 Significance of linguistic vetting…………………………………………. 137 Chapter 7 Conclusion………………………………………………………… 139 7.1 Introduction………………………………………………………………… 139 7.2 A realization of computer reading………………………………………….. 140 7.3 What is learned from the vetting…………………………………………… 141 Reference……………………………………………………………………… 142 List of Tables Table 2-1: Types of Knowledge of the language……………………………….. 11 Table 2-2: The categorization of approaches to NLP…………………………... 13 Table 2-3 The list of a whole suite of Chen’s NLP technology………………… 33 Table 3-1: A screen shot of a reading report…………………………………… 38 Table 3-2: An annotated reading report of a sentence………………………….. 40 Table 3-3: Duplicating reading results in alignment……………………………. 43 Table 3-4-a: Manual annotation………………………………………………... 44 Table 3-4-b: Manual annotation………………………………………………... 47 Table6-1:Events correctly identified or events that are missed or over-generated…………………………………………………………….. 119 Table 6-2:Precision measurement in the finding of each event’s Who Did What ToWhom……………………………………………………………. 122 Table 6-3:Precision measurement in the identification of inter-event relation and the constituent type of each event………………………………. 126 Table 6-4:Meaning representation……………………………………………... 132 List of Figures Figure 2-1: The Classification of NLP…………………………………………. 11 Figure 2-2: The parsing process……………………………………………….. 15 Figure 2-3:The prescribed grammar rules……………………………………… 16 Figure 2-4:The final parse tree…………………………………………………. 16 Figure 2-5:The process of Bottom-Up Parsing………………………………… 18 Figure 2-6:Dependency sturcture………………………………………………. 23 Figure2-7:The tree structure of Dependency-based and constituent-based syntatic analysis……………………………………………………. 24 Figure3-1:Data collection……………………………………………………… 36 Figure5-1:The problems of diagnostic analysis………………………………… 100

    Reference
    Allen, J.(1995), Natural Language Understanding (2nd edition). Benjamin-Cummings Publishing Co.,Redwood City, California, USA.
    Bastings, J., & Sima'an, K. (2014). All Fragments Count in Parser Evaluation. In LREC (pp. 78-82).
    Black, E., Abney, S., Flickinger, D., Gdaniec, C., Grishman, R., Harrison, P., Hindle, D., Ingria, R., Jelinek, F., Klavans, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., and Strzalkowski, T. (1991). A procedure for quantitatively comparing the syntactic coverage of English grammars. In Proceedings of the DARPA Speech and Natural Language Workshop, pages306–311.
    Bosco, C./Lombardo, V. (2004), Dependency and Relational Structure in Treebank Annotation. In:Proceedings of the Workshop Recent Advances in Dependency Grammar. Geneva, Switzerland,9_16.
    Charniak E. (1996). Treebank Grammars. In Proc. of AAAI-1996. Eugene Charniak. 1997. St
    Chen, K./Luo, C./Chang, M./Chen, F./Chen, C./Huang, C./Gao, Z. (2003), Sinica Treebank. In:Abeille´ 2003a, 231_248.
    Chen, Shian-Jung & Lu, Kevin (2012). Clause boundary detection and relational marking for MT reordering. Special Issue in the Studies in English Language and Literature (2013), National Taiwan University of Science and Technology.
    Chen, Shian-Jung (2013). PADS restoration and its importance in reading comprehension and meaning representation. Proceedings of The 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27).
    Chen, Shian-Jung (2014). Computer reading and human reading. International Conference on Applied Linguistics & Language Teaching (ALLT). National Taiwan University of Science and Technology, Taipei.
    Chen, Shian-Jung. (1996). Analysis of Chinese for Chinese-English Machine Translation (Doctoral Dissertation). Department of Linguistics, Georgetown University, Washington, DC.
    Chen, Shian-Jung. (2010). Linguistic relativity revisit. 2010年跨文化研究國際研討會,輔仁大學。 Retrieved from https://ntust.academia.edu/ShianjungChen.
    Chomsky, N. (1957). Syntactic Structure. The Hague/Paris: Mouton.
    Church, K., & Patil, R. (1982). Coping with syntactic ambiguity or how to put the block in the box on the table. Computational Linguistics, 8(3-4), 139-149.
    de-Wit, Lee & Alexander, David & Ekroll, Vebjørn & Wagemans, Johan. (2016). Is neuroimaging measuring information in the brain? Psychonomic Bulletin & Review. 23. 10.3758/s13423-016-1002-0.
    Dimitriadis, A. (1996). When pro-drop languages don’t: Overt pronominal subjects and pragmatic inference. Proceedings of CLS, 32, 33-47.
    Emily M. Bender (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax , Morgan & Claypool Publishers ,June 30, 2013.
    Fillmore, C. J. (1968). The Case for Case. In Bach and Harms (Ed.): Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston, 1-88.
    Fillmore, C. J., & Baker, C. F. (2001, June). Frame semantics for text understanding. In Proceedings of WordNet and Other Lexical Resources Workshop, NAACL.
    Garside, Roger, Geoffrey Leech, and Geoffrey Sampson (1987). The Computational Analysis of English. A Corpus-based Approach. Longman, London.
    Guo, J. (2016). The Washington Post-Wonkblog: Google’s new artificial intelligence can’t understand these sentences. Can you? https://www.washingtonpost.com/news/wonk/wp/2016/05/18/googles-new-artificial-intelligence-cant-understand-these-sentences-can-you/?utm_term=.8e0cfc3ef357
    Haegeman, L. (1994). Introduction to government and binding theory. (2nd ed.). Oxford: Blackwell.
    Haji, E., & Ová (2000). Dependency Treebank: From analytic to tectogrammatical annotations.
    Hazel Mae Pan(2016), How BLEU Measures Translation and Why It Matters, https://slator.com/technology/how-bleu-measures-translation-and-why-it-matters/
    Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266.
    Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (Vol. 3). London: Pearson.
    Kakkonen, T. (2007). Framework and resources for natural language parser evaluation. Tuomo Kakkonen.
    Kamath, A., & Das, R. (2018). A Survey on Semantic Parsing. arXiv preprint arXiv:1812.00978.
    Khurana, D., Koli, A., Khatter, K., & Singh, S. (2017). Natural language processing: State of the art, current trends and challenges. arXiv preprint arXiv:1708.05148.
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu(2002). BLEU: a Method for Automatic Evaluation of Machine Translation, In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311-318.
    Li, H., & Sheng, X. (2017). A Study on the Garden Path Phenomenon from the Perspective of Generative Grammar. Journal of Language Teaching and Research, 8(6), 1190-1194.
    Liddy, E.D.(2001). Natural Language Processing. In Encyclopedia of Library and Information Science, 2nd Ed. NY. Marcel Decker, Inc.
    Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. (1993) .Building a large annotated corpus of English: the Penn Treebank, Computational Linguistics 19(2):313-330.
    Maria Teresa, Pazienza & Pennacchiotti, Marco & Zanzotto, Fabio Massimo. (2006). Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. 10.1007/3-540-32394-5_20.
    Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
    Navigli, R. (2009). Word sense disambiguation: A survey. ACM computing surveys (CSUR), 41(2), 10.
    Nivre, J. (2008). Treebanks. In: Corpus Linguistics: An International Handbook / [ed] Kytö, Merja; Lüdeling, Anke, Mouton de Gruyter, 2008, p. 225-241
    Pulman, S. G. (1991). Basic Parsing Techniques: an introductory survey.
    Rangra, R. (2015). Basic Parsing techniques in natural Language Processing. International Journal, 4(3).
    Rodney A. Brooks(1999). Cambrian Intelligence: The Early History of the New A, A Bradford Book; 1 edition.
    Slav Petrov (2016). Google AI Blog ,Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source, https://ai.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html.
    Su, Keh-Yih & Jing-Shin Chang (1992). Why corpus-based statistics-oriented machine translation, Fourth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-92), Empiricist vs. rationalist methods in MT, June 25-27, 1992, Montreal, CCRIT-CWARC; pp.249-262.
    Taavet Kikas, Margus Treumuth(2007).Automatic Parser Evaluation.
    Tadayoshi Hara, Takrya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii(2011). Exploring Difficulties in Parsing Imperatives and Questions, Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 749-757,Chiang Mai, Tailand,November8-13,2011.
    Wasson, Mark, Don Loritz, Shian-jung Chen, et al. (2005). System and Method for Extracting Information from Text Using Text Annotation and Fact Extraction,US Patent US7912705, 19 Jan 2010.
    Zhou, G., Li, J., Fan, J., & Zhu, Q. (2011). Tree kernel-based semantic role labeling with enriched parse tree structure. Information Processing & Management, 47(3), 349-362.

    QR CODE