簡易檢索 / 詳目顯示

研究生: 陳子軒
ZIH-SYUAN CHEN
論文名稱: 英文文件內流程資訊之擷取與流程模型之建構
The Extraction of Process Information from English Text Document and to Construct Process Model
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 林義貴
Yi-Kuei Lin
侯建良
Jiang-Liang Hou
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 113
中文關鍵詞: 文字分割資訊擷取事件驅動流程鏈流程相似度
外文關鍵詞: English Word Segmentation, Information Extraction, Event-driven Process Chain, Process Equivalence
相關次數: 點閱:140下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前在企業內部存在著許多的商業流程敘述文件來表達各種不同的執行流程,其複雜度是更甚以往的,在此之前皆是由相關人員透過這些流程文件來進行分析了解其所要傳達的流程訊息,並進行直接閱讀分析流程文件,但是其會造成個人解讀不同而產生理解上的錯誤,因此,本研究發展一套正規劃的方法步驟來產生出讓人可以容易理解的流程圖,以便能縮小每個人解讀流程之差異所產生的錯誤。
    另一方面,其相關流程文件或許會伴隨著一個原始流程圖。而在某些文件當中,其所伴隨之流程圖並不能完整的描繪出文件內所要表達的訊息,也就是說原始流程圖與原始流程敘述文件兩者是有差異的,而這樣的差異對於相關人員進行解讀的過程中也會產生誤解,因此,本研究將流程相似度分析應用在此,提供一個定量的分析法來針對圖文比較。
    綜合以上概念,本研究分成兩個階段,首先利用自然語言的文字處理技術來擷取文件內的流程資訊來轉換並建構成為流程圖型式。利用本研究所建構之流程圖與原始流程進行相似度分析,藉此來量化圖文差異度指標並提供各種相似度差異之狀況,讓相關人員可進行分析。


    Due to the rapid development of enterprise globally, the documents used to describe business process have become very tedious and might be difficult to catch the required information. In addition, the process diagrams embedded in the documents usually were not properly mapped with the text description. This might due to either a very detailed text description versus a simplified process diagram or vise versa. However, both of the situations might mislead the readers due to incompatible information between text and process diagram.
    This research tends to propose a method to extract the process related information from text documents and to construct the related process diagram. An approach using word and sentence segmentation would be developed. The segmented words and sentences would be extracted and added tags. An algorithm will be developed to compute the precedence and logical relationships among the tagged words and sentences and to construct the related process models. The constructed models would be compared with the process diagrams embedded in the original document in terms of the index such as precision and recall. The computed values can be used as a reference to analyze the variance among the constructed and original process models.
    This research will be addressed on both of the English and Chinese documents in two years of period.

    摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 VI 表目錄 VIII 第一章 緒論 1 1.1 研究背景 1 1.2 研究問題 3 1.3 研究目的 4 第二章 文獻探討 6 2.1 自然語言處理 7 2.1.1 斷詞 7 2.1.1.1 標點符號 7 2.1.1.2 英文斷詞 8 2.1.1.3 中文斷詞 9 2.2 時間資訊與事件順序處理 11 2.2.1 時間資訊表達 11 2.2.2 時間資訊標註 11 2.3 XML(eXtensible Markup Language) 13 2.4 流程相似度比較 14 2.5評估方法 15 第三章 研究方法架構 17 3.1 概念分析階段 21 3.2 設計階段 22 3.2.1 流程資訊擷取 24 3.2.1.1 動詞標註 24 3.2.1.2 文句分割 25 3.2.1.3 元素分解 26 3.2.2 流程元素驗證 29 3.2.3 時間邏輯判斷 32 3.2.4 圖檔轉換 36 3.2.5 比較與分析 38 3.2.5.1 流程圖之階層判別 39 3.2.5.2 流程元素整併 41 3.2.5.3 流程相似度計算 43 3.2.5.4 流程相似度分析 44 3.3 實作階段 46 3.3.1 系統與處理文件介紹 46 3.3.2 實作流程解說 46 第四章 研究程式開發與實例分析 51 4.1 工具與程式之開發環境介紹 51 4.1.1 動詞標註程式(callisto.jar) 52 4.1.2 流程元素分解程式 53 4.1.3 流程圖產生工具 (EPC Tools) 57 4.2 實例驗證分析 59 4.2.1 實例解析(1) 59 4.2.2 實例解析(2) 68 4.2.2.1 文件解析 69 4.2.2.2 實例二流程分析 73 4.2.3實例解析(3) 77 4.2.3.1 文件解析 78 4.2.3.2 範例三流程分析 89 4.3 方法評估 94 4.3.1 評估值定義 95 4.3.1.1 結構正確度 95 4.3.1.2 語意正確度 95 4.3.2 評估計算 96 4.3.2.1 Ordering Toner 97 4.3.2.2 Planning a Seminar 99 4.3.2.3 Car Dealership 100 4.3.2.4 Recruiting a New Employee 102 4.3.3 評估分析 103 4.3.3.1 Function 103 4.3.3.2 時間邏輯 104 第五章 結論與建議 105 5.1 研究貢獻 105 5.2 未來展望與發展 106 參考文獻 107

    [Allen 83] J. F. Allen ”Maintaining Knowledge about Temporal Intervals”, Communications of the ACM 26, 11, 832-843, November 1983.
    [Ahn 05] D. Ahn, S. Fissaha Adafre and M. de Rijke, “Towards Task-Based Temporal Extraction and Recognition”, in Proceedings Dagstuhl Workshop on Annotating, Extracting, and Reasoning about Time and Events, 2005.
    [Amerl 80] Müller, Hans, Amerl, G. Natalis. “Worterkennungsverfahren als Grundlage einer Universalmethode zur automatischen Segmentierung von Texten in Sätze”, Ein Verfahren zur maschinellen Satzgrenzenbestimmung im Englischen. Sprache und Datenverarbeitung 1, 1980.
    [Buckland 94] M.K. Buckland and F.C. Gey, “The Relationship between Recall and Precision”, Journal of the American Society for Information Science 45, pp.12-19, 1994
    [Bayraktar 98] M. Bayraktar, B. Say, V. Akman, “An analysis of English punctuation: The special case of comma”, international Journal of Corpus Linguistics 3(1), 1998
    [Curran 97] T. Curran, G. Keller, A. Ladd, “SAP R/3 Business Blueprint: “Understanding the Business Process Reference Model”, Prentice Hall PTR, 1997
    [Chen 03] J.F. Chen, W.C. Lin, C.Y. Jian and T.Y. Ho, "Using the Keyword in Context Segmentation Method for Collaborative Design in a Chinese Website," 10th ISPE International conference on Concurrent Engineering Research and Applications, pp.967-975, 2003
    [Cheng 99] K.S.Cheng. ,G.H. Young. and K.F. Wong, “A Study on Word-based and Integral-bit Chinese Text Compression Algorithms”, Journal of American Society of Information Science, 50(3), 218-228, 1999
    [Doran, 00] C. Doran, “Punctuation in a Lexicalized Grammar”, In Proceedings of Workshop TAG+5, Paris, 2000
    [Freksa 90] C. Freksa, “Temporal Reasoning Based on Semi-Intervals”, Artificial Intelligence, vol. 54, nos. 1-2, pp. 199-227,1992.
    [Ferro 02] L. Ferro, R. Kozierok, L. Gerber, B. Sundheim, Inderjeet Mani and George Wilson, “Annotating Temporal Information – From Theory to Practice”, Proceedings of the international conference on Human Language Technology Research, pp.226 – 230, 2002
    [Ferro 01] L. Ferro, ”Instruction Manual for the annotating of temporal expressions”, MITRE TECHNICAL REPORT, 2001
    [Grefenstette 94] G. Grefenstette, P. Tapanainen, “What is a word, what is a sentence? Problems of tokenization.”, In Proceedings of the 3rd International Conference on Computational Lexicography, pp. 79–87, 1994.
    [Geoffrey 90] N. Geoffrey, “The Linguistics of Punctuation”, Number 18 in CSLI Lecture Notes, CSLI Publications, Stanford, CA, 1990
    [Gan 95] K.W. Gan, 1995. “Integrating Word Boundary Disambiguation with Sentence Understanding” , Ph.D. thesis , National University of Singapore.
    [Gao 03] J. Gao, M. Li and C.N. Huang, “Improved Source-Channel Models for Chinese Word Segmentation1”, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp.272-279, July 2003,
    [Gao 02] J. Gao, J. Goodman, M. Li, K.F. Lee, “Toward a unified approach to statistical language modeling for Chinese”, ACM Transactions on Asian Language Information Processing,Vol.1, No. 1, pp 3-33, 2002
    [Han 06] B. Han, D. Gates and L. Levin, “From Language to Time: A Temporal Expression Anchorer”, 13th International Symposium on Temporal Representation and Reasoning, 2006

    [Huang 06] W.Y. Ma, and C.R. Huang, “Uniform and Effective Tagging of a Heterogeneous Giga-word Corpus”, Proceedings of the Fifth International Conference on Language Resources and Evaluation. Genoa, Italy, 2006
    [Jin 04] M.X. Jin, M.Y. Kim, D. Kim and J.H. Lee, ”Segmentation of Chinese Long Sentences Using Commas”, in SIGHAN2004.
    [Juan 03] Y.C. Juan and C. Ou-Yang, “A Process Logic Comparison Approach to Support Business Process Benchmarking” International Journal of Advanced Manufacturing Technology (SCI, EI), 26(1-2) p191-p210, 2005
    [Lin 93] M.Y. Lin, T.H. Chiang, and K.Y. Su, “A preliminary study on unknown word problem in Chinese word segmentation”, ROCLING 6, 119-141, 1993
    [Mani 01] I. Mani, G. Wilson, L. Ferro and B. Sundheim, “Guidelines for annotating temporal information”, In Notebook Proceedings of Human Language Technology Conference 2001, pages 299--302, 2001
    [Mendling 04] J. Mendling, M. Nüttgens, “XML-based Reference Modelling: Foundations of an EPC Markup Language” Proceedings of the 8th GI Workshop Referenzmodellierung, Essen, Germany, 2004
    [Meyer 87] C.F. Meyer, “A Linguistic Study of American Punctuation”, New York: Peter Lang Publishing Co. ,1987
    [Mrozinski 06] J. Mrozinski, E.W.D.Whittaker, P. Chatain and S. Furui, “Automatic sentence segmentation of speech for automatic summarization”, in Proc. of ICASSP, 2006.
    [Mani 00] I. Mani and W. George, “Robust Temporal Processing of News”, in Proceedings of the 38th Annual Meeting of the Associationfor Computational Linguistics, pp. 69-76, 2000.
    [Nishant 02] P. Nishant and Chandra Shekhar Meena, “XML Editor for Indian Languages”, Department of Computer Science and Engineering Indian Institute of Technology Kanpur Nov, 2002
    [Palmer 94] D.D. Palmer, M.A. Hearst, ”Adaptive sentence boundary disambiguation”, In Proceedmgs of the 199/~ conference on Applied Natural Language Processing (ANLP), 1994.
    [Pustejovsky 05] J. Pustejovsky, R. Ingria, R. Sauri, J. Castaño, J. Littman, R. Gaizauskas, A. Setzer, G. Katz and I. Mani “The Specification Language TimeML.” In: I. Mani, J. Pustejovsky & R. Gaizauskas (eds.) The Language of Time. A Reader. Oxford University Press., 2005
    [Riley 89] M.D. Riley, “Some applications of tree-based modelling to speech and language”, In DARPA ,5'peech and Language Technology Workshop, pp. 339-352,1989
    [Reynar 97] J.C. Reynar and A. Ratnaparkhi, “A Maximum Entropy Approach to Identifying Sentence Boundaries”, In Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 16-19. 1997
    [Say 97] B. Say, V. Akman, “Current Approaches to Punctuation in Computational Linguistics”, Computers and the Humanities, 30(6):457–469, 1996.
    [Saquete 02] E. Saquete, P. Martínez-Barco and R. Munoz, "Recognizing and Tagging Temporal Expressions in Spanish", in Proceedings of Workshop on Annotation Standards for Temporal Information in Natural Language, pp.44-51, 2002.
    [Schilder 01] F. Schilder, C. Habel., “From Temporal Expressions to Temporal Information: Semantic Tagging of News Messages”, in Proceedings of the ACL 2001 Workshop on Temporal and Spatial Information Processing, pp. 65-72, 2001.
    [Shen 04] X. Shen, M. Boutell, J. Luo, C. Brown, “Multi-label machine learning and its application to semantic scene classification”, in Proceedings of the 2004 International Symposium on Electronic Imaging, 2004.
    [Setzer 01] A. Setzer, ”Temporal Information in Newswire Articles: an Annotation Scheme and Corpus Study”, PhD dissertation, University of Sheffield. 2001.
    [Saur´i 06] R. Sauri, J. Littman, B. Knippen, Robert Gaizauskas, Andrea Setzer, and James Pustejovsky, “TimeML Annotation Guidelines Version 1.2.1”, http://www.timeml.org/site/publications/specs.html , 2006
    [Tsai 03] J.L. Tsai, C.L. Sung, and W.L. Hsu, "Chinese Word Auto-Confirmation Agent", submitted, also appeared in Proceedings of ROCLING2003, (2003).
    [Teahan 00] W.J. Teahan, Y. Wen, R. McNab, and I. H. Witten, “A compression-based algorithm for Chinese word segmentation, Computational Linguistics, 26(3): 375-393, 2000
    [van der Aalst 06] W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters. Process Equivalence: Comparing Two Process Models Based on Observed Behavior. In S. Dustdar, J.L. Fiadeiro, and A. Sheth, editors, BPM 2006, volume 4102 of Lecture Notes in Computer Science, pages 129–144. Springer-Verlag, Berlin, 2006.
    [van Glabbeek 96] R.J. van Glabbeek and W. Peter Weijland, “Branching Time and Abstraction in Bisimulation Semantics”, Journal of the ACM, 1996
    [Wu 05] M. Wu, W. LI, Q. CHEN, Q. Lu, “Normalizing Chinese Temporal Expressions with Multi-label Classification”, Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2005
    [Xue 03] N. Xue, “Chinese Word Segmentation as Character Tagging,” International Journal of Computational Linguistics and Chinese, pp. 29–48, 2003
    [王02 ] 王俊弘、劉昭麟、高照明,”電腦輔助英文字彙出題系統之研究(Toward Computer Assisted Item Generation for English Vocabulary Tests)”,2002

    QR CODE