簡易檢索 / 詳目顯示

研究生: 林予歆
Yu-Hsin Lin
論文名稱: 中文文件內流程資訊之擷取與流程模型之建構
The Extraction of Process Information from Chinese Text Document and to Construct Process Model
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 林義貴
Yi-Kuei Lin
侯建良
Jiang-Liang Hou
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 106
中文關鍵詞: 中文斷詞資訊擷取事件驅導流程鏈流程相似度
外文關鍵詞: Information Extraction
相關次數: 點閱:134下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於商業流程的日益龐大,企業都會將公司內部的流程給予文件化並附上流程圖,以便相關人員可以透過輔助的資訊更快速的了解公司相關的作業流程。但是當相關人員在取得任一流程描述與附帶的流程圖時,往往會發現兩者相對應的內容有所差異,讓相關人員不知道是因為流程文章描述的不清楚完整,抑或是流程圖未忠實呈現流程描述的內容,而造成兩者之間的落差,這樣的情況反而讓相關人員產生更大的困擾。
    因此,本研究將建立一套規則,以擷取流程文件中任一篇幅流程描述內容中所需要的資訊,並以事件驅導流程鏈(Event-driven Process Chains ,EPC)來做為企業流程圖表達的工具。其間將利用中文斷詞系統將流程描述作簡化,再透過制訂擷取出所需要資訊的規則,與事件流程先後順序的邏輯演算法,找出流程資訊並建立完整的流程模型,最後透過準確率與召回率等績效評估方式,計算斷詞後擷取的資訊是否確實為流程所需的資訊。之後,再將依流程文章所建構的流程圖與原始流程圖做流程作相似度差異的比較,讓使用者了解流程文章與原始流程圖間的差異性有多大。


    Due to the rapid development of enterprise globally, the documents used to describe business process have become very tedious and might be difficult to catch the required information. In addition, the process diagrams embedded in the documents usually were not properly mapped with the text description. This might due to either a very detailed text description versus a simplified process diagram or vise versa. However, both of the situations might mislead the readers due to incompatible information between text and process diagram.
    This research tends to propose a method to extract the process related information from Chinese text documents and to construct the related process diagram. An approach using word and sentence segmentation would be developed. The segmented words and sentences would be extracted and added tags. An algorithm will be developed to compute the precedence and logical relationships among the tagged words and sentences and to construct the related process models. The constructed models would be compared with the process diagrams embedded in the original document in terms of the index such as precision and recall. The computed values can be used as a reference to analyze the variance among the constructed and original process models.
    This research will be addressed on both of the English and Chinese documents in two years of period.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VI 表目錄 VIII 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 1 1.3 研究問題定義 2 1.4 研究論文架構 3 第二章 文獻探討 4 2.1資訊之處理 4 2.1.1 資訊的擷取 4 2.1.2 中文斷詞法 6 2.2 知識解析 10 2.2.1 斷詞與詞類標註 10 2.2.2 時間知識表達 12 2.2.3 時間關係推論 12 2.3 績效評估 14 2.3.1 準確率與召回率 14 2.3.2 流程相似度 14 第三章、研究方法 17 3.1 概念階段 19 3.1.1 問題領域分析 19 3.1.2 概念階段流程解說 20 3.2 設計階段 20 3.2.1 資訊擷取 21 3.2.1.1 文章斷句 21 3.2.1.2 找尋動詞 22 3.2.1.3 擷取Function 23 3.2.1.4 先後順序與邏輯判斷規則 25 3.2.1.5 Function、時序與邏輯的驗證 32 3.2.3 繪製流程圖 35 3.2.4 比較與分析 37 3.2.4.1 流程資訊整理 37 3.2.4.2 流程圖相似度計算 38 3.3 實作階段 40 第四章 實證研究 43 4.1 系統架構 43 4.1.1 斷詞系統 43 4.1.2 使用者介面 44 4.1.3 EPC Tools軟體工具 44 4.1.4 實作環境 45 4.1.5 實作研究限制 45 4.2 實例驗證 45 4.2.1 流程文章「成品出貨」 45 4.2.2 流程文章「物料需求規劃」 65 第五章 結論與建議 82 5.1 結論 82 5.2 研究貢獻 82 5.3 未來研究建議 82 參考文獻 84 附錄 90

    [van der Aalst 06] W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters. “Process Equivalence: Comparing Two Process Models Based on Observed Behavior. ” ,In S. Dustdar, J.L. Fiadeiro, and A. Sheth, editors, BPM 2006, volume 4102 of Lecture Notes in Computer Science, pages 129–144. Springer-Verlag, Berlin, 2006.
    [Allen 83] J. Allen,”Maintaining Knowledge about Temporal Intervals”,
    Communications of the ACM,1983
    [Applet 93] D. Applet, J. Hobbs, D. Israel, M. Kameyama, M. Tyson. The SRI MUC-5 JV FASTUS “Information Extraction System.” Proceedings of the Fifth Message Understanding Conference, 1993.
    [Applet 99] D. E. Applet , D. J. Israel “Introduction to Information Extraction
    Technology.”, In Proceedings of the 16th International Joint Conference on
    Artificial Intelligence, 1999.
    [Asahara 03] M. Asahara, C. L. Goh, X. Wang and Y. Matsumoto. “Combining
    Segmenter and Chunker for Chinese Word Segmentation” , In Proceedings
    of Second SIGHAN Workshop on Chinese Language Processing, pp.
    144-147, 2003.
    [Bolognesi 87] T. Bolognesi and S. Smolka, “Fundamental Results for the Verification of Observational Equivalence: a Survey”, Proc. of IFIP Int. Workshop on Protocol Spec., Testing, and Verification, 1987.
    [Cheng 07] Y. Cheng, M. Asahara, Y. Matsumoto,” Constructing a Temporal Relation Tagged Corpus of Chinese based on Dependency Structure”, The 21st Annual Conference of the Japanese Society for Artificial Intelligence, 2007.
    [El-gendy 95] H. El-Gendy, et al, “Automated Derivation of Test Sequences for
    Protocols & Software Specified in Lotos”, IEEE Int. Con. on Elect.,
    Circuits & Systems, Amman, Jordan, Dec. 18-21, 1995.
    [El-gendy 96] H. El-gendy,”A New Theory for Equivalence Between Process
    Specifications”, ICECS,1996.
    [El-gendy 98] H. El-gendy, H. El-Sayed, A.W. Fayez,”Comparative Analysis of
    the Notions of Equivalence for Process Specifications” , IEEE,1998.
    [El-gendy 98] H. El-Gendy, H. El-Sayed, and A. Fayez, “Equivalence of Behaviour
    Process Specifications”, Proc. of IEEE & IEE Int. Conf. on
    elecommunications, Chalkidiki, Greece, 22-25 June 1998.
    [Freitag 98] D. Freitag, “Machine Learning for Information Extraction in Informal “Domain[D], PhD Thesis, Carnegie Mellon University Pittsbugh, PA, 1998.
    [Galil 86] Z. Galil, ”Efficient Algorithms for finding Maximum Matching in
    Graphs ”, Computer Surveys, Vol.18, No.1, March 1986.
    [Harold 76] N. G. Harold, “An efficient Implementation of Edmond’s Algorithms
    for maximum Matching on Graphs ” ,Journal of the Association for
    Computing Machmery, Vol.23, No 2, pp 221-234,April ,1976.
    [Harold 99] N.G. Harold, K. Haim, E. Robert, “Unique Maximum Matching
    Algorithms”, Annual ACM Symposium on Theory of Computer,1999.
    [Hoare 81] C. Hoare, “A Model for Communicating Sequential Processes”, Tech.
    Monograph Prg-22, Comp. Lab., Univ. of Oxford 1981.
    [Hoare 81] C. A. R. Hoare, S. D. Brookes and A. W. Roscoe, “A Theory of
    Communicating Sequential Processes”, Technical Report PRG-16,Oxford
    University, Programming Research Group, 1981.

    [Hennessy 85] M. Hennessy and R. Milner, “Algebraic Laws for Nondeterminism and Concurrency” , ACM J., Vol. 32, No. I, pp. 137-161,Jan. 1985.
    [Hsu 98] C.N. Hsu and M.T. Dung. “Generating Finite-State Transducers for Semi-
    Structured Data Extraction from the Web”, Journal of Infromation Systems,
    Special Issue on Semi-structured Data, Vol.23, No.8, pp. 521-538, 1998.
    [Kushmerick 97] N. Kushmerick, D. Weld, and R. Doorenbos. “Wrapper Induction for
    information extraction.”, In Proceedings of the 15th International Joint Conference on AI (IJCAI-97),pp. 729-737, 1997.
    [Li 88] Li, G. C., K. Y. Liu and Y. K. Zhang,”Identifying Chinese Word an Processing
    Different Meaning Structures,”Journal of Chinese Information Processing,
    Vol. 2, pp. 45-53, 1988.
    [Li 01] W. Li, K.F. Wong,C. Yuan,”Application and Difficulty of Natural Language
    Processing in Chinese Temporal Information Extraction” ,in Proceedings of
    6th Natural Language Information Processing Pacific Rim Symposium
    (NLPRS'01), Tokyo, Japan, November 27-29 ,2001.
    [Li 01] W. Li, K.F. Wong, C. Yuan, ”A Model for Processing Temporal
    Reference in chinese”, Annual Meeting of the ACL, 2001
    [Li 02] W. Li, K.F. Wong,”A Word-Based Approach for Modeling and Discovering Temporal Relations Embedded in Chinese Sentences”,ACM Transactions on Asian Language Information Processing,Vol.1, No.3 ,pp 173-206,2002.
    [Liang 90] N.Y. Liang,”Knowledge of Chinese Word Segmentation,” Journal of
    Chine Information Processing, Vol. 4, pp. 42-49, 1990.
    [Lin 06] Q.X. Lin,”Chinese Word Segmentation using Specialized HMM”, Computer Science and Information Engineering,2006.
    [Lu 05] X. Lu.“Towards a Hybrid Model for Chinese Word Sementation,” In Proceedings of Fourth SIGHAN Workshop on Chinese Language Processing, 2005.
    [Ma 03] W.Y. Ma and K.J. Chen, “Introduction to CKIP Chinese Word Segmentation
    System for the First International Chinese Word Segmentation Bakeoff,”
    In Proceedings of ACL,Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan,pp.168-171,2003.
    [Ma 05] Y. Ma,Y. Liu,” Chinese-English Quasi-equivalent Noun Phrase: Definition
    and Automatic Identification”, 2005 youth project of Asia research center, Tsinghua University, 2005.
    [Milner 81] R. Milner, “A Calculs of Communicating Systems”, Lecture Notes of
    Computer Science, Springer Verlag, Vol. 92, 1981
    [Nic 96] J. Y. Nic and M. Briscobois, “On Chinese Text Retrieval,” Proceeding of
    SIGIR, pp. 225-233, 1996.
    [Nicola 84] R. Nicola and M. C. B. Hennessy, “Testing Equivalences for Processes”, Theoretical Comp. SC. 34, N. Holland, pp. 83-133,1984.
    [Probert 95] R. Probert & L. Wei,“A Requirements-Directed Behavioral View of Processes & Its Application to Test Design for INRES Service”, Proc. of IFIP Int. Works. on Protocol Test Systems, pp. l-16 ,1995.
    [Riloff 93] E. Riloff. ,”Automatically Constructing a Dictionary for Information
    Extraction Tasks”,. Proceeding of the Eleventh National Conference on
    Artificial Intelligence, pp.811-816, 1993.
    [Sporat 90] R. Sporat and C. Shih, “A Statistical Method for Finding Word Boundaries in Chinese Text,” Computer Processing of Chinese and Oriental Languages,Vol. 4 No. 4, pp.336-351, 1990.
    [Van Rijsbergen 79] C.J.Van Rijsbergen. “Information Retrieval”, Butterworths,
    London, 1979.
    [Wang 05] J. Wang, J. Liu, “A New Algorithm of Rule Generation for Chinese Information Extraction”, Proceeding of NLP-KE’05, 2005.
    [Wong 99] K.F. Wong, W. Li and C. Yuan,” Classifying Temporal Concepts in Chinese for Information Extraction”,. In Proceedings of 5th Natural Language Processing Pacific Rim Symposium (NLPRS'99). November 5-7, Beijing, pp.172-177, 1999.
    [Wong 02] K.F. Wong, W. Li,”Temporal Representation and Classification in Chinese” ,International Journal of Computer Processing of Oriental Languages, Vol.15, No 2, pp 211-230 , 2002.
    [Xue 03] N. Xue and L. Shen. “Chinese Word Segmentation as LMR Tagging,” In
    Proceedings of Second SIGHAN Workshop on Chinese Language Processing, 2003.
    [Xue 03] N. Xue. “Chinese Word Segmentation as Character Taging,” International Journal of Computational Linguistics and Chinese, pp. 29-48, 2003.
    [Yeh 91] C. L. Yeh and H. J. Lee, “Rul-Based Word Identification for Mandarin Chinese Sentences-A Unification Approach,” Computer Processing of Chinese and Oriental Languages, Vol. 5, No. 2, pp. 97-118, 1991.
    [Zhang 03] H. P. Zhang, H. K. Yu, D. Y. Xiong and Q. Liu. “HHM-based Chinese Lexical Analyzer ICTCLAS” ,In Proceedings of Second SIGHAN Workshop on Chinese Language Processing, pp.187-187, 2003

    [1]卜小蝶,台北市:文華,圖書資訊檢索技術,1996
    [2]曾慧馨、劉昭麟、高照明、陳克健,「A Hybrid Approach for Automatic
    Classification of Chinese Unknown Verbs」,Computational Linguistics and
    Chinese Language Processing Vol. 7, No. 1, pp. 1-28, February 2002.
    [3]陳稼興、謝佳倫、許芳誠,「以遺傳演算法為基礎的中文斷詞研究」,資訊管
    理研究,第二卷第二期,2000年07月,pp.27-44。
    [4]陳克健,黃居仁,“中央研究院現代漢語標記語料庫4.0版簡介”, 技術報告第
    95-02/98-04號「中央研究院漢語料庫的內容與說明」。
    [5]黃居仁、陳克健、張莉萍、許蕙麗,”中央研究院平衡語料庫簡介”,中華民國
    八十四年第八屆計算語言學研討會論文集,1995,81-99頁。

    QR CODE