研究生: |
陳怡婷 Yee-ting Chen |
---|---|
論文名稱: |
以事件詞彙鏈為基礎之多文件摘要 Multi-Document Summarization Based on Event Lexical Chain |
指導教授: |
徐俊傑
Chun-chieh Hsu |
口試委員: |
王有禮
Yue-li Wang 王建民 Chien-min Wang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 82 |
中文關鍵詞: | 多文件摘要 、事件詞彙鏈 、事件特徵 |
外文關鍵詞: | Multi-Document Summarization, Event Lexical Chain, Event Feature |
相關次數: | 點閱:140 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於網際網路的普及化,人們可比以往更容易地取得資料,但這也導致了資訊過量的問題,除此之外,還存在著資訊重複的問題。以電子新聞為例,現在線上電子新聞網站林立,各個網站可能以不同的論點陳述著相同的事件,使用者若想窺知事情全貌,就需花費大量時間到各個電子新聞網站搜尋、瀏覽相關的事件,只有在搜集、閱讀完所有報導後,才有可能取得完整資訊。然而對忙碌的現代人來說,並沒有這麼多的時間來搜集、閱讀如此大量的資訊。因此需要一個自動形成高品質摘要的機制,來幫助人們以最短的時間獲得最大的知識量。
因此,本論文提出”事件詞彙鏈”的方法,加入了描述文章事件的事件特徵及偵測出文件中事件與概念詞彙關係形成事件詞彙鏈的關聯性計算。透過事件詞彙鏈重要度的計算客觀地判斷事件詞彙鏈的重要性,選擇出重要的事件詞彙鏈來擷取出幫助使用者了解文件重點的摘要。經由實驗發現,以事件詞彙鏈為基礎可有效地改善文件摘要。
Due to the population of the internet, people can access much more information easily than before, which leads to information overload problem. In addition, there exists information redundancy problem. For example, a lot of online electronic news websites are built and many websites may present the same events in different arguments. If the user wants to know the overall picture of the thing, they must spend a lot of time to surf various electronic news website and browse related news. One needs to collect and read all related news in order to obtain the complete information. However, the modern busy people do not have so much time to collect and read the large amount of information. Therefore, it is desirable to have a mechanism to automatically format a high quality summary for helping people receive the maximum amount of knowledge in the shortest time.
In this thesis, we propose a method called “Event Lexical Chain”. Adding the event features which describe the event in the documents and the relation between terms can form the Event Lexical Chain, which can be used to detect the events and concept terms in the documents. The importance of the Event Lexical Chain can be objectively judged by weighting the Event Lexical Chain. Selecting the important Event Lexical Chain can extract the summary to help users understand the key point of the documents. The experimental results reveal that the multi-document based on Event Lexical Chain can effectively improve the summary of documents.
[1]黃純敏, 吳郁瑩, “網路文件自動摘要”, TANET'99台灣區網際網路研討會, 1999。
[2]Hovy, E. and C.-Y. Lin, ”Automated Text Summarization in SUMMARIST”, In ACL '97 workshop on Intelligent Scalable Text Summarization, pp 18-24 (1997).
[3]Luhn, Hans P., ”The Automatic Creation of Literature Abstracts”, IBM Journal, pages 159-165(1958).
[4]葉鎮源, ”文件自動化摘要方法之研究及其在中文文件的應用”, 交通大學資訊科學研究所論士論文(2002)。
[5]黃思萱, ”以關鍵詞分群為基礎的多文件摘要”, 台灣科技大學資訊管理研究所碩士論文(2000)。
[6]Mani, I. ,” Natural Language Processing, Vol. 3: Automatic Summarization”, John Benjamins (2001).
[7]Radev, D. R., Hovy, E. and McKeown, K., ” Introduction to the Special Issue on Summarization”, Association for Computational Linguistic, Volume 28, pp. 399-408(2002).
[8]Goldstein, J., Mittal, V., Carbonell, J. and Kantrowitz, M., “ Multi-Document Summarization By Sentence Extraction”, ANLP/NAACL Workshop on Automatic Summarization(2000).
[9]H. H., Chen and C. J., Lin, “A Multilingual News Summarizer“, Proceedings of 18th International Conference on Computational Linguistics, pp 159-165(2000).
[10]Radev, D. R., Fanx, W. and Zhangy, Z., ” WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System”, NAACL Workshop on Automatic Summarization (2001).
[11]Hatzivassiloglou, V., Klavans, L. J., Holcombe, L. M., Barzilay, R., M. Y., Kan, and McKeown, K., ”SIMFINDER: A Flexible Clustering Tool for Summarization”, In Proceedings of the Workshop on Summarization in NAACL ‘01, Pittsburg, Pennsylvania, USA(2001).
[12]Hardy, H., Shimizu N., Strzalkowski T., L., Ting, Wise, B. G.., and X., Zhang, “Cross-Document Summarization by Concept Classification”, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp 121 - 128 (2002).
[13]Goldensohn, S. B., Evans,D., Hatzivassiloglou,V., McKeown, K., Nenkova, A., Passonneau, R., Schiffman, B., Schlaikjer, A., Siddharthan, A., Siegelman, S., “Columbia University at DUC 2004”, In Proceedings of the 4th Document Understanding Conference (2004).
[14]Radev, D. R., Jing, H., Sty’s, M., and Tam, D., “Centroid-based summarization of multiple documents”, Information Processing and Management, Volume 40 , Issue 6, pp919–938 (2004).
[15]Kupiec J., Pedersen J. and Chen F. “A Trainable Document Summarizer”, ACM SIGIR, Seattle WA, USA(1995).
[16]Watanabe, H., ” A Method for Abstracting Newspaper Articles by Using Surface Clues”, Proceedings of the 16th conference on Computational linguistics, Volume 2 , pp974 – 979(1996).
[17]邱中人, ”中文新聞摘要”, 清華大學資訊工程所碩士論文(2000)。
[18]吳家威, ”自動摘要方法之研究與探討”, 政治大學資訊科學所碩士論文(2002)。
[19]Salton G., Singhal A., Mitra M.and Chris Buckley, ”Automatic Text Structuring and Summarization”, Information Processing and Management: an International Journal ,Volume 33 , Issue 2, pp193-207(1997).
[20]J. H., Kimt, J. H., Kimt and D., Hwang, ” Korean Text Summarization Using an Aggregate Similarity”,International Workshop on Information Retrieval with Asia Languages Proceedings of the fifth in International workshop on Information retrieval with Asian languages, pp111 – 118(2000).
[21]Erkan, G.¸ and Radev, D. R., “LexPageRank : Prestige in Multi-Document Text Summarization”, Proceedings of EMNLP, Barcelona, Spain, pp365-371 (2004).
[22]黃純敏,”多語文(中英文)超文件自動摘要與評估”,行政院國家科學委員會專題研究計畫成果報告(2001)。
[23]Paice C. D. and Jones P.A., “The Identification of Important Concepts in Highly Structured Technical Papers” , In Proceedings of the 16 International Conference on Research and Development Information Retrieval, pp69-78(1993).
[24]Yan-Min Chen, Xiao-Long Wang, Bing-Quan Liu, ”Multi-Document Summarization Based On Lexical Chains”, IEEE Proceedings of the Fourth International conference on Machine Learning and Cybernetics(2005).
[25]C. Y., Lin and Hovy, E., “The Automated Acquisition of Topic Signatures for Text Summarization”, Proceedings of the 18th conference on Computational linguistics, Volume 1, pp495-501(2000).
[26]C. Y., Lin and Hovy, E., “Automated Multi-document Summarization in NeATS”, In Proceedings of the DARPA Human Language Technology Conference, pp. 50-53(2002).
[27]Y. M. Chen, X. L. Wang and B. Q. Liu, “Multi-Document Summarization Based On Lexical Chains”, IEEE Proceedings of the Fourth International conference on Machine Learning and Cybernetics(2005).
[28]Smooch T. K. Tang, Jerome Yen, and Christopher C. Yang,”Multi-Document Summarization based on Concept Space”, IEEE conference on information Technology: Research and Education, pp385-389(2003).
[29]黃耀民, “以字句擷取為基礎並應用於文件分類之自動摘要之研究”, 師範大學資訊工程研究所碩士論文(2005)。
[30]Radev, D. R, “A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure”, In Proceedings, 1st ACL SIGDIAL Workshop on Discourse and Dialogue(2000).
[31]Mann, W. C. and Thompson, S. A. ,” Rhetorical structure theory: A theory of text organization”,Technical Reports Information Sciences Institute, pp87-190(1987).
[32]Z., Zhang, Otterbacher, J. and Radev, D. R, “Learning Cross-document Structural Relationships using Boosting”, Proceedings of the twelfth international conference on Information and knowledge management, pp124 -130(2003).
[33]Bellaachia, A., ” Information Retrieval and Data Mining Techniques for Generic Text Summarization”, Technical Report, Computer Science Department, School of Engineering and Applied Sciences, The George Washington University(2003).
[34]Hovy, E., “The Oxford Handbook of Computational Linguistics”, Oxford university press, Chapter32(2003).
[35]Jun'ichi Fukumoto, ” Multi-Document Summarization Using Document Set Type Classification”, Proceedings NTCIR(2004).
[36]Newsblaster網站, http://www1.cs.columbia.edu/nlp/newsblaster/
[37]Mani, I. and Bloedorn, E. “Summarizing Similarities and Differences among Related Documents”, Proceedings of RIAO, Montreal(1999).
[38]Mani, I., House D. , Klein G., Hirschman L., Obrst L., Firmin T., Chrzanowski M., and Sundheim B., “The tipster summac text summarization evaluation: Final report.” Technical report, DARPA(1998).
[39]李祥賓和柯淑津,”新聞文件摘要之研究”,中華民國90年第十四屆計算機語言學會研討會論文集, pp65-88(2001)。
[40]郭家良,(黃純敏),”新聞事件群聚及摘要檢索研究”,雲林科技大學資訊管理研究所碩士論文”(2003)。
[41]黃聖傑, ”多文件自動摘要方法研究”, 台灣大學資訊工程研究所碩士論文(1999)。
[42]殷欣靖, ”以文件為基礎的資訊擷取系統”, 國立台灣科技大學資訊管理研究碩士論文(2001)。
[43]H. H. Chen and J. C. Lee, “Identification and Classification of Proper Nouns in Chinese Texts”, Proceedings of 16th International Conference on Computational Linguistics, pp222-229(1996).
[44]中華人網站:http://www.greatchinese.com/
[45]蔡坤修, ”以動態式詞分群為基礎之文件分群研究”, 國立台灣科技大學資訊管理研究所碩士論文(2003)。
[46]Nallapati, R., Allan, J. and Mahadevan, S., “Extraction of Key Words from News Stories”, CIIR Technical report #IR-345, Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts(2004).
[47]C. N. Li and S. A., “Thompson, Mandarin Chinese – A Functional Reference Grammar”, the Crane Publishing Co.(1982).
[48]屈承喜, “A Concise Grammar of Mandarin Chinese”, 五南圖書出版公司( 1999)。
[49]邱詩佩, “以事件特徵為基礎的階層式新聞偵測系統”, 國立台灣科技大學資訊管理研究所碩士論文(2005)。