研究生: |
RESA SEPTIARI RESA - SEPTIARI |
---|---|
論文名稱: |
Keyword Extraction Based on Text Hierarchical Structures Keyword Extraction Based on Text Hierarchical Structures |
指導教授: |
鮑興國
Hsing-Kuo Pao |
口試委員: |
李育杰
Yuh-Jye Lee 項天瑞 Tien-Ruey Hsiang 蔡宗翰 Tzong-Han Tsai |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 34 |
中文關鍵詞: | Keywordextraction 、keywordoccurrencesprediction 、texthierarchicalstructures 、GaussianProcessRegression |
外文關鍵詞: | Keyword extraction, keyword occurrences prediction, text hierarchical structures, Gaussian Process Regression |
相關次數: | 點閱:193 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Keyword extraction is one of the most popular topics in text mining and information retrieval for the purpose of summarization, recommendation, and categorization on
texts. Given a text, the goal of keyword extraction is to know which word is important
and informative as representatives to describe the text. Keyword extraction has been improved rapidly in recent years as there are a rich set of methods proposed for the task.
We propose a novel keyword extraction method of keyword extraction based on machine
learning techniques and the consideration of the hierarchical structures of text to further
improve the extraction performance from previous methods.We believe that the decision of choosing keywords
is based on the focused themes of texts, the authors’ personal preferences, as well as the
structures of the texts such as how to present a main topic in each part of texts. We test the
proposed method on the data set that consists of a set of research papers for the keyword
extraction task by computing the recall, precision, and F-measure in each trial. As a result,
the proposed method shows effectiveness in terms of the prediction power and efficiency
in terms of the processing time in most cases compared to previous methods.
Keyword extraction is one of the most popular topics in text mining and information retrieval for the purpose of summarization, recommendation, and categorization on
texts. Given a text, the goal of keyword extraction is to know which word is important
and informative as representatives to describe the text. Keyword extraction has been improved rapidly in recent years as there are a rich set of methods proposed for the task.
We propose a novel keyword extraction method of keyword extraction based on machine
learning techniques and the consideration of the hierarchical structures of text to further
improve the extraction performance from previous methods.We believe that the decision of choosing keywords
is based on the focused themes of texts, the authors’ personal preferences, as well as the
structures of the texts such as how to present a main topic in each part of texts. We test the
proposed method on the data set that consists of a set of research papers for the keyword
extraction task by computing the recall, precision, and F-measure in each trial. As a result,
the proposed method shows effectiveness in terms of the prediction power and efficiency
in terms of the processing time in most cases compared to previous methods.
[1] Y. MATSUO and M. ISHIZUKA, “Keyword extraction from a single document using word cooccurrence statistical information,” International Journal on Artificial Intelligence Tools, vol. 13,
no. 01, pp. 157–169, 2004.
[2] M. Cargill and P. O’Connor, Writing scientific research articles. Wiley-Blackwell, 2009.
[3] H. Alani, S. Kim, D. Millard, M. Weal, W. Hall, P. Lewis, and N. Shadbolt, “Automatic ontology-based
knowledge extraction from web documents,” IEEE Intell. Syst., vol. 18, no. 1, pp. 14–21, 2003.
[4] K. Sparcck Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of Documentation, vol. 60, no. 5, pp. 493–502, 2004.
[5] Y. Qin, “Applying frequency and location information to keyword extraction in single document,” 2012
IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, 2012.
[6] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. MIT Press, 2006.
[7] Ranks.nl, “Stopwords,” 2015. [Online]. Available: http://www.ranks.nl/stopwords
[8] GitHub, “snkim/automatickeyphraseextraction,” 2013. [Online]. Available: https://github.com/
snkim/AutomaticKeyphraseExtraction
[9] M. F. Porter, “Snowball: A language for stemming algorithms,” 2001.
[10] T. D. Nguyen and M.-Y. Kan, “Keyphrase extraction in scientific publications,” in Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. Springer, 2007, pp. 317–326.
[11] S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin, “Automatic keyphrase extraction from scientific
articles,” Lang Resources Evaluation, vol. 47, no. 3, pp. 723–742, 2012.