Keyword Extraction Based on Text Hierarchical Structures｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	RESA SEPTIARI RESA - SEPTIARI
論文名稱：	Keyword Extraction Based on Text Hierarchical Structures Keyword Extraction Based on Text Hierarchical Structures
指導教授：	鮑興國 Hsing-Kuo Pao
口試委員:	李育杰 Yuh-Jye Lee 項天瑞 Tien-Ruey Hsiang 蔡宗翰 Tzong-Han Tsai
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	34
中文關鍵詞：	Keywordextraction 、keywordoccurrencesprediction 、texthierarchicalstructures 、GaussianProcessRegression
外文關鍵詞：	Keyword extraction, keyword occurrences prediction, text hierarchical structures, Gaussian Process Regression
相關次數：	點閱：160 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Keyword extraction is one of the most popular topics in text mining and information retrieval for the purpose of summarization, recommendation, and categorization on
texts. Given a text, the goal of keyword extraction is to know which word is important
and informative as representatives to describe the text. Keyword extraction has been improved rapidly in recent years as there are a rich set of methods proposed for the task.
We propose a novel keyword extraction method of keyword extraction based on machine
learning techniques and the consideration of the hierarchical structures of text to further
improve the extraction performance from previous methods.We believe that the decision of choosing keywords
is based on the focused themes of texts, the authors’ personal preferences, as well as the
structures of the texts such as how to present a main topic in each part of texts. We test the
proposed method on the data set that consists of a set of research papers for the keyword
extraction task by computing the recall, precision, and F-measure in each trial. As a result,
the proposed method shows effectiveness in terms of the prediction power and efficiency
in terms of the processing time in most cases compared to previous methods.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Research Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Hierarchical Structures of Text . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1 Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Sentence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1 Mean Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . 10
3 TFIDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Implementation of MC,GPR, and TFIDF in Text Document . . . . . . . . 14
Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1 Experiment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 Stopwords Removal and Word Stemming . . . . . . . . . . . . . 18
2.2 Structure Normalization . . . . . . . . . . . . . . . . . . . . . . 20
3 Experiment, Result, and Analysis . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Keyword Extraction in Section Structure . . . . . . . . . . . . . 22
3.2 Keyword Extraction in Sentence Structure . . . . . . . . . . . . . 24
3.3 Keyword Extraction in Word Structure . . . . . . . . . . . . . . 24
3.4 Keyword Extraction using Section+Word . . . . . . . . . . . . . 26
3.5 Summary of Best Method and Sample Article for Each Category . 28
3.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

                                

[1] Y. MATSUO and M. ISHIZUKA, “Keyword extraction from a single document using word cooccurrence statistical information,” International Journal on Artificial Intelligence Tools, vol. 13,
no. 01, pp. 157–169, 2004.
[2] M. Cargill and P. O’Connor, Writing scientific research articles. Wiley-Blackwell, 2009.
[3] H. Alani, S. Kim, D. Millard, M. Weal, W. Hall, P. Lewis, and N. Shadbolt, “Automatic ontology-based
knowledge extraction from web documents,” IEEE Intell. Syst., vol. 18, no. 1, pp. 14–21, 2003.
[4] K. Sparcck Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of Documentation, vol. 60, no. 5, pp. 493–502, 2004.
[5] Y. Qin, “Applying frequency and location information to keyword extraction in single document,” 2012
IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, 2012.
[6] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. MIT Press, 2006.
[7] Ranks.nl, “Stopwords,” 2015. [Online]. Available: http://www.ranks.nl/stopwords
[8] GitHub, “snkim/automatickeyphraseextraction,” 2013. [Online]. Available: https://github.com/
snkim/AutomaticKeyphraseExtraction
[9] M. F. Porter, “Snowball: A language for stemming algorithms,” 2001.
[10] T. D. Nguyen and M.-Y. Kan, “Keyphrase extraction in scientific publications,” in Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. Springer, 2007, pp. 317–326.
[11] S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin, “Automatic keyphrase extraction from scientific
articles,” Lang Resources Evaluation, vol. 47, no. 3, pp. 723–742, 2012.

全文公開日期 2021/01/12 (校內網路)
全文公開日期 2041/01/12 (校外網路)
全文公開日期 2041/01/12 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文