簡易檢索 / 詳目顯示

研究生: RESA SEPTIARI
RESA - SEPTIARI
論文名稱: Keyword Extraction Based on Text Hierarchical Structures
Keyword Extraction Based on Text Hierarchical Structures
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
Yuh-Jye Lee
項天瑞
Tien-Ruey Hsiang
蔡宗翰
Tzong-Han Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 34
中文關鍵詞: KeywordextractionkeywordoccurrencespredictiontexthierarchicalstructuresGaussianProcessRegression
外文關鍵詞: Keyword extraction, keyword occurrences prediction, text hierarchical structures, Gaussian Process Regression
相關次數: 點閱:160下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Keyword extraction is one of the most popular topics in text mining and information retrieval for the purpose of summarization, recommendation, and categorization on
    texts. Given a text, the goal of keyword extraction is to know which word is important
    and informative as representatives to describe the text. Keyword extraction has been improved rapidly in recent years as there are a rich set of methods proposed for the task.
    We propose a novel keyword extraction method of keyword extraction based on machine
    learning techniques and the consideration of the hierarchical structures of text to further
    improve the extraction performance from previous methods.We believe that the decision of choosing keywords
    is based on the focused themes of texts, the authors’ personal preferences, as well as the
    structures of the texts such as how to present a main topic in each part of texts. We test the
    proposed method on the data set that consists of a set of research papers for the keyword
    extraction task by computing the recall, precision, and F-measure in each trial. As a result,
    the proposed method shows effectiveness in terms of the prediction power and efficiency
    in terms of the processing time in most cases compared to previous methods.


    Keyword extraction is one of the most popular topics in text mining and information retrieval for the purpose of summarization, recommendation, and categorization on
    texts. Given a text, the goal of keyword extraction is to know which word is important
    and informative as representatives to describe the text. Keyword extraction has been improved rapidly in recent years as there are a rich set of methods proposed for the task.
    We propose a novel keyword extraction method of keyword extraction based on machine
    learning techniques and the consideration of the hierarchical structures of text to further
    improve the extraction performance from previous methods.We believe that the decision of choosing keywords
    is based on the focused themes of texts, the authors’ personal preferences, as well as the
    structures of the texts such as how to present a main topic in each part of texts. We test the
    proposed method on the data set that consists of a set of research papers for the keyword
    extraction task by computing the recall, precision, and F-measure in each trial. As a result,
    the proposed method shows effectiveness in terms of the prediction power and efficiency
    in terms of the processing time in most cases compared to previous methods.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Hierarchical Structures of Text . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Sentence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Mean Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 TFIDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Implementation of MC,GPR, and TFIDF in Text Document . . . . . . . . 14 4 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Experiment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.1 Stopwords Removal and Word Stemming . . . . . . . . . . . . . 18 4.2.2 Structure Normalization . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Experiment, Result, and Analysis . . . . . . . . . . . . . . . . . . . . . . 21 4.3.1 Keyword Extraction in Section Structure . . . . . . . . . . . . . 22 4.3.2 Keyword Extraction in Sentence Structure . . . . . . . . . . . . . 24 4.3.3 Keyword Extraction in Word Structure . . . . . . . . . . . . . . 24 4.3.4 Keyword Extraction using Section+Word . . . . . . . . . . . . . 26 4.3.5 Summary of Best Method and Sample Article for Each Category . 28 4.3.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    [1] Y. MATSUO and M. ISHIZUKA, “Keyword extraction from a single document using word cooccurrence statistical information,” International Journal on Artificial Intelligence Tools, vol. 13,
    no. 01, pp. 157–169, 2004.
    [2] M. Cargill and P. O’Connor, Writing scientific research articles. Wiley-Blackwell, 2009.
    [3] H. Alani, S. Kim, D. Millard, M. Weal, W. Hall, P. Lewis, and N. Shadbolt, “Automatic ontology-based
    knowledge extraction from web documents,” IEEE Intell. Syst., vol. 18, no. 1, pp. 14–21, 2003.
    [4] K. Sparcck Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of Documentation, vol. 60, no. 5, pp. 493–502, 2004.
    [5] Y. Qin, “Applying frequency and location information to keyword extraction in single document,” 2012
    IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, 2012.
    [6] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. MIT Press, 2006.
    [7] Ranks.nl, “Stopwords,” 2015. [Online]. Available: http://www.ranks.nl/stopwords
    [8] GitHub, “snkim/automatickeyphraseextraction,” 2013. [Online]. Available: https://github.com/
    snkim/AutomaticKeyphraseExtraction
    [9] M. F. Porter, “Snowball: A language for stemming algorithms,” 2001.
    [10] T. D. Nguyen and M.-Y. Kan, “Keyphrase extraction in scientific publications,” in Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. Springer, 2007, pp. 317–326.
    [11] S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin, “Automatic keyphrase extraction from scientific
    articles,” Lang Resources Evaluation, vol. 47, no. 3, pp. 723–742, 2012.

    無法下載圖示 全文公開日期 2021/01/12 (校內網路)
    全文公開日期 2041/01/12 (校外網路)
    全文公開日期 2041/01/12 (國家圖書館:臺灣博碩士論文系統)
    QR CODE