簡易檢索 / 詳目顯示

研究生: 何紹威
Shou-Wei Ho
論文名稱: 基於概念向量萃取維基百科分類網路建構模糊領域本體論
Mining Fuzzy Domain Ontology Based on Concept Vector from Wikipedia Category Network
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 何建明
Jan-Ming Ho
李育杰
Yuh-Jye Lee
王榮英
Jung-Ying Wang
陳志銘
Chih-Ming Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 50
中文關鍵詞: 本體論維基百科探勘概念向量領域分類
外文關鍵詞: Ontology, Wikipedia Mining, Concept Vector, Domain Classification
相關次數: 點閱:194下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本體論是用來表達領域知識的一種標準化的語言,並且可使用於需要人機溝通的系統上(例如,專家推薦系統,文件分類)。而領域本體論可表達關鍵字在不同領域上的特殊意義。許多研究者在利用模糊領域本體論來衡量概念間的相似度。然而,建設領域本體論是一種勞動密集和耗費時間成本的工作。根據最近的研究,維基百科可以用來建立與更新本體論,因為維基百科藉由眾人的智慧可以提供最新的資訊。在本論文中,我們提出了一種基於概念向量萃取維基百科分類網路建構模糊領域本體論的方法,並使用概念向量建立關鍵字與概念間的模糊關係。而一個領域的知識是由幾個概念所組成的,這裡所指的概念是由一個特定的維基百科分類所代表。然後,模糊關係是用來衡量關鍵字、概念和領域之間的語義關聯度。模糊領域本體論的構建包含概念集合、領域集合和他們之間的模糊關係。我們的方法可以達到:(1)利用維基百科建立最新的模糊領域本體論;(2)將領域概念化成數個由維基百科分類所組成的概念;(3)利用模糊領域本體論建立的關鍵字與領域間的模糊關係。藉由使用文件檢索會議(Text Etrieval Conference,簡稱TREC)文件資料庫的實驗結果表明,模糊領域本體論有助於改進文件檢索的程序。


    Ontology is essential in the formalization of domain knowledge for effective humancomputer interactions (e.g., recommendation system, document classification). Especially domain ontology represents the particular meanings of terms in a specific domain. Many researchers have proposed approaches to measure the similarity between concepts by accessing fuzzy domain ontology. However, engineering of the construction of domain ontologies turns out to be labor intensive and tedious. Recently, Wikipedia mining facilitates the process of ontology construction because Wikipedia provides the up-to-date concept information managed based on socially annotated category structure. In this thesis, we propose an approach to mine domain concepts from Wikipedia Category Network, and to generate the fuzzy relation based on a concept vector extraction method to measure the relatedness between a single term and a concept.
    The domain knowledge is composed of several concepts, and the concept is represented by a specific Wikipedia category. Then the fuzzy relation is used to measure the relatedness score among key terms, concepts and domains. The constructed fuzzy domain ontology comprises several concept sets of domains and fuzzy relation between terms and domains. Our methodology can conceptualize domain knowledge by mining Wikipedia Category Network. Especially ontology-based systems can be implemented by our fuzzy domain ontology. An empirical experiment is conducted to evaluate the robustness by using the textual dataset from Text REtrieval Conference (TREC). Experiment results show that the constructed fuzzy domain ontology derived by proposed approach can discover robust fuzzy domain ontology which achieves improvement in information retrieval tasks.

    ABSTRACT i ACKNOWLEDGEMENTS iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Outlines of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background 8 2.1 Fuzzy Domain Ontology . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Ontology-Based Approaches . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Semantic Integration . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Ontology Mapping . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . 12 2.3 Measuring Semantic Relatedness in Wikipedia . . . . . . . . . . . . . 12 2.3.1 Semantic Relatedness . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Community Structure . . . . . . . . . . . . . . . . . . . . . . 14 3 Fuzzy Ontology Generation 16 3.1 Pre-Processing Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 POS Tagger . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Lexical Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Wiki Mapping Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Wiki-Page Mapping . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Wiki-Category Mapping . . . . . . . . . . . . . . . . . . . . 23 3.3 Ontology Building Stage . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Concept Representation Finder . . . . . . . . . . . . . . . . . 24 3.3.2 Fuzzy Relation Generator . . . . . . . . . . . . . . . . . . . 26 4 Empirical Experiments and Results 30 4.1 Wikipedia Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Dataset Reuters-21578 . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Evaluation Metrics and Experimental Design . . . . . . . . . . . . . 32 4.4 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.5 Expert Finding System . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . 39 4.5.2 Results and Discussions . . . . . . . . . . . . . . . . . . . . 39 5 Conclusion and Further Work 42 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    [1] D. Beneventano, S. Bergamaschi, F. Guerra, and M. Vincini, “Synthesizing an
    integrated ontology,” Internet Computing, IEEE, vol. 7, no. 5, pp. 42–51, 2003.
    [2] J. Brank, M. Grobelnik, and D. Mladenic, “A survey of ontology evaluation techniques,”
    in Proceedings of the Conference on Data Mining and Data Warehouses
    (SiKDD 2005). Citeseer, 2005, pp. 166–170.
    [3] N. Choi, I. Song, and H. Han, “A survey on ontology mapping,” ACM Sigmod
    Record, vol. 35, no. 3, pp. 34–41, 2006.
    [4] P. Cimiano, A. Hotho, and S. Staab, “Learning concept hierarchies from text corpora
    using formal concept analysis,” Journal of Artificial Intelligence Research,
    vol. 24, no. 1, pp. 305–339, 2005.
    [5] A. Clauset, M. Newman, and C. Moore, “Finding community structure in very
    large networks,” Physical Review E, vol. 70, no. 6, pp. 66 111–1–66 111–6, 2004.
    [6] A. Doan, P. Domingos, and A. Halevy, “Learning to match the schemas of data
    sources: A multistrategy approach,” Machine Learning, vol. 50, no. 3, pp. 279–
    301, 2003.
    [7] N. Du, B.Wang, and B. Wu, “Overlapping community structure detection in networks,”
    in Proceeding of the 17th ACM conference on Information and knowledge
    management. ACM, 2008, pp. 1371–1372.
    [8] E. Gabrilovich and S. Markovitch, “Computing semantic relatedness using
    wikipedia-based explicit semantic analysis,” in Proceedings of the 20th International
    Joint Conference on Artificial Intelligence, 2007, pp. 6–12.
    [9] X. Han and J. Zhao, “Named entity disambiguation by leveraging wikipedia semantic
    knowledge,” in Proceeding of the 18th ACM conference on Information
    and knowledge management. ACM, 2009, pp. 215–224.
    [10] C. Jones, A. Abdelmoty, D. Finch, G. Fu, and S. Vaid, “The spirit spatial search
    engine: Architecture, ontologies and spatial indexing,” Geographic Information
    Science, pp. 125–139, 2004.
    [11] A. Kittur, E. Chi, and B. Suh, “What’s in wikipedia?: mapping topics and conflict
    using socially annotated category structure,” in Proceedings of the 27th international
    conference on Human factors in computing systems. ACM, 2009, pp.
    1509–1512.
    [12] J. Kopecky, T. Vitvar, C. Bournez, and J. Farrell, “Sawsdl: Semantic annotations
    for wsdl and xml schema,” IEEE Internet Computing, pp. 60–67, 2007.
    [13] R. Lau, A. Chung, D. Song, and Q. Huang, “Towards fuzzy domain ontology
    based concept map generation for e-learning,” Advances inWeb Based Learning–
    ICWL 2007, pp. 90–101, 2008.
    [14] R. Lau, Y. Li, and Y. Xu, “Mining fuzzy domain ontology from textual
    databases,” in Proceedings of the IEEE/WIC/ACM International Conference on
    Web Intelligence. IEEE Computer Society, 2007, pp. 156–162.
    [15] R. Lau, D. Song, Y. Li, T. Cheung, and J. Hao, “Towards a fuzzy domain ontology
    extractionmethod for adaptive e-learning,” IEEE Transactions on Knowledge and
    Data Engineering, vol. 21, no. 6, pp. 800–813, 2009.
    [16] L. Leme, M. Casanova, K. Breitman, and A. Furtado, “Owl schema matching,”
    Journal of the Brazilian Computer Society, vol. 16, no. 1, pp. 21–34, 2010.
    [17] D. Lewis, “Reuters-21578 text categorization test collection,” AT&T Labs Research,
    1997.
    [18] H. Liu and Y. Chen, “Computing semantic relatedness between named entities using
    wikipedia,” in Artificial Intelligence and Computational Intelligence (AICI),
    2010 International Conference on, vol. 1. IEEE, 2010, pp. 388–392.
    [19] D. Lizorkin, O. Medelyan, and M. Grineva, “Analysis of community structure in
    wikipedia,” in Proceedings of the 18th international conference on World wide
    web. ACM, 2009, pp. 1221–1222.
    [20] D. McGuinness and F. Van Harmelen, “Owl web ontology language overview,”
    W3C recommendation, vol. 10, pp. 2004–03, 2004.
    [21] D. Milne, O. Medelyan, and I. Witten, “Mining domain-specific thesauri from
    wikipedia: A case study,” in IEEE/WIC/ACM International Conference on Web
    Intelligence, 2006. WI 2006, 2006, pp. 442–448.
    [22] A. Morozov, “On computable automorphisms in formal concept analysis,”
    Siberian Mathematical Journal, vol. 51, no. 2, pp. 289–295, 2010.
    [23] V. Nastase and M. Strube, “Decoding wikipedia categories for knowledge acquisition,”
    in Proceedings of the 23rd national conference on Artificial intelligence,
    2008, pp. 1219–1224.
    [24] M. Newman and M. Girvan, “Finding and evaluating community structure in
    networks,” Physical review E, vol. 69, no. 2, pp. 26 113–1–26 113–15, 2004.
    [25] N. Noy, “Semantic integration: a survey of ontology-based approaches,” ACM
    Sigmod Record, vol. 33, no. 4, pp. 65–70, 2004.
    [26] T. Quan, S. Hui, and T. Cao, “Foga: a fuzzy ontology generation framework for
    scholarly semantic web,” in Proceedings of the 2004 Knowledge Discovery and
    Ontologies Workshop. Citeseer, 2004, pp. 37–48.
    [27] M. Sahami and T. Heilman, “A web-based kernel function for measuring the similarity
    of short text snippets,” in Proceedings of the 15th international conference
    on World Wide Web. ACM, 2006, pp. 377–386.
    [28] E. Sanchez and T. Yamanoi, “Fuzzy ontologies for the semantic web,” Flexible
    Query Answering Systems, pp. 691–699, 2006.
    [29] P. Schonhofen, “Identifying document topics using the wikipedia category network,”
    in Proceedings of the 2006 IEEE/WIC/ACM International Conference on
    Web Intelligence. IEEE Computer Society, 2006, pp. 456–462.
    [30] M. Shirakawa, K. Nakayama, T. Hara, and S. Nishio, “Concept vector extraction
    from wikipedia category network,” in Proceedings of the 3rd International Conference
    on Ubiquitous Information Management and Communication. ACM,
    2009, pp. 71–79.
    [31] N. Silva and J. Rocha, “Ontology mapping for interoperability in semantic web,”
    in Proceedings of the IADIS International Conference WWW/Internet, 2003, pp.
    603–610.
    [32] Q. Tho, S. Hui, A. Fong, and T. Cao, “Automatic fuzzy ontology generation for
    semantic web,” IEEE Transactions on Knowledge and Data Engineering, pp.
    842–856, 2006.
    [33] K. Toutanova, D. Klein, C. Manning, and Y. Singer, “Feature-rich part-of-speech
    tagging with a cyclic dependency network,” in Proceedings of the 2003 Conference
    of the North American Chapter of the Association for Computational Linguistics
    on Human Language Technology, vol. 1. Association for Computational
    Linguistics, 2003, pp. 173–180.
    [34] A. Voutilainen, “Part of speech Tagging,” The Oxford Handbook of Computational
    Linguistics, pp. 219–232, 2003.
    [35] R. Wille, “Restructuring lattice theory: an approach based on hierarchies of concepts,”
    in Ordered sets, I. Rival, Ed. Reidel, 1982, pp. 445–470.
    [36] R. Wille, “Formal concept analysis as mathematical theory of concepts and concept
    hierarchies,” Formal Concept Analysis, pp. 1–33, 2005.
    [37] F. Wu and D. Weld, “Automatically refining the wikipedia infobox ontology,” in
    Proceeding of the 17th international conference on World Wide Web. ACM,
    2008, pp. 635–644.
    [38] K. Yang, T. Kuo, H. Lee, and J. Ho, “A Reviewer Recommendation SystemBased
    on Collaborative Intelligence,” in Proceedings of the 2009 IEEE/WIC/ACM International
    Joint Conference on Web Intelligence and Intelligent Agent Technology,
    vol. 1. IEEE Computer Society, 2009, pp. 564–567.
    [39] T. Zesch, C. M‥uller, and I. Gurevych, “Extracting Lexical Semantic Knowledge
    fromWikipedia andWiktionary,” in Proceedings of the Conference on Language
    Resources and Evaluation (LREC), 2008.
    [40] J. Zhai, L. Shen, Z. Zhou, and Y. Liang, “Fuzzy ontology model for knowledge
    management,” in Proceedings of the 2007 International Conference Proceedings
    on Intelligent Systems and Knowledge Engineering, 2007.
    [41] H. Zhuge, “Communities and emerging semantics in semantic link network: Discovery
    and learning,” IEEE Transactions on Knowledge and Data Engineering,
    pp. 785–799, 2008.

    QR CODE