Author: 周麗玲
Li-Ling Chou
Thesis Title: 超文字與關鍵字相關度為基礎之主題式查詢-應用於網頁資訊檢索
Topic Hierarchy Generation Based on Anchor Text and Term-correlation
Advisor: 李漢銘
Hahn-Ming Lee
Committee: 許清琦
Ching-Chi Hsu
Jan-Ming Ho
Cheng-Seen Ho
Yuh-Jye Lee
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2005
Graduation Academic Year: 93
Language: 英文
Pages: 51
Keywords (in Chinese): 網頁目錄搜尋關鍵字超文字搜尋引擎
Keywords (in other languages): Topic Directory Query, Term-correlation, Anchor Text, Search Engine
  • 隨著網際網路的蓬勃發展,資訊越來越多元化,許多使用者借由網路資訊去取得新技術資料及課程內容。透過搜尋引擎提供網頁目錄查詢服務,並幫助使用者在很短的時間對此技術包含那些相關之子技術有所了解,即便成了一個重要服務。

    As Internet booms prosperously, there is various information available for user to obtain, such as new technique information and course contents for instance. It has become an important task to provide "Topic Directory Query" Service in order to help users understanding relevant subtopics of their interested techniques within a short period of time.
    In this thesis, we propose an approach that utilizes Anchor Text and Term-correlation technique to construct and generate topic hierarchy, in order to facilitate users search effectively and efficiently the scope of their interested topics that differs from manually constructed topic hierarchy, such as Open Directory Project or Yahoo Web Directory for instance. In our experiment analysis results, our proposed approach was proved to be effective in searching relevant hierarchical subtopics, especially those topics that cannot be found from manually constructed topic directory search engine mentioned previously but can be found in our system. Therefore, with regard to "Topic Directory Query" Service, there are still many issues need to be resolved, such as precision rate enhancement and new topic detection.
    However, we still hope that learning new techniques for everyone will never be a troublesome problem. Furthermore, by promoting the concept of topic hierarchy generation, we hope the issues mentioned previously can be researched continuously.

    Content Abstract II Acknowledgements IV Content VI List of Figures VIII List of Tables IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Challenges 3 1.2.1 Definition of Query Scope 3 1.2.2 Hierarchical Structure Generator Issues 3 1.3 Our Goal and Design 4 1.4 Outlines 4 Chapter 2 Background 5 2.1 Basic Definition 5 2.2 Introduction of Search Engines 8 2.2.1 Google 8 2.2.2 Web Crawler 11 2.2.3 IBM Focused Crawler 12 Chapter 3 System Architecture 14 3.1 Concept of Hierarchical Structure Generator 15 3.2 Architecture of Hierarchical Structure Generator 16 3.2.1 Crawler Agent 18 3.2.2 Data Preprocessing Unit 21 3.2.3 Noisy Terms Finder 23 3.2.4 Candidate Terms Finder 26 3.2.5 Correlation Analysis Unit 27 3.2.6 Structure Generator 28 3.27 Interface Agent 31 3.3 Hierarchical Structure Generator (HSG) Program 32 Chapter 4 Experiment 35 4.1 Characteristics of Experimental Datasets 35 4.2 Criteria Evaluation 36 4.3 Experimental Results 36 4.4 Discussion 41 4.4.1 Characteristics of our proposed method 41 4.4.2 Limitations of our proposed method 41 Chapter 5 Conclusion 43 5.1 Conclusion 43 5.2 Future Work 43

