簡易檢索 / 詳目顯示

研究生: 王文祺
Wen-chi Wang
論文名稱: 應用文件重排序與局部查詢擴展於中文文件檢索之研究
Improving Retrieval Effectiveness by Document Re-ranking and Local Expansion
指導教授: 林伯慎
Bor-shen Lin
口試委員: 羅乃維
Nai-wei Lo
王新民
Hsin-min Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 57
中文關鍵詞: 資訊檢索局部查詢擴展查詢偏移文件重排序
外文關鍵詞: Information Retrieval, Local Expansion, Query Drift, Document Re-ranking
相關次數: 點閱:167下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在資訊檢索領域中,查詢擴展(Query Expansion)已成為提升檢索系統效能的重要技術之ㄧ。而查詢擴展的技術中又以局部查詢擴展(Local Expansion)對於檢索效能的提昇最為顯著。局部查詢擴展是分析初始檢索結果的前幾篇文件,從中選出擴展詞,但此方法有一項缺點,就是當這些文件中與查詢句不相關的文件佔較多數時,所選出的擴展詞可能與查詢句不相關,若把這些擴展詞加入查詢句做檢索,會造成查詢偏移(Query Drift)的情形,降低檢索效能。
因此在本論文中針對局部查詢擴展的缺點做改進,使用文件重排序(Document Re-ranking)的方法,在作擴展前先將初始檢索輸出的文件排名結果重新排序,將與查詢相關的文件盡量往前排序,以提升局部查詢擴展的精確性。本論文應用的三種文件重排序法,實驗結果顯示其皆能有效的提昇檢索效能。且我們更進一步地探討這三種重排序法間的互補性,將其作結合,實驗結果顯示重排序法間的結合能更有效的提昇檢索效能,將檢索的平均精確率(MAP)從0.3727提升至0.3956,前10篇文件的精確率(P@10)從0.4929提升至0.5595。


Query Expansion has been an important technique for improving the retrieval effectiveness in text retrieval, while local expansion, through analyzing the top ranking documents and selecting expanded words from them, achieves the most significant improvements in query expansion. However, for local expansion usually quite a few of the top ranking documents are not relevant to the query, neither will be those expanded words. This inevitably leads to query drift and degrades the retrieval performance. Document re-ranking, which re-ranks the top ranking documents to achieve higher precision before local expansion, is therefore researched to alleviate such problem.
In this paper, we propose three approaches of document re-ranking based on concept query expansion, document clustering and local link statistics, respectively. Experimental results show that retrieval performance can be improved for all approaches, while the one based on local link statistics performs the best. In addition, it is founded that these approaches are complementary one another such that the retrieval performance can further advance through combing all. The combined approach can improve the mean precision from 0.3727 to 0.3956, and the precision at top ten documents from 0.4929 to 0.5595, for the Chinese NTCIR-3 corpus.

目 錄 第一章 序論 1 1.1背景與研究動機 . . . . . . . . . . . . . . . . . . 1 1.2論文目的與成果簡介 . . . . . . . . . . . . . . . . 3 1.3論文組織與架構 . . . . . . . . . . . . . . . . . . 4 第二章 文獻回顧 5 2.1 查詢擴展 . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 全域查詢擴展 . . . . . . . . . . . . . . . . 5 2.1.2 局部查詢擴展 . . . . . . . . . . . . . . . . 7 2.2 文件重排序 . . . . . . . . . . . . . . . . . . . . 8 2.3 本章摘要 . . . . . . . . . . . . . . . . . . . . . 10 第三章 研究方法 11 3.1 實驗語料與評估準則 . . . . . . . . . . . . . . . . 11 3.1.1 實驗語料介紹 . . . . . . . . . . . . . . . . 11 3.1.2 實驗評估準則 . . . . . . . . . . . . . . . . 15 3.2 系統架構圖 . . . . . . . . . . . . . . . . . . . . 17 3.3 檢索模型介紹 . . . . . . . . . . . . . . . . . . . 20 3.4 局部查詢擴展 . . . . . . . . . . . . . . . . . . . 23 3.5 基礎系統檢索結果 . . . . . . . . . . . . . . . . . 24 3.6 本章摘要 . . . . . . . . . . . . . . . . . . . . . 28 第四章 文件重排序 29 4.1 應用概念式查詢於文件重排序 . . . . . . . . . . . . 31 4.2 應用文件分群於文件重排序 . . . . . . . . . . . . . 36 4.3 應用查詢關鍵詞間的局部鏈結於文件重排序 . . . . . . 39 4.4 三種文件重排序方法之比較 . . . . . . . . . . . . . 43 4.5 文件重排序方法間的結合 . . . . . . . . . . . . . . 45 4.6 本章摘要 . . . . . . . . . . . . . . . . . . . . . 47 第五章 查詢擴展詞選取方法深入探討 49 5.1 動態擴展查詢詞 . . . . . . . . . . . . . . . . . . 49 5.2 查詢擴展詞過濾 . . . . . . . . . . . . . . . . . . 51 5.3 本章摘要 . . . . . . . . . . . . . . . . . . . . . 55 第六章 結論和未來研究方向 56 6.1 結論 . . . . . . . . . . . . . . . . . . . . . . . 56 6.2 未來研究方向 . . . . . . . . . . . . . . . . . . . 57

參考文獻

[1] 陳光華,“資訊檢索的績效評估”, 2004年現代資訊組織與檢索研討會.

[2] S. E. Roberson and S. Walker, “Some simple effective approximations to the 2-Possion model for probabilistic weighted retrieval”, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, 1994.

[3] J.J. Jr. Rocchio, “Relevance feedback in information retrieval”, In the Smart system – experiments in automatic document processing, 1971, 313-323.
Englewood Cliffs, NJ:Prentice Hall Inc.

[4] Gerard Salton and Chris Buckley, “Improving retrieval performance by relevance feedback”, Journal of the American Society for Information Science.41(4) : 1990, 288-297.

[5] Y. Qiu and H. P. Frei, “Concept based query expansion”, Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp.160-169

[6] J. B. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probabilit, Berkeley, University of California Press, 1967, 1:281- 297

[7]L.P. Yang and D.H. Ji., “Chinese information retrieval based on terms and
relevant terms”, ACM Transactions on Asian Language Information Processing.
2005, Vol.4, Issue 3 . pp.357-374

[8]http://ckipsvr.iis.sinica.edu.tw/

[9] E.M. Voorhees, “Query expansion using lexical-semantic relations”, in Proceedings of the 17th ACM-SIGIR Conference, 1994, pp.61-69.

[10]A.F. Smeaton and C. Berrut, “Thresholding postings lists, query expansion by word-word distances and POS tagging of Spanish text”, in Proceedings of The Fourth Text Retrieval Conference, 1996.

[11]G. Salton, “The SMART Retrieval System Experiments in Automatic Document
Processing”, Prentice-Hall,1971.

[12]G.A. Miller, R. Beckwith, C. Felbaum, D. Gross, and K. Miller, “Introduction to WordNet: An On-line Lexical Database”, Revised version 1993.

[13]E.M. Voorhees, “Using WordNet to disambiguate word senses for text retrieval”, in Proceedings of the 16th ACM-SIGIR Conference, 1993, pp.171-180.

[14]J. Crouch. Carolyn, “An approach to the automatic construction of global
thesauri”, Information Processing and Management, 1990, Vol.26, No.5, pp.629-640.

[15]J. Xu, W.B. Croft, “Query expansion using local and global document analysis”, Proceeding of the 19th annual international ACM SIGIR conference on research and development in information retrieval, 1996, pp.4-11.

[16]Yuen-Hsien Tseng, “Fast Co-occurrence Thesaurus Construction for Chinese
News”, Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference, 2001 IEEE International Workshop on Natural Language processing and Knowledge Engineering (NLPKE 2001) in conjunction with the IEEE International Conference on Systems, Man, and Cybernetics SMC' 2001 Tucson, Arizona, USA, October 7-10, 2001, pp.853-858.

[17]Yuen-Hsien Tseng, Da-Wei Juang and, Shiu-Han Chen., “Global and Local
Expansion Term Expansion for Text Retrieval”, Proceedings of the Fourth
NTCIR Workshop on Evaluation of Information Retrieval, Automatic Text Summarization and Question Answering, June 2-4,2004,Tokyo,Japan.

[18]G. Salton, “Automatic Text Processing: The Transformation, Analysis, and
Retrieval of Information by Computer”,Addison-Wesley,1989.

[19]Yuen-Hsien Tseng, Yu-Chin Tsai, and Chi-Jen Lin., “Comparison of Global Term Expansion Methods for Text Retrieval”, Proceedings of NTCIR-5 Workshop
Meeting, Deceber 6-9,2005,Tokyo,Japan.
[20]K.S. Lee, Y.C. Park, and K.S Choi., “Document Re-ranking Model Using
Clusters”, Information Processing & Management, v37 n1 p1-14, Jan 2001.

[21]G. Grefenstette., “Use of syntactic context to produce term association lists for text retrieval”, in Proceedings of the 15th ACM SIGIR Conference, 1997, pp.33-64.

[22]Y. Jing and B. Croft., “An association thesaurus for information retrieval”, In Proceedings of RIAO, pp.146-160,1994.

[23]R. Mandala, T. Tokunaga, and H. Tanaka, “Combining multiple evidence from
difference types of thesaurus for query expansion”, in Proceedings of the 22nd ACM SIGIR conference on research and development in information retrieval, 1999, pp.191-197.

[24]Martin Franz, Salim Roukos., “A method for scoring correlated features in query expansion”, in Proceedings of the 21st annual international ACM SIGIR
conference on Research and development in information retrieval, 1998, pp.337-338.

[25]D. Harman,“Relevance feedback revisited”, in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, New York, 1992.

[26] M. Zhang, R. Song, C. Lin, S. Ma, Z. Jiang, Y. Liu, et al., “Expansion-
based technologies in finding relevant and new information”, Paper presented at the TERC.

[27]J. Xu, W.B. Croft, “Improving the Effectiveness of Information Retrieval with Local Context Analysis”, ACM Transactions on Information Systems, 2000,
18(1):79—112.

[28]M. Mitra., A. Singhal. and C. Buckley., “Improving Automatic Query Expansion”, In Proc. ACM SIGIR’98.

[29]Y.L. Qu, G.W. Xu, J. Wang., “Rerank Method Based on Individual Thesaurus”, Proceedings of NTCIR2 Workshop.

[30]J. Kamps, “Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary”, The 21th European Conference on Information Retrieval, 2004.

[31]Yang Lingpeng, Ji Donghong, TangLi., “Document Re-ranking Based on Automatically Acquired Key Terms in Chinese Information Retrieval”, in Proceedings of the COLING'2004, pp. 480-486.

[32]R.W.P. Luk, K.F. Wong, “Pseudo-Relevance Feedback and Title Re-ranking for Chinese IR”, in Proceedings of NTCIR4 Workshop,2004.

QR CODE