簡易檢索 / 詳目顯示

研究生: 王政翰
Cheng-Han Wang
論文名稱: 分散式文件索引系統中快取管理之新技巧
Cache Management in Distributed Cache-based Information Retrieval Systems
指導教授: 陳秋華
Chyouhwa Chen
口試委員: 馮輝文
Huei-Wen Ferng
楊凱翔
Kai-Hsiang Yang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 38
中文關鍵詞: 資訊檢索查詢驅動點對點網路延展網路關鍵字搜尋快取全文索引
外文關鍵詞: query driven index, DHT, overlay network
相關次數: 點閱:213下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 介紹在結構化延展網路(structured overlay network)下一種以查詢趨動(query driven)為基礎的索引模型QDI(Query Driven Index),以及其模型本身的一些問題並提出一系列的改善方法來增進模型的效率。
    在以快取為基礎的索引技術中主要快取儲存的是使用者索引的結果,許多的實驗證實這是一種相當有效率的資訊檢索作法。然而之前的作法皆存在一些缺點,我們提出了一系列的改進方法在這種以快取為基礎的索引方案,同時顯示這些改進是有效率的,而且這些觀念在分散式資訊檢索系統的環境下是重要的。


    Cache-based indexing based on caching query results has been shown to be an effective information retrieval paradigm. However, previous proposals are flawed in a number of ways. We propose a number of improvements to the cache-based indexing scheme and show that the improvements are effective and important in a distributed IR environment.

    摘要................................................................................................................................................................I ABSTRACT ...............................................................................................................................................III 誌謝.............................................................................................................................................................IV TABLE OF CONTENTS............................................................................................................................V LIST OF FIGURES................................................................................................................................. VII 1 INTRODUCTION.............................................................................................................................. 1 2 RELATED WORKS.......................................................................................................................... 3 2.1 DHT NETWORK........................................................................................................................... 3 2.2 INVERTED INDEX.......................................................................................................................... 4 2.3 QDI MODEL................................................................................................................................. 5 2.3.1 Truncated inverted index ........................................................................................................ 6 2.3.2 Cache posting lists for popular subkeys ................................................................................. 6 2.3.3 Key activation........................................................................................................................ 6 2.3.4 ONM( Opportunistic Notification Mechanism) ...................................................................... 6 2.3.5 Discriminating keys ................................................................................................................ 8 2.3.6 Retrieval model....................................................................................................................... 8 2.4 PROBLEMS OF QDI MODEL......................................................................................................... 10 2.4.1 Problem with Active Key Identification ................................................................................ 10 2.4.2 Problem with Active Key activation overhead...................................................................... 11 2.4.3 Problem with Long query processing ................................................................................... 12 3 DISTRIBUTED INDEXING AND RETRIEVAL ARCHITECTURE........................................ 13 3.1 A NEW QUERY-DRIVEN INDEX AND RETRIEVAL SCHEME.......................................................... 13 3.2 A NEW QUERY-DRIVEN KEY ACTIVATION METHOD ................................................................. 14 3.3 A MODIFIED TRADITIONAL NOTIFICATION METHOD ................................................................. 16 3.4 A MODIFIED QUERY METHOD FOR LONGER QUERY KEY ........................................................... 19 4 DISTRIBUTED INDEXING AND RETRIEVAL ALGORITHM............................................... 21 4.1 DOCUMENT PUBLISH .................................................................................................................. 21 4.2 DOCUMENT RETRIEVAL ............................................................................................................. 22 5 PERFORMANCE EVALUATION................................................................................................. 24 5.1 PUBLISH COST ............................................................................................................................ 24 5.2 ACTIVATION COST ..................................................................................................................... 25 5.3 RETRIEVAL COST ....................................................................................................................... 27 5.4 NUMBER OF KEY ACTIVATED ..................................................................................................... 31 5.5 ACCURACY EVALUATION ........................................................................................................... 33 6 CONCLUSIONS AND FUTURE WORK...................................................................................... 36 6.1 CONCLUSIONS ............................................................................................................................ 36 6.2 FUTURE WORK ........................................................................................................................... 36 REFERENCES.......................................................................................................................................... 37

    1. The impact of caching on search engines, Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock, Vassilis Plachouras, Fabrizio Silvestri, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007
    2. Efficient Peer-To-Peer Searches Using Result-Caching, Bhattacharjee, Chawathe, Gopalakrishnan, Keleher and Silaghi, IPTPS, 2003
    3. Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. B. Cahoon, K. S. McKinley, and Z. Lu. ACM Transactions on Information Systems, 18(1): 1–43, January 2000
    4. Efficient top-k query calculation in distributed networks. P. Cao and Z. Wang. PODC.,2004
    5. Analysis of the query logs of a web site search engine, Michael Chau , Xiao Fang , Olivia R. Liu Sheng, Journal of the American Society for Information Science and Technology, v.56 n.13, p.1363-1376, November 2005
    6. Brighten Godfrey, Karthik Lakshminarayanan, Sonesh Surana (University of California at Berkeley), Richard Karp, Ion Stoica, “Load Balancing in Dynamic Structured P2P Systems,” INFOCOM 04, also in Performance Evaluation: Special Issue on Performance Modeling and Evaluation of Peer-to-Peer Computing Systems, 2005
    7. Using controlled query generation to evaluate blind relevance feedback algorithms, Qigang Gao Jordan, C. Watters, C. , Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries , talk
    8. Xin Li, Young-Jin Kim, Ramesh Govindan, and Wei Hong, “Multi-Dimensional Range Queries in Sensor Networks, “ Sensys'03
    9. Three-level caching for efficient query processing in large Web search engines, Xiaohui Long , Torsten Suel, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
    10. Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval,G. Skobeltsyn, T. Luu, I. Podnar Zarko, M. Rajman and K. Aberer: Infoscale: the Second International Conference on Scalable Information Systems, Suzhou, China, June 6-8, 2007, talk
    11. Beyond term indexing: A P2P framework for Web information retrieval, I. Podnar, M. Rajman, T. Luu, F. Klemm, K. Aberer, Informatica, vol. 30, no. 2, pp. 153-161, Special Issue on Specialised Web Search, eds. Stephan Bloehdorn, Wray Buntine and Andreas Hotho, 2006.
    12. Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys, Ivana Podnar, Martin Rajman, Toan Luu, Fabius Klemm, Karl Aberer, Proceedings of the 23rd International Conference on Data Engineering 2007
    13. Efficient Peer-to-Peer Keyword Searching, Patrick Reynolds and Amin Vahdat. In Proceedings of the International Middleware Conference, Rio de Janeiro, Brazil, June 2003
    14. Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, and Ion Stoica, “Load Balancing in Structured P2P Systems,” IPTPS, Berkeley, CA, February 2003.
    15. Web Text Retrieval with a P2P Query-Driven Index, G. Skobeltsyn, T. Luu, I. Podnar Zarko, M. Rajman and K. Aberer, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, talk
    16. Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks, Gleb Skobeltsyn , Karl Aberer, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, USA
    17. Benno Stein and Martin Potthast, “Applying Hash-based Indexing in Text-based Information Retrieval,” In 7th Dutch-Belgian Information Retrieval Workshop (DIR 2007), pages 29-35, March 2007
    18. In Search of Reliable Retrieval Experiments William Webber and Alistair Moffat, The Tenth Australasian Document Computing Symposium (ADCS 2005)
    19. "Performance of full text search in structured and unstructured peer-to-peer systems", Y. Yang, R. Dunlap, M. Rexroad, and B.F. Cooper, INFOCOM 2006, April 2006
    20. "Proof: A DHT-based Peer-to-Peer Search Engine," Kai-Hsiang Yang, and Jan-Ming Ho, IEICE TRANS. COMMUN., VOL.E90–B, NO.4 APRIL 2007

    無法下載圖示 全文公開日期 2013/07/31 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE