簡易檢索 / 詳目顯示

研究生: 蔡遠銘
Yuan-Ming Tsai
論文名稱: 分散式文件索引系統實驗設計之研究
Experiment Design for Peer to Peer Information Retrieval Systems Efficiency Evaluation
指導教授: 陳秋華
Chyouhwa Chen
口試委員: 馮輝文
Huei-Wen Ferng
楊凱翔
Kai-Hsiang Yang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 41
中文關鍵詞: 資料檢索文件索引索引搜尋分散式
外文關鍵詞: query, Information Retrieval, IR, DHT, P2P
相關次數: 點閱:238下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 紹如何產生可供實驗的查詢記錄,並且比較產生的查詢記錄和真實的查詢記錄,要求能保留真實資料在各方面的特性。
    考慮到每一個資料集都是不同的,通常不易取得和資料集相符的查詢記錄,或是數量不足使用,而使用不相應的查詢記錄會使實驗執行不出結果,或是執行結果錯誤,並且可能誤導實驗推論,所以需要一種方法來為實驗資料集產生查詢記錄,以利模擬使用,並且可以保證結果的正確性,使模擬的結果有一定的可靠度。


    We consider how to generated query log to simulate. And generated query log compare to real query log. We request that it need retain the real data property.
    Every data set is different. Usually, it is difficult that get query log match your data set. Or number of query log is not enough. If we used not match query log, we can’t get correct results. Then the error result maybe generated erroneous judgments. So we need a method to generated query log for our data set. Suggest our simulation and need to guarantee the result is correctly. Let experiment results is reliable.

    摘要................................................................................................................................................................I ABSTRACT...............................................................................................................................................III 誌謝.............................................................................................................................................................IV TABLE OF CONTENTS............................................................................................................................V LIST OF FIGURES................................................................................................................................VIII LIST OF TABLES.....................................................................................................................................IX 1 INTRODUCTION..............................................................................................................................1 1.1 MOTIVATION...............................................................................................................................1 2 RELATED WORKS..........................................................................................................................2 2.1 INFORMATION RETRIEVAL...........................................................................................................2 2.1.1 Inverted list............................................................................................................................2 2.1.2 Local Index............................................................................................................................3 2.1.3 Global Index..........................................................................................................................3 2.2 QUERY-DRIVEN INDEXING SEARCH ENGINE.................................................................................4 2.2.1 Activated................................................................................................................................4 2.2.2 QDI model.............................................................................................................................4 2.3 RV SEARCH ENGINE.....................................................................................................................5 2.3.1 Bloom Filter...........................................................................................................................5 2.3.2 RV search model.....................................................................................................................6 3 GENERATE QUERY AND QUERY POOL....................................................................................7 3.1 QUERY SIZE.................................................................................................................................8 3.1.1 Normal distribution................................................................................................................8 3.1.2 Pareto distribution..................................................................................................................9 3.2 TAKE QUERY TERM....................................................................................................................11 3.2.1 Problem...............................................................................................................................11 3.3 QUERY POOL.............................................................................................................................13 3.4 QUERY STREAM.........................................................................................................................13 3.5 FINAL PARAMETER.....................................................................................................................13 4 SIMULATION RESULTS...............................................................................................................18 4.1 QUERY TERM FREQUENCY (QTF)...............................................................................................18 4.2 OVERLAP IN QDI........................................................................................................................20 4.2.1 Avg. overlap with query processed.......................................................................................20 4.2.2 Avg. overlap with different documents.................................................................................21 4.2.3 Avg. overlap with different DFmax.......................................................................................23 4.3 PERFORMANCE OF ANSWERING EACH QUERY IN RV...................................................................23 4.3.1 Bandwidth............................................................................................................................23 4.3.2 Response time.......................................................................................................................24 4.4 PUBLISH COST...........................................................................................................................25 4.5 ARRIVAL RATE FOR BOTH TERM AND QUERIES...........................................................................26 5 DISCUSSIONS AND CONCLUSIONS..........................................................................................28 5.1 CONCLUSION.............................................................................................................................28 5.2 FUTURE WORKS.........................................................................................................................28 5.2.1 Query term correlation.........................................................................................................29 5.2.2 Answer length control...........................................................................................................29 5.2.3 Arrival rate for both term and queries..................................................................................30 REFERENCES..........................................................................................................................................31

    1. G. Skobeltsyn, T. Luu, I. Podnar Zarko, M. Rajman and K. Aberer, “Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval,” Infoscale: the Second International Conference on Scalable Information Systems, Suzhou, China, June 6-8, 2007.
    2. Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock, Vassilis Plachouras, Fabrizio Silvestri, “The impact of caching on search engines,” Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
    3. Y. Yang, R. Dunlap, M. Rexroad, and B.F. Cooper, “Performance of full text search in structured and unstructured peer-to-peer systems,” INFOCOM 2006, April 2006
    4. WT Balke, W Nejdl, W Siberski, U Thaden, “DL meets P2P Distributed Document Retrieval based on Classification and Content,” 9th European Conference on Research and Advanced Technology for Digital Libraries, 2005 – Springer.
    5. B. Cahoon, K. S. McKinley, and Z. Lu, “Evaluating the performance of distributed architectures for information retrieval using a variety of workloads,” ACM Transactions on Information Systems, 18(1): 1–43, January 2000.
    6. Chris Jordan, Carolyn Watters, Qigang Gao, “Using controlled query generation to evaluate blind relevance feedback algorithms,” Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries.
    7. G. Skobeltsyn, T. Luu, I. Podnar Zarko, M. Rajman and K. Aberer, “Web Text Retrieval with a P2P Query-Driven Index,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007.
    8. Bhattacharjee, Chawathe, Gopalakrishnan, Keleher and Silaghi, “Efficient Peer-To-Peer Searches Using Result-Caching,” IPTPS, 2003.
    9. Gleb Skobeltsyn , Karl Aberer, “Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks,” Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, Arlington, Virginia, USA.
    10. Lintao Liu, Kyung Dong Ryu, Kang-Won Lee, “Keyword Fusion to Support Efficient Keyword-based Search in Peer-to-Peer File Sharing,” in Proceedings of the Fourth International Workshop on Global and Peer-to-Peer Computing, Chicago, IL, April, 2004.
    11. Kai-Hsiang Yang, and Jan-Ming Ho, "Proof: A DHT-based Peer-to-Peer Search Engine," IEICE TRANS. COMMUN., VOL.E90–B, NO.4 APRIL 2007.
    12. Patrick Reynolds and Amin Vahdat. “Efficient Peer-to-Peer Keyword Searching,” in Proceedings of the International Middleware Conference, Rio de Janeiro, Brazil, June 2003.
    13. S. Michel, M. Bender, N. Ntarmos, P. Trianta‾llou, G. Weikum, and C. Zimmer, “Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to Improve P2P Routing Indices,” in CIKM, 2006.
    14. B.H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Commun. ACM, vol.13, no.7, pp.422–426, 1970.

    無法下載圖示 全文公開日期 2013/08/01 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE