Basic Search / Detailed Display

Author: 金冠辰
Kuan-Chen Chin
Thesis Title: 使用平滑支撐向量機達到匯總式搜尋引擎個人化排序之目的
Personalized Ranking for Meta-Search Engine by Using SSVM
Advisor: 李育杰
Yuh-Jye Lee
Committee: 張源俊
Yuan-Chin Chang
李漢銘
Hahn-Ming Lee
何正信
Cheng-Seen Ho
林智仁
Chih-Jen Lin
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2005
Graduation Academic Year: 93
Language: 英文
Pages: 67
Keywords (in Chinese): 個人化排序網路資訊檢索匯總式搜尋引擎文件分類平滑支撐向量機
Keywords (in other languages): personalized ranking, web information retrieval, meta-search engine, text classification, smooth support vector machines
Reference times: Clicks: 280Downloads: 2
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

隨著全球資訊網 (World Wide Web) 的迅速發展,存在於網路上的資訊大量增加。當使用者在網路上搜尋資料時,搜尋引擎 (search engine) 該如何將回傳的網頁依使用者的搜尋目的做排序 (ranking),已經成為一個重要的研究主題。現今的搜尋引擎是被建立來服務所有的使用者,故搜尋結果通常是依大眾的興趣來做排序,再加上現今的搜尋引擎回復給所有使用者相同的結果,因此搜尋結果往往無法滿足所有的使用者。在這篇論文中,我們建立了一個可供個人化的匯總式搜尋引擎 (personalized meta-search engine) 以提供使用者將搜尋結果依照其個人興趣做排序。假設使用者會依自己喜好將文章儲存於特定資料夾中,我們利用文件分類 (text classification) 的技術與平滑支撐向量機 (smooth support vector machine) 來從這些特定資料夾的文件中萃取出關於使用者興趣的模型描述 (user profile)。

由於匯總式搜尋引擎擁有相對於一般搜尋引擎較多的資訊含量,在我們的系統中我們將匯總式搜尋引擎當作一個資訊蒐集器 (data collector),專門幫我們蒐集網頁資料。對於搜尋回來的網頁,我們提出了兩種非個人化的匯總排序方式(meta-ranking method) 來將搜尋結果呈現給使用者。在個人化排序方面,我們將搜尋回來的網頁依照與使用者檔案的相似度以及搜尋引擎喜好程度機制 (search engine preference mechanism) 重新做排序,來達到個人化排序之目的。在我們的實驗中,我們模擬了6個情境來測試我們系統個人化排序的效能。而我們的實驗結果顯示,我們成功的建立了一個方法,將搜尋回來的網頁依使用者個人的興趣做排序,使得使用者能於前幾筆回傳的網頁中,得到所需的資訊。


With the fast growth of the World Wide Web, the amount of information on the Web has become overwhelming. Since current search engines are built to serve all users, the search results are usually ranked based on the public interests of the users. Furthermore, current search engines always provide the same results for all users no matter which field the user belongs to. Hence, some users have to browse the search results laboriously to find out the desired web pages. In this thesis, we build up a personalized meta-search engine (PMSE) which allows users to rank web pages according to their personal interests. A user's interests are represented in a user profile which can be learned from the documents that are stored in the personal computer. Due to the superior performance of support vector machines (SVMs), smooth support vector machine and several other text-classification techniques are applied to obtain user profiles.

We take advantage of meta-search engine's better coverage of the Web to collect a wide variety of web pages. Meta-search engine works as a data collector, and two basic meta-ranking algorithms are proposed in our system. For personalized ranking, we re-rank the collected web pages by consulting with the user profile and the search engine preference mechanism (SEP). We simulate 6 scenarios in order to evaluate the performance of our system, which conventional search engines could not provide satisfied search results for the users. Our experimental results indicate that we successfully provide a way for the users to rank the web pages according to their interests, and also show that the personalized ranking is worth pursuing further.

1. Introduction 1.1 Web Information Retrieval 1.2 Conventional Search Services 1.3 Ranking Problem 1.4 Organization of Thesis 2. Text Classification 2.1 Text Representation 2.2 Feature Selection 2.3 Term Weighting 2.4 Support Vector Machines 2.5 Multi-Class Classification 2.6 Performance Measures 3. Framework of Personalized Meta-Search Engine 3.1 Meta-Search Engine 3.2 Personalized Filter 3.3 Off-Line Training 4. Experiments 4.1 Experimental Setup 4.2 Numerical Results 5. Conclusion

[1] E. J. Bredensteiner and K. P. Bennett. Multicategory classification by support vector machines. Computational Optimization and Applications, 12:53-79, 1999.

[2] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1:7):107-117, 1998.

[3] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167, 1998.

[4] William B. Cavnar and John M. Trenkle. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 161-175, Las Vegas, US, 1994.

[5] Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 1991.

[6] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, 2000.

[7] C. J. Crouch. A cluster-based approach to thesaurus construction. In SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages 309{320, New York, NY, USA, 1988.
ACM Press.

[8] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems -9-, pages 155-161, Cambridge, MA, 1997.
MIT Press.

[9] S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In In Proceeding of CIKM-98, 7th ACM International Conference on Information and Knowledge Management, pages 148-155, Bethesda, MD, 1998.

[10] Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Scott Deerwester, and Richard Harshman. Using latent semantic analysis to improve access to textual information. In Proceedings of the Conference on Human Factors in Computing Systems CHI'88, 1988.

[11] J. Fagan. Automatic phrase indexing for document retrieval. In SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval, pages 91-101, New York, NY, USA, 1987. ACM Press.

[12] P. W. Foltz. Using latent semantic indexing for information filtering. In Proceedings of the conference on Office information systems, pages 40-47, New York, NY, USA, 1990. ACM Press.

[13] S. R. Gunn. Support vector machines for classification and regression. Technical report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1997.

[14] L. Huang. A survey on web information retrieval technologies. Technical report, ECSL, 2000.

[15] Bernard J. Jansen, Amanda Spink, Judy Bateman, and Tefko Saracevic. Real life information retrieval: a study of user queries on the web. SIGIR Forum, 32(1):5-17, 1998.

[16] T. Joachims. Learning to classify text using support vector machine. Kluwer Academic Publishers, Bostons, 2001.

[17] T. Joachims. Optimizing search engines using clickthrough data, 2002.

[18] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28:11-20, 1972.

[19] Gata Lapedriza Jordi. Open n-grams and discriminant features in text world: An empirical study., 2004.

[20] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA '98: Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pages 668-677, Philadelphia, PA, USA, 1998. Society for Industrial and Applied Mathematics.

[21] U. Kressel. Pairwise classification and support vector machines. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 255-268, Cambridge, MA, 1999. MIT Press.

[22] Y.-J. Lee, W.-F. Hsieh, and C.-M. Huang. SSVR: A Smooth Support Vector Machine for insensitive Regression. IEEE Transactions on Knowledge and Data Engineering, 17(5):678-685, 2005.

[23] Y.-J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Computational Optimization and Applications, 20:5-22, 2001. Data Mining Institute, University of Wisconsin, Technical Report 99-03. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps.

[24] Y.-J. Lee, O. L. Mangasarian, and W. H. Wolberg. Survival-time classification of breast cancer patients. Technical Report 01-03, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, March 2001. Computational Optimization and Applications, to appear.
ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/01-03.ps.

[25] D. D. Lewis. An evaluation of phrasal and clustered representations on a text categorization task. In In Proceeding of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval, pages 37-50, Copenhagen,
Denmark, 1992.

[26] D. D. Lewis and W. B. Croft. Term clustering of syntactic phrases. In SIGIR '90: Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 385{404, New York, NY, USA, 1990. ACM Press.

[27] Fang Liu, Clement Yu, and Weiyi Meng. Personalized web search for improving retrieval effectiveness. IEEE Transactions on Knowledge and Data Engineering, 16(1):28-40, 2004.

[28] O. L. Mangasarian. Mathematical programming in neural networks. ORSA Journal on Computing, 5(4):349-360, 1993.

[29] E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application to face detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 130-136, 1997.

[30] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.

[31] M. Porter. An algorithm for su±x stripping. Program Automated Library and Information Systems, 14(1):130-137, 1980.

[32] G. Salton. The smart Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, New Jersey, 1971.

[33] G. Salton. Developments in Automatic Text Retrieval. Science, 253:974-980, August 1991.

[34] Gerard Salton and Chris Buckley. Term weighting approaches in automatic text retrieval. Technical report, Cornell University, 1987.

[35] Gerard Salton and Michael J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA, 1986.

[36] Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In SIGIR '95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 229-237, New York, NY, USA, 1995. ACM Press.

[37] Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6-12, 1999.

[38] A. J. Smola and B. Scholkopf. A tutorial on support vector regression. Technical report, Produced as part of the ESPRIT Working Group in Neural and Computational Learning II, NeuroCOLT2 27150, October 1998.

[39] V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer, New York, 1982.

[40] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.

[41] V. N. Vapnik. Statistical Learning Theory. Wiley Interscience, 1998.

[42] J. Weston and C. Watkins. Support vector machines for multiclass pattern recognition. In Proceedings of the Seventh European Symposium On Artificial Neural Networks, 4 1999.

[43] Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning, pages 412-420, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.

QR CODE