簡易檢索 / 詳目顯示

研究生: 耿少宏
KENG, SHAO-HONG
論文名稱: 應用混合具擾動的萬用演算法為基礎之 K最鄰近及密集插補法於推薦系統之協同過濾
Applying Hybrid Metaheuristic with Perturbation-based K-Nearest-Neighbors and Densest Imputation to Collaborative Filtering in Recommender Systems
指導教授: 陳正綱
郭人介
口試委員: 王孔政
Kung-Jeng Wang
歐陽超
Chao Ou-Yang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 81
中文關鍵詞: 推薦系統協同過濾混合萬用演算法相似度最鄰近值演算法最鄰近插補演算法
外文關鍵詞: Recommender systems, Collaborative filtering, Hybrid metaheuristics, Similarities, K-nearest-neighbors algorithm, K-nearest-neighbors and densest imputation
相關次數: 點閱:196下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於電子商務公司如: Youtube、Amazon、Netflix和其他許多網路服務業的興起,推薦系統得到前所未有的廣泛應用,推薦系統不僅可以增加服務提供者的收入,還可以減少網站上服務使用者的搜尋時間。在推薦系統的演算法中,儘管協同過濾是利用客戶的偏好來找出他們有興趣的商品或服務同時也是最知名的演算法,但是協同過濾仍然存在資料稀疏度以及相似性選擇的問題。
    本研究基於最鄰近值演算法結合最鄰近值插補演算法以及最密集評分使用者,提出一最鄰近密集值插補演算法 (KDIKNN),並進一步結合另一提出的相似度融合函數,再應用混合擾動萬用演算法為降低資料稀疏度的影響以及找出相似度的權重進而優化結果。
    本研究之實驗使用三組評分資料集驗證提出的演算法,並使用mean square error (MAE)評比指標衡量所提出演算法的結果。根據實驗結果證實,相較於基本的KNN、KDI-KNN以及基於萬用演算法KDI-KNN,基於擾動的混合萬用演算法KDI-KNN能獲得較優異表現。此外,本研究將針對真實數據-基金交易資料集進行個案研究分析,分析結果顯示,資料集的內容會嚴重影響到相似度在推薦系統中的表現。


    The applications of recommender systems have been adopted more broadly than ever before since the rise of E-commerce companies such as Youtube, Amazon, Netflix, and many other web services. Recommender systems not only can increase the revenue for service providers but also reduce the search time for service receivers on the website. Although collaborative filtering is the most well-known approach which utilizes customer’s preference to discover their interest, the problems data sparsity and similarities selection still exist in it.
    A Hybrid Metaheuristics with Perturbation-based K-Nearest-Neighbors and Densest imputation for collaborative filtering (KDI-KNN) algorithm is proposed by combing the KNN imputation with densest users in the dataset to alleviate the effects of data sparsity. A similarities union function is proposed which adopts hybrid metaheuristics with perturbation in order to determine the fittest similarity and enhance the prediction performance.
    This study conducts three rating datasets to validate the proposed algorithms and the performances of algorithms are measured by mean square error (MAE) indicator. The experimental results indicate that hybrid metaheuristics with perturbation based KDI-KNN algorithms are superior to basic KNN, original KDI-KNN, and most of single metaheuristic based KDI-KNN. In addition, a real-world dataset, fund transaction dataset is adopted in the case study. The analysis reveals that the similarity is affected seriously by the different content of the dataset.

    摘要 I ABSTRACT II 致謝 III CONTENTS IV LIST OF TABLES VI LIST OF FIGURES VII CHAPTER 1 INTRODUCTION 1 1.1 Research Background and Motivation 1 1.2 Research Objectives 2 1.3 Research Scope and Constraints 2 1.4 Thesis Organization 2 CHAPTER 2 LITERATURE REVIEW 4 2.1 Recommender systems 4 2.1.1 Content-based filtering 6 2.1.2 Collaborative filtering 7 2.1.3 Summary of recommender system 9 2.2 Missing data imputation 11 2.2.1 Missing data pattern 11 2.2.2 Imputation methods 12 2.3 Metaheuristics 15 2.3.1 Genetic algorithm (GA) 16 2.3.2 Particle swarm optimization (PSO) algorithm 18 2.3.3 Sine Cosine Algorithm 19 2.3.4 Hybrid metaheuristics 20 CHAPTER 3 METHODOLOGY 22 3.1 Research framework 22 3.2 Data preprocessing 23 3.3 Metaheuristic with perturbation-based KDI-KNN algorithm 25 3.4 Hybrid Metaheuristic-based KDI-KNN algorithm 25 Chapter 4 EXPERIMENTAL RESULTS 30 4.1 Datasets 30 4.2 Performance Measurement 30 4.3 Parameter Setting 31 4.4 Experiment Result 35 4.5 Statistical Hypothesis 43 CHAPTER 5 CASE STUDY 47 5.1. Dataset 48 5.2. Dataset preprocessing 48 5.3. Parameters Setting 50 5.4. Analysis and Statistical Hypothesis 52 Chapter 6 CONCLUSIONS AND FUTURE RESEARCH 60 6.1 Conclusions 60 6.2 Contributions 60 6.3 Future Research 61 REFERENCES 63 APPENDIX 66

    Ahn, H. J. (2008). A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences, 178(1), 37-51.
    Allison, P. (2000). Multiple imputation for missing data: A cautionary tale. Sociological Methods and Research, 28, 301-309.
    Altmayer, L. (2002). Hot-deck imputation a simple DATA step approach. Proceedings of the 2002 Northeast SAS User’s Group, 773-780.
    Batista, G. E., & Monard, M. C. (2002). A Study of K-Nearest Neighbour as an Imputation Method. His, 87(251-260), 48.
    Blum, C., Puchinger, J., Raidl, G. R., & Roli, A. (2011). Hybrid metaheuristics in combinatorial optimization: A survey. Applied Soft Computing, 11(6), 4135-4151.
    Bobadilla, J., Ortega, F., Hernando, A., & Alcalá, J. (2011). Improving collaborative filtering recommender system results and performance using genetic algorithms. Knowledge-Based Systems, 24(8), 1310-1316.
    Bobadilla, J., Serradilla, F., & Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems, 23(6), 520-528.
    Candillier, L., Meyer, F., & Boullé, M. (2007). Comparing state-of-the-art collaborative filtering systems. Paper presented at the International Workshop on Machine Learning and Data Mining in Pattern Recognition.
    Çano, E., & Morisio, M. (2017). Hybrid recommender systems: A systematic literature review. Intelligent Data Analysis, 21(6), 1487-1524.
    Collins, L. M., Schafer, J. L., & Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330-351.
    Cordeau, J. F., Gendreau, M., Laporte, G., Potvin, J. Y., & Semet, F. (2002). A guide to vehicle routing heuristics. Journal of the Operational Research Society, 53(5), 512-522. doi:10.1057/palgrave.jors.2601319
    Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender Systems Handbook (pp. 107-144). Boston, MA: Springer US.
    Donders, A. R. T., Van Der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). A gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10), 1087-1091.
    Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. Paper presented at the MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.
    Engelbrecht, A. P. (2007). Computational intelligence: an introduction: John Wiley & Sons.
    Gan, M., & Jiang, R. (2013). Improving accuracy and diversity of personalized recommendation through power law adjustments of user similarities. Decision Support Systems, 55(3), 811-821.
    Herlocker, J.A. Konstan, & A. Borchers, J. R. (1999). An algorithmic framework for performing collaborative filtering. Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
    Holland, J. H. (1992). Genetic algorithms. Scientific american, 267(1), 66-73.
    Ishibuchi, H., & Murata, T. (1998). A multi-objective genetic local search algorithm and its application to flowshop scheduling. IEEE transactions on systems, man, and cybernetics, part C (applications and reviews), 28(3), 392-403.
    Isinkaye, F. O., Folajimi, Y. O., & Ojokoh, B. A. (2015). Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal, 16(3), 261-273.
    Jamali, M., & Ester, M. (2009). TrustWalker: a random walk model for combining trust-based and item-based recommendation. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France.
    Jerez, J. M., Molina, I., García-Laencina, P. J., Alba, E., Ribelles, N., Martín, M., & Franco, L. (2010). Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial intelligence in medicine, 50(2), 105-115.
    Johnson, D. S., Papadimitriou, C. H., & Yannakakis, M. (1988). How easy is local search? Journal of computer and system sciences, 37(1), 79-100.
    Kaššák, O., Kompan, M., & Bieliková, M. (2016). Personalized hybrid recommendation for group of users: Top-N multimedia recommender. Information Processing & Management, 52(3), 459-477.
    Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science, 3, 57-63.
    Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793): John Wiley & Sons.
    Liu, H., Hu, Z., Mian, A., Tian, H., & Zhu, X. (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems, 56, 156-166.
    Mirjalili, S. (2016). SCA: A Sine Cosine Algorithm for solving optimization problems. Knowledge-Based Systems, 96, 120-133.
    Najafabadi, M. K., Mohamed, A. H., & Mahrin, M. N. r. (2017). A survey on data mining techniques in recommender systems. Soft Computing, 23(2), 627-654.
    Popescul, Pennock, L. H. U. D. M., & Lawrence, S. (2001). Probabilistic models for unified collaborative and content-based. Proceeding UAI'01 Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence 437-444.
    Portugal, I., Alencar, P., & Cowan, D. (2018). The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications, 97, 205-227. doi:10.1016/j.eswa.2017.12.020
    Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J., and Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence. ” Survey Methodology, 27, 85–96.
    Reed, J., Toombs, R., & Barricelli, N. A. (1967). Simulation of biological evolution and machine learning: I. Selection of self-reproducing numeric patterns by data processing machines, effects of hereditary control, mutation type and crossing. Journal of theoretical biology, 17(3), 319-342.
    Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56-59.
    Rubin. (1976). Inference and missing data. Biometrika, 63(3), 581-592.
    Rubin. (2004). Multiple imputation for nonresponse in surveys (Vol. 81): John Wiley & Sons.
    Schafer, J. B., Konstan, J., & Riedl, J. (1999). Recommender systems in e-commerce. Paper presented at the Proceedings of the 1st ACM conference on Electronic commerce.
    Shardanand, P. M. (1994). Social information filtering: algorithms for automating word of mouth. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 210–217.
    Sinha, B. B., & Dhanalakshmi, R. (2019). Evolution of recommender system over the time. Soft Computing, 23(23), 12169-12188. doi:10.1007/s00500-019-04143-8
    Su, X., Khoshgoftaar, T. M., & Greiner, R. (2008, 9-12 Dec. 2008). Imputed Neighborhood Based Collaborative Filtering. Paper presented at the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.
    Ting, T. O., Yang, X.-S., Cheng, S., & Huang, K. (2015). Hybrid Metaheuristic Algorithms: Past, Present, and Future. In X.-S. Yang (Ed.), Recent Advances in Swarm Intelligence and Evolutionary Computation (pp. 71-83). Cham: Springer International Publishing.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., . . . Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.

    無法下載圖示 全文公開日期 2025/06/05 (校內網路)
    全文公開日期 2025/06/05 (校外網路)
    全文公開日期 2025/06/05 (國家圖書館:臺灣博碩士論文系統)
    QR CODE