簡易檢索 / 詳目顯示

研究生: 吉雷曼
Ghilman - Fatih
論文名稱: 運用電影銷售趨勢分群於電影推薦系統之研究
MOVIE SALES PATTERN CLUSTERING FOR RECOMMENDATION SYSTEM
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 郭人介
Ren-Jieh Kuo
林希偉
Shi-Woei Lin
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 92
中文關鍵詞: 電影推薦系統電影票房矩陣因子分解階層式資料分群函數型資料分群方法。
外文關鍵詞: movie recommender system, sales pattern, matrix factorization, hierarchical clustering, functional data clustering
相關次數: 點閱:289下載:36
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在電子商務盛行的今天,利用推薦系統來主動提供與消費者相關或消費者可能有興趣的商品已成為電子商務服務的一環。推薦系統主要利用以物品與使用者所形成的龐大評分矩陣來推估使用者對未知商品的評比。由於評分矩陣通常相當大,因此進行此一矩陣的因子分解計算(matrix factorization)極為費時。本研究,針對電影推薦系統,以資料分群的方法先針對電影依其電影相關資訊及過去銷售狀況進行分群,藉以將原本極大的矩陣計算區分為幾個較小的矩陣計算。此外,電影分群的結果將有助於在推薦電影時更準確推薦消費者有興趣的電影類別。本研究以twitter®的電影評比資料庫及IMDB電影資料庫作為研究的資料集。本研究亦提出一個兩階段的電影分群方法,將電影依影片特性及電影票房進行分群。為有效地處理電影票房變化的時間序列資料,本研究透過正規化及functional data clustering方法將所收集之電影分為11群,並依此分群結果將原本的評分矩陣進行切割。實驗結果發現,透過分群所產生較小的評分矩陣可加速推薦系統運算,並提供較為正確的評比估計。


A recommender system (RS) where consumers are presented with items that are relevant to them obtains a lot of attention on e-commerce. By utilizing consumer’s explicit feedback given to the system, recommendation given can be more accurate. The mathematics behind RS is using a matrix sized number of users multiplies number of items available. Calculating this very big matrix is exhaustive and inefficient. In this research, the concept of divide-and-conquer were borrowed by clustering items into several groups for enhancing the matrix computation in RS. Twitter’s user movie rating data was used to generate the matrix and IMDb movie data was used for clustering the movies. Two-step clustering was proposed to first cluster the movies based on its internal attributes. The second step is clustering movies by sales pattern of each movie. When clustering movies by sales pattern, the duration of a movie shown in theater can be considered as a product life. For better clustering time-series sales pattern, the discrete sales information was transformed into functional data. The functional data clustering was performed and the accuracy, computation time and recommendation given by traditional RS and our pre-cluster RS are compared. We found by clustering the items before doing matrix factorization, the accuracy of the predicted rating is better and computation time is faster. Moreover, the recommendation given is also based on the combination of latent features and items similarity.

摘 要 i ABSTRACT ii CONTENTS iii LIST OF FIGURES vi LIST OF TABLES ix CHAPTER 1 Introduction 1 1.1. Background 1 1.2. Problem Definition 4 1.3. Objectives 6 1.4. Research Scope 6 1.5. Thesis Outline 6 CHAPTER 2 Literature Review 8 2.1. Recommendation System 8 2.1.1. The Purpose of a Recommender System 10 2.1.2. Content-Based filtering 12 2.1.3. Collaborative filtering 13 2.1.4. User Feedback 14 2.1.5. Neighborhood-Based CF 15 2.1.6. Model-Based CF 18 2.2. Hierarchical Clustering (HC) 19 2.2.1. Distance Matrix 19 2.2.2. Linkage Methods 20 2.2.3. Cluster Evaluation 21 CHAPTER 3 Research Methodology 23 3.1. Overall Methodology 23 3.2. Hierarchical Clustering on Movie Internal Attributes 25 3.3. Functional Data 26 3.3.1. Preprocessing Functional Data 26 3.3.2. Functional Data Clustering (FDC) 27 3.3.3. Functional Data Clustering (FDC) on Sales Data 29 3.4. Regularized Incremental Simultaneous Matrix Factorization 30 3.5. Numerical Example on RISMF 34 3.6. Recommendation from R: 35 CHAPTER 4 Experimental Result 39 4.1. Datasets 39 4.1.1. Twitter Dataset 39 4.1.2. Internet Movie Database (IMDb) Datasets 41 4.1.3. Data Cleaning 43 4.2. Clustering IMDb Movies on Internal Attributes 48 4.3. Functional Data Clustering on IMDb Internal Attributes 51 4.4. RISMF on Original Matrix 61 4.5. RISMF on Clustered Matrix 62 4.6. Recommendation on Original Matrix 64 4.7. Recommendation on Clustered Matrix 65 CHAPTER 5 Discussion & Conclusion 68 5.1. Discussion 68 5.1.1. Movie attribute clustering 68 5.1.2. Sales pattern clustering (Functional Data Clustering) 69 5.1.3. Matrix Factorization and Recommendation Differences 70 5.2. Conclusion 72 5.3. Future Research 73 Appendix 74 A. Cluster characteristics 74 B. Example R code used to get data of Twitter: 76 REFERENCES 77

1. Kantor, P.B., et al., Recommender systems handbook. 2011: Springer.
2. Gupta, P., et al., WTF: the who to follow service at Twitter, in Proceedings of the 22nd international conference on World Wide Web. 2013, International World Wide Web Conferences Steering Committee: Rio de Janeiro, Brazil. p. 505-514.
3. Hosanagar, K., et al., Will the Global Village Fracture Into Tribes? Recommender Systems and Their Effects on Consumer Fragmentation. Management Science, 2013. 60(4): p. 805-823.
4. Thompson, C., If you liked this, you’re sure to love that. The New York Times, 2008. 21.
5. Das, A.S., et al. Google news personalization: scalable online collaborative filtering. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM.
6. Lamere, P. and S. Green. Project aura-recommendation for the rest of us. in Presentation at Sun JavaOne Conference. Slides last accessed. 2008.
7. Linden, G., B. Smith, and J. York, Amazon. com recommendations: Item-to-item collaborative filtering. Internet Computing, IEEE, 2003. 7(1): p. 76-80.
8. Bennett, J. and S. Lanning. The netflix prize. in Proceedings of KDD cup and workshop. 2007.
9. Amatriain, X. and J. Basilico, Netflix recommendations: Beyond the 5 stars (Part 1). The Netflix Tech Blog, http://techblog. netflix. com/2012/04/netflixrecommendations-beyond-5-stars. html,(April 6, 2012), 2012.
10. Herlocker, J.L., et al., Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 5-53.
11. Takács, G., et al. Matrix factorization and neighbor based algorithms for the netflix prize problem. in Proceedings of the 2008 ACM conference on Recommender systems. 2008. ACM.
12. Burke, R., Interactive critiquing forcatalog navigation in e-commerce. Artificial Intelligence Review, 2002. 18(3-4): p. 245-267.
13. Schafer, J.B., et al., Collaborative filtering recommender systems, in The adaptive web. 2007, Springer. p. 291-324.
14. Herlocker, J.L., et al. An algorithmic framework for performing collaborative filtering. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.
15. Sarwar, B., et al. Item-based collaborative filtering recommendation algorithms. in Proceedings of the 10th international conference on World Wide Web. 2001. ACM.
16. Karypis, G. Evaluation of item-based top-n recommendation algorithms. in Proceedings of the tenth international conference on Information and knowledge management. 2001. ACM.
17. Koren, Y. and R. Bell, Advances in collaborative filtering, in Recommender systems handbook. 2011, Springer. p. 145-186.
18. Lops, P., M. De Gemmis, and G. Semeraro, Content-based recommender systems: State of the art and trends, in Recommender systems handbook. 2011, Springer. p. 73-105.
19. Bostandjiev, S., J. O'Donovan, and T. Höllerer. Tasteweights: a visual interactive hybrid recommender system. in Proceedings of the sixth ACM conference on Recommender systems. 2012. ACM.
20. Jedidi, K., R. Krider, and C. Weinberg, Clustering at the Movies. Marketing Letters, 1998. 9(4): p. 393-405.
21. Li, B., Y. Liao, and Z. Qin, Precomputed Clustering for Movie Recommendation System in Real Time. Journal of Applied Mathematics, 2014. 2014.
22. Lemire, D. and A. Maclachlan. Slope One Predictors for Online Rating-Based Collaborative Filtering. in SDM. 2005. SIAM.
23. Eyjolfsdottir, E.A., G. Tilak, and N. Li, Moviegen: A movie recommendation system. UC Santa Barbara: Technical Report, 2010.
24. O’Connor, M. and J. Herlocker. Clustering items for collaborative filtering. in Proceedings of the ACM SIGIR workshop on recommender systems. 1999. Citeseer.
25. Li, Q. and B.M. Kim. Clustering approach for hybrid recommender system. in Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on. 2003. IEEE.
26. Sarwar, B., et al., Application of dimensionality reduction in recommender system-a case study. 2000, DTIC Document.
27. Ahrens, S., Recommender Systems. 2012: epubli.
28. Said, A., et al. Recommender systems evaluation: A 3d benchmark. in ACM RecSys 2012 Workshop on Recommendation Utility Evaluation: Beyond RMSE, Dublin, Ireland. 2012.
29. Zacharski, R., A Programmer's Guide to Data Mining. 2012: http://guidetodatamining.com/.
30. Shardanand, U. and P. Maes. Social information filtering: algorithms for automating “word of mouth”. in Proceedings of the SIGCHI conference on Human factors in computing systems. 1995. ACM Press/Addison-Wesley Publishing Co.
31. Oard, D.W. and J. Kim. Implicit feedback for recommender systems. in Proceedings of the AAAI workshop on recommender systems. 1998.
32. Hu, Y., Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. in Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. 2008. IEEE.
33. Koenigstein, N., et al. The Xbox recommender system. in Proceedings of the sixth ACM conference on Recommender systems. 2012. ACM.
34. Breese, J.S., D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. 1998. Morgan Kaufmann Publishers Inc.
35. Resnick, P., et al. GroupLens: an open architecture for collaborative filtering of netnews. in Proceedings of the 1994 ACM conference on Computer supported cooperative work. 1994. ACM.
36. Humberto Jesús Corona Pampín, H.J., Michael P O'Mahony. Evaluating the Relative Performance of Neighbourhood-Based Recommender Systems. in 3rd Spanish Conference on Information Retrieval. 2014. Spain: In Proceedings of the 3rd Spanish Conference on Information Retrieval.
37. Ba, Q., X. Li, and Z. Bai. Clustering collaborative filtering recommendation system based on SVD algorithm. in Software Engineering and Service Science (ICSESS), 2013 4th IEEE International Conference on. 2013. IEEE.
38. Llanes, J., M. Park, and S. Bajaj, Predicting User Ratings for Music.
39. Hastie, T., et al., The elements of statistical learning. Vol. 2. 2009: Springer.
40. Finch, H., Comparison of distance measures in cluster analysis with dichotomous data. Journal of Data Science, 2005. 3(1): p. 85-100.
41. Batagelj, V., Generalized Ward and related clustering problems. Classification and related methods of data analysis, 1988: p. 67-74.
42. Färber, I., et al. On using class-labels in evaluation of clusterings. in MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD. 2010.
43. Aman. How to interpret mean of Silhouette plot? 2012 [cited 2015 30 May]; Available from: http://stats.stackexchange.com/questions/10540/how-to-interpret-mean-of-silhouette-plot.
44. Dunn†, J.C., Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 1974. 4(1): p. 95-104.
45. Ramsay, J. Welcome to the Functional Data Analysis Website! [cited 2015 21 May]; Available from: http://www.psych.mcgill.ca/misc/fda/index.html.
46. Graves, S., G. Hooker, and J. Ramsay, Functional Data Analysis with R and MATLAB. 2009: New York: Springer.
47. Jacques, J. and C. Preda, Functional data clustering: a survey. Advances in Data Analysis and Classification, 2014. 8(3): p. 231-255.
48. Liu, X. and M.C. Yang, Simultaneous curve registration and clustering for functional data. Computational Statistics & Data Analysis, 2009. 53(4): p. 1361-1376.
49. Sangalli, L.M., et al., K-mean alignment for curve clustering. Computational Statistics & Data Analysis, 2010. 54(5): p. 1219-1233.
50. Chiou, J.M. and P.L. Li, Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2007. 69(4): p. 679-699.
51. Aßfalg, J., et al., Similarity search on time series based on threshold queries, in Advances in Database Technology-EDBT 2006. 2006, Springer. p. 276-294.
52. Berry, M.W., S.T. Dumais, and G.W. O'Brien, Using linear algebra for intelligent information retrieval. SIAM review, 1995. 37(4): p. 573-595.
53. Takács, G., et al., Scalable collaborative filtering approaches for large recommender systems. The Journal of Machine Learning Research, 2009. 10: p. 623-656.
54. Alexa. The top 500 sites on the web. 2015 [cited 2015 04 February]; Available from: http://www.alexa.com/topsites.
55. Adamopoulos, P. and A. Tuzhilin, Probabilistic neighborhood selection in collaborative filtering systems. 2013.
56. Kumar, R., B. K Verma, and S. Sunder Rastogi, Social Popularity based SVD++ Recommender System. International Journal of Computer Applications, 2014. 87(14): p. 33-37.
57. Rowe, M. Semanticsvd++: Incorporating semantic taste evolution for predicting ratings. in Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01. 2014. IEEE Computer Society.
58. IMDb. Personalized Recommendations Frequently Asked Questions. 2015 [cited 2015 06 May]; Available from: http://www.imdb.com/help/show_leaf?personalrecommendations.
59. Sawhney, M.S. and J. Eliashberg, A parsimonious model for forecasting gross box-office revenues of motion pictures. Marketing Science, 1996. 15(2): p. 113-131.
60. Wallace, W.T., A. Seigerman, and M.B. Holbrook, The role of actors and actresses in the success of films: How much is a movie star worth? Journal of cultural economics, 1993. 17(1): p. 1-27.
61. IMDb. Top 1000 Actors and Actresses. 2015 [cited 2015 05 May]; Available from: http://www.imdb.com/list/ls058011111/.

QR CODE