簡易檢索 / 詳目顯示

研究生: 哲黛比
Debby Cintia Ganesha Putri
論文名稱: 以非監督式學習架構評估電影推薦系統中的聚類演算法
Evaluation of Clustering Algorithms in Movie Recommender System with Unsupervised Machine Learning Schemes
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 呂政修
Jenq-Shiou Leu
周承復
Cheng-Fu Chou
衛信文
Hsin-Wen Wei
王瑞堂
Ruei-Tang Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 144 pages
中文關鍵詞: 聚類算法推薦系統無監督機器學習聚類績效評估
外文關鍵詞: Recommender System, Unsupervised Learning, Clustering Algorithms, Machine Learning, Clustering Performance Evaluation
相關次數: 點閱:232下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

這項研究旨在確定人群的相似性,以用戶構建電影推薦系統。個人缺乏必要的經驗或能力,不足以評估特定情況下存在的大量替代項目。例如,很難找到想要的電影。由於電影信息量的增加,用戶通常難以找到合適的電影。推薦系​​統對於幫助客戶選擇具有現有功能的首選電影非常有用,並且聚類算法評估可以幫助研究人員確定最佳的聚類算法。
推薦系​​統是一種簡單的算法,旨在為用戶提供最相關的信息。推薦系​​統對客戶來說非常有用,因為此功能會通過提供電影推薦來破壞用戶。在這項研究中,推薦系統的開發是通過使用一些算法來進行聚類的,例如K-Means算法,Birch算法,Mini Batch K-Means算法,Mean shift算法,親和傳播算法,聚集聚類算法以及頻譜聚類算法。然後提出了一種優化K的方法,該方法對於每個聚類不會明顯增加方差。我們限制使用基於類型,標籤和電影分級的聚類。這項研究將找到一種更好的方法和方法來評估聚類算法。為了檢查推薦系統的更好算法,我們使用均方誤差(MSE),鄧恩矩陣作為聚類有效性指標和社交網絡分析(SNA)來探索聚類之間的關係,例如度中心性,親密性中心性和中間性中心性。我們還使用平均相似度,計算時間,Apriori算法的關聯規則和聚類性能評估作為評估方法,這些方法已廣泛用於比較推薦系統的方法性能。使用輪廓係數,Calinski-Harabaz指數,Davies-Bouldin指數進行聚類性能評估。
這項研究的結果是找出人群之間的相似之處,以便為用戶構建電影推薦系統。推薦系​​統對於客戶選擇具有現有功能的首選電影非常有用。這項研究將從聚類算法和評估聚類算法的方法中找到一種更好的性能檢驗方法。評估聚類算法將為研究人員提供有關所用算法最佳性能的信息。


This research aims to determine the similarities in groups of people to build a film recommender system for users. Individuals lack the necessary experience or competence sufficient to evaluate a large number of alternative items that exist in a particular case. For example, difficulty in finding the desired movie. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features and clustering algorithm evaluation can help researchers determine the best algorithm for clustering.
The recommendation system is a simple algorithm with the aim of providing the most relevant information for users. The recommendation system is very useful for customers because this feature can spoil the user by giving movie recommendations. In this study, the development of a recommendation system is carried out by using some algorithms to get clustering such as K-Means Algorithm, Birch Algorithm, Mini Batch K-Means Algorithm, Mean shift Algorithm, Affinity Propagation Algorithm, Agglomerative Clustering Algorithm, and Spectral Clustering Algorithm. Then proposed a method to optimize K that for each cluster would not rise significantly the variance. We limited to use clustering based on Genre, Tags, and movies ratings. This study would find a better method and way to evaluate clustering algorithm. To check a better algorithm of the recommendation system, we employed the mean squared error (MSE), Dunn Matrix as Cluster Validity Indices and social network analysis (SNA) to explore the relationships between clusters, such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used average similarity, computational time, Association Rule with Apriori Algorithm, and clustering performance evaluation as evaluation measures which have been widely used to compare methods performance of recommendation systems. Clustering Performance Evaluation with Silhouette Coefficient, Calinski-Harabaz Index, Davies-Bouldin Index.
The results from this study at finding out the similarities within groups of people to build a movie recommending system for users. The recommendation system is very useful for customers to choose the preferred movie with existing features. This research would find a better method with performance examine from clustering algorithm and the way to evaluate clustering algorithm. Evaluate clustering algorithm will provide information for researcher about the best performance of the algorithm used.

CONTENTS ABSTRACT iii ACKNOWLEDGEMENTS v LIST OF FIGURES viii LIST OF TABLES x LIST OF EQUATIONS xi CHAPTER 1 INTRODUCTION 1 1.1 Research Background 1 1.2 Research Objectives 3 1.3 Research Scope and Limitations 4 1.4 Contributions 4 1.5 Outline and Report 6 CHAPTER 2 LITERATURE REVIEW 8 2.1 System Recommendations 8 2.2 Clustering Algorithm 10 2.2.1 K-Means Algorithm 10 2.2.2 Birch Clustering Algorithm 11 2.2.3 Mini Batch K-Means clustering 12 2.2.4 Mean Shift clustering 12 2.2.5 Affinity Propagation Clustering 13 2.2.6 Agglomerative Clustering 13 2.2.7 Spectral Clustering 14 2.3 Evaluation Criteria 14 2.3.1 Mean Squared Errors (MSE) 14 2.3.2 Cluster Validity Indices: Dunn Matrix 15 2.3.3 Social Network Analysis (SNA) 16 2.3.4 Average Similarity 17 2.3.5 Computational Time 18 2.3.6 Assosiation Rule: Apriori Algorithm 18 2.3.7 Clustering Performance Evaluation 20 CHAPTER 3 METHODOLOGY 22 3.1 System Design 22 3.2 Clustering Algorithm 23 3.2.1 K-Means Clustering Algorithm and Optimize K Number Cluster 23 3.2.2 Birch Clustering Algorithm 26 3.2.3 Mini Batch K-Means Clustering 27 3.2.4 Mean Shift Clustering 27 3.2.5 Affinity Propagation Clustering 28 3.2.6 Agglomerative Clustering Algorithm 29 3.2.7 Spectral Clustering Algorithm 30 CHAPTER 4 EXPERIMENT AND RESULT 31 4.1 Dataset 31 4.2 Experiment Result 32 4.2.1 Algorithm Clustering Result 32 4.2.2 Optimize K number cluster 40 CHAPTER 5 EVALUATION AND DISCUSSION 44 5.1 Evaluation Result 44 5.1.1 Mean Squared Error (MSE) 44 5.1.2 Cluster Validity Indices: Dunn Matrix 50 5.1.3 Social Network Analysis Result (SNA) 54 5.1.4 Average Similarity 75 5.1.5 Computation Time 83 5.1.6 Clustering Performance Evaluation 85 5.1.7 Association Rule: Apriori Algorithm 97 5.2 Discussion 119 5.2.1 K-Means Performance 119 5.2.2 Birch Performance 120 5.2.3 Performance of the Mini Batch Kmeans 121 5.2.4 Mean Shift Performance 122 5.2.5 Affinity Propagation Performance 123 5.2.6 Agglomerative Clustering Performance 124 5.2.7 Spectral Clustering Performance 125 CHAPTER 6 CONCLUSION AND FUTURE RESEARCH 126 6.1 Conclusion 126 6.2 Future Research 127 REFERENCES 128

Jamil Itmazi, M. M. (2008). Using Recommendation Systems in Course Management System to Recommend Learning Object. The International Arab Journal of Information Technology, Vol 5, No 3.
Jie Lu, D. W. (2015). Recommender system application development: A survey. Decision Support Systems 74 , 12-32.
Neepa Shah, S. M. (2012). Document Clustering: A Detailed Review. International Journal of Applied Information Systems (IJAIS).
Zan Wang, X. Y. (2014). An improved collaborative movie recommendation system. Journal of Visual Languages and Computing 25 , 667–675.
Md. Tayeb Himel, M. N. (2017). Weight Based Movie Recommendation System Using K-means Algorithm. IEEE.
Konstan, F. M. (2015). The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems, Vol. 5.
Tryon, R. C., 1939. Cluster analysis: Correlation profile and orthometric (factor) analysis for the isolation of unities in mind and personality. s.l.:Edwards brother, Incorporated, lithoprinters and publishers.
Robert W. Cox, G. C. (2017). FMRI Clustering in AFNI: False Positive Rates Redux. Brain Connectivity.
Kiri Wagstaf, C. C. (2017). Constrained K-means Clustering with Background Knowledge. Proceedings of the Eighteenth International Conference on Machine Learning, 577–584.
Gholamhosein Sheikholeslami, S. j. (1998). WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proceedings of the 24th VLDB Conference New York, USA.
Sculley, D. (2010). Web-Scale K-Means Clustering. North Carolina, USA.
K. Fukunaga and L.D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Truns. Information Theory, vol. 21, pp. 32-40, 1975.
Dueck, D. (2009). Affinity Propagation: Clustering Data By Passing Messages. conformity with the requirements for the degree of Doctor of Philosophy.
Oded Maimon, L. R. (2005). Data Mining And Knowledge Discovery Handbook. Israel: 5 Springer Science+Business Media, Inc.
Freeman, L. C. (2004). The Development Of Social Network Analysis. North Charleston, South Carolina: BookSurge, LLC.
Mohammad Iqbal, A. M. (2017). Using k-means++ algorithm for researchers clustering. ResearchGate.
Plattel, C. (n.d.). Distributed and Incremental Clustering using Shared Nearest Neighbours. Utrecht University, 2014
Ujjwal Maulik, S. B. (2002). Performance Evaluation of Some Clustering Algorithms and Validity Indices. IEEE Transactions On Pattern Analysis And Machine Intelligence Vol 24.
Rousseeuw, P. J. (1986). Least Median Of Squares: A Robust Method For Outlier And Model Error Detection In Regression And Calibration. Elsevier Science Publishers B.V., Amsterdam.
Fahim A.M, S. A. (2006). An efficient enhanced k-means clustering algorithm. Journal of Zhejiang University SCIENCE A .
Meng-Yen Hsieh, T.-H. W.-C. (2018). A keyword-aware recommender system using implicit feedback on Hadoop. J. Parallel Distrib. Comput., 63-73.
V, S. (n.d.). Supervised vs Unsupervised Learning: Algorithms and Examples. Retrieved from http://intellspot.com: http://intellspot.com/unsupervised-vs-supervised-learning/
Fuzhi Zhang, Z. Z. (2018). UD-HMM: An unsupervised method for shilling attack detection based on hidden Markov model and hierarchical clustering. Knowledge-Based Systems , 146–166.
Sagarika Bakshi, A. K.-N. (2014). Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Applied Soft Computing, 21-29.
Committed, I. (2019). World Telecommunication/ICT Indicators Database. Retrieved from itu.int: https://www.itu.int/en/ITU-D/Statistics/Pages/publications/wtid.aspx
David Reinsel, J. G. J. R., 2017. Data Age 2025: The Evolution of Data to Life-Critical. An IDC White Paper.
A. Paccanaro, J. A. (2006). Spectral clustering of protein sequences. Nucleic Acids Research, vol. 34, no. 5, 1571–1580.
Fionn Murtagh, P. L. (2014). Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? Journal of Classification, 274-295 .
J. Weston, C. L. (2005). Semi-supervised protein classification using cluster kernels. Bioinformatics, vol. 21, no. 15, pp. 3241–3247.
Jasdeep Singh Malik, P. G. (2010). A Comprehensive Approach Towards Data Preprocessing Techniques & Association Rules. Proceedings of The 4th National.
Kwon, G. A. (2007). New Recommendation Techniques for Multicriteria Rating Systems. IEEE Computer Society.
Paolo Cremonesi, R. T. (2008). An Evaluation Methodology for Collaborative Recommender Systems. International Conference on Automated solutions for Cross Media Content and Multi-channel Distribution.
Ravindra Ramawat, M. B. (2017). A Hotel Recommendation System Based on Data Mining Techniques. IOSR Journal of Computer Engineering (IOSR-JCE) .
Sung Hwan Min, I. H. (2005). Detection of the customer time-variant pattern for improving recommender systems. Expert Systems with Applications.
Zahid Halim, J. H. (2019). Density-based clustering of big probabilistic graphs. Springer-Verlag GmbH Germany.

QR CODE