Basic Search / Detailed Display

Author: 劉信佑
Xin-You Liu
Thesis Title: ALBERT4Rec: Sequential Recommendation with A Lite Bidirectional Encoder Representations from Transformer
ALBERT4Rec: Sequential Recommendation with A Lite Bidirectional Encoder Representations from Transformer
Advisor: 吳怡樂
Yi-Leh Wu
Committee: 唐政元
Zheng-Yuan Tang
Jian-Zhong Chen
Li-Gang Yan
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2020
Graduation Academic Year: 109
Language: 英文
Pages: 30
Keywords (in Chinese): 推薦系統深度學習雙向序列模型多頭自我注意力機制轉譯器的雙向編碼表示跨層參數共享
Keywords (in other languages): Recommendation System, Deep Learning, Bidirectional Sequential Model, Multi-Head Self-Attention Mechanism, BERT, ALBERT
Reference times: Clicks: 315Downloads: 24
School Collection Retrieve National Library Collection Retrieve Error Report


Recommendation systems have become an indispensable part of today's commercial websites. How to effectively recommend things that users are interested in is the direction that researchers of current recommendation systems continue to work hard on. The data source of the recommendation system usually comes from the interaction between the users and the items, such as ratings, clicks, browsing histories, etc. Our purpose is to predict the next item through the user's behavior. In recent years, some researchers have proposed the BERT4Rec recommendation model which is based on Bidirectional Encoder Representations from Transformer (BERT) and has achieved state of the art. Inspired by the BERT4Rec, we found the next-generation version of the BERT, A Lite Bidirectional Encoder Representations from Transformer (ALBERT), and implement it as a sequential recommendation system, hoping to obtain better performance. In the experiment, we use a real-world data set: MovieLens to experiment, this data set is widely used to evaluate the performance of a recommendation system. Furthermore, we also analyze the dimensionality of the embedding layer, the dimensionality of the hidden layer, the multi-head self-attention mechanism, the cross-layer parameter sharing, the masked proportion and the maximum length of the sequence, then use the best parameter fine tune model to achieve best performance. Experimental results show that the performance of using our proposed ALBERT4Rec is better than the BERT4Rec, and the performance is increased by about 20%.

論文摘要 ii Abstract iii Contents iv LIST OF FIGURES iv LIST OF TABLES v Chapter 1. Introduction 1 1.1 Research Background 1 1.2 Research Motivation 2 Chapter 2. Related Work 3 2.1 General Recommendation 3 2.2 Sequential Recommendation 3 2.3 Attention Mechanism 4 Chapter 3. Proposed Method 8 3.1 Problem Statement 8 Table 3.1: Notation. 8 3.2 Model Architecture 8 3.3 Embedding Layer 9 3.4 Transformer Layer 10 3.5 Output Layer 12 3.6 Model Learning 12 Chapter 4. Experiments 14 4.1 Datasets 14 Table 4.1: Statistics of the datasets. 14 4.2 Task Settings & Evaluation Metrics 14 4.3 Baselines & Implementation Details 15 Table 4.2: Parameters setting for training model. 15 4.4 Overall Performance Comparison 16 Table 4.3: Overall performance. In each row, the best score is boldfaced, and the second-best score is underlined. Improvements over the ALBERT4Rec are shown in the last column. 16 4.5 Performance Comparison in Different Argument 17 Table 4.4: Performance comparison in different argument. 17 4.6 The Impact of Embedded Size 17 4.7 The Impact of Hidden Units 18 4.7 The Impact of Multi-Head Self-Attention 18 4.8 The Impact of Cross-layer Parameter Sharing 19 4.9 The Impact of Masked Proportion 19 4.10 The Impact of Max Length 20 Chapter 5. Conclusions and Future Work 21 References 22

[1] Jiaxi Tang, and Ke Wang, "Personalized top-n sequential recommendation via convolutional sequence embedding," in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. pp. 565-573, 2018.
[2] Hsiao-Chun Hu, “Recurrent Neural Network based Collaborative Filtering,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2018.
[3] Pei-Hsuan Chen, “A Study of Recommender System Based on Recurrent Neural Network Using Scaled Exponential Linear Unit,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2019.
[4] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk, “Session-based recommendations with recurrent neural networks,” arXiv preprint arXiv:1511.06939, 2015.
[5] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, "Attention is all you need," in Advances in neural information processing systems. pp. 5998-6008, 2017.
[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[7] Wilson L Taylor, ““Cloze procedure”: A new tool for measuring readability,” Journalism quarterly, vol. 30, no. 4, pp. 415-433, 1953.
[8] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang, "BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer," in Proceedings of the 28th ACM International Conference on Information and Knowledge Management. pp. 1441-1450, 2019.
[9] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, ” arXiv preprint arXiv:1909.11942, 2019.
[10] Yehuda Koren, and Robert Bell, "Advances in collaborative filtering," in Recommender systems handbook, pp. 77-118: Springer, 2015.
[11] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th international conference on World Wide Web. pp. 285-295, 2001.
[12] Yehuda Koren, Robert Bell, and Chris Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30-37, 2009.
[13] Andriy Mnih, and Russ R Salakhutdinov, "Probabilistic matrix factorization," in Advances in neural information processing systems. pp. 1257-1264, 2008.
[14] Steffen Rendle, "Factorization machines," in Proceedings of the 10th IEEE International Conference on Data Mining. pp. 995-1000, 2010.
[15] Zhi-Hong Deng, Ling Huang, Chang-Dong Wang, Jian-Huang Lai, and Philip S. Yu, “DeepCF: A Unified Framework of Representation Learning and Matching Function Learning in Recommender System,” arXiv preprint arXiv:1901.04704, 2019.
[16] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He, “DeepFM: A Factorization-Machine based Neural Network for CTR Prediction,” arXiv preprint arXiv:1703.04247, 2017.
[17] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme, "Factorizing personalized markov chains for next-basket recommendation," in Proceedings of the 19th international conference on World wide web. pp. 811-820, 2010.
[18] Wang-Cheng Kang, and Julian McAuley, "Self-attentive sequential recommendation," in 2018 IEEE International Conference on Data Mining (ICDM). pp. 197-206, 2018.
[19] Dan Hendrycks, and Kevin Gimpel, “Bridging nonlinearities and stochastic regularizers with gaussian error linear units,” 2016.
[20] F Maxwell Harper, and Joseph A Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1-19, 2015.
[21] Rounak Banik, “The Movies Dataset,” Dataset on Kaggle Website, 2017.
[22] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua, “Neural Collaborative Filtering,” in Proceedings of World Wide Web. pp. 173-182, 2017.
[23] Diederik P Kingma, and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[24] Kevin Clark, Minh-Thang Luong, and Quoc V. Le, Christopher D. Manning, “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” arXiv preprint arXiv:2003.10555, 2020.