簡易檢索 / 詳目顯示

研究生: 劉信佑
Xin-You Liu
論文名稱: ALBERT4Rec: Sequential Recommendation with A Lite Bidirectional Encoder Representations from Transformer
ALBERT4Rec: Sequential Recommendation with A Lite Bidirectional Encoder Representations from Transformer
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 唐政元
Zheng-Yuan Tang
陳建中
Jian-Zhong Chen
閻立剛
Li-Gang Yan
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 30
中文關鍵詞: 推薦系統深度學習雙向序列模型多頭自我注意力機制轉譯器的雙向編碼表示跨層參數共享
外文關鍵詞: Recommendation System, Deep Learning, Bidirectional Sequential Model, Multi-Head Self-Attention Mechanism, BERT, ALBERT
相關次數: 點閱:328下載:24
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

推薦系統已經成為現今商業網站不可或缺的一部分,如何有效地推薦使用者感興趣的事物,是現今推薦系統的研究者持續努力的方向。推薦系統的資料來源通常是來自於使用者與物品之間的交互行為,例如評分、點擊、瀏覽紀錄等等,我們的目的是透過使用者的行為來預測下一個時刻的物品。近年來,有研究者提出了基於BERT(“轉譯器的雙向編碼表示"的縮寫)的BERT4Rec推薦模型並刷新了效能紀錄。由於BERT4Rec的啟發,我們找到了BERT的下一代版本ALBERT(“輕量轉譯器的雙向編碼表示"的縮寫),並將他實作成序列推薦系統,期望獲得更佳的性能。實驗中,我們採用真實世界的資料集:MovieLens來加以實驗,這個資料集被廣泛地用來評估一個推薦系統的性能好壞。此外,我們還分析了嵌入層的維度大小、隱藏層的維度大小、多頭自我注意力機制、跨層參數共享、遮罩比例以及序列最大長度的影響,並且以最佳參數調整模型,達到了最佳的性能。實驗結果證明,使用我們提出的ALBERT4Rec的性能比BERT4Rec更好,效能提升了約20%。


Recommendation systems have become an indispensable part of today's commercial websites. How to effectively recommend things that users are interested in is the direction that researchers of current recommendation systems continue to work hard on. The data source of the recommendation system usually comes from the interaction between the users and the items, such as ratings, clicks, browsing histories, etc. Our purpose is to predict the next item through the user's behavior. In recent years, some researchers have proposed the BERT4Rec recommendation model which is based on Bidirectional Encoder Representations from Transformer (BERT) and has achieved state of the art. Inspired by the BERT4Rec, we found the next-generation version of the BERT, A Lite Bidirectional Encoder Representations from Transformer (ALBERT), and implement it as a sequential recommendation system, hoping to obtain better performance. In the experiment, we use a real-world data set: MovieLens to experiment, this data set is widely used to evaluate the performance of a recommendation system. Furthermore, we also analyze the dimensionality of the embedding layer, the dimensionality of the hidden layer, the multi-head self-attention mechanism, the cross-layer parameter sharing, the masked proportion and the maximum length of the sequence, then use the best parameter fine tune model to achieve best performance. Experimental results show that the performance of using our proposed ALBERT4Rec is better than the BERT4Rec, and the performance is increased by about 20%.

論文摘要 ii Abstract iii Contents iv LIST OF FIGURES iv LIST OF TABLES v Chapter 1. Introduction 1 1.1 Research Background 1 1.2 Research Motivation 2 Chapter 2. Related Work 3 2.1 General Recommendation 3 2.2 Sequential Recommendation 3 2.3 Attention Mechanism 4 Chapter 3. Proposed Method 8 3.1 Problem Statement 8 Table 3.1: Notation. 8 3.2 Model Architecture 8 3.3 Embedding Layer 9 3.4 Transformer Layer 10 3.5 Output Layer 12 3.6 Model Learning 12 Chapter 4. Experiments 14 4.1 Datasets 14 Table 4.1: Statistics of the datasets. 14 4.2 Task Settings & Evaluation Metrics 14 4.3 Baselines & Implementation Details 15 Table 4.2: Parameters setting for training model. 15 4.4 Overall Performance Comparison 16 Table 4.3: Overall performance. In each row, the best score is boldfaced, and the second-best score is underlined. Improvements over the ALBERT4Rec are shown in the last column. 16 4.5 Performance Comparison in Different Argument 17 Table 4.4: Performance comparison in different argument. 17 4.6 The Impact of Embedded Size 17 4.7 The Impact of Hidden Units 18 4.7 The Impact of Multi-Head Self-Attention 18 4.8 The Impact of Cross-layer Parameter Sharing 19 4.9 The Impact of Masked Proportion 19 4.10 The Impact of Max Length 20 Chapter 5. Conclusions and Future Work 21 References 22

References
[1] Jiaxi Tang, and Ke Wang, "Personalized top-n sequential recommendation via convolutional sequence embedding," in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. pp. 565-573, 2018.
[2] Hsiao-Chun Hu, “Recurrent Neural Network based Collaborative Filtering,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2018.
[3] Pei-Hsuan Chen, “A Study of Recommender System Based on Recurrent Neural Network Using Scaled Exponential Linear Unit,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2019.
[4] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk, “Session-based recommendations with recurrent neural networks,” arXiv preprint arXiv:1511.06939, 2015.
[5] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, "Attention is all you need," in Advances in neural information processing systems. pp. 5998-6008, 2017.
[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[7] Wilson L Taylor, ““Cloze procedure”: A new tool for measuring readability,” Journalism quarterly, vol. 30, no. 4, pp. 415-433, 1953.
[8] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang, "BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer," in Proceedings of the 28th ACM International Conference on Information and Knowledge Management. pp. 1441-1450, 2019.
[9] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, ” arXiv preprint arXiv:1909.11942, 2019.
[10] Yehuda Koren, and Robert Bell, "Advances in collaborative filtering," in Recommender systems handbook, pp. 77-118: Springer, 2015.
[11] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th international conference on World Wide Web. pp. 285-295, 2001.
[12] Yehuda Koren, Robert Bell, and Chris Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30-37, 2009.
[13] Andriy Mnih, and Russ R Salakhutdinov, "Probabilistic matrix factorization," in Advances in neural information processing systems. pp. 1257-1264, 2008.
[14] Steffen Rendle, "Factorization machines," in Proceedings of the 10th IEEE International Conference on Data Mining. pp. 995-1000, 2010.
[15] Zhi-Hong Deng, Ling Huang, Chang-Dong Wang, Jian-Huang Lai, and Philip S. Yu, “DeepCF: A Unified Framework of Representation Learning and Matching Function Learning in Recommender System,” arXiv preprint arXiv:1901.04704, 2019.
[16] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He, “DeepFM: A Factorization-Machine based Neural Network for CTR Prediction,” arXiv preprint arXiv:1703.04247, 2017.
[17] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme, "Factorizing personalized markov chains for next-basket recommendation," in Proceedings of the 19th international conference on World wide web. pp. 811-820, 2010.
[18] Wang-Cheng Kang, and Julian McAuley, "Self-attentive sequential recommendation," in 2018 IEEE International Conference on Data Mining (ICDM). pp. 197-206, 2018.
[19] Dan Hendrycks, and Kevin Gimpel, “Bridging nonlinearities and stochastic regularizers with gaussian error linear units,” 2016.
[20] F Maxwell Harper, and Joseph A Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1-19, 2015.
[21] Rounak Banik, “The Movies Dataset,” Dataset on Kaggle Website, 2017.
[22] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua, “Neural Collaborative Filtering,” in Proceedings of World Wide Web. pp. 173-182, 2017.
[23] Diederik P Kingma, and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[24] Kevin Clark, Minh-Thang Luong, and Quoc V. Le, Christopher D. Manning, “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” arXiv preprint arXiv:2003.10555, 2020.

QR CODE