研究生: 劉信佑
Xin-You Liu
論文名稱: ALBERT4Rec: Sequential Recommendation with A Lite Bidirectional Encoder Representations from Transformer
ALBERT4Rec: Sequential Recommendation with A Lite Bidirectional Encoder Representations from Transformer
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 唐政元
Zheng-Yuan Tang
Jian-Zhong Chen
Li-Gang Yan
學位類別: 碩士
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 30
中文關鍵詞: 推薦系統深度學習雙向序列模型多頭自我注意力機制轉譯器的雙向編碼表示跨層參數共享
外文關鍵詞: Recommendation System, Deep Learning, Bidirectional Sequential Model, Multi-Head Self-Attention Mechanism, BERT, ALBERT
相關次數: 點閱:836下載:31
Recommendation systems have become an indispensable part of today's commercial websites. How to effectively recommend things that users are interested in is the direction that researchers of current recommendation systems continue to work hard on. The data source of the recommendation system usually comes from the interaction between the users and the items, such as ratings, clicks, browsing histories, etc. Our purpose is to predict the next item through the user's behavior. In recent years, some researchers have proposed the BERT4Rec recommendation model which is based on Bidirectional Encoder Representations from Transformer (BERT) and has achieved state of the art. Inspired by the BERT4Rec, we found the next-generation version of the BERT, A Lite Bidirectional Encoder Representations from Transformer (ALBERT), and implement it as a sequential recommendation system, hoping to obtain better performance. In the experiment, we use a real-world data set: MovieLens to experiment, this data set is widely used to evaluate the performance of a recommendation system. Furthermore, we also analyze the dimensionality of the embedding layer, the dimensionality of the hidden layer, the multi-head self-attention mechanism, the cross-layer parameter sharing, the masked proportion and the maximum length of the sequence, then use the best parameter fine tune model to achieve best performance. Experimental results show that the performance of using our proposed ALBERT4Rec is better than the BERT4Rec, and the performance is increased by about 20%.

論文摘要 ii Abstract iii Contents iv LIST OF FIGURES iv LIST OF TABLES v Chapter 1. Introduction 1 1.1 Research Background 1 1.2 Research Motivation 2 Chapter 2. Related Work 3 2.1 General Recommendation 3 2.2 Sequential Recommendation 3 2.3 Attention Mechanism 4 Chapter 3. Proposed Method 8 3.1 Problem Statement 8 Table 3.1: Notation. 8 3.2 Model Architecture 8 3.3 Embedding Layer 9 3.4 Transformer Layer 10 3.5 Output Layer 12 3.6 Model Learning 12 Chapter 4. Experiments 14 4.1 Datasets 14 Table 4.1: Statistics of the datasets. 14 4.2 Task Settings & Evaluation Metrics 14 4.3 Baselines & Implementation Details 15 Table 4.2: Parameters setting for training model. 15 4.4 Overall Performance Comparison 16 Table 4.3: Overall performance. In each row, the best score is boldfaced, and the second-best score is underlined. Improvements over the ALBERT4Rec are shown in the last column. 16 4.5 Performance Comparison in Different Argument 17 Table 4.4: Performance comparison in different argument. 17 4.6 The Impact of Embedded Size 17 4.7 The Impact of Hidden Units 18 4.7 The Impact of Multi-Head Self-Attention 18 4.8 The Impact of Cross-layer Parameter Sharing 19 4.9 The Impact of Masked Proportion 19 4.10 The Impact of Max Length 20 Chapter 5. Conclusions and Future Work 21 References 22

