簡易檢索 / 詳目顯示

研究生: 蔡亦凱
Yi-Kai Tsai
論文名稱: 基於轉譯器的雙向編碼表示改善序列推薦之研究
A Study of Improving Sequential Recommendation with Bidirectional Encoder Representations from Transformer
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 吳怡樂
Yi-Leh Wu
陳建中
Jiann-Jone Chen
唐政元
Cheng-Yuan Tang
閻立剛
Li-Kang Yen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 34
中文關鍵詞: 推薦系統深度學習雙向序列模型多頭自我注意力機制轉譯器的雙向編碼表示
外文關鍵詞: Recommender System, Deep Learning, Bidirectional Sequential Model, Multi-Head Self-Attention Mechanism, BERT
相關次數: 點閱:224下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 推薦系統的應用層面多元且廣泛,舉凡電商平台、影音網站,都有使用這個技術。如果能提高推薦系統的可用性與效率,相信能夠為人類日常生活帶來更多便利性。使用者的歷史互動行為是推薦系統的基礎,目前有許多序列推薦的方法,都是基於使用者歷史行為。最近,用於翻譯自然語言的模型:BERT(“轉譯器的雙向編碼表示”的縮寫)利用有多頭自我注意力機制的編碼器,建立一個雙向的預訓練模型,並且使用了克漏字任務來避免模型自己間接看到待預測的項目,為機器翻譯帶來強大的性能。後來有一篇論文將BERT應用在序列推薦問題上(BERT4Rec),並且得到了最先進的效果。然而,他們僅僅使用了每個使用者曾經與之互動過的項目來建立模型,有關於這些項目的其它特徵,像是使用者評價、項目類別等等,都沒有加以考慮進去。事實上,那些都是存在於資料集中的資訊,如果把這些特徵也一起加入訓練,我們認為模型將會更加準確。因此,我們使用真實世界的資料集:MovieLens來做實驗,這個資料集經常被用來訓練推薦系統模型,除了有使用者曾經評論過的電影紀錄以外,也包含了使用者對電影的評分,以及電影的類型。我們分別測試了只有加入使用者評分、只有加入電影類別,以及兩者都加入的模型表現,成功證明了在加進這些特徵之後,模型的表現會比原本的BERT4Rec更好。此外,我們更分析了多頭自我注意力機制、殘差連接,以及位置嵌入對模型的影響,在我們的模型中,這三個機制確實都有其必要性,他們會讓模型的表現變得更好。


    The application of the recommender system is very extensive, such as e-commerce or online video-sharing platform. The progress of the recommender system makes our life more convenient. The user's historical behavior is the basis of the recommender system. At present, many studies of sequence recommendation are all based on user's historical behavior. Recently, a state-of-the-art natural language model, Bidirectional Encoder Representations from Transformer (BERT), has proposed with bidirectional self-attention mechanism. The model uses Cloze task to avoid indirectly seeing the target item, bringing powerful performance to machine translation. Later, the scholars applied BERT to the sequence recommendation task, called BERT4Rec, and they got excellent results. However, they only use user’s historical interactions to train their model. They ignore user ratings and item categories. We assume that the features are helpful for training model. If those unused features are also used for training, we can further improve the model. Therefore, we use the real-world dataset, the MovieLens, to do experiments. In addition to the movie reviews that users commented on, the dataset also includes user ratings and genres of movies. We test the performance of our model with only adding user ratings, only adding movie genres, and adding both. The results show that with these features, our model has better recommendation performance than the BERT4Rec. Moreover, we analyze the impacts of multi-head self-attention, residual connections, and position embeddings. In our case, the above three components can indeed enhance the effectiveness of our model.

    論文摘要 iii Abstract iv Acknowledgements v Contents vi LIST OF FIGURES vii LIST OF TABLES viii Chapter 1. Introduction 1 1.1 Research Background 1 1.2 Research Motivation 2 Chapter 2. Related Work 3 2.1 General Recommendation 3 2.2 Sequential Recommendation 3 2.3 Attention Mechanism 4 2.4 Sequential Recommendation with Bidirectional Encoder Representations from Transformer (BERT4Rec) 5 Chapter 3. Proposed Method 7 3.1 Problem Statement 7 3.2 Embedding Layer 7 3.3 Output Layer 10 3.4 Model Learning 10 Chapter 4. Experiments 12 4.1 Datasets 12 4.2 Task Settings & Evaluation Metrics 13 4.3 Baselines & Implementation Details 14 4.4 Overall Performance Comparison (RQ1) 15 4.5 The Impact of Multi-Head Self-Attention (RQ2) 17 4.6 The Impact of Residual Connection (RQ3) 18 4.7 The Impact of Position Embedding (RQ4) 19 Chapter 5. Conclusions and Future Work 21 References 22

    [1] Guy Shani, David Heckerman, and Ronen I Brafman, “An MDP-based recommender system,” Journal of Machine Learning Research, vol. 6, no. Sep, pp. 1265-1295, 2005.
    [2] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme, "Factorizing personalized markov chains for next-basket recommendation," in Proceedings of the 19th international conference on World wide web. pp. 811-820, 2010.
    [3] Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He, "A simple convolutional generative network for next item recommendation," in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 582-590, 2019.
    [4] Yong Kiam Tan, Xinxing Xu, and Yong Liu, "Improved recurrent neural networks for session-based recommendations," in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. pp. 17-22, 2016.
    [5] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk, “Session-based recommendations with recurrent neural networks,” arXiv preprint arXiv:1511.06939, 2015.
    [6] Jiaxi Tang, and Ke Wang, "Personalized top-n sequential recommendation via convolutional sequence embedding," in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. pp. 565-573, 2018.
    [7] Hsiao-Chun Hu, “Recurrent Neural Network based Collaborative Filtering,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2018.
    [8] Pei-Hsuan Chen, “A Study of Recommender System Based on Recurrent Neural Network Using Scaled Exponential Linear Unit,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2019.
    [9] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, "Attention is all you need," in Advances in neural information processing systems. pp. 5998-6008, 2017.
    [10] Wang-Cheng Kang, and Julian McAuley, "Self-attentive sequential recommendation," in 2018 IEEE International Conference on Data Mining (ICDM). pp. 197-206, 2018.
    [11] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    [12] Wilson L Taylor, ““Cloze procedure”: A new tool for measuring readability,” Journalism quarterly, vol. 30, no. 4, pp. 415-433, 1953.
    [13] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang, "BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer," in Proceedings of the 28th ACM International Conference on Information and Knowledge Management. pp. 1441-1450, 2019.
    [14] Yehuda Koren, and Robert Bell, "Advances in collaborative filtering," in Recommender systems handbook, pp. 77-118: Springer, 2015.
    [15] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th international conference on World Wide Web. pp. 285-295, 2001.
    [16] Yehuda Koren, Robert Bell, and Chris Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30-37, 2009.
    [17] Andriy Mnih, and Russ R Salakhutdinov, "Probabilistic matrix factorization," in Advances in neural information processing systems. pp. 1257-1264, 2008.
    [18] Santosh Kabbur, Xia Ning, and George Karypis, "Fism: factored item similarity models for top-n recommender systems," in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 659-667, 2013.
    [19] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton, "Restricted Boltzmann machines for collaborative filtering," in Proceedings of the 24th international conference on Machine learning. pp. 791-798, 2007.
    [20] Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu, "Convolutional matrix factorization for document context-aware recommendation," in Proceedings of the 10th ACM Conference on Recommender Systems. pp. 233-240, 2016.
    [21] Hao Wang, Naiyan Wang, and Dit-Yan Yeung, "Collaborative deep learning for recommender systems," in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1235-1244, 2015.
    [22] Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian McAuley, "Visually-aware fashion recommendation and design with generative image models," in 2017 IEEE International Conference on Data Mining (ICDM). pp. 207-216, 2017.
    [23] Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu, "What your images reveal: Exploiting visual contents for point-of-interest recommendation," in Proceedings of the 26th International Conference on World Wide Web. pp. 391-400, 2017.
    [24] Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen, "Deep content-based music recommendation," in Advances in neural information processing systems. pp. 2643-2651, 2013.
    [25] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua, "Neural collaborative filtering," in Proceedings of the 26th international conference on world wide web. pp. 173-182, 2017.
    [26] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie, "Autorec: Autoencoders meet collaborative filtering," in Proceedings of the 24th international conference on World Wide Web. pp. 111-112, 2015.
    [27] Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester, "Collaborative denoising auto-encoders for top-n recommender systems," in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. pp. 153-162, 2016.
    [28] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
    [29] Balázs Hidasi, and Alexandros Karatzoglou, "Recurrent neural networks with top-k gains for session-based recommendations," in Proceedings of the 27th ACM International Conference on Information and Knowledge Management. pp. 843-852, 2018.
    [30] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio, "Show, attend and tell: Neural image caption generation with visual attention," in International conference on machine learning. pp. 2048-2057, 2015.
    [31] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
    [32] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua, "Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention," in Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. pp. 335-344, 2017.
    [33] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua, “Attentional factorization machines: Learning the weight of feature interactions via attention networks,” arXiv preprint arXiv:1708.04617, 2017.
    [34] Shoujin Wang, Liang Hu, Longbing Cao, Xiaoshui Huang, Defu Lian, and Wei Liu, "Attention-based transactional context embedding for next-item recommendation," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    [35] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
    [36] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770-778, 2016.
    [37] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
    [38] F Maxwell Harper, and Joseph A Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1-19, 2015.
    [39] Dan Hendrycks, and Kevin Gimpel, “Bridging nonlinearities and stochastic regularizers with gaussian error linear units,” 2016.
    [40] Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang, "Improving sequential recommendation with knowledge-enhanced memory networks," in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. pp. 505-514, 2018.
    [41] Diederik P Kingma, and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [42] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.

    QR CODE