XLNet4Rec:使用前層歸一化轉譯器進行廣義自回歸預訓練的順序推薦

簡易檢索 / 詳目顯示

回結果列表

研究生：	傅韻帆 Yun-Fan Fu
論文名稱：	XLNet4Rec:使用前層歸一化轉譯器進行廣義自回歸預訓練的順序推薦 XLNet4Rec: Sequential Recommendation with Generalized Autoregressive Pretraining Using Pre-Layer Normalization Transformer
指導教授：	吳怡樂 Yi-Leh Wu
口試委員:	唐政元 Zheng-Yuan Tang 陳建中 Jian-Zhong Chen 閻立剛 Li-Gang Yan
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	50
中文關鍵詞：	推薦系統、深度學習、相對位置編碼、雙流自注意力機制、前層歸一化轉譯器
外文關鍵詞：	Recommendation system, Deep Learning, Relative Position embedding, Two-Stream Self-Attention Mechanism, Pre-LN Transformer
相關次數：	點閱：249 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

將深度學習的方法使用在各種領域中已是當今的趨勢，其中包含將自然語言的模型應用到推薦系統中。推薦系統根據產品內容抑或是根據使用者的習慣來判斷使用者的喜好，精確且有效率地根據這些資訊推薦使用者感興趣的事物是推薦系統模型的訓練目標。受到BERT4Rec的啟發，我們使用了XLNet(“用於語言理解的廣義自回歸預訓練”)，希望能將此模型經過些為模型上的調整，用於推薦系統中。在實作中，我們使用了真實世界的資料集: MovieLens來驗證我們的實驗結果，這個資料集被廣泛的實用在評估推薦系統模型的好壞。
在實驗過程中，我們首先調整XLNet4Rec模型的參數來評估對模型效能的影響並選擇最合適的參數來做實驗。除此之外，我們還使用Pre-LN Transformer(“前層歸一化轉譯器”)來幫助加速收斂時間，希望能得到更好的性能。最後針對模型激勵函數的選擇，發現使用Swish激勵函數能夠使我們模型效能有一定的提升，最後得到模型整體提升了12.5%左右的成果。

Deep Learning has been used in various fields become the trend of the times, including the models of natural language application to recommendation system. To accurately recommend the things that user interested in, the recommendation system judges the user’s preferences according to product content or the user’s habit. Inspired by the BERT4Rec, we use the XLNet (“generalized autoregressive pre-training for language understanding”) and hope to use this model in recommendation task after some adjustments. In our experiments, we use a real-world dataset: MovieLens, which is widely used to evaluate the quality of recommendation system models to validate our experimental results.
In the experiments, we adjust the hyperparameters of the XLNet4Rec to evaluate the impact on model performance and choose the most suitable parameters for experiments. Furthermore, we also use the Pre-LN Transformer (“Pre-Layer Normalization Transformer”) to speed up the convergence time and get better performance. For the activation function adjustment, we find that the use of Swish activation function can improve the efficiency of our model to a certain extent, and finally the overall model is improved by about 12.5%.

論文摘要 I
Abstract II
Contents III
LIST OF FIGURES V
LIST OF TABLES VI
Chapter 1. Introduction 1
1.1 Research Background 1
1.2 Research Motivation 2
Chapter 2. Related Work 4
2.1 General Recommendation 4
2.2 Sequential Recommendation 4
2.3 Attention Mechanism 5
2.4 Pre-LN Transformer 10
Chapter 3. Proposed Method 12
3.1 Problem Statement 12
3.2 Modeling Architecture 12
3.2.1 Embedding Layer 14
3.2.2 Transformer Layer 14
3.2.3 Pre-LN Transformer 20
3.2.4 Output Layer 21
3.3 Model Learning 21
Chapter 4. Experiments 23
4.1 Datasets 23
4.2 Task Settings & Evaluation Metrics 24
4.3 Baselines & Training Details 25
4.4 Overall Performance Comparison(RQ1) 26
4.5 The Impact of Different Argument Settings (RQ2) 27
4.5.1 Hidden Dimensionality 27
4.5.2 Number of heads 28
4.5.3 Maximum Sequence Length 29
4.5.4 Masked Proportion 30
4.6 The Impact of Pre-LN Transformer (RQ3) 30
4.7 The Impact of Activation Function (RQ4) 31
4.8 The Impact of Two-stream Self-attention (RQ5) 32
Chapter 5. Conclusions and Future Work 33
References 34
Appendix I: The Output of Each XLNet4Rec Layer 38
                                

[1] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V.Le, “ XLNet: Generalized Autoregressive Pretraining for Language Understanding, ” arXiv preprint arXiv:1906.08237v2,2020.
[2] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang, “BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer, ” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management. pp. 1441-1450, 2019.
[3] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever, “ Improving language understanding with unsupervised learning, ” Technical report, OpenAI,2018.
[4] Mattew E. Peters, Mark Neumann, Mohit lyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, “ Deep contextualized word representations, ”arXiv preprint arXiv:1802.05365v2,2018.
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ”CoRR abs/1810.04805 ,2018.
[6] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, “Attention is all you need. ”arXiv preprint arXiv:1706.03762v5,2017.
[7] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov, “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. ”arXiv preprint arXiv:1901.02860v3,2019.
[8] Benigno Uria, Marc-Alexandre Côté, Karol Gregor, Iain Murray, and Hugo Larochelle. “Neural autoregressive distribution estimation. ” The Journal of Machine Learning Research, 17(1):7184– 7220, 2016.
[9] Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu, “ On Layer Normalization in the Transformer Architecture,”arXiv preprint arXiv:2002.04745v2,2020.
[10] Wang-Cheng Kang, Julian McAuley,“ Self-Attentive Sequential Recommendation , ”arXiv preprint arXiv:1808.09781v1,2018.
[11] Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. “Sequential User-based Recurrent Neural Network Recommendations.”In Proceedings of RecSys. ACM, New York, NY, USA, 152–160,2017.
[12] Balázs Hidasi and Alexandros Karatzoglou. “Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. ” In Proceedings of CIKM. ACM, New York, NY, USA, 843–852,2018.
[13] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. “Session-based Recommendations with Recurrent Neural Networks. ” In Proceedings of ICLR,2016.
[14] Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. “Recurrent Recommender Networks. ” In Proceedings of WSDM. ACM, 2017,New
[15] Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. “A Dynamic Recurrent Model for Next Basket Recommendation. ” In Proceedings of SIGIR. ACM, New York, NY, USA, 729–732,2016.
[16] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. “ Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. ” In Proceedings of EMNLP. Association for Computational Linguistics, 1724–1734,2014.
[17] Prajit Ramachandran, Barret Zoph, Quoc V. Le. “ Searching for Activation Functions. ” arXiv preprint arXiv:1710.05941v2,2017.
[18] Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter. “ Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). ”In Proceedings of ICLR,2016.
[19] Hao Zheng, Zhanlei Yang, Wenju Liu, Jizhong Liang, Yanpeng Li. “ Improving deep neural networks using softplus units. ”In International Joint Conference on Neural Networks (IJCNN),2015.
[20] Wilson L Taylor, ““Cloze procedure”:A new tool for measuring readability, ”Journalism quarterly,vol.30,20.4,pp.415-433,1953.
[21] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter, “Self-Normalizing Neural Networks. ” arXiv preprint arXiv:1706.02515v5,2017.
[22] F Maxwell Harper, and Joseph A Konstan, “The movielens datasets:History and context, ” Acm transactions on interactive intelligent systems(tiis),vol. 5,no. 4,pp. 1-19,2015.
[23] Rounak Banik, “The Movies Dataset",Dataset on Kaggle Website,2017.
[24] Shoujin Wang, Liang Hu, Longbing Cao, Xiaoshui Huang, Defu Lian, Wei Liu, “Attention-based transactional context embedding for next-item recommendation,” in Thirty-Seconf AAAI Conference on Artificial Intelligence, 2018.
[25] Jiaxi Tang, and Ke Wang, “Personalized top-n sequential recommendation via convolutional sequence embedding,” in Proceedings of the Eleventh ACM International COnference on Web Search and Data Mining.pp.565-573,2018.
[26] Steffen Rendle, Christoph Freudenthaler. and Lars Schmidt-Thieme, “Factorizing personalized markov chains for next-basket recommendation, ” in Procedings of the 19th international congerence on Workd wide web.pp. 811-820,2010.
[27] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua, “Neural Collaborative Filtering, ” in Proceedings of World Wide Web. pp 173-182,2017.
[28] Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, “ DeBERTa: Decoding-enhanced BERT with Disentangled Attention, ” In Proceedings of ICLR,2021.
[29] Diederik P Kingma, and Jimmy Ba, “ Adam: A method for stochastic optimization, ”arXiv preprint arXiv:1412.6980,2014.
[30] Guy Shani, David Heckerman, and Ronen I. Brafman. “ An mdp-based recommender system. ” J. Mach. Learn. Res., 6:1265–1295, December 2005. ISSN 1532-4435.
[31] Sepp Hochreiter and Jürgen Schmidhuber. “ Long Short-Term Memory. ” Neural Computation 9, 8 , pp. 1735–1780. November 1997.
[32] Andrew M Dai and Quoc V Le. “ Semi-supervised sequence learning. ” In Advances in neural information processing systems, pp. 3079–3087, 2015.
[33] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. “Roberta: A robustly optimized bert pretraining approach. ” arXiv preprint arXiv:1907.11692, 2019.
[34] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua, “Attentional factorization machines: Learning the weight of feature interactions via attention networks,” arXiv preprint arXiv:17.8.04617, 2017.
[35] Hsiao-Chun Hu,“Recurrent Neural Network based Collaborative Filtering,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2018.
[36] Pei-Hsuan Chen,“A study of Recommender System Based on Recurrent Neural Network Using Scaled Exponential Linear Unit ,” Master Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2019.
[37] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. “Restricted Boltzmann Machines for Collaborative Filtering. ” in Proceedings of the 24th international conference on Machine learining. pp. 791– 798, 2007.
[38] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Lui, Seng Chua, “Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention,” in Preceeding of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. pp. 335-344, 2017.

簡易檢索 / 詳目顯示

相關論文