簡易檢索 / 詳目顯示

研究生: 洪羽萱
Yu-Hsuan Hong
論文名稱: 運用重疊分群與長短期記憶神經網路於網頁瀏覽序列之預測
Sequential Webpage Browsing Behavior Prediction using Overlapping Clustering and Long Short-Term Memory Network
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 王孔政
Kung-Jeng Wang
郭人介
Ren-Jieh Kuo
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 70
中文關鍵詞: 有序推薦系統重疊分群長短期記憶神經網路(LSTM)
外文關鍵詞: Sequential Recommendation System, Overlapping Clustering, Long Short-Term Memory Network
相關次數: 點閱:378下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著網際網路日益進步,越來越多的電子商務網站都使用著推薦系統來幫助消費者找到要購買的商品。一般傳統常見的推薦系統都是考量瀏覽商品與商品之間的關聯性,或是基於內容過濾和協同過濾的方式為消費者進行推薦,但這些推薦系統都沒有考量到消費者在網路上進行商品瀏覽時,可能因為習慣或喜好而產生出不同的商品瀏覽序列行為。
本研究分析了個案電商平台當中五大類商品的瀏覽序列資料,考量須以序列方式進行探勘,因此採用重疊分群結合長短期記憶神經網路(Long Short Term Memory Network, LSTM) 的方法於電商平台當中對用戶於商品瀏覽序列做預測。
本研究以消費者角度出發,隨著消費者在平台中瀏覽商品的序列越來越長時,能有越來越精準的推薦結果,除了能夠為消費者提供合適、有價值性的資訊以外,對公司而言,也能夠使消費者產生依賴性而建立自身長期經營的市場競爭力。經由本研究的結果也顯示,在瀏覽序列紀錄檔中雖然多數序列都存在無嚴格序列的問題,但使用重疊分群能有效解決此問題,且在預測的準確率上也與不採用重疊分群的效果高出25%。


Nowadays, an increasing number of E-commerce websites are utilizing recommendation systems to suggest users find products to purchase, and enrich their shopping potential. Generally, one parts of the traditional recommendation system considered the correlation between products, others are use Content Base Filtering or Collaborative Filtering, but the sequence of the actions performed by them are not usually be considered. Different browsing sequence behaviors may occur due to user’s habits or preferences.
To address this issue, this research analyzes the recommendation systems at five categories of products by considered the recommendation systems in a sequential pattern. We use a hybrid approach, which employing overlapping clustering and Long Short-Term Memory Network (LSTM) to make next browsing sequential products prediction.
From the consumer perspective, while consumers browsing products longer and longer, they will get better accurate recommendation results. When the consumers are satisfied with a particular E-commerce sites, they will purchase there more. It can also keep company staying competitive in the market. The results of this research shows that although most sequences in the log file faced the non-strict sequence problem, we use overlapping clustering can solve this problem effectively. To compare with non-using overlapping clustering model, use overlapping clustering’s model prediction accuracy is 25% higher than non-using overlapping clustering.

第一章、緒論 1 1.1研究背景 1 1.2研究目的 2 1.3研究議題 4 1.3.1商品無嚴格序列的情況下如何有效推薦: 4 1.3.2 如何預測瀏覽商品的序列: 4 1.4 重要性 5 第二章、背景資料與文獻探討 6 2.1 網頁瀏覽行為預測 6 2.2 長短期記憶神經網路(Long Short-Term Memory, LSTM) 8 2.3 粒子群演算法(Particle Swarm optimization, PSO) 10 2.4 重疊分群(Overlapping clustering) 11 第三章、研究方法 12 3.1 研究架構與流程 12 3.2 資料收集與預處理 14 3.2.1資料收集 14 3.2.2資料預處理 15 3.3 建構IPSO-LSTM模型 17 3.4 建置商品的重疊分群 20 3.5 商品與重疊分群組合變數挑選 22 3.6 關聯分析找尋合適群集 29 3.7 模型評估 32 3.8敏感度分析 36 第四章、實作結果 37 4.1個案資料收集與預處理 37 4.1.1個案資料介紹 37 4.1.2個案資料預處理 37 4.2 模型參數設定 41 4.2.1 分群數設定 41 4.2.2 迭代粒子群演算法參數設定 42 4.2.3 長短期記憶神經網路模型參數設定 42 4.3 實驗結果與分析 44 4.3.1 商品與重疊分群變數組合結果 44 4.3.2 關聯分析找尋合適集群結果 48 4.3.3 模型預測與分析 54 4.4敏感度分析 57 4.4.1 探討重疊分群的增減對預測之影響 57 4.4.2 探討預測後續瀏覽序列長度之影響 57 4.4.1 重疊分群數的增減對預測之影響 58 4.5方法比較 62 4.5.1 關聯分析法 62 4.5.2 無重疊分群法 63 第五章、結論與建議 65 5.1結論 65 5.2研究限制與未來建議 66 參考文獻 68 圖目錄 圖 1、現有商品推薦系統與本研究商品推薦系統比較圖 3 圖 2、研究流程及架構 13 圖 3、LSTM神經網路圖 18 圖 4、重疊分群對應位置圖(註:數字1至10表示位置索引值而非實際數值 21 圖 5、IPSO-LSTM複合模型建構流程 22 圖 6、樂活票券測試資料混淆矩陣 33 圖 7、電商平台記錄檔資料部分截圖 37 圖 8、休閒零食預處理後資料部分截圖 38 圖 9、藥妝小物預處理後資料部分截圖 39 圖 10、數位3C預處理後資料部分截圖 39 圖 11、樂活票券預處理後資料部分截圖 40 圖 12、生活好康預處理後資料部分截圖 40 圖 13、休閒零食(D1)重疊分群示意圖 44 圖 14、藥妝小物(D2)重疊分群示意圖 45 圖 15、數位3C(D3)重疊分群示意圖 46 圖 16、樂活票券(D4)重疊分群示意圖 47 圖 17、生活好康(D5)重疊分群示意圖 48 圖 18、LSTM預測結果示意圖 56 圖 19、重疊分群組合 58 圖 20、LSTM模型預測序列數的準確率 60 表目錄 表 1、民國108-109年台灣整體零售業網路銷售額 1 表 2、網頁瀏覽行為預測相關文獻 7 表 3、記錄檔資料範例格式 14 表 4、資料集變數與對應之資料集名稱 15 表 5、預處理後資料及型態 16 表 6、重疊分群之變數定義 20 表 7、重疊分群之所有排列結果 25 表 8、窮舉所有序列排列之結果 25 表 9、資料輸入LSTM資料型態 27 表 10、關聯分析探勘 29 表 11、樂活票券測試資料用戶瀏覽序列 30 表 12、重疊分群更新後樂活票券測試資料用戶瀏覽序列 30 表 13、測試資料輸入LSTM資料型態 31 表 14、精確率、召回率、f1-score計算 34 表 15、商品編號內容 41 表 16、IPSO參數設定 42 表 17、休閒零食(D1)重疊分群結果 44 表 18、藥妝小物(D2)重疊分群結果 45 表 19、數位3C(D3)重疊分群結果 45 表 20、樂活票券(D4)重疊分群結果 46 表 21、生活好康(D5)重疊分群結果 47 表 22、休閒食品關聯分析結果 49 表 23、藥妝小物關聯分析結果 50 表 24、數位3C關聯分析結果 51 表 25、樂活票券關聯分析結果 52 表 26、生活好康關聯分析結果 53 表 27、LSTM模型預測結果 54 表 28、LSTM針對不同資料集的預測效果 (單位%) 55 表 29、不同分群數的LSTM模型預測準確率(單位%) 58 表 30、LSTM模型預測序列數的準確率(單位%) 59 表 31、minsup=0.5% 關聯分析 62 表 32、無重疊分群算法的預測準確率(單位%) 63

經濟部統計處 (2020), 統計指標簡易查詢, 取自https://dmz26.moea.gov.tw/GMWeb/common/CommonQuery.aspx
Adeniyi, D. A., Wei, Z., & Yongquan, Y. (2016). Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 12(1), 90-108.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Paper presented at the Proc. 20th int. conf. very large data bases, VLDB.
Awad, M. A., & Khalil, I. (2012). Prediction of user's web-browsing behavior: Application of markov model. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1131-1142.
Baadel, S., Thabtah, F., & Lu, J. (2016). Overlapping clustering: A review. Paper presented at the 2016 SAI Computing Conference (SAI).
Bai, L., Wang, H., & Zhai, Z. (2020). Research on Network Traffic Forecast Based on Improved LSTM Neural Network. International Core Journal of Engineering, 6(6), 225-234.
Bouktif, S., Fiaz, A., Ouni, A., & Serhani, M. A. (2020). Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting. Energies, 13(2), 391.
Chollet, F. (2018). Deep learning with Python (Vol. 361): Manning New York.
Chung, H., & Shin, K.-s. (2018). Genetic algorithm-optimized long short-term memory network for stock market prediction. Sustainability, 10(10), 3765.
Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. Paper presented at the MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.
Evermann, J., Rehse, J.-R., & Fettke, P. (2017). Predicting process behaviour using deep learning. Decision Support Systems, 100, 129-140.
Fong, A. C. M., Zhou, B., Hui, S. C., Hong, G. Y., & Do, T. A. (2011). Web content recommender system based on consumer behavior modeling. IEEE Transactions on Consumer Electronics, 57(2), 962-969.
Fu, R., Zhang, Z., & Li, L. (2016). Using LSTM and GRU neural network methods for traffic flow prediction. Paper presented at the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC).
Graves, A., Mohamed, A.-r., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. Paper presented at the 2013 IEEE international conference on acoustics, speech and signal processing.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Jindal, H., & Sardana, N. (2020). PKM3: an optimal Markov model for predicting future navigation sequences of the web surfers. Pattern Analysis and Applications, 1-19.
Khazaei, A., Ghasemzadeh, M., & Gollmann, D. (2018). Overlapping Clustering for Textual Data. Paper presented at the Proceedings of the 2018 7th International Conference on Software and Computer Applications.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Lee, T.-Y., & Chen, C.-L. (2007). Unit commitment with probabilistic reserve: An IPSO approach. Energy conversion and Management, 48(2), 486-493.
Lv, L., Kong, W., Qi, J., & Zhang, J. (2018). An improved long short-term memory neural network for stock forecast. Paper presented at the MATEC Web of Conferences.
Mahdavi, M., & Abolhassani, H. (2009). Harmony K-means algorithm for document clustering. Data mining and knowledge discovery, 18(3), 370-391.
Mary, S. A., & Malarvizhi, M. (2014). A new improved weighted association rule mining with dynamic programming approach for predicting a user’s next access. Computer Science & Information Technology.
Narvekar, M., & Banu, S. S. (2015). Predicting user’s web navigation behavior using hybrid approach. Procedia Computer Science, 45, 3-12.
Pérez-Suárez, A., Martínez-Trinidad, J. F., Carrasco-Ochoa, J. A., & Medina-Pagola, J. E. (2013). OClustR: A new graph-based algorithm for overlapping clustering. Neurocomputing, 121, 234-247.
Sanchiz, M., Chin, J., Chevalier, A., Fu, W.-T., Amadieu, F., & He, J. (2017). Searching for information on the web: Impact of cognitive aging, prior domain knowledge and complexity of the search problems. Information Processing & Management, 53(1), 281-294.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
Wang, Y., Huang, M., Zhu, X., & Zhao, L. (2016). Attention-based LSTM for aspect-level sentiment classification. Paper presented at the Proceedings of the 2016 conference on empirical methods in natural language processing.
Whang, J. J., Hou, Y., Gleich, D. F., & Dhillon, I. S. (2018). Non-exhaustive, overlapping clustering. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2644-2659.
Wu, I.-C., & Yu, H.-K. (2020). Sequential analysis and clustering to investigate users’ online shopping behaviors based on need-states. Information Processing & Management, 57(6), 102323.
Yang, J., & Leskovec, J. (2013). Overlapping community detection at scale: a nonnegative matrix factorization approach. Paper presented at the Proceedings of the sixth ACM international conference on Web search and data mining.
Zhang, Y., & Chen, G. (2014). A forensics method of web browsing behavior based on association rule mining. Paper presented at the The 2014 2nd International Conference on Systems and Informatics (ICSAI 2014).
Zhao, Y., Karypis, G., & Fayyad, U. (2005). Hierarchical clustering algorithms for document datasets. Data mining and knowledge discovery, 10(2), 141-168.

QR CODE