簡易檢索 / 詳目顯示

研究生: 徐世軒
Shih-Hsuan Hsu
論文名稱: 結合序列樣式探勘與模糊分群於XGBoost之瀏覽序列預測
Sequential Browsing Prediction Combined with Sequential Pattern Mining and Fuzzy Clustering in XGBoost
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 歐陽超
Chao Ou-Yang
郭人介
Ren-Jieh Kuo
王孔政
Kung-Jeng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 77
中文關鍵詞: 序列推薦系統序列樣式探勘模糊分群極限梯度提升
外文關鍵詞: Sequential Recommendation System, Sequential Pattern Mining, Fuzzy Clustering, XGBoost
相關次數: 點閱:387下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

電子商務的規模日益漸大,電商平台內所販售的商品項目也逐漸增加,導致使用者在平台上瀏覽商品的搜尋成本也會跟著增加,使得越來越多的平台採用推薦系統來幫助使用者減少瀏覽商品的搜尋成本。一般較為普遍使用的推薦系統皆是將使用者瀏覽商品與商品間的關聯性進行評估,或是將推薦系統分成兩大類「內容過濾」與「協同過濾」的推薦引擎,但因為上述這些推薦系統中皆沒有考慮使用者在平台上進行瀏覽商品時,可能會因為興趣、習慣或需求進而造成每位使用者有不同的瀏覽商品序列,本研究利用使用者的思維為一個出發點,期望能夠有著一套為使用者量身打造的推薦結果。
本研究在分析個案電商平台眾多的資料中,具有有順序性的序列資料是一個相當重要的研究方向,不管是商業上或是科學上都有著廣泛的應用,本研究可以利用資料探勘的技術將序列資料中挖掘出較高頻繁的序列樣式,將資料中使用者於六大類別商品瀏覽序列行為的資料進行序列的探究,因此將採用一個以序列樣式探勘(Prefix Span)為基礎挖掘商品類別頻繁序列,接著利用模糊分群(Fuzzy Clustering)將商品與商品間較為相近的品項結合為一群集,並在最後利用機器學習中極限梯度提升(eXtreme Gradient Boosting, XGBoost)的技術針對電商平台中使用者在商品瀏覽序列的行為進行預測。
透過本研究的研究成果顯示,在使用者瀏覽序列的資料紀錄檔,發現大多數的瀏覽序列都存在著無嚴格排序且序列中前後項商品是否有序列關聯性的狀況,但使用序列樣式探勘與模糊分群皆能有效地解決此狀況,且使用極限梯度提升模型(XGBoost)對於預測的準確率(72.77%)也與其他模型(AdaBoost:67.72%、Decision Tree:65.45%、Random Forest:67.01%、LSTM:57.70%)相比準確率皆有較好的情形。因此可知本研究所使用之序列樣式探勘結合模糊分群於極限梯度提升模型的預測結果能較為準確預測使用者下一步想要瀏覽之商品項目為何,並也證明此方法與電商平台傳統推薦系統相比之下有助於提升在推薦系統中無考慮瀏覽順序的問題。


Nowadays, e-commerce is becoming more and more popular, and the variety of commodities is gradually increasing, so the search time for users to browse commodities is also gradually increasing. Most platforms are using recommender systems to help users reduce search time, traditional technologies do not take into account the interests, habits or needs of users, which may result in different product browsing orders for each user. This research hopes to develop a recommender system suitable for users.
This research uses data mining technology to mine sequence data with high-frequency sequence patterns, which means that users perform sequence exploration on the browsing sequence behavior data of six categories of commodities, and mine frequent sequences of commodity categories. The differences between items are then compared using fuzzy clustering, followed by grouping similar items into a cluster. Finally, extreme gradient boosting techniques in machine learning are used to predict user behavior during item browsing sequences.
The results of this research show that the accuracy of the method used in this study is 72.77% higher than other models, such as AdaBoost: 67.72%, Decision Tree: 65.45%, Random Forest: 67.01%, LSTM: 57.70%. Therefore, it can be seen that the prediction results of the method used in this research can accurately predict the content that the user will browse next, and it is also proved that the method can improve the recommender system without considering the browsing order.

表目錄 VI 圖目錄 VIII 第一章、緒論 1 1.1 研究背景 1 1.2 研究目的 2 1.3 研究議題 4 1.4 重要性 5 第二章、文獻探討 6 2.1 預測電商平台商品瀏覽行為 6 2.2 模糊C均值聚類演算法(FUZZY C MEANS, FCM) 8 2.3粒子群演算法(PARTICLE SWARM OPTIMIZATION, PSO) 10 2.4極限梯度提升(EXTREME GRADIENT BOOSTING, XGBOOST) 12 第三章、研究方法 15 3.1 研究流程與架構 15 3.2 資料收集與前處理 17 3.2.1資料收集 17 3.2.2資料前處理-商品分類方式延伸 18 3.2.3資料前處理-使用者瀏覽序列刪除整合與分割 19 3.3 建置商品瀏覽序列之序列樣式探勘-PREFIX SPAN 22 3.4 建置商品與模糊分群組合挑選-FUZZY C MEANS 31 3.5 建構XGBOOST極限梯度提升模型 38 3.6 模型評估 41 3.7敏感度分析 44 第四章、實作結果 45 4.1電商平台個案資料收集及前處理 45 4.1.1電商平台個案資料內容介紹 45 4.1.2電商平台個案資料前處理 46 4.2模型之參數設定 47 4.2.1序列樣式探勘之參數設定 47 4.2.2 模糊分群演算法之參數設定 48 4.2.3 極限梯度提升模型之參數設定 48 4.3 實驗結果與分析 50 4.3.1 使用者瀏覽序列之序列樣式探勘結果 50 4.3.2 商品細分類別之模糊分群群集結果 51 4.3.3 XGBoost模型預測與分析 54 4.4敏感度分析 59 4.4.1 探究模糊分群的分群數增加或減少對於序列預測之成效影響 59 4.4.2 探究預測商品瀏覽序列長度多寡之成效影響 60 4.5方法比較 62 第五章、結論與建議 64 5.1結論 64 5.2研究限制與未來建議 65 參考文獻 67

Agrawal, R., et al. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD international conference on Management of data.

Awad, M. A. and I. Khalil (2012). "Prediction of user's web-browsing behavior: Application of markov model." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42(4): 1131-1142.

Bezdek, J. C. (2013). Pattern recognition with fuzzy objective function algorithms, Springer Science & Business Media.

Breiman, L., et al. (2017). Classification and regression trees, Routledge.

Chen, T. and C. Guestrin (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.

Dongshan, X. and S. Junyi (2002). "A new markov model for web access prediction." Computing in Science & Engineering 4(6): 34-39.

Dunn, J. C. (1973). "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters."

Eberhart, R. and J. Kennedy (1995). Particle swarm optimization. Proceedings of the IEEE international conference on neural networks, Citeseer.

Freund, Y., et al. (1999). "A short introduction to boosting." Journal-Japanese Society For Artificial Intelligence 14(771-780): 1612.

Izakian, H. and A. Abraham (2011). "Fuzzy C-means and fuzzy swarm for fuzzy clustering problem." Expert Systems with Applications 38(3): 1835-1838.

lumban Gaol, F. (2010). Exploring the pattern of habits of users using web log squential pattern. 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, IEEE.

Mahdavi, M. and H. Abolhassani (2009). "Harmony K-means algorithm for document clustering." Data mining and knowledge discovery 18(3): 370-391.

Mishra, R., et al. (2015). "A web recommendation system considering sequential information." Decision Support Systems 75: 1-10.

Mobasher, B., et al. (2001). Effective personalization based on association rule discovery from web usage data. Proceedings of the 3rd international workshop on Web information and data management.

Pei, J., et al. (2004). "Mining sequential patterns by pattern-growth: The prefixspan approach." IEEE Transactions on knowledge and data engineering 16(11): 1424-1440.

Sanchiz, M., et al. (2017). "Searching for information on the web: Impact of cognitive aging, prior domain knowledge and complexity of the search problems." Information Processing & Management 53(1): 281-294.

Seebacher, U. G. (2021). MarTech 8000: How to Survive in Jurassic Park of Dazzling Marketing Solutions, Springer International Publishing: 89-117.

Shi, Y. and R. C. Eberhart (2001). Fuzzy adaptive particle swarm optimization. Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), IEEE.

Zhao, Y., et al. (2005). "Hierarchical clustering algorithms for document datasets." Data mining and knowledge discovery 10(2): 141-168.

Zhou, B., et al. (2004). An intelligent recommender system using sequential web access patterns. IEEE Conference on Cybernetics and Intelligent Systems, 2004., IEEE.

無法下載圖示 全文公開日期 2025/08/11 (校內網路)
全文公開日期 2025/08/11 (校外網路)
全文公開日期 2025/08/11 (國家圖書館:臺灣博碩士論文系統)
QR CODE