Basic Search / Detailed Display

Author: 柯名鴻
Ming-Hong Ke
Thesis Title: 基於機器學習模型於新進玩家流失預測
New Player Churn Prediction Based On Machine Learning Models
Advisor: 戴文凱
Wen-Kai Tai
Committee: 黃元欣
Yuan-Shin Hwang
Tung-Ju Hsieh
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2022
Graduation Academic Year: 110
Language: 中文
Pages: 64
Keywords (in Chinese): 新進玩家流失預測免費遊玩遊戲機器學習
Keywords (in other languages): New Player, Churn Prediction, Free-to-Play, Machine Learning
Reference times: Clicks: 219Downloads: 20
School Collection Retrieve National Library Collection Retrieve Error Report
  • 近年來,手機遊戲商大多以免費遊戲為主。但是,免費遊戲在定義玩家是否流失上極為困難,遊戲商容易失去挽回玩家的機會。此外,新進玩家的流失率較高。因此,如果能準確地預測出可能流失的新進玩家,就能即時的採取挽留策略,希望能因此提高新進玩家的留存率,進而有效地提高營收。
    本論文對此議題提出一巨量資料探勘框架,此框架將由五大階段組成: (1) 資料前處理階段、(2) 資料分析階段、(3) 機器學習階段、(4) 預測結果分析階段及 (5) 產業應用分析階段。將需先將資料進行前處理以及預測前之資料分析,隨後進行機器學習的訓練,再將預測之結果導入資料特徵重要性分析之中,最後用代理人模型找出玩家流失的規則,完成整體預測與分析之工作。

    In recent years, most mobile game developers focus on free-to-play (F2P) games. However, it is extremely difficult for F2P games to define whether players are churning, which makes game operators hard to retain players. In addition, new players have a higher churn rate. Therefore, if operators can accurately predict the new players that may churn, they can immediately adopt a retention strategy to lost players, looking forward that this will increase the retention rate of new players and thus effectively increase revenue.
    This thesis proposes a machine learning framework for this topic, which is composed of five major stages: (1) pre-processing stage, (2) analysis stage, (3) machine learning stage, (4) feature extraction analysis stage and (5) businese application analysis stage. To be specific, the framework will need to process the data and data analysis before prediction, and train the model via machine learning algorithm, then analyze important features with the prediction results. Finally, by using surrogate model, the framework will be able to find out the patterns of player churn.

    中文摘要 ABSTRACT 誌謝 目錄 圖目錄 表目錄 符號說明 1. 緒論 1.1 研究背景與動機 1.2 研究目標 1.3 研究方法概述 1.4 研究貢獻 1.5 本論文之章節結構 2. 文獻探討 2.1 免費遊戲興起 2.2 資料前處理 2.3 學習模型選擇 2.4 資料不平衡處理及其評估方式 3. 研究方法 3.1 資料前處理階段 3.1.1 整合資料 3.1.2 資料過濾 刪除空缺值 無價值玩家資料處理 3.1.3 目標值準備 3.1.4 建立資料特徵 3.2 資料分析階段 3.2.1 探索性資料分析 高資訊量之資料特徵 3.3 機器學習階段 3.3.1 分割訓練與測試資料集 3.3.2 學習模型選擇 3.3.3 不平衡資料權重調整 3.3.4 搜尋最佳參數解 模型參數 網格搜索與交叉驗證 3.3.5 評估驗證最佳模型 3.4 預測結果分析階段 3.4.1 資料特徵重要性分析 3.5 產業應用分析階段 3.5.1 代理人模型 4 實驗結果與分析 4.1 實驗系統架構 4.2 資料前處理評估 4.2.1 預測受眾評估 4.2.2 預測特徵評估 4.3 資料分析評估 4.3.1 探索性資料分析評估 4.4 機器學習評估 4.4.1 分割訓練與測試資料集評估 4.4.2 資料不平衡處理評估 4.4.3 最佳模型評估 4.4.4 時間框架評估 4.5 預測結果分析評估 4.5.1 資料特徵重要性評估 4.6 產業應用分析評估 4.6.1 代理人模型評估 5. 結論與未來研究 5.1 結論 5.2 未來研究 參考文獻

    [1] P. Miller, “GDC 2012: How Valve made Team Fortress 2 free-to-play,” Gamasutra. Haettu, vol. 7, 2012.
    [2] F. Reichheld and W. Sasser, “Zero defects: quality comes to service,” Harvard Business Review, September/October, pp. 105–111, 1990.
    [3] The Team at Swrve, “The April 2014 New Players Report.”, 2014. [Online; accessed 30-June-2022].
    [4] K. Mustač, K. Bačić, L. Skorin-Kapov, and M. Sužnjević, “Predicting Player Churn of a Free-to-Play Mobile Video Game Using Supervised Machine Learning,” MDPI, 2022.
    [5] J. W. Tukey, Exploratory data analysis, vol. 2. Reading, MA, 1977.
    [6] Wikipedia contributors, “Hyperparameter optimization — Wikipedia, The Free Encyclopedia.” [Online; accessed 30-June-2022].
    [7] Wikipedia contributors, “Cross-validation — Wikipedia, The Free Encyclopedia.” [Online; accessed 30-June-2022].
    [8] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression trees. CRC press, 1984.
    [9] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
    [10] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
    [11] M. N. W. Stefano Nembrini, Inke R König, “The revival of the Gini importance?,” Bioinformatics, vol. 34, pp. 3711–3718, 2018.
    [12] Wikipedia contributors, “Surrogate model — Wikipedia, The Free Encyclopedia.” [Online; accessed 30-June-2022].
    [13] T. Fawcett, “An introduction to ROC analysis,” Pattern recognition letters, vol. 27, no. 8, pp. 861–874, 2006.
    [14] D. Powers, “Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation,” Mach. Learn. Technol., vol. 2, 01 2008.
    [15] C. Goutte and É. Gaussier, “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation,” in ECIR, 2005.
    [16] Wikipedia contributors, “Free-to-play — Wikipedia, The Free Encyclopedia.” [Online; accessed 30-June-2022].
    [17] E. Lee, Y. Jang, D. M. Yoon, J. Jeon, S. i. Yang, S. K. Lee, D. W. Kim, P. P. Chen, A. Guitart, P. Bertens, Á. Periáñez, F. Hadiji, M. Müller, Y. Joo, j. Lee, I. Hwang, and K. J. Kim, “Game data mining competition on churn prediction and survival analysis using commercial game log data,” IEEE Transactions on Games, vol. 11, no. 3, pp. 215–226, 2018.
    [18] R. Flunger, A. Mladenow, and C. Strauss, “Game Analytics on Free to Play,” in Big Data Innovations and Applications (M. Younas, I. Awan, and S. Benbernou, eds.), (Cham), pp. 133–141, Springer International Publishing, 2019.
    [19] B. Gregory, “Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data,” arXiv preprint arXiv: 1802.03396, 2018.
    [20] M. Tamassia, W. Raffe, R. Sifa, A. Drachen, F. Zambetta, and M. Hitchens, “Predicting player churn in destiny: A Hidden Markov models approach to predicting player departure in a major online game,” in 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8, IEEE, 2016.
    [21] Á. Periáñez, A. Saas, A. Guitart, and C. Magne, “Churn prediction in mobile social games: Towards a complete assessment using survival ensembles,” in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 564– 573, IEEE, 2016.
    [22] J. Runge, P. Gao, F. Garcin, and B. Faltings, “Churn prediction for high-value players in casual social games,” in 2014 IEEE conference on Computational Intelligence and Games, pp. 1–8, IEEE, 2014.
    [23] R. Sifa, F. Hadiji, J. Runge, A. Drachen, K. Kersting, and C. Bauckhage, “Predicting purchase decisions in mobile free-to-play games,” in Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference, 2015.
    [24] H. Xie, S. Devlin, D. Kudenko, and P. Cowling, “Predicting player disengagement and first purchase with event-frequency based data representation,” in 2015 IEEE conference on Computational Intelligence and Games, pp. 230–237, IEEE, 2015.
    [25] S. K. Lee, S. J. Hong, S. I. Yang, and H. Lee, “Predicting churn in mobile free-to-play games,” in 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1046–1048, IEEE, 2016.
    [26] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauckhage, “Predicting player churn in the wild,” in 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–8, 2014.
    [27] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp. 123–140, 1996.
    [28] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999.
    [29] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
    [30] N. Chinchor and B. M. Sundheim, “MUC-5 evaluation metrics,” in Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993, 1993.
    [31] M. Kubat, R. Holte, and S. Matwin, “Learning when negative examples abound,” in European Conference on Machine Learning, pp. 146–153, Springer, 1997.
    [32] A. Martínez, C. Schmuck, S. Pereverzyev Jr, C. Pirker, and M. Haltmeier, “A machine learning framework for customer purchase prediction in the non-contractual setting,” European Journal of Operational Research, vol. 281, no. 3, pp. 588–596,
    [33] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1–37, 2008.
    [34] J. Brownlee, Imbalanced Classification with Python: Better Metrics, Balance Skewed Classes, Cost-Sensitive Learning. Machine Learning Mastery, 2020.
    [35] J. Davis and M. Goadrich, “The relationship between Precision-Recall and ROC curves,” in Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240, 2006.
    [36] Wikipedia contributors, “Decision tree learning — Wikipedia, The Free Encyclopedia.” [Online; accessed 30-June-2022].
    [37] M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia, “Spark sql: Relational data processing in spark,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp. 1383–1394, 2015.
    [38] A. Bilogur, “Missingno: a missing data visualization suite,” Journal of Open Source Software, vol. 3, no. 22, p. 547, 2018.
    [39] M. Waskom, O. Botvinnik, J. Ostblom, M. Gelbart, S. Lukauskas, P. Hobson, D. C. Gemperline, T. Augspurger, Y. Halchenko, J. B. Cole, J. Warmenhoven, J. de Ruiter, C. Pye, S. Hoyer, J. Vanderplas, S. Villalba, G. Kunter, E. Quintero, P. Bachant, M. Martin, K. Meyer, C. Swain, A. Miles, T. Brunner, D. O’Kane, T. Yarkoni, M. L. Williams, C. Evans, C. Fitzgerald, and Brian, “mwaskom/seaborn: v0.10.1 (April 2020),” Apr. 2020.
    [40] J. Reback, W. McKinney, jbrockmendel, J. V. den Bossche, T. Augspurger, P. Cloud, gfyoung, Sinhrks, A. Klein, M. Roeschke, S. Hawkins, J. Tratner, C. She, W. Ayd, T. Petersen, M. Garcia, J. Schendel, A. Hayden, MomIsBestFriend, V. Jancauskas, P. Battiston, S. Seabold, chris b1, h vetinari, S. Hoyer, W. Overmeire, alimcmaster1, K. Dong, C. Whelan, and M. Mehyar, “pandas-dev/pandas: Pandas 1.0.3,” Mar. 2020.
    [41] Wes McKinney, “Data Structures for Statistical Computing in Python,” in Proceedings of the 9th Python in Science Conference (Stéfan van der Walt and Jarrod Millman, eds.), pp. 56 – 61, 2010.
    [42] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
    [43] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine learning software: experiences from the scikit-learn project,” in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122, 2013.
    [44] Wikipedia contributors, “One-hot — Wikipedia, The Free Encyclopedia.” [Online; accessed 30-June-2022].