研究生: |
廖宣瑋 Hsuan-Wei Liao |
---|---|
論文名稱: |
巨量資料探勘框架:基於極限梯度提升之預測免費手機遊戲中潛在新進付費玩家 Big Data Mining Framework: Predicting Potential New Paying Player in Mobile Free-to-Play Games Based on Extreme Gradient Boosting |
指導教授: |
戴文凱
Wen-Kai Tai |
口試委員: |
張國清
鮑興國 |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 72 |
中文關鍵詞: | 付費預測 、免費遊玩遊戲 、巨量資料 、資料探勘 、機器學習 、極限梯度提升 |
外文關鍵詞: | Purchases Prediction, Free-to-Play, Big Data, Data Mining, Machine Learning, Extreme Gradient Boosting |
相關次數: | 點閱:216 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前市面上之手機遊戲多以免費遊玩商業模式(Free-to-Play, F2P)為主,使得遊戲內購買( In-App Purchases, IAP)顯得越來越重要,已然成為遊戲開發商營運之重點,為了能夠推出成功吸引各式玩家的精準行銷,需要資料分析團隊針對付費玩家進行研究,並且希望能夠在新進玩家族群中,成功預測出潛在付費玩家,以利提升IAP的意願,因此,如何在付費玩家資料中,有效探勘出資料特徵並透過機器學習進行預測,則為此次研究的目標。
本論文對此議題提出一巨量資料探勘框架,將需先將資料進行前處理以及預測前之資料分析,隨後訓練機器學習與其最佳化處理,最後再依預測之結果導入資料特徵重要性分析之中,完成整體預測與分析之工作,此框架將由四大階段組成: (1) 資料前處理階段、(2) 資料分析階段、(3) 機器學習階段及(4) 預測結果分析階段。
根據實驗結果,藉由我們提出的巨量資料探勘框架,利用無價值玩家觀察期清理了無價值的資料,並藉由付費玩家定義期準備了付費玩家與非付費玩家目標值,利用資料特徵探勘期探勘出了有價值的玩家遊戲行為軌跡。透過探索性資料分析(Exploratory Data Analysis, EDA)找出不合理資料特徵與高資訊量資料特徵,推測出有價值的資料特徵。能夠經由學習模型之預測,預測出潛在之新進付費玩家,並依其預測結果,分析資料特徵重要性,了解到玩家消費原因與遊戲之連動性。整體來說,該框架將能使得預測付費玩家之時間成本與人力成本有效降低,並得到對於行銷有利的資訊。
In the last few years, most mobile games on the market are dominated by Free-to-Play(F2P) business models, which makes In-App Purchase(IA) more and more important, and has become the focus of game developer operations. In order to be able to launch accurate marketing that successfully attracts all types of players, it is necessary for the data analysis team to conduct research on paying players, and hope to be able to successfully predict potential paying players in the new player group, so as to improve the willingness of IAP. Among the payer data, effective mining of data features and prediction through machine learning are goals of this research.
This paper proposes a big data mining framework for this topic. It will need to process the data and data analysis before prediction, then train and optimize the model via machine learning algorithm. Finally, analyze the feature importance with the prediction results.
This framework will be composed of four major stages: (1) data pre-processing stage, (2) data analysis stage, (3) machine learning stage, (4) feature importance analysis stage.
According to the experimental results, with the big data mining framework we proposed, the valueless data was cleaned up by the observation period of valueless players, and the target values of payer and non-payer were prepared through the definition period of payer. The valuable player game play record is prepared by the mining period of data mining. Through Exploratory Data Analysis(EDA), unreasonable data features and high-information data features are found to infer valuable data features. Through the prediction of the learning model, it can predict potential new paying players, and analyze the importance of features according to the prediction results, understand the player’s consumption reasons and the connection with game. Overall, the framework will effectively reduce the time cost and labor cost of predicting paying players, and obtain favorable information for marketing.
[1] A. Drachen, C. Thurau, J. Togelius, G. N. Yannakakis, and C. Bauckhage, “Game
data mining,” in Game analytics, pp. 205–253, Springer, 2013.
[2] P. Miller, “Gdc 2012: How valve made team fortress 2 freetoplay,”
Gamasutra.
Haettu, vol. 7, p. 2012, 2012.
[3] E. Lee, Y. Jang, D.M.
Yoon, J. Jeon, S.i.
Yang, S.K.
Lee, D.W.
Kim, P. P. Chen,
A. Guitart, P. Bertens, f. Periáñez, F. Hadiji, M. Müller, Y. Joo, j. Lee, I. Hwang,
and K.J.
Kim, “Game data mining competition on churn prediction and survival
analysis using commercial game log data,” IEEE Transactions on Games, vol. 11,
no. 3, pp. 215–226, 2018.
[4] T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters, vol. 27,
no. 8, pp. 861–874, 2006.
[5] D. Powers, “Evaluation: From precision, recall and ffactor
to roc, informedness,
markedness and correlation,” Mach. Learn. Technol., vol. 2, 01 2008.
[6] C. Goutte and É. Gaussier, “A probabilistic interpretation of precision, recall and
fscore,
with implication for evaluation,” in ECIR, 2005.
[7] Wikipedia contributors, “Freetoplay
— Wikipedia, the free encyclopedia.”
https://en.wikipedia.org/w/index.php?title=Free-to-play&oldid=
965292994, 2020. [Online; accessed 10July2020].
[8] R. Flunger, A. Mladenow, and C. Strauss, “Game analytics on free to play,” in Big
Data Innovations and Applications (M. Younas, I. Awan, and S. Benbernou, eds.),
(Cham), pp. 133–141, Springer International Publishing, 2019.
[9] S.K.
Lee, S.J.
Hong, S.I.
Yang, and H. Lee, “Predicting churn in mobile freetoplay
games,” in 2016 International Conference on Information and Communication
Technology Convergence (ICTC), pp. 1046–1048, IEEE, 2016.
[10] R. Sifa, F. Hadiji, J. Runge, A. Drachen, K. Kersting, and C. Bauckhage, “Predicting
purchase decisions in mobile freetoplay
games,” in Eleventh Artificial Intelligence
and Interactive Digital Entertainment Conference, 2015.
[11] M. Tamassia, W. Raffe, R. Sifa, A. Drachen, F. Zambetta, and M. Hitchens, “Predicting
player churn in destiny: A hidden markov models approach to predicting
player departure in a major online game,” in 2016 IEEE Conference on Computational
Intelligence and Games (CIG), pp. 1–8, IEEE, 2016.
[12] Á. Periáñez, A. Saas, A. Guitart, and C. Magne, “Churn prediction in mobile social
games: Towards a complete assessment using survival ensembles,” in 2016 IEEE
International Conference on Data Science and Advanced Analytics (DSAA), pp. 564–
573, IEEE, 2016.
[13] J. Runge, P. Gao, F. Garcin, and B. Faltings, “Churn prediction for highvalue
players
in casual social games,” in 2014 IEEE conference on Computational Intelligence and
Games, pp. 1–8, IEEE, 2014.
[14] A. Martínez, C. Schmuck, S. Pereverzyev Jr, C. Pirker, and M. Haltmeier, “A machine
learning framework for customer purchase prediction in the noncontractual
setting,” European Journal of Operational Research, vol. 281, no. 3, pp. 588–596,
2020.
[15] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauckhage, “Predicting
player churn in the wild,” in 2014 IEEE Conference on Computational Intelligence
and Games, pp. 1–8, 2014.
[16] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp. 123–140,
1996.
[17] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
[18] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression
trees. CRC press, 1984.
[19] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” JournalJapanese
Society For Artificial Intelligence, vol. 14, no. 771780,
p. 1612, 1999.
[20] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings
of the 22nd acm sigkdd international conference on knowledge discovery and data
mining, pp. 785–794, 2016.
[21] T. Huang, “機器學習: Ensemble learning 之bagging、boosting 和adaboost.”
https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%
AD%B8%E7%BF%92-ensemble-learning%E4%B9%8Bbagging-boosting%E5%92%
8Cadaboost-af031229ebc3. [Online; accessed 10July2020].
[22] A. Semenov, P. Romov, S. Korolev, D. Yashkov, and K. Neklyudov, “Performance of
machine learning algorithms in predicting game outcome from drafts in dota 2,” in
International Conference on Analysis of Images, Social Networks and Texts, pp. 26–
37, Springer, 2016.
[23] A. Janusz, T. Tajmajer, and M. Świechowski, “Helping ai to play hearthstone:
Aaia’17 data mining challenge,” in 2017 Federated Conference on Computer Science
and Information Systems (FedCSIS), pp. 121–125, IEEE, 2017.
[24] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic
minority oversampling
technique,” Journal of artificial intelligence research,
vol. 16, pp. 321–357, 2002.
[25] N. Chinchor and B. M. Sundheim, “Muc5
evaluation metrics,” in Fifth Message Understanding
Conference (MUC5):
Proceedings of a Conference Held in Baltimore,
Maryland, August 2527,
1993, 1993.
[26] M. Kubat, R. Holte, and S. Matwin, “Learning when negative examples abound,” in
European Conference on Machine Learning, pp. 146–153, Springer, 1997.
[27] J. W. Tukey, Exploratory data analysis, vol. 2. Reading, MA, 1977.
[28] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan,
A. Ng, B. Liu, S. Y. Philip, Z.H.
Zhou, M. Steinbach, D. J. Hand, and D. Steinberg,
“Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14,
no. 1, pp. 1–37, 2008.
[29] N. V. Chawla, “Data mining for imbalanced datasets: An overview,” in Data mining
and knowledge discovery handbook, pp. 875–886, Springer, 2009.
[30] I. Tomek, “Two modifications of cnn,” IEEE Transactions on Systems, Man, and
Cybernetics, vol. SMC6,
no. 11, pp. 769–772, 1976.
[31] M. R. Smith, T. Martinez, and C. GiraudCarrier,
“An instance level analysis of data
complexity,” Machine learning, vol. 95, no. 2, pp. 225–256, 2014.
[32] G. E. Batista, A. L. Bazzan, and M. C. Monard, “Balancing training data for automated
annotation of keywords: a case study.,” in WOB, pp. 10–18, 2003.
[33] X.Y.
Liu, J. Wu, and Z.H.
Zhou, “Exploratory undersampling for classimbalance
learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics),
vol. 39, no. 2, pp. 539–550, 2008.
[34] J. Brownlee, Imbalanced Classification with Python: Better Metrics, Balance
Skewed Classes, CostSensitive
Learning. Machine Learning Mastery, 2020.
[35] M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan,
M. J. Franklin, A. Ghodsi, and M. Zaharia, “Spark sql: Relational data processing
in spark,” in Proceedings of the 2015 ACM SIGMOD international conference on
management of data, pp. 1383–1394, 2015.
[36] A. Bilogur, “Missingno: a missing data visualization suite,” Journal of Open Source
Software, vol. 3, no. 22, p. 547, 2018.
[37] M. Waskom, O. Botvinnik, J. Ostblom, M. Gelbart, S. Lukauskas, P. Hobson, D. C.
Gemperline, T. Augspurger, Y. Halchenko, J. B. Cole, J. Warmenhoven, J. de Ruiter,
C. Pye, S. Hoyer, J. Vanderplas, S. Villalba, G. Kunter, E. Quintero, P. Bachant,
M. Martin, K. Meyer, C. Swain, A. Miles, T. Brunner, D. O’Kane, T. Yarkoni, M. L.
Williams, C. Evans, C. Fitzgerald, and Brian, “mwaskom/seaborn: v0.10.1 (april
2020),” Apr. 2020.
[38] J. Reback, W. McKinney, jbrockmendel, J. V. den Bossche, T. Augspurger, P. Cloud,
gfyoung, Sinhrks, A. Klein, M. Roeschke, S. Hawkins, J. Tratner, C. She, W. Ayd,
T. Petersen, M. Garcia, J. Schendel, A. Hayden, MomIsBestFriend, V. Jancauskas,
P. Battiston, S. Seabold, chris b1, h vetinari, S. Hoyer, W. Overmeire, alimcmaster1,
K. Dong, C. Whelan, and M. Mehyar, “pandasdev/
pandas: Pandas 1.0.3,” Mar.
2020.
[39] Wes McKinney, “Data Structures for Statistical Computing in Python,” in
Proceedings of the 9th Python in Science Conference (Stéfan van der Walt and Jarrod
Millman, eds.), pp. 56 – 61, 2010.
[40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna
peau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikitlearn:
Machine learning in
Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[41] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae,
P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt,
and G. Varoquaux, “API design for machine learning software: experiences from the
scikitlearn
project,” in ECML PKDD Workshop: Languages for Data Mining and
Machine Learning, pp. 108–122, 2013.
[42] J. Davis and M. Goadrich, “The relationship between precisionrecall
and roc
curves,” in Proceedings of the 23rd international conference on Machine learning,
pp. 233–240, 2006.