巨量資料探勘框架：基於極限梯度提升之預測免費手機遊戲中潛在新進付費玩家

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖宣瑋 Hsuan-Wei Liao
論文名稱：	巨量資料探勘框架：基於極限梯度提升之預測免費手機遊戲中潛在新進付費玩家 Big Data Mining Framework: Predicting Potential New Paying Player in Mobile Free-to-Play Games Based on Extreme Gradient Boosting
指導教授：	戴文凱 Wen-Kai Tai
口試委員:	張國清鮑興國
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	72
中文關鍵詞：	付費預測、免費遊玩遊戲、巨量資料、資料探勘、機器學習、極限梯度提升
外文關鍵詞：	Purchases Prediction, Free-to-Play, Big Data, Data Mining, Machine Learning, Extreme Gradient Boosting
相關次數：	點閱：216 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

目前市面上之手機遊戲多以免費遊玩商業模式（Free-to-Play, F2P）為主，使得遊戲內購買（ In-App Purchases, IAP）顯得越來越重要，已然成為遊戲開發商營運之重點，為了能夠推出成功吸引各式玩家的精準行銷，需要資料分析團隊針對付費玩家進行研究，並且希望能夠在新進玩家族群中，成功預測出潛在付費玩家，以利提升IAP的意願，因此，如何在付費玩家資料中，有效探勘出資料特徵並透過機器學習進行預測，則為此次研究的目標。

本論文對此議題提出一巨量資料探勘框架，將需先將資料進行前處理以及預測前之資料分析，隨後訓練機器學習與其最佳化處理，最後再依預測之結果導入資料特徵重要性分析之中，完成整體預測與分析之工作，此框架將由四大階段組成： (1) 資料前處理階段、(2) 資料分析階段、(3) 機器學習階段及(4) 預測結果分析階段。

根據實驗結果，藉由我們提出的巨量資料探勘框架，利用無價值玩家觀察期清理了無價值的資料，並藉由付費玩家定義期準備了付費玩家與非付費玩家目標值，利用資料特徵探勘期探勘出了有價值的玩家遊戲行為軌跡。透過探索性資料分析（Exploratory Data Analysis, EDA）找出不合理資料特徵與高資訊量資料特徵，推測出有價值的資料特徵。能夠經由學習模型之預測，預測出潛在之新進付費玩家，並依其預測結果，分析資料特徵重要性，了解到玩家消費原因與遊戲之連動性。整體來說，該框架將能使得預測付費玩家之時間成本與人力成本有效降低，並得到對於行銷有利的資訊。

In the last few years, most mobile games on the market are dominated by Free-to-Play(F2P) business models, which makes In-App Purchase(IA) more and more important, and has become the focus of game developer operations. In order to be able to launch accurate marketing that successfully attracts all types of players, it is necessary for the data analysis team to conduct research on paying players, and hope to be able to successfully predict potential paying players in the new player group, so as to improve the willingness of IAP. Among the payer data, effective mining of data features and prediction through machine learning are goals of this research.

This paper proposes a big data mining framework for this topic. It will need to process the data and data analysis before prediction, then train and optimize the model via machine learning algorithm. Finally, analyze the feature importance with the prediction results.
This framework will be composed of four major stages: (1) data pre-processing stage, (2) data analysis stage, (3) machine learning stage, (4) feature importance analysis stage.

According to the experimental results, with the big data mining framework we proposed, the valueless data was cleaned up by the observation period of valueless players, and the target values of payer and non-payer were prepared through the definition period of payer. The valuable player game play record is prepared by the mining period of data mining. Through Exploratory Data Analysis(EDA), unreasonable data features and high-information data features are found to infer valuable data features. Through the prediction of the learning model, it can predict potential new paying players, and analyze the importance of features according to the prediction results, understand the player’s consumption reasons and the connection with game. Overall, the framework will effectively reduce the time cost and labor cost of predicting paying players, and obtain favorable information for marketing.

目錄
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .III
ABSTRACT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IV
誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .V
目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VI
圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IX
表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .XI
符號說明. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .XII
1緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1研究背景與動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2研究目標. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
3研究方法概述. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
4研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
5本論文之章節結構. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2文獻探討. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
1  Free­to­Play類型遊戲興起. . . . . . . . . . . . . . . . . . . . . . . .5
2資料前處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3學習模型選擇. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
4資料不平衡處理及其評估方式. . . . . . . . . . . . . . . . . . . . . .9
3研究方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
1資料前處理階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
1.1整合資料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
1.2清理資料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
1.2.1空缺值處理. . . . . . . . . . . . . . . . . . . . . . .13
1.2.2無價值玩家資料處理. . . . . . . . . . . . . . . . . .13
1.3目標值準備. . . . . . . . . . . . . . . . . . . . . . . . . . . .14
1.4資料特徵探勘. . . . . . . . . . . . . . . . . . . . . . . . . . .16
2資料分析階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
2.1探索性資料分析( Exploratory Data Analysis, EDA ). . . . . .18
2.1.1不合理之資料特徵. . . . . . . . . . . . . . . . . . .19
2.1.2高資訊量之資料特徵. . . . . . . . . . . . . . . . . .20
3機器學習階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
3.1分割訓練與測試資料集. . . . . . . . . . . . . . . . . . . . . .21
3.2學習模型選擇. . . . . . . . . . . . . . . . . . . . . . . . . . .21
3.3資料不平衡處理. . . . . . . . . . . . . . . . . . . . . . . . . .22
3.4搜尋最佳參數解. . . . . . . . . . . . . . . . . . . . . . . . . .23
3.5交叉驗證( Cross Validation ). . . . . . . . . . . . . . . . . . .24
3.6評估驗證最佳模型. . . . . . . . . . . . . . . . . . . . . . . .25
4預測結果分析階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
4.1資料特徵重要性分析. . . . . . . . . . . . . . . . . . . . . . .26
4實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
1實驗系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
2資料前處理評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
2.1清理資料評估. . . . . . . . . . . . . . . . . . . . . . . . . . .29
2.2目標值與資料特徵評估. . . . . . . . . . . . . . . . . . . . . .31
3資料分析評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
3.1探索性資料分析評估. . . . . . . . . . . . . . . . . . . . . . .35
4機器學習評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
4.1分割訓練與測試資料集評估. . . . . . . . . . . . . . . . . . .42
4.2資料不平衡處理評估. . . . . . . . . . . . . . . . . . . . . . .43
4.3最佳模型評估. . . . . . . . . . . . . . . . . . . . . . . . . . .48
5預測結果分析評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
5.1資料特徵重要性評估. . . . . . . . . . . . . . . . . . . . . . .51
5結論與未來研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
1結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
2未來研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
                                

[1] A. Drachen, C. Thurau, J. Togelius, G. N. Yannakakis, and C. Bauckhage, “Game
data mining,” in Game analytics, pp. 205–253, Springer, 2013.
[2] P. Miller, “Gdc 2012: How valve made team fortress 2 freetoplay,”
Gamasutra.
Haettu, vol. 7, p. 2012, 2012.
[3] E. Lee, Y. Jang, D.M.
Yoon, J. Jeon, S.i.
Yang, S.K.
Lee, D.W.
Kim, P. P. Chen,
A. Guitart, P. Bertens, f. Periáñez, F. Hadiji, M. Müller, Y. Joo, j. Lee, I. Hwang,
and K.J.
Kim, “Game data mining competition on churn prediction and survival
analysis using commercial game log data,” IEEE Transactions on Games, vol. 11,
no. 3, pp. 215–226, 2018.
[4] T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters, vol. 27,
no. 8, pp. 861–874, 2006.
[5] D. Powers, “Evaluation: From precision, recall and ffactor
to roc, informedness,
markedness and correlation,” Mach. Learn. Technol., vol. 2, 01 2008.
[6] C. Goutte and É. Gaussier, “A probabilistic interpretation of precision, recall and
fscore,
with implication for evaluation,” in ECIR, 2005.
[7] Wikipedia contributors, “Freetoplay
— Wikipedia, the free encyclopedia.”
https://en.wikipedia.org/w/index.php?title=Free-to-play&oldid=
965292994, 2020. [Online; accessed 10July2020].
[8] R. Flunger, A. Mladenow, and C. Strauss, “Game analytics on free to play,” in Big
Data Innovations and Applications (M. Younas, I. Awan, and S. Benbernou, eds.),
(Cham), pp. 133–141, Springer International Publishing, 2019.
[9] S.K.
Lee, S.J.
Hong, S.I.
Yang, and H. Lee, “Predicting churn in mobile freetoplay
games,” in 2016 International Conference on Information and Communication
Technology Convergence (ICTC), pp. 1046–1048, IEEE, 2016.
[10] R. Sifa, F. Hadiji, J. Runge, A. Drachen, K. Kersting, and C. Bauckhage, “Predicting
purchase decisions in mobile freetoplay
games,” in Eleventh Artificial Intelligence
and Interactive Digital Entertainment Conference, 2015.
[11] M. Tamassia, W. Raffe, R. Sifa, A. Drachen, F. Zambetta, and M. Hitchens, “Predicting
player churn in destiny: A hidden markov models approach to predicting
player departure in a major online game,” in 2016 IEEE Conference on Computational
Intelligence and Games (CIG), pp. 1–8, IEEE, 2016.
[12] Á. Periáñez, A. Saas, A. Guitart, and C. Magne, “Churn prediction in mobile social
games: Towards a complete assessment using survival ensembles,” in 2016 IEEE
International Conference on Data Science and Advanced Analytics (DSAA), pp. 564–
573, IEEE, 2016.
[13] J. Runge, P. Gao, F. Garcin, and B. Faltings, “Churn prediction for highvalue
players
in casual social games,” in 2014 IEEE conference on Computational Intelligence and
Games, pp. 1–8, IEEE, 2014.
[14] A. Martínez, C. Schmuck, S. Pereverzyev Jr, C. Pirker, and M. Haltmeier, “A machine
learning framework for customer purchase prediction in the noncontractual
setting,” European Journal of Operational Research, vol. 281, no. 3, pp. 588–596,
2020.
[15] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauckhage, “Predicting
player churn in the wild,” in 2014 IEEE Conference on Computational Intelligence
and Games, pp. 1–8, 2014.
[16] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp. 123–140,
1996.
[17] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
[18] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression
trees. CRC press, 1984.
[19] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” JournalJapanese
Society For Artificial Intelligence, vol. 14, no. 771780,
p. 1612, 1999.
[20] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings
of the 22nd acm sigkdd international conference on knowledge discovery and data
mining, pp. 785–794, 2016.
[21] T. Huang, “機器學習: Ensemble learning 之bagging、boosting 和adaboost.”
https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%
AD%B8%E7%BF%92-ensemble-learning%E4%B9%8Bbagging-boosting%E5%92%
8Cadaboost-af031229ebc3. [Online; accessed 10July2020].
[22] A. Semenov, P. Romov, S. Korolev, D. Yashkov, and K. Neklyudov, “Performance of
machine learning algorithms in predicting game outcome from drafts in dota 2,” in
International Conference on Analysis of Images, Social Networks and Texts, pp. 26–
37, Springer, 2016.
[23] A. Janusz, T. Tajmajer, and M. Świechowski, “Helping ai to play hearthstone:
Aaia’17 data mining challenge,” in 2017 Federated Conference on Computer Science
and Information Systems (FedCSIS), pp. 121–125, IEEE, 2017.
[24] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic
minority oversampling
technique,” Journal of artificial intelligence research,
vol. 16, pp. 321–357, 2002.
[25] N. Chinchor and B. M. Sundheim, “Muc5
evaluation metrics,” in Fifth Message Understanding
Conference (MUC5):
Proceedings of a Conference Held in Baltimore,
Maryland, August 2527,
1993, 1993.
[26] M. Kubat, R. Holte, and S. Matwin, “Learning when negative examples abound,” in
European Conference on Machine Learning, pp. 146–153, Springer, 1997.
[27] J. W. Tukey, Exploratory data analysis, vol. 2. Reading, MA, 1977.
[28] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan,
A. Ng, B. Liu, S. Y. Philip, Z.H.
Zhou, M. Steinbach, D. J. Hand, and D. Steinberg,
“Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14,
no. 1, pp. 1–37, 2008.
[29] N. V. Chawla, “Data mining for imbalanced datasets: An overview,” in Data mining
and knowledge discovery handbook, pp. 875–886, Springer, 2009.
[30] I. Tomek, “Two modifications of cnn,” IEEE Transactions on Systems, Man, and
Cybernetics, vol. SMC6,
no. 11, pp. 769–772, 1976.
[31] M. R. Smith, T. Martinez, and C. GiraudCarrier,
“An instance level analysis of data
complexity,” Machine learning, vol. 95, no. 2, pp. 225–256, 2014.
[32] G. E. Batista, A. L. Bazzan, and M. C. Monard, “Balancing training data for automated
annotation of keywords: a case study.,” in WOB, pp. 10–18, 2003.
[33] X.Y.
Liu, J. Wu, and Z.H.
Zhou, “Exploratory undersampling for classimbalance
learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics),
vol. 39, no. 2, pp. 539–550, 2008.
[34] J. Brownlee, Imbalanced Classification with Python: Better Metrics, Balance
Skewed Classes, CostSensitive
Learning. Machine Learning Mastery, 2020.
[35] M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan,
M. J. Franklin, A. Ghodsi, and M. Zaharia, “Spark sql: Relational data processing
in spark,” in Proceedings of the 2015 ACM SIGMOD international conference on
management of data, pp. 1383–1394, 2015.
[36] A. Bilogur, “Missingno: a missing data visualization suite,” Journal of Open Source
Software, vol. 3, no. 22, p. 547, 2018.
[37] M. Waskom, O. Botvinnik, J. Ostblom, M. Gelbart, S. Lukauskas, P. Hobson, D. C.
Gemperline, T. Augspurger, Y. Halchenko, J. B. Cole, J. Warmenhoven, J. de Ruiter,
C. Pye, S. Hoyer, J. Vanderplas, S. Villalba, G. Kunter, E. Quintero, P. Bachant,
M. Martin, K. Meyer, C. Swain, A. Miles, T. Brunner, D. O’Kane, T. Yarkoni, M. L.
Williams, C. Evans, C. Fitzgerald, and Brian, “mwaskom/seaborn: v0.10.1 (april
2020),” Apr. 2020.
[38] J. Reback, W. McKinney, jbrockmendel, J. V. den Bossche, T. Augspurger, P. Cloud,
gfyoung, Sinhrks, A. Klein, M. Roeschke, S. Hawkins, J. Tratner, C. She, W. Ayd,
T. Petersen, M. Garcia, J. Schendel, A. Hayden, MomIsBestFriend, V. Jancauskas,
P. Battiston, S. Seabold, chris b1, h vetinari, S. Hoyer, W. Overmeire, alimcmaster1,
K. Dong, C. Whelan, and M. Mehyar, “pandasdev/
pandas: Pandas 1.0.3,” Mar.
2020.
[39] Wes McKinney, “Data Structures for Statistical Computing in Python,” in
Proceedings of the 9th Python in Science Conference (Stéfan van der Walt and Jarrod
Millman, eds.), pp. 56 – 61, 2010.
[40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna
peau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikitlearn:
Machine learning in
Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[41] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae,
P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt,
and G. Varoquaux, “API design for machine learning software: experiences from the
scikitlearn
project,” in ECML PKDD Workshop: Languages for Data Mining and
Machine Learning, pp. 108–122, 2013.
[42] J. Davis and M. Goadrich, “The relationship between precisionrecall
and roc
curves,” in Proceedings of the 23rd international conference on Machine learning,
pp. 233–240, 2006.

全文公開日期 2025/07/27 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文