研究生: |
王竣生 Jyun-Sheng Wang |
---|---|
論文名稱: |
基於深度模仿學習之擬人撞球AI研究 A Study on Human-like Billiards AI Bot Based on Deep Imitation Learning |
指導教授: |
戴文凱
Wen-Kai Tai |
口試委員: |
張國清
陳怡伶 |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 80 |
中文關鍵詞: | 撞球 、人工智慧 、類神經網路 、行為複製 |
外文關鍵詞: | Billiards, AI, Neural network, Behavior cloning |
相關次數: | 點閱:190 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今遊戲AI研究的方向,無論是Machine learning或是Rule-based的方法,主要的目的都在於提升AI的強度部分。AI透過良好的設計以及強大的運算能力,可以得到人類難以做出的最佳解行為,但在實際的玩家遊戲體驗角度來說,需要考量的就不只是強度了。尤其是在對戰型的遊戲中(如,撞球),若對手不斷出現異於常人的最佳行為,往往會使得玩家完全無法反擊、感到氣餒,容易有較差的遊戲體驗;同時玩家會察覺對手是AI而非真人,減少了投入程度。因此,對戰型遊戲AI的需求,不只是基本的進行遊戲能力,還必須同時擁有適當的強度,減少過於誇張的最佳行為。本研究探討了如何應用Imitation learning中的Behavior cloning,從玩家的Replay data去學習出符合一般玩家的決策行為。蒐集Replay data,做Data 整理、分類後,提取出適當的特徵,由Deep neural network 訓練出預測模型,在遊戲中的不同階段,給予AI模仿玩家思路的行為建議,進而實現合理但又不過於完美的AI策略,使得玩家較難察覺對手是AI。
以撞球遊戲為平台,本研究透過玩家Play log的資料探勘,以及Deep imitation learning,實作出能與一般玩家相仿的選球、選袋、出桿力道決策之AI,在決策上與玩家資料有80%以上的相似度,力道Model則可以在不使用物理模擬的情況下,達到75%的進球率;並探討了不同的特徵設計之下,對預測模型似人程度的影響;設計出適用檢驗撞球AI之User study,問卷結果確認了我們的AI與真人玩家確實有一定的相似度,且在原Rule-based AI表現不好的盤面有所改善。
Nowadays, the trend of game AI research in billiards, whether it is machine learning or rule-based, is to improve the strength of AI. Through proper design of algorithm and powerful computing power, AI can get the optimized behavior that humans can't make. However, in terms of actual player's game experience, it is not just about strength. Especially in competition-type games, such as billiards, if the opponent continues to choose the best move different from ordinary people, it will make the player completely unable to fight back, feeling discouraged, easy to have a poor game experience; at the same time, the player will perceive that the opponent is an AI, not a real person, which reduces the level of immersion. Therefore, the demand for competition-type game AI is not only the basic ability to play, but also must have the appropriate intensity, reducing the best move that is too exaggerated.
This study explores how to apply behavior cloning in imitation learning to learn from the player's replay data in order to match the decision strategy of the average players. Collect the replay data, do the data sorting and classification, extract the appropriate features, and train the predictive model by the deep neural network. At different stages of the game, we give the AI a behavior suggestion that imitates the player's ideas, which achieves reasonable and not perfect level of decision. This strategy makes it harder for players to realize that the opponent is an AI.
Based on the pool game, this study makes AI imitate the general player's target ball selection, target pocket selection and power/angle decision through player's play log data mining and deep imitation learning. In decision-making there is a similarity of more than 80% with player data. Besides, the force model can achieve a possibility of 75% to sink target ball without using physical simulation; We also explore the impact of different feature set designs on the degree of approximation of the predictive model. Furthermore, we design a user study to examine the human-likeness. Results of the questionnaire confirmed that our AI has a certain degree of similarity with the real player, and the situation in pool state where the original Rule-based AI act weirdly has improved.
Keyword:Billiards, AI, Neural network, Behavior cloning.
[1] K.Leibrandt,T.Lorenz,T.Nierhoff,andS.Hirche,“Modellinghumangameplayat poolandcounteringitwithananthropomorphicrobot,”inSocialRobotics(G.Herrmann, M. J. Pearson, A. Lenz, P. Bremner, A. Spiers, and U. Leonards, eds.), (Cham),pp.30–39,SpringerInternationalPublishing,2013.
[2] C.Hsi-TingandT.Wen-Kai,“Styleablenpcbehavior–acasestudyofpoolgame,” Master’sthesis,NationalTaiwanUniversityofScienceandTechnology,2017.
[3] N.JustesenandS.Risi,“Learningmacromanagementinstarcraftfromreplaysusing deeplearning,”CoRR,vol.abs/1707.03743,2017.
[4] J.-F. Landry, J.-P. Dussault, and P. Mahey, “A robust controller for a two-layered approachappliedtothegameofbilliards,”EntertainmentComputing,vol.3,no.3, pp.59–70,2012. GamesandAI.
[5] D. Alciatore, The Illustrated Principles of Pool and Billiards. Sterling; 1st edition (August5,2004),2012.
[6] H.Cho,K.Kim,andS.Cho,“Replay-basedstrategypredictionandbuildorderadaptationforstarcraftAIbots,”in2013IEEEConferenceonComputationalInteligence inGames(CIG),pp.1–7,Aug2013.
[7] M. Bogdanovic, D. Markovikj, M. Denil, and N. de Freitas, “Deep apprenticeship learningforplayingvideogames,”inAAAIWorkshops,2015.
[8] J. Barratt and C. Pan, “Deep imitation learning for playing real time strategy games.” http://cs229.stanford.edu/proj2017/final-reports/5244338. pdf,2017. Online;accessed27June2019.
[9] N.Taatgen,M.vanOploo,J.Braaksma,andJ.Niemantsverdriet,“Howtoconstruct a believable opponent using cognitive modeling in the game of set,” in The logic of cognitive systems (F. Detje, D. Dorner, and H. Schaub, eds.), pp. 201–206, Universitäts-Verlag Bamberg, 2003. ICCM-5 2003, Fifth International Conference on CognitiveModeling,April10-12,2003.
[10] I. Umarov, M. Mozgovoy, and P. Rogers, “Believable and effective ai agents in virtual worlds: Current state and future perspectives,” IJGCMS, vol. 4, pp. 37–59, 2012.
[11] A.M.Turing,“Computingmachineryandintelligence,”Mind,vol.59,no.October, pp.433–60,1950.
[12] P.Hingston,“Aturingtestforcomputergamebots,”ComputationalIntelligenceand AIinGames,IEEETransactionson,vol.1,pp.169–186,102009.
[13] B.TastanandG.Sukthankar,“Learningpoliciesforfirstpersonshootergamesusing inversereinforcementlearning,”inProceedingsoftheSeventhAAAIConferenceon Artificial Intelligence and Interactive Digital Entertainment, AIIDE’11, pp. 85–90, AAAIPress,2011.
[14] C.Renman,“Creatinghuman-likeaimovementingamesusingimitationlearning,” Master’s thesis, KTH Royal Institute OF Technology School of Computer Science andCommunication,STOCKHOLMSWEDEN,May2017.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International ConferenceonComputerVision(ICCV),pp.1026–1034,Dec2015.
[16] D.KingmaandJ.Ba,“Adam: Amethodforstochasticoptimization,”International ConferenceonLearningRepresentations,122014.
[17] K. documentation, “Usage of loss functions.” https://keras.io/losses/. Online;accessed7July2019.
[18] Wikipedia,“One-hot.”https://en.wikipedia.org/wiki/One-hot. Online;accessed7July2019.
[19] B. Jason, “How to choose loss functions when training deep learning neural networks.” https://machinelearningmastery.com/ how-to-choose-loss-functions-when-training-deep-learning-neural-networks/, 2019. Online;accessed29June2019.
[20] P. Hingston, “A new design for a turing test for bots,” in Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, pp. 345–350, Aug 2010.
[21] R. Likert, “A technique for the measurement of attitudes,” Archives of Psychology, vol.22,no.140,pp.5–55,1932.
[22] R.Johns,“Likertitemsandscales.,”Colchester: UKDataArchive.SurveyQuestion BankMethodsFactSheet,vol.1,2010.
[23] M. J. Parnell, “Playing with scales: Creating a measurement scale to assess the experienceofvideogames,”Master’sthesis,UniversityCollegeLondon,2009.
[24] Z.Jyun-AnandT.Wen-Kai,“Proceduralbilliardsgamelevelsgenerationandevaluation system,” Master’s thesis, National Taiwan University of Science and Technology,2019.
[25] R.A.Fisher,Statisticalmethodsforresearchworkers. GenesisPublishingPvtLtd, 2006.
[26] A.AgrestiandM.Kateri,Categoricaldataanalysis. Springer,2011.
[27] P.Sprent,FisherExactTest,pp.524–525. Berlin,Heidelberg: SpringerBerlinHeidelberg,2011.