簡易檢索 / 詳目顯示

研究生: 王竣生
Jyun-Sheng Wang
論文名稱: 基於深度模仿學習之擬人撞球AI研究
A Study on Human-like Billiards AI Bot Based on Deep Imitation Learning
指導教授: 戴文凱
Wen-Kai Tai
口試委員: 張國清
陳怡伶
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 80
中文關鍵詞: 撞球人工智慧類神經網路行為複製
外文關鍵詞: Billiards, AI, Neural network, Behavior cloning
相關次數: 點閱:190下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今遊戲AI研究的方向,無論是Machine learning或是Rule-based的方法,主要的目的都在於提升AI的強度部分。AI透過良好的設計以及強大的運算能力,可以得到人類難以做出的最佳解行為,但在實際的玩家遊戲體驗角度來說,需要考量的就不只是強度了。尤其是在對戰型的遊戲中(如,撞球),若對手不斷出現異於常人的最佳行為,往往會使得玩家完全無法反擊、感到氣餒,容易有較差的遊戲體驗;同時玩家會察覺對手是AI而非真人,減少了投入程度。因此,對戰型遊戲AI的需求,不只是基本的進行遊戲能力,還必須同時擁有適當的強度,減少過於誇張的最佳行為。本研究探討了如何應用Imitation learning中的Behavior cloning,從玩家的Replay data去學習出符合一般玩家的決策行為。蒐集Replay data,做Data 整理、分類後,提取出適當的特徵,由Deep neural network 訓練出預測模型,在遊戲中的不同階段,給予AI模仿玩家思路的行為建議,進而實現合理但又不過於完美的AI策略,使得玩家較難察覺對手是AI。

    以撞球遊戲為平台,本研究透過玩家Play log的資料探勘,以及Deep imitation learning,實作出能與一般玩家相仿的選球、選袋、出桿力道決策之AI,在決策上與玩家資料有80%以上的相似度,力道Model則可以在不使用物理模擬的情況下,達到75%的進球率;並探討了不同的特徵設計之下,對預測模型似人程度的影響;設計出適用檢驗撞球AI之User study,問卷結果確認了我們的AI與真人玩家確實有一定的相似度,且在原Rule-based AI表現不好的盤面有所改善。


    Nowadays, the trend of game AI research in billiards, whether it is machine learning or rule-based, is to improve the strength of AI. Through proper design of algorithm and powerful computing power, AI can get the optimized behavior that humans can't make. However, in terms of actual player's game experience, it is not just about strength. Especially in competition-type games, such as billiards, if the opponent continues to choose the best move different from ordinary people, it will make the player completely unable to fight back, feeling discouraged, easy to have a poor game experience; at the same time, the player will perceive that the opponent is an AI, not a real person, which reduces the level of immersion. Therefore, the demand for competition-type game AI is not only the basic ability to play, but also must have the appropriate intensity, reducing the best move that is too exaggerated.

    This study explores how to apply behavior cloning in imitation learning to learn from the player's replay data in order to match the decision strategy of the average players. Collect the replay data, do the data sorting and classification, extract the appropriate features, and train the predictive model by the deep neural network. At different stages of the game, we give the AI a behavior suggestion that imitates the player's ideas, which achieves reasonable and not perfect level of decision. This strategy makes it harder for players to realize that the opponent is an AI.

    Based on the pool game, this study makes AI imitate the general player's target ball selection, target pocket selection and power/angle decision through player's play log data mining and deep imitation learning. In decision-making there is a similarity of more than 80% with player data. Besides, the force model can achieve a possibility of 75% to sink target ball without using physical simulation; We also explore the impact of different feature set designs on the degree of approximation of the predictive model. Furthermore, we design a user study to examine the human-likeness. Results of the questionnaire confirmed that our AI has a certain degree of similarity with the real player, and the situation in pool state where the original Rule-based AI act weirdly has improved.

    Keyword:Billiards, AI, Neural network, Behavior cloning.

    論文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI 1 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究背景與動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究目的與研究問題 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 方法概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 本論文之章節結構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 撞球相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 撞球專有名詞定義 . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 撞球 AI 設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Learningfromreplays. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 DeeplearningAIingame . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Human-likeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 研究方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Replaysystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Datafiltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3 Dataaugmentation . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Dataanalysisandfeatureengineering . . . . . . . . . . . . . . . . . . . 17 3.2.1 目標球分布 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.2 袋口分布 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.3 Cutangle 分布 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.4 與袋口偏移角分布 . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Modelarchitectureanddesign . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1 目標球模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.2 目標袋口模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.3 目標力道模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.4 目標合力模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.5 目標打擊角度模型 . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.6 Lossfunctiondecision:MSEorMAE. . . . . . . . . . . . . . . 34 3.3.7 目標球 Highlevelmodel . . . . . . . . . . . . . . . . . . . . . . 35 3.4 AI 實際運作方式 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 實驗結果與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1 不同特徵值對 Model 人性化學習有效性影響之實驗 . . . . . . . . . . 38 4.1.1 預測目標球 Model . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.2 預測目標袋口 Model . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.3 預測目標力道 X、Z 方向值 . . . . . . . . . . . . . . . . . . . 42 4.1.4 預測目標合力 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1.5 預測目標角度 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1.6 決策以及打擊相似度 . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.7 不同資料量之整體影響 . . . . . . . . . . . . . . . . . . . . . . 46 4.2 人性化有效性 Userstudy . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.1 實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.2 實驗設計一 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.3 實驗設計二 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.4 問卷設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.5 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.6 顯著性驗證 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 實際遊戲中運行效能測試 . . . . . . . . . . . . . . . . . . . . . . . . . 58 5 結論與後續工作 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.1 貢獻與結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 限制與未來方向 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 附錄一:Playerlog 種類 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 附錄二:Replaydata 中本研究使用之資料項目 . . . . . . . . . . . . . . . . . . 67 授權書 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    [1] K.Leibrandt,T.Lorenz,T.Nierhoff,andS.Hirche,“Modellinghumangameplayat poolandcounteringitwithananthropomorphicrobot,”inSocialRobotics(G.Herrmann, M. J. Pearson, A. Lenz, P. Bremner, A. Spiers, and U. Leonards, eds.), (Cham),pp.30–39,SpringerInternationalPublishing,2013.
    [2] C.Hsi-TingandT.Wen-Kai,“Styleablenpcbehavior–acasestudyofpoolgame,” Master’sthesis,NationalTaiwanUniversityofScienceandTechnology,2017.
    [3] N.JustesenandS.Risi,“Learningmacromanagementinstarcraftfromreplaysusing deeplearning,”CoRR,vol.abs/1707.03743,2017.
    [4] J.-F. Landry, J.-P. Dussault, and P. Mahey, “A robust controller for a two-layered approachappliedtothegameofbilliards,”EntertainmentComputing,vol.3,no.3, pp.59–70,2012. GamesandAI.
    [5] D. Alciatore, The Illustrated Principles of Pool and Billiards. Sterling; 1st edition (August5,2004),2012.
    [6] H.Cho,K.Kim,andS.Cho,“Replay-basedstrategypredictionandbuildorderadaptationforstarcraftAIbots,”in2013IEEEConferenceonComputationalInteligence inGames(CIG),pp.1–7,Aug2013.
    [7] M. Bogdanovic, D. Markovikj, M. Denil, and N. de Freitas, “Deep apprenticeship learningforplayingvideogames,”inAAAIWorkshops,2015.
    [8] J. Barratt and C. Pan, “Deep imitation learning for playing real time strategy games.” http://cs229.stanford.edu/proj2017/final-reports/5244338. pdf,2017. Online;accessed27June2019.
    [9] N.Taatgen,M.vanOploo,J.Braaksma,andJ.Niemantsverdriet,“Howtoconstruct a believable opponent using cognitive modeling in the game of set,” in The logic of cognitive systems (F. Detje, D. Dorner, and H. Schaub, eds.), pp. 201–206, Universitäts-Verlag Bamberg, 2003. ICCM-5 2003, Fifth International Conference on CognitiveModeling,April10-12,2003.
    [10] I. Umarov, M. Mozgovoy, and P. Rogers, “Believable and effective ai agents in virtual worlds: Current state and future perspectives,” IJGCMS, vol. 4, pp. 37–59, 2012.
    [11] A.M.Turing,“Computingmachineryandintelligence,”Mind,vol.59,no.October, pp.433–60,1950.
    [12] P.Hingston,“Aturingtestforcomputergamebots,”ComputationalIntelligenceand AIinGames,IEEETransactionson,vol.1,pp.169–186,102009.
    [13] B.TastanandG.Sukthankar,“Learningpoliciesforfirstpersonshootergamesusing inversereinforcementlearning,”inProceedingsoftheSeventhAAAIConferenceon Artificial Intelligence and Interactive Digital Entertainment, AIIDE’11, pp. 85–90, AAAIPress,2011.
    [14] C.Renman,“Creatinghuman-likeaimovementingamesusingimitationlearning,” Master’s thesis, KTH Royal Institute OF Technology School of Computer Science andCommunication,STOCKHOLMSWEDEN,May2017.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International ConferenceonComputerVision(ICCV),pp.1026–1034,Dec2015.
    [16] D.KingmaandJ.Ba,“Adam: Amethodforstochasticoptimization,”International ConferenceonLearningRepresentations,122014.
    [17] K. documentation, “Usage of loss functions.” https://keras.io/losses/. Online;accessed7July2019.
    [18] Wikipedia,“One-hot.”https://en.wikipedia.org/wiki/One-hot. Online;accessed7July2019.
    [19] B. Jason, “How to choose loss functions when training deep learning neural networks.” https://machinelearningmastery.com/ how-to-choose-loss-functions-when-training-deep-learning-neural-networks/, 2019. Online;accessed29June2019.
    [20] P. Hingston, “A new design for a turing test for bots,” in Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, pp. 345–350, Aug 2010.
    [21] R. Likert, “A technique for the measurement of attitudes,” Archives of Psychology, vol.22,no.140,pp.5–55,1932.
    [22] R.Johns,“Likertitemsandscales.,”Colchester: UKDataArchive.SurveyQuestion BankMethodsFactSheet,vol.1,2010.
    [23] M. J. Parnell, “Playing with scales: Creating a measurement scale to assess the experienceofvideogames,”Master’sthesis,UniversityCollegeLondon,2009.
    [24] Z.Jyun-AnandT.Wen-Kai,“Proceduralbilliardsgamelevelsgenerationandevaluation system,” Master’s thesis, National Taiwan University of Science and Technology,2019.
    [25] R.A.Fisher,Statisticalmethodsforresearchworkers. GenesisPublishingPvtLtd, 2006.
    [26] A.AgrestiandM.Kateri,Categoricaldataanalysis. Springer,2011.
    [27] P.Sprent,FisherExactTest,pp.524–525. Berlin,Heidelberg: SpringerBerlinHeidelberg,2011.

    無法下載圖示 全文公開日期 2024/08/20 (校內網路)
    全文公開日期 2024/08/20 (校外網路)
    全文公開日期 2024/08/20 (國家圖書館:臺灣博碩士論文系統)
    QR CODE