基於強化學習模型之 Ludo 遊戲 AI Bot 之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王珮如 Pei-Ru Wang
論文名稱：	基於強化學習模型之 Ludo 遊戲 AI Bot 之研究 A Study on AI Bot for Ludo Game Based on Reinforcement Learning Model
指導教授：	戴文凱 Wen-Kai Tai
口試委員:	范欽雄 Chin-Shyurng Fahn 王學武 Hsueh-Wu Wang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	40
中文關鍵詞：	強化學習、Ludo 、遊戲AI 、機器學習、強化學習方法比較、Epsilon Greedy 、低資源需求
外文關鍵詞：	Reinforcement Learning, Ludo, game AI, Machine Learning, Comparison of reinforcement learning methods, Epsilon Greedy, Low Resources Needed
相關次數：	點閱：356 下載：10
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

在遊戲領域中，AI一直都是非常重點研究項目，從早期的規則式AI、強化學習AI到現在結合深度學習技術的深度強化學習AI。傳統的規則式AI必須需要開發者熟悉該遊戲的操作及各式各樣的演算法及資料結構，且幾乎任何遊戲都無法通用同一種架構模式，但強化學習AI不需要太大改變即可於大部分的環境下使用，不再像是傳統AI需要個別設計。而深度強化學習則更進一步使強化學習對於計算能力及記憶體需求部分大幅下降，使得強化學習AI能結合深度學習之技術，應用於更為複雜的遊戲。雖然強化學習核心幾乎能適用於所有遊戲，但在需要短時間內完成且無強大運算能力的情形下，並不是所有的強化學習方法皆適用於這種情況。因此，如何找到能在運算能力不足的情形下於短時間內訓練完成的強化學習方法是本論文的主要研究目標。
本論文我們以Ludo遊戲來嘗試各種不同的強化學習方法與參數調整，並提出了一種能使AI可以更加穩定訓練的一種探索方法，以及適用於Ludo遊戲更為強大的Rule-Based AI來讓強化學習AI更好的學習與測試。本論文限制了實驗模型必須在使用非專用運算卡情況下只能進行48小時的測試，我們嘗試了State-Action Value Function、State Value Function兩種強化學習方式與不同Step設定於Multi-Step Reinforcement Learning方法，並且提出更加穩定的探索方法對於時間與勝率實驗測試，並將調整參數後訓練完成的強化學習AI。
將我們參數調整後的強化學習AI與以往同樣遊戲的AI進行比較後的結果表明，我們的AI確實能在短時間內於硬體限制的情形下完成訓練，勝率比以往的AI提高5~6%。

Game AI is one of the important things in game development. Traditional Rule-Based AI need developer to design a lot of rules and algorithm. Reinforcement Learning based AI does not need to redesign for different games. Deep Reinforcement Learning is a technique that mixes Reinforcement Learning and Deep Learning. It can solve some situation e.g., out-of-memory memory with Q-Learning Algorithm.
Although Deep Reinforcement Learning is very powerful, it takes a lot of time for the machine to learn. Therefore, this thesis aims to find what model can be run with limited time and computing resources.
In this thesis, we research Ludo game's AI Bot based on Reinforcement Learning Model. We propose a reinforcement learning exploration method that can make AI training more stable. Furthermore, we proposed a Rule-Based AI that can make AI training faster and test whether the AI is powerful. We study on comparison of State-Action Value Function and State Value Function with limited hardware and 48 hours of training time. In addition, we experiment with Multi-Step Reinforcement Learning Model with different settings.
By comparing our Reinforcement Learning AI with previous AI on Ludo Game, the experimental results show that our AI outperforms previous AI at least 5%.

中文摘要    I
Abstract    II
誌謝    III
目錄    IV
圖目錄    VI
表目錄    VII
第1章 緒論    1
1.1 研究動機和目標    1
1.2 研究方法    1
1.3 研究貢獻    2
1.4 本論文之章節結構    2
第2章 相關研究    3
2.1 Ludo    3
2.2 Ludo AI    4
2.3 Reinforcement Learning強化學習    5
2.4 Q-Learning    6
2.5 Epsilon Greedy    7
2.6 Deep Q-Learning Network    8
2.7 Multi-Step Reinforcement Learning    10
第3章 研究方法    11
3.1 運作架構    11
3.2 實驗模型設計    14
3.2.1 SCF AI (Rule-Based AI)    14
3.2.2 RL AI(強化學習AI)    14
3.2.3 Epsilon Greedy    18
3.3 實驗設計    18
第4章 實驗結果與討論    19
4.1 實驗環境    19
4.2 不同Epsilon Greedy方式訓練的勝率差異    20
4.3 不同Multi-Step設定訓練的勝率差異    21
4.4 不同Value Function訓練的勝率差異    22
4.5 SCF AI的對戰結果    23
4.6 RL AI的對戰結果    24
4.7 SCF AI 與 RL AI的對戰結果    25
4.8 AI與真實玩家對戰結果    26
第5章 結論與後續工作    27
5.1 結論    27
5.2 後續工作    28
參考文獻    29
                                

[1] “Ludo (board game) - Wikipedia,” [Online]. Available: https://en.wikipedia.org/wiki/Ludo_(board_game).
[2] F. Alvi and M. Ahmed, “Complexity Analysis and Playing Strategies for Ludo and its Variant Race Games,” in IEEE Conference on Computational Intelligence and Games (CIG), pp. 134-141, 2011.
[3] Alhajry, Majed, F. Alvi and M. Ahmed, “TD (λ) and Q-learning based Ludo players,” in IEEE Conference on Computational Intelligence and Games (CIG), 2012.
[4] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, MIT press, 1998.
[5] C. Watkins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, no. 3-4, 1992.
[6] R. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, pp. 9-44, 1988.
[7] C. Watkins, “Learning from delayed rewards,” Phd thesis, Cambridge Univ., England, 1989.
[8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller,“Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[9] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling,“The arcade learning environment: An evaluation platform for general agents.,” Journal of Artificial Intelligence Research, vol. 47, pp. 253-279, 2013.
[10] M. Bellemare, J. Veness, and M. Bowling,“Investigating contingency awareness using atari 2600 games.,”in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 26, 2012.
[11] M. Hausknecht, J. Lehman, R. Miikkulainen, and P. Stone,“A neuro-evolution approach to general atari game playing.,”in IEEE Transactions on Computational Intelligence and AI in Games, vol. 6, no. 4, pp. 355-366, 2014.
[12] De Asis, K., Hernandez-Garcia, J. F., Zacharias Holland, G., and Sutton, R. S. “Multi-step Reinforcement Learning: A Unifying Algorithm,” in AAAI Conference on Artificial Intelligence (AAAI), 2017.

簡易檢索 / 詳目顯示

相關論文