最適化蒙地卡羅樹搜尋於麻將之應用｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	馮國軒 Kuo-Hsuan Feng
論文名稱：	最適化蒙地卡羅樹搜尋於麻將之應用 Optimized Monte Carlo Tree Search for Mahjong Agent
指導教授：	戴文凱 Wen-Kai Tai
口試委員:	陳俊成陳冠宇
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	62
中文關鍵詞：	蒙地卡羅樹搜尋、麻將、不完全信息博弈
外文關鍵詞：	MCTS, games of imperfect information
相關次數：	點閱：274 下載：12
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在人工智慧應用於遊戲代理人之相關領域，截至現今已有多種方法被提出，其中蒙地卡羅樹搜尋的應用造就了許多輝煌的成果，在棋牌類遊戲中的表現更是亮眼。然而，蒙地卡羅樹搜尋的使用仍受限制，至今未有適用於不完全信息博弈的通用解。

麻將是亞洲地區受歡迎的一種牌類遊戲，其中包含了大量隱藏資訊與隨機事件。此不確定性使麻將的遊戲樹分支迅速，於此龐大空間內尋找最佳路徑因而顯得困難。我們提出一個基於蒙地卡羅樹搜尋的框架，妥善對應隨機性並結合機率模型，於簡化版麻將遊戲中模擬對手的行動。此外，使用特化型制約極大的引導蒙地卡羅樹搜尋，於擴展及模擬時限制其合法行動的選擇，以減輕評估結果的不穩定性與收斂的遲緩。我們也為樹平行化設計了無鎖架構以極大化資源利用並擴展實戰的可行性。

實驗部分不僅止於證明本論文方法之有效性，也提出一個特徵的集合，自各種觀點適切衡量不同玩家的強度與打牌風格。我們透過遊戲紀錄的分析強調快速胡牌的影響力，並提供一可變參數調節代理人之行為模式。與其他方法相對比，蒙地卡羅樹搜尋藉由對大局觀的掌控，不需保有複雜的領域知識便能達到更高水平。

In the field of artificial intelligence for game agents, a variety of methods have been proposed so far, in which the adoption of Monte Carlo tree search (MCTS) results in notable success especially in chess and card games. However, the use of MCTS is still limited, and there exists no general solution for games of imperfect information.

Mahjong is a popular tile-based game in Asia, which contains lots of hidden information and stochastic events. The uncertainty makes the game tree of Mahjong branches quickly, and therefore searching optimal paths in such a huge dimension is challenging. We propose a framework for MCTS which handles the randomness and combines probability models to simulate opponents' moves in a simplified version of Mahjong. In addition, specialized restrictions are applied as strong guidance for MCTS, limiting the candidates of valid actions in both expansion and simulation in order to alleviate the unstable evaluation and the slow convergence. A lock-free architecture for tree parallelization is also designed to make use of available resources and stretch the practical use in real matches.

In the part of experiment, aside from proving the effectiveness of our methods, a set of features are suggested for Mahjong to properly measure the strength along with playing styles of different players from multiple points of view. Through analyses of gaming records, the effect of quick wins is emphasized, and an adjustable parameter is provided to manually control the agent's behavior. In comparison with other methods, MCTS takes advantage of the overall view to reach a higher level without complicated domain knowledge.

Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Objectives and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Mahjong Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Stop Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Upper Confidence Bounds Applied to Trees . . . . . . . . . . . . 17
2.3 Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 High-Speed Calculation . . . . . . . . . . . . . . . . . . . . . . 20
3 Extended One-Player Mahjong . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Quick-Win-Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Virtual Discarding . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Probability Model of Winning . . . . . . . . . . . . . . . . . . . 32
4 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Lock-Free Architecture . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Virtual Visits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 39
1 Preliminary Arrangements . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.2 Expert Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.3 Rule-Based Agent . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4 Headless Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.5 Settings of Parameters . . . . . . . . . . . . . . . . . . . . . . . 42
2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.1 The Effectiveness of Quick-Win-Rules . . . . . . . . . . . . . . 43
2.2 Comparison with Expert Players . . . . . . . . . . . . . . . . . . 45
2.3 Advantages over Rule-Based Methods . . . . . . . . . . . . . . . 45
2.4 Multi-Thread Efficiency . . . . . . . . . . . . . . . . . . . . . . 48
2.5 The Effect of Loss Penalty . . . . . . . . . . . . . . . . . . . . . 48
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Appendix A: Rules of Taiwanese Mahjong . . . . . . . . . . . . . . . . . . . . . . 55
Appendix B: Mahjong Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Appendix C: Scoring and Winning Patterns . . . . . . . . . . . . . . . . . . . . . 59
Appendix D: Mahjong Terms in Chinese . . . . . . . . . . . . . . . . . . . . . . . 61
                                

[1] H. Handa, “Evolution of the weight vectors in mahjong non-player characters,” in Nature and Biologically
Inspired Computing (NaBIC), 2013 World Congress on, pp. 147–152, IEEE, 2013.
[2] N. Mizukami and Y. Tsuruoka, “Building a computer mahjong player based on monte carlo simulation
and opponent models,” in Computational Intelligence and Games (CIG), 2015 IEEE Conference on,
pp. 275–283, IEEE, 2015.
[3] N. Mizukami, R. Nakahari, A. Ura, M. Miwa, Y. Tsuruoka, and T. Chikayama, “Realizing a four-player
computer mahjong program by supervised learning with isolated multi-player aspects,” Transactions
of Information Processing Society of Japan, vol. 55, no. 11, pp. 2410–2420, 2014.
[4] D. Xu, Mahjong AI/analyzer. PhD thesis, California State University, Northridge, 2015.
[5] K. Ihara and S. Kato, “Neuro-evolutionary approach to multi-objective optimization in one-player
mahjong,” in International Conference on Network-Based Information Systems, pp. 492–503,
Springer, 2017.
[6] Y. Liang, X. Jin, and Z. Zhao, “Training with enlightening model for games with difficult-starting
problem,” in 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 97–
100, IEEE, 2018.
[7] J. Zhang and H. Liu, “Reinforcement learning with monte carlo sampling in imperfect information
problems,” in International Conference on Cognitive Computing, pp. 55–67, Springer, 2018.
[8] M.-J. Kim and C. W. Ahn, “Hybrid fighting game ai using a genetic algorithm and monte carlo
tree search,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion,
pp. 129–130, ACM, 2018.
[9] D. J. Soemers, C. F. Sironi, T. Schuster, and M. H. Winands, “Enhancements for real-time montecarlo
tree search in general video game playing,” in Computational Intelligence and Games (CIG),
2016 IEEE Conference on, pp. 1–8, IEEE, 2016.
[10] B. Brügmann, “Monte carlo go,” tech. rep., Citeseer, 1993.
[11] S. Gelly and D. Silver, “Combining online and offline knowledge in uct,” in Proceedings of the 24th
international conference on Machine learning, pp. 273–280, ACM, 2007.
[12] J. P. A. Nijssen and M. H. Winands, “Enhancements for multi-player monte-carlo tree search,” in
International Conference on Computers and Games, pp. 238–249, Springer, 2010.
[13] M. J. Tak, M. H. Winands, and Y. Bjornsson, “N-grams and the last-good-reply policy applied in
general game playing,” IEEE Transactions on Computational Intelligence and AI in games, no. 2,
pp. 73–83, 2012.
[14] G. M.-B. Chaslot, M. H. Winands, and H. J. van Den Herik, “Parallel monte-carlo tree search,” in
International Conference on Computers and Games, pp. 60–71, Springer, 2008.
[15] M. Enzenberger and M. Müller, “A lock-free multithreaded monte-carlo tree search algorithm,” in
Advances in Computer Games, pp. 14–20, Springer, 2009.
[16] P. I. Cowling, E. J. Powley, and D. Whitehouse, “Information set monte carlo tree search,” IEEE
Transactions on Computational Intelligence and AI in Games, vol. 4, no. 2, pp. 120–143, 2012.
[17] L. Kocsis and C. Szepesvári, “Bandit based monte-carlo planning,” in European conference on machine
learning, pp. 282–293, Springer, 2006.
[18] 段昊, “人工智能在麻将领域能够战胜人类吗？.” https://www.zhihu.com/question/
40171482, 2016. Accessed: 2019-07-13.

簡易檢索 / 詳目顯示

相關論文