Winning Rate Prediction Model Based on Monte Carlo Tree Search for Computer Dou Dizhu,IEEE Transactions on Games

当前位置： X-MOL 学术 › IEEE Trans. Games › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Winning Rate Prediction Model Based on Monte Carlo Tree Search for Computer Dou Dizhu
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2019-09-10 , DOI: 10.1109/tg.2019.2940261
Guangyun Tan , Yongyi He , Huahu Xu , Peipei Wei , Ping Yi , Xinxin Shi

Poker is the typical game of incomplete information, and remains a longstanding challenge problem in artificial intelligence (AI). The poker game of Dou Dizhu has been viewed as a thorny topic in AI because of its own characteristics. This article introduces a developed Monte Carlo tree search (MCTS) method for Dou Dizhu to solve the decision making effectively. We built the winning rate prediction model (WRPM) to predict the winning rate of moves as the initial situation estimation and improve the model to be more applicable to different player roles. Then, the WRPM is embedded as the core algorithm into MCTS for extension and simulation and named it WRPM-MCTS. In addition, we also train a card distribution prediction model to predict the holding cards of opponents for further improving the performance of WRPM-MCTS on the agent of Dou Dizhu . Experiments show that the WRPM-MCTS has a statistically significant performance better than the pure MCTS and the pure WRPM. In the game with human players from an online game platform, the WRPM-MCTS-based agent had the winning rate of 52.86% in 4 000 000 games and ranked in top 1.22% among 500 000 human players, indicating that this agent had reached the expert level of humans.

中文翻译：

基于蒙特卡罗树搜索的计算机中奖率预测模型窦帝柱

扑克是典型的不完全信息游戏，并且仍然是人工智能 (AI) 中长期存在的挑战问题。扑克游戏窦帝柱由于其自身的特点，一直被视为人工智能中的一个棘手话题。本文介绍了一种开发的蒙特卡罗树搜索 (MCTS) 方法，用于窦帝柱有效解决决策问题。我们建立了胜率预测模型（WRPM）来预测出招的胜率作为初始情况估计，并改进模型以更适用于不同的玩家角色。然后，将WRPM作为核心算法嵌入到MCTS中进行扩展和仿真，并将其命名为WRPM-MCTS。此外，我们还训练了一个纸牌分布预测模型来预测对手的持牌情况，以进一步提高 WRPM-MCTS 在窦帝柱 . 实验表明，WRPM-MCTS 具有比纯 MCTS 和纯 WRPM 更好的统计显着性能。在网络游戏平台与人类玩家的游戏中，基于WRPM-MCTS的agent在400万场比赛中的胜率为52.86%，在50万人类玩家中排名前1.22%，表明该agent达到了人类的专家级。

更新日期：2019-09-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>