当前位置: X-MOL 学术Front. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games
Frontiers of Computer Science ( IF 3.4 ) Pub Date : 2021-07-16 , DOI: 10.1007/s11704-020-9307-6
Li Zhang 1 , Yuxuan Chen 1 , Wei Wang 1 , Ziliang Han 1 , Shijian Li 1 , Zhijie Pan 1 , Gang Pan 1
Affiliation  

Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games, e.g., StarCraft and poker. Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge. However, it needs to train a neural network in an off-policy manner to approximate the action values. For games with large search spaces, the training may suffer from unnecessary exploration and sometimes fails to converge. In this paper, we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP, called MC-NFSP, to improve the performance in real-time zero-sum imperfect-information games. With experiments and empirical analysis, we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not. Furthermore, we develop an Asynchronous Neural Fictitious Self-Play framework (ANFSP). It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality. The experiments with th e games with hidden state information (Texas Hold’em), and the FPS (firstperson shooter) games demonstrate effectiveness of our algorithms.



中文翻译:

一种在不完全信息动态博弈中近似纳什均衡的蒙特卡罗神经虚拟自我博弈方法

解决优化问题以接近纳什均衡点在不完美信息游戏中起着重要作用,例如星际争霸和扑克。神经虚拟自我博弈 (NFSP) 是一种有效的算法,它可以在没有先验领域知识的情况下从纯自我博弈中学习不完全信息博弈的近似纳什均衡。然而,它需要以离策略的方式训练神经网络来逼近动作值。对于搜索空间较大的游戏,训练可能会遭受不必要的探索,有时无法收敛。在本文中,我们提出了一种新的神经虚拟自博弈算法,该算法将蒙特卡罗树搜索与 NFSP 相结合,称为 MC-NFSP,以提高实时零和不完美信息游戏的性能。通过实验和实证分析,我们证明了所提出的 MC-NFSP 算法可以在具有大规模搜索深度的游戏中逼近纳什均衡,而 NFSP 则不能。此外,我们开发了一个异步神经虚拟自我播放框架(ANFSP)。它使用异步和并行架构来收集游戏经验并提高训练效率和策略质量。带有隐藏状态信息的游戏(德州扑克)和 FPS(第一人称射击游戏)游戏的实验证明了我们算法的有效性。

更新日期:2021-07-16
down
wechat
bug