当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Value functions for depth-limited solving in zero-sum imperfect-information games
Artificial Intelligence ( IF 14.4 ) Pub Date : 2022-10-19 , DOI: 10.1016/j.artint.2022.103805
Vojtěch Kovařík , Dominik Seitz , Viliam Lisý , Jan Rudolf , Shuo Sun , Karel Ha

We provide a formal definition of depth-limited games together with an accessible and rigorous explanation of the underlying concepts, both of which were previously missing in imperfect-information games. The definition works for an arbitrary (perfect recall) extensive-form game and is not tied to any specific game-solving algorithm. Moreover, this framework unifies and significantly extends three approaches to depth-limited solving that previously existed in extensive-form games and multiagent reinforcement learning but were not known to be compatible. A key ingredient of these depth-limited games is value functions. Focusing on two-player zero-sum imperfect-information games, we show how to obtain optimal value functions and prove that public information provides both necessary and sufficient context for computing them. We provide a domain-independent encoding of the domains that allows for approximating value functions even by simple feed-forward neural networks, which are then able to generalize to unseen parts of the game. We use the resulting value network to implement a depth-limited version of counterfactual regret minimization. In three distinct domains, we show that the algorithm's exploitability is roughly linearly dependent on the value network's quality and that it is not difficult to train a value network with which depth-limited CFR's performance is as good as that of CFR with access to the full game.



中文翻译:

零和不完全信息博弈中深度有限求解的价值函数

我们提供了深度限制博弈的正式定义以及对基本概念的可访问且严格的解释,这两者以前在不完美信息博弈中是缺失的。该定义适用于任意(完美回忆)扩展形式的游戏,并且不依赖于任何特定的游戏解决算法。此外,该框架统一并显着扩展了三种深度限制求解方法,这些方法以前存在于广泛形式的游戏和多智能体强化学习中,但不知道是否兼容。这些深度受限游戏的一个关键要素是价值函数。专注于两人零和不完全信息博弈,我们展示了如何获得最优价值函数,并证明公共信息为计算它们提供了必要和充分的背景。我们提供了与域无关的域编码,即使通过简单的前馈神经网络也可以近似值函数,然后能够推广到游戏的看不见的部分。我们使用生成的价值网络来实现反事实后悔最小化的深度限制版本。在三个不同的领域中,我们表明该算法的可利用性大致线性地依赖于价值网络的质量,并且训练一个价值网络并不难,其深度受限的 CFR 的性能与 CFR 的性能一样好,可以访问完整的游戏。我们使用生成的价值网络来实现反事实后悔最小化的深度限制版本。在三个不同的领域中,我们表明该算法的可利用性大致线性地依赖于价值网络的质量,并且训练一个价值网络并不难,其深度受限的 CFR 的性能与 CFR 的性能一样好,可以访问完整的游戏。我们使用生成的价值网络来实现反事实后悔最小化的深度限制版本。在三个不同的领域中,我们表明该算法的可利用性大致线性地依赖于价值网络的质量,并且训练一个价值网络并不难,其深度受限的 CFR 的性能与 CFR 的性能一样好,可以访问完整的游戏。

更新日期:2022-10-19
down
wechat
bug