Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-15 , DOI: arxiv-2009.07384
Brian Hu Zhang, Tuomas Sandholm

Often---for example in war games, strategy video games, and financial simulations---the game is given to us only as a black-box simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too large to expand fully, it can be difficult to compute strategies with guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a notion of certificate for extensive-form games that allows exploitability guarantees while not expanding the full game tree. However, that work assumed that the black box could sample or expand arbitrary nodes of the game tree at any time, and that a series of exact game solves (via, for example, linear programming) can be conducted to compute the certificate. Each of those two assumptions severely restricts the practical applicability of that method. In this work, we relax both of the assumptions. We show that high-probability certificates can be obtained with a black box that can do nothing more than play through games, using only a regret minimizer as a subroutine. As a bonus, we obtain an equilibrium-finding algorithm with $\tilde O(\sqrt{T})$ regret bound in the extensive-form game setting that does not rely on a sampling strategy with lower-bounded reach probabilities (which MCCFR assumes). We demonstrate experimentally that, in the black-box setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree.

中文翻译：

在黑盒扩展形式博弈中寻找和证明（接近）最优策略

通常——例如在战争游戏、战略视频游戏和金融模拟中——游戏仅作为我们可以玩的黑盒模拟器提供给我们。在这些设置中，由于游戏可能具有未知的自然动作分布（我们只能从中获取样本）和/或太大而无法完全扩展，因此很难计算具有可利用性保证的策略。最近的工作 \cite{Zhang20:Small} 产生了一种广泛形式游戏的证书概念，它允许可利用性保证，同时不扩展完整的游戏树。然而，这项工作假设黑盒可以随时采样或扩展博弈树的任意节点，并且可以进行一系列精确的博弈求解（例如，通过线性规划）来计算证书。这两个假设中的每一个都严重限制了该方法的实际适用性。在这项工作中，我们放宽了这两个假设。我们表明，可以通过一个黑匣子获得高概率证书，该黑匣子只能通过游戏进行游戏，仅使用遗憾最小化器作为子程序。作为奖励，我们在不依赖于下限到达概率（MCCFR假设）。我们通过实验证明，在黑盒设置中，我们的方法能够提供重要的可利用性保证，同时仅扩展博弈树的一小部分。我们表明，可以通过一个黑匣子获得高概率证书，该黑匣子只能通过游戏进行游戏，仅使用遗憾最小化器作为子程序。作为奖励，我们在不依赖于下限到达概率（MCCFR）的抽样策略的扩展形式博弈设置中获得了具有 $\tilde O(\sqrt{T})$ 后悔边界的均衡寻找算法假设）。我们通过实验证明，在黑盒设置中，我们的方法能够提供重要的可利用性保证，同时仅扩展博弈树的一小部分。我们表明，可以通过一个黑匣子获得高概率证书，该黑匣子只能通过游戏进行游戏，仅使用遗憾最小化器作为子程序。作为奖励，我们在不依赖于下限到达概率（MCCFR假设）。我们通过实验证明，在黑盒设置中，我们的方法能够提供重要的可利用性保证，同时仅扩展博弈树的一小部分。我们在扩展形式的游戏设置中获得了一个具有 $\tilde O(\sqrt{T})$ 后悔边界的均衡寻找算法，该算法不依赖于具有下限到达概率（MCCFR 假设）的采样策略。我们通过实验证明，在黑盒设置中，我们的方法能够提供重要的可利用性保证，同时仅扩展博弈树的一小部分。我们在扩展形式的游戏设置中获得了一个具有 $\tilde O(\sqrt{T})$ 后悔边界的均衡寻找算法，该算法不依赖于具有下限到达概率（MCCFR 假设）的采样策略。我们通过实验证明，在黑盒设置中，我们的方法能够提供重要的可利用性保证，同时仅扩展博弈树的一小部分。

更新日期：2020-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>