Small Nash Equilibrium Certificates in Very Large Games,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Small Nash Equilibrium Certificates in Very Large Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-06-29 , DOI: arxiv-2006.16387
Brian Hu Zhang and Tuomas Sandholm

In many game settings, the game is not explicitly given but is only accessible by playing it. While there have been impressive demonstrations in such settings, prior techniques have not offered safety guarantees, that is, guarantees on the game-theoretic exploitability of the computed strategies. In this paper we introduce an approach that shows that it is possible to provide exploitability guarantees in such settings without ever exploring the entire game. We introduce a notion of a certificatae of an extensive-form approximate Nash equilibrium. For verifying a certificate, we give an algorithm that runs in time linear in the size of the certificate rather than the size of the whole game. In zero-sum games, we further show that an optimal certificate---given the exploration so far---can be computed with any standard game-solving algorithm (e.g., using a linear program or counterfactual regret minimization). However, unlike in the cases of normal form or perfect information, we show that certain families of extensive-form games do not have small approximate certificates, even after making extremely nice assumptions on the structure of the game. Despite this difficulty, we find experimentally that very small certificates, even exact ones, often exist in large and even in infinite games. Overall, our approach enables one to try one's favorite exploration strategies while offering exploitability guarantees, thereby decoupling the exploration strategy from the equilibrium-finding process.

中文翻译：

超大型博弈中的小纳什均衡证明

在许多游戏设置中，游戏并未明确给出，而只能通过玩游戏才能访问。虽然在这种情况下已经有令人印象深刻的演示，但先前的技术并没有提供安全保证，即对计算策略的博弈论可利用性的保证。在本文中，我们介绍了一种方法，该方法表明可以在此类设置中提供可利用性保证，而无需探索整个游戏。我们引入了扩展形式近似纳什均衡的证明的概念。为了验证证书，我们给出了一种算法，该算法在时间上与证书的大小呈线性关系，而不是与整个游戏的大小呈线性关系。在零和游戏中，我们进一步证明了最佳证书——考虑到目前的探索——可以用任何标准的游戏求解算法（例如，使用线性程序或反事实后悔最小化）。然而，与标准形式或完美信息的情况不同，我们表明，即使在对游戏结构做出非常好的假设之后，某些扩展形式博弈的家族也没有小的近似证明。尽管存在这种困难，但我们通过实验发现，在大型游戏甚至无限游戏中，通常存在非常小的证书，即使是精确的证书。总体而言，我们的方法使人们能够在提供可利用性保证的同时尝试自己喜欢的探索策略，从而将探索策略与均衡发现过程分离。我们表明，即使在对游戏结构做出非常好的假设之后，某些广泛形式的游戏系列也没有小的近似证书。尽管存在这种困难，但我们通过实验发现，在大型游戏甚至无限游戏中，通常存在非常小的证书，即使是精确的证书。总体而言，我们的方法使人们能够在提供可利用性保证的同时尝试自己喜欢的探索策略，从而将探索策略与均衡发现过程分离。我们表明，即使在对游戏结构做出非常好的假设之后，某些广泛形式的游戏系列也没有小的近似证书。尽管存在这种困难，但我们通过实验发现，在大型游戏甚至无限游戏中，通常存在非常小的证书，即使是精确的证书。总体而言，我们的方法使人们能够在提供可利用性保证的同时尝试自己喜欢的探索策略，从而将探索策略与均衡发现过程分离。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文