当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Subgame solving without common knowledge
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-06-10 , DOI: arxiv-2106.06068
Brian Hu Zhang, Tuomas Sandholm

In imperfect-information games, subgame solving is significantly more challenging than in perfect-information games, but in the last few years, such techniques have been developed. They were the key ingredient to the milestone of superhuman play in no-limit Texas hold'em poker. Current subgame-solving techniques analyze the entire common-knowledge closure of the player's current information set, that is, the smallest set of nodes within which it is common knowledge that the current node lies. However, this set is too large to handle in many games. We introduce an approach that overcomes this obstacle, by instead working with only low-order knowledge. Our approach allows an agent, upon arriving at an infoset, to basically prune any node that is no longer reachable, thereby massively reducing the game tree size relative to the common-knowledge subgame. We prove that, as is, our approach can increase exploitability compared to the blueprint strategy. However, we develop three avenues by which safety can be guaranteed. First, safety is guaranteed if the results of subgame solves are incorporated back into the blueprint. Second, we provide a method where safety is achieved by limiting the infosets at which subgame solving is performed. Third, we prove that our approach, when applied at every infoset reached during play, achieves a weaker notion of equilibrium, which we coin affine equilibrium, and which may be of independent interest. We show that affine equilibria cannot be exploited by any Nash strategy of the opponent, so an opponent who wishes to exploit must open herself to counter-exploitation. Even without the safety-guaranteeing additions, experiments on medium-sized games show that our approach always reduced exploitability even when applied at every infoset, and a depth-limited version of it led to--to our knowledge--the first strong AI for the massive challenge problem dark chess.

中文翻译:

没有常识的子博弈求解

在不完美信息博弈中,子博弈求解明显比在完美信息博弈中更具挑战性,但在过去几年中,已经开发了此类技术。它们是无限注德州扑克中超人游戏里程碑的关键因素。当前的子博弈解决技术分析玩家当前信息集的整个公知闭包,即当前节点所在公知的最小节点集。然而,这个集合太大而无法在许多游戏中处理。我们引入了一种克服这一障碍的方法,而是仅使用低阶知识。我们的方法允许代理在到达信息集时,基本上修剪任何不再可达的节点,从而大大减少了相对于常识子博弈的博弈树大小。我们证明,与蓝图策略相比,我们的方法可以提高可利用性。但是,我们开发了三种可以保证安全的途径。首先,如果将子博弈解决的结果重新纳入蓝图,则可以保证安全。其次,我们提供了一种通过限制执行子博弈求解的信息集来实现安全性的方法。第三,我们证明了我们的方法,当应用于游戏期间到达的每个信息集时,实现了较弱的均衡概念,我们将其称为仿射均衡,并且可能具有独立的兴趣。我们表明仿射均衡不能被对手的任何纳什策略利用,因此希望利用的对手必须向反利用开放自己。
更新日期:2021-06-14
down
wechat
bug