No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-04-01 , DOI: arxiv-2004.00603
Andrea Celli, Alberto Marchesi, Gabriele Farina, Nicola Gatti

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

中文翻译：

广泛形式相关均衡的无悔学习动力学

在正规博弈中收敛到相关均衡的简单的、非耦合的无后悔动力学的存在是多智能体系统理论中的一个著名结果。具体来说，20 多年来，当所有参与者在重复的范式游戏中寻求最小化他们的内在遗憾时，游戏的经验频率会收敛到范式相关均衡。广泛形式（即树形式）游戏通过对顺序和同时移动以及私人信息进行建模来概括正常形式的游戏。由于游戏中部分信息的顺序性和存在性，广泛形式的相关性与规范形式的对应物具有显着不同的特性，其中许多仍然是开放的研究方向。扩展形式相关均衡（EFCE）已被提议作为正常形式相关均衡的自然扩展形式对应物。然而，目前尚不清楚 EFCE 是否是非耦合代理动力学的结果。在本文中，我们给出了第一个非耦合的无后悔动力学，该动力学收敛到具有完美回忆的 $n$-玩家一般和扩展形式博弈中的 EFCE 集合。首先，我们在扩展形式博弈中引入了触发后悔的概念，它扩展了正常形式博弈中的内部后悔。当每个玩家的触发后悔都低时，游戏的经验频率接近 EFCE。然后，我们给出了一个高效的无触发后悔算法。我们的算法在玩家的每个决策点将触发后悔分解为局部子问题，

更新日期：2020-06-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文