当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Decentralized Learning Dynamics for Extensive-Form Coarse Correlated Equilibrium: No Expensive Computation of Stationary Distributions Required
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-09-16 , DOI: arxiv-2109.08138
Gabriele Farina, Andrea Celli, Tuomas Sandholm

While in two-player zero-sum games the Nash equilibrium is a well-established prescriptive notion of optimal play, its applicability as a prescriptive tool beyond that setting is limited. Consequently, the study of decentralized learning dynamics that guarantee convergence to correlated solution concepts in multiplayer, general-sum extensive-form (i.e., tree-form) games has become an important topic of active research. The per-iteration complexity of the currently known learning dynamics depends on the specific correlated solution concept considered. For example, in the case of extensive-form correlated equilibrium (EFCE), all known dynamics require, as an intermediate step at each iteration, to compute the stationary distribution of multiple Markov chains, an expensive operation in practice. Oppositely, in the case of normal-form coarse correlated equilibrium (NFCCE), simple no-external-regret learning dynamics that amount to a linear-time traversal of the tree-form decision space of each agent suffice to guarantee convergence. This paper focuses on extensive-form coarse correlated equilibrium (EFCCE), an intermediate solution concept that is a subset of NFCCE and a superset of EFCE. Being a superset of EFCE, any learning dynamics for EFCE automatically guarantees convergence to EFCCE. However, since EFCCE is a simpler solution concept, this begs the question: do learning dynamics for EFCCE that avoid the expensive computation of stationary distributions exist? This paper answers the previous question in the positive. Our learning dynamics only require the orchestration of no-external-regret minimizers, thus showing that EFCCE is more akin to NFCCE than to EFCE from a learning perspective. Our dynamics guarantees that the empirical frequency of play after $T$ iteration is a $O(1/\sqrt{T})$-approximate EFCCE with high probability, and an EFCCE almost surely in the limit.

中文翻译:

广泛形式粗相关均衡的高效分散学习动态:无需昂贵的平稳分布计算

虽然在两人零和博弈中,纳什均衡是一个完善的规定性最优博弈概念,但它作为超出该设置的规定性工具的适用性是有限的。因此,保证在多人、一般和扩展形式(即树形)游戏中收敛到相关解决方案概念的分散学习动态的研究已成为活跃研究的重要课题。当前已知学习动态的每次迭代复杂性取决于所考虑的特定相关解决方案概念。例如,在扩展形式相关均衡 (EFCE) 的情况下,所有已知的动力学都需要,作为每次迭代的中间步骤,计算多个马尔可夫链的平稳分布,这在实践中是一项昂贵的操作。相反,在范式粗相关均衡 (NFCCE) 的情况下,简单的无外部后悔学习动态相当于每个代理的树形决策空间的线性时间遍历足以保证收敛。本文侧重于扩展形式的粗相关均衡 (EFCCE),这是一种中间解概念,它是 NFCCE 的子集和 EFCE 的超集。作为 EFCE 的超集,EFCE 的任何学习动态都会自动保证收敛到 EFCCE。然而,由于 EFCCE 是一个更简单的解决方案概念,这就引出了一个问题:EFCCE 的学习动态是否存在避免了平稳分布的昂贵计算?本文正面回答了前面的问题。我们的学习动态只需要无外部后悔最小化器的编排,因此,从学习的角度来看,EFCCE 更类似于 NFCCE 而不是 EFCE。我们的动态保证了 $T$ 迭代后的经验播放频率是 $O(1/\sqrt{T})$-近似的 EFCCE,概率很高,并且 EFCCE 几乎肯定在极限内。
更新日期:2021-09-17
down
wechat
bug