Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-06-04 , DOI: arxiv-2106.02615
Nelson Vadori, Rahul Savani, Thomas Spooner, Sumitra Ganesh

Recently, Optimistic Multiplicative Weights Update (OMWU) was proven to be the first constant step-size algorithm in the online no-regret framework to enjoy last-iterate convergence to Nash Equilibria in the constrained zero-sum bimatrix case, where weights represent the probabilities of playing pure strategies. We introduce the second such algorithm, \textit{Consensus MWU}, for which we prove local convergence and show empirically that it enjoys faster and more robust convergence than OMWU. Our algorithm shows the importance of a new object, the \textit{simplex Hessian}, as well as of the interaction of the game with the (eigen)space of vectors summing to zero, which we believe future research can build on. As for OMWU, CMWU has convergence guarantees in the zero-sum case only, but Cheung and Piliouras (2020) recently showed that OMWU and MWU display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we extend CMWU to non zero-sum games by introducing a new framework for online learning in games, where the update rule's gradient and Hessian coefficients along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new canonical decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We show empirically that our new learning policy is able to exploit the game signature across a wide range of game types.

中文翻译：

共识乘法权重更新：学习使用基于投影仪的游戏签名

最近，Optimistic Multiplicative Weights Update (OMWU) 被证明是在线无后悔框架中第一个恒定步长算法，在约束零和双矩阵情况下享受最后迭代收敛到纳什均衡，其中权重表示概率玩纯策略。我们介绍了第二种这样的算法 \textit{Consensus MWU}，我们证明了局部收敛性，并凭经验证明它比 OMWU 具有更快、更稳健的收敛性。我们的算法显示了新对象 \textit{simplex Hessian} 的重要性，以及游戏与向量总和为零的（特征）空间的交互的重要性，我们相信未来的研究可以以此为基础。对于 OMWU，CMWU 仅在零和情况下有收敛保证，但 Cheung 和 Piliouras（2020）最近表明，根据博弈是零和还是合作，OMWU 和 MWU 显示出相反的收敛特性。受这项工作和最近关于学习优化单个函数的文献的启发，我们通过引入一种新的游戏在线学习框架将 CMWU 扩展到非零和游戏，其中更新规则的梯度和沿轨迹的 Hessian 系数通过一种以游戏性质为条件的强化学习策略：\textit{游戏签名}。我们使用将两人游戏的新规范分解构建为对应于交换投影算子的八个组件，概括和统一了文献中研究的最新游戏概念。

更新日期：2021-06-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>