Policy Regret in Repeated Games,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Policy Regret in Repeated Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2018-11-09 , DOI: arxiv-1811.04127
Raman Arora, Michael Dinitz, Teodor V. Marinov, Mehryar Mohri

The notion of \emph{policy regret} in online learning is a well defined? performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a \emph{policy equilibrium}, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

中文翻译：

重复博弈中的政策遗憾

在线学习中 \emph {policy sorry} 的概念是一个明确定义的？适应性对手的常见场景的性能测量，更传统的数量，如外部后悔没有考虑在内。我们重新审视政策后悔的概念，并首先表明存在政策后悔和外部后悔不相容的在线学习环境：任何在一个定义方面获得有利后悔的游戏序列必须在另一个方面表现不佳。然后，我们将重点放在对手是自利代理的博弈论设置上。在这种情况下，我们证明了外部后悔和策略后悔并不冲突，事实上，只要对手也在使用这样的算法，就可以确保对这两种定义都有良好的后悔。我们还表明，无策略后悔算法的游戏序列收敛到 \emph {policy equilibrium}，这是我们引入的一种新的均衡概念。将此与外部后悔联系起来，我们表明，无外部后悔参与者收敛到的粗相关均衡是策略均衡的严格子集。因此，在博弈论环境中，每一个没有外部遗憾的游戏序列也承认没有政策遗憾，但反过来不成立。

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文