Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-03-26 , DOI: arxiv-2003.11727
Le Cong Dinh, Long Tran-Thanh, Tri-Dung Nguyen, Alain B. Zemkoho

This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. Suppose that while repeatedly playing this game, the row player chooses her strategy at each round by using a no-regret algorithm to minimize her (pseudo) regret. We develop a no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium. We show that our algorithm is efficient against a large set of popular no-regret algorithms of the row player, including the multiplicative weight update algorithm, the online mirror descent method/follow-the-regularized-leader, the linear multiplicative weight update algorithm, and the optimistic multiplicative weight update.

中文翻译：

信息不对称重复博弈中的最后一轮收敛与无瞬间后悔

本文考虑了重复博弈，其中一个玩家比其他玩家拥有更多关于游戏的信息。特别是，我们研究了重复的两人零和游戏，其中只有列玩家知道游戏的收益矩阵 A。假设在重复玩这个游戏时，排玩家在每一轮选择她的策略，通过使用无后悔算法来最小化她的（伪）后悔。我们为列球员开发了一种无即时后悔算法，以展示最后一轮收敛到极小极大均衡。我们表明我们的算法对行播放器的大量流行的无悔算法是有效的，包括乘法权重更新算法，在线镜像下降法/跟随正则化领导者，线性乘法权重更新算法，

更新日期：2020-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文