No-regret Learning in Cournot Games,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

No-regret Learning in Cournot Games
arXiv - CS - Systems and Control Pub Date : 2019-06-15 , DOI: arxiv-1906.06612
Yuanyuan Shi, Baosen Zhang

This paper examines the convergence of no-regret learning in Cournot games with continuous actions. Cournot games are the essential model for many socio-economic systems, where players compete by strategically setting their output quantity. We assume that players do not have full information of the game and thus cannot pre-compute a Nash equilibrium. Two types of feedback are considered: one is bandit feedback and the other is gradient feedback. To study the convergence of the induced sequence of play, we introduce the notion of convergence in measure, and show that the players' actual sequence of action converges to the unique Nash equilibrium. In addition, our results naturally extend the no-regret learning algorithms' time-average regret bounds to obtain the final-iteration convergence rates. Together, our work presents significantly sharper convergence results for learning in games without strong assumptions on game property (e.g., monotonicity) and shows how exploiting the game information feedback can influence the convergence rates.

中文翻译：

古诺博弈中的无悔学习

本文研究了具有连续动作的古诺博弈中无悔学习的收敛性。古诺博弈是许多社会经济系统的基本模型，玩家通过战略性地设定他们的输出数量来进行竞争。我们假设玩家没有完整的博弈信息，因此无法预先计算纳什均衡。考虑了两种类型的反馈：一种是老虎机反馈，另一种是梯度反馈。为了研究诱导游戏序列的收敛性，我们引入了度量收敛的概念，并表明参与者的实际动作序列收敛于唯一的纳什均衡。此外，我们的结果自然地扩展了无后悔学习算法的时间平均后悔界限，以获得最终迭代收敛率。一起，

更新日期：2020-02-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文