当前位置:
X-MOL 学术
›
arXiv.cs.SY
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
No-regret Learning in Cournot Games
arXiv - CS - Systems and Control Pub Date : 2019-06-15 , DOI: arxiv-1906.06612 Yuanyuan Shi, Baosen Zhang
arXiv - CS - Systems and Control Pub Date : 2019-06-15 , DOI: arxiv-1906.06612 Yuanyuan Shi, Baosen Zhang
This paper examines the convergence of no-regret learning in Cournot games
with continuous actions. Cournot games are the essential model for many
socio-economic systems, where players compete by strategically setting their
output quantity. We assume that players do not have full information of the
game and thus cannot pre-compute a Nash equilibrium. Two types of feedback are
considered: one is bandit feedback and the other is gradient feedback. To study
the convergence of the induced sequence of play, we introduce the notion of
convergence in measure, and show that the players' actual sequence of action
converges to the unique Nash equilibrium. In addition, our results naturally
extend the no-regret learning algorithms' time-average regret bounds to obtain
the final-iteration convergence rates. Together, our work presents
significantly sharper convergence results for learning in games without strong
assumptions on game property (e.g., monotonicity) and shows how exploiting the
game information feedback can influence the convergence rates.
中文翻译:
古诺博弈中的无悔学习
本文研究了具有连续动作的古诺博弈中无悔学习的收敛性。古诺博弈是许多社会经济系统的基本模型,玩家通过战略性地设定他们的输出数量来进行竞争。我们假设玩家没有完整的博弈信息,因此无法预先计算纳什均衡。考虑了两种类型的反馈:一种是老虎机反馈,另一种是梯度反馈。为了研究诱导游戏序列的收敛性,我们引入了度量收敛的概念,并表明参与者的实际动作序列收敛于唯一的纳什均衡。此外,我们的结果自然地扩展了无后悔学习算法的时间平均后悔界限,以获得最终迭代收敛率。一起,
更新日期:2020-02-12
中文翻译:
古诺博弈中的无悔学习
本文研究了具有连续动作的古诺博弈中无悔学习的收敛性。古诺博弈是许多社会经济系统的基本模型,玩家通过战略性地设定他们的输出数量来进行竞争。我们假设玩家没有完整的博弈信息,因此无法预先计算纳什均衡。考虑了两种类型的反馈:一种是老虎机反馈,另一种是梯度反馈。为了研究诱导游戏序列的收敛性,我们引入了度量收敛的概念,并表明参与者的实际动作序列收敛于唯一的纳什均衡。此外,我们的结果自然地扩展了无后悔学习算法的时间平均后悔界限,以获得最终迭代收敛率。一起,