当前位置:
X-MOL 学术
›
arXiv.cs.GT
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions and Policy Optimization
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-01 , DOI: arxiv-2009.00578 Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun Tan
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-01 , DOI: arxiv-2009.00578 Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun Tan
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics
and quadratic cost are studied under infinite-horizon discounted utility
function. ZSMFTG are a class of games in which two decision makers whose
utilities sum to zero, compete to influence a large population of
indistinguishable agents. In particular, the case in which the transition and
utility functions depend on the state, the action of the controllers, and the
mean of the state and the actions, is investigated. The optimality conditions
of the game are analysed for both open-loop and closed-loop controls, and
explicit expressions for the Nash equilibrium strategies are derived. Moreover,
two policy optimization methods that rely on policy gradient are proposed for
both model-based and sample-based frameworks. In the model-based case, the
gradients are computed exactly using the model, whereas they are estimated
using Monte-Carlo simulations in the sample-based case. Numerical experiments
are conducted to show the convergence of the utility function as well as the
two players' controls.
中文翻译:
线性二次零和平均场类型游戏:最优条件和策略优化
在本文中,在无限范围贴现效用函数下研究了具有线性动力学和二次成本的零和平均场型博弈(ZSMFTG)。ZSMFTG 是一类博弈,其中效用总和为零的两个决策者竞争影响大量无法区分的代理。特别地,研究了转移函数和效用函数取决于状态、控制器的动作以及状态和动作的均值的情况。分析了开环和闭环控制的博弈最优条件,并推导出了纳什均衡策略的显式表达式。此外,针对基于模型和基于样本的框架提出了两种依赖于策略梯度的策略优化方法。在基于模型的情况下,梯度是使用模型精确计算的,而在基于样本的情况下它们是使用蒙特卡罗模拟估计的。进行了数值实验以显示效用函数的收敛性以及两个参与者的控制。
更新日期:2020-09-02
中文翻译:
线性二次零和平均场类型游戏:最优条件和策略优化
在本文中,在无限范围贴现效用函数下研究了具有线性动力学和二次成本的零和平均场型博弈(ZSMFTG)。ZSMFTG 是一类博弈,其中效用总和为零的两个决策者竞争影响大量无法区分的代理。特别地,研究了转移函数和效用函数取决于状态、控制器的动作以及状态和动作的均值的情况。分析了开环和闭环控制的博弈最优条件,并推导出了纳什均衡策略的显式表达式。此外,针对基于模型和基于样本的框架提出了两种依赖于策略梯度的策略优化方法。在基于模型的情况下,梯度是使用模型精确计算的,而在基于样本的情况下它们是使用蒙特卡罗模拟估计的。进行了数值实验以显示效用函数的收敛性以及两个参与者的控制。