当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions and Policy Optimization
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-01 , DOI: arxiv-2009.00578
Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun Tan

In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both open-loop and closed-loop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the model-based case, the gradients are computed exactly using the model, whereas they are estimated using Monte-Carlo simulations in the sample-based case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.

中文翻译:

线性二次零和平均场类型游戏:最优条件和策略优化

在本文中,在无限范围贴现效用函数下研究了具有线性动力学和二次成本的零和平均场型博弈(ZSMFTG)。ZSMFTG 是一类博弈,其中效用总和为零的两个决策者竞争影响大量无法区分的代理。特别地,研究了转移函数和效用函数取决于状态、控制器的动作以及状态和动作的均值的情况。分析了开环和闭环控制的博弈最优条件,并推导出了纳什均衡策略的显式表达式。此外,针对基于模型和基于样本的框架提出了两种依赖于策略梯度的策略优化方法。在基于模型的情况下,梯度是使用模型精确计算的,而在基于样本的情况下它们是使用蒙特卡罗模拟估计的。进行了数值实验以显示效用函数的收敛性以及两个参与者的控制。
更新日期:2020-09-02
down
wechat
bug