当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-02 , DOI: arxiv-2009.02146
Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun Tan

In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The game is analyzed and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the first case, the gradients are computed exactly using the model whereas they are estimated using Monte-Carlo simulations in the second case. Numerical experiments show the convergence of the two players' controls as well as the utility function when the two algorithms are used in different scenarios.

中文翻译:

线性二次零和平均场类型博弈的策略优化

在本文中,在无限范围贴现效用函数下研究了具有线性动力学和二次效用的零和平均场型博弈(ZSMFTG)。ZSMFTG 是一类博弈,其中效用总和为零的两个决策者竞争影响大量代理。特别地,研究了转移函数和效用函数取决于状态、控制器的动作以及状态和动作的均值的情况。分析博弈并推导出纳什均衡策略的显式表达式。此外,针对基于模型和基于样本的框架提出了两种依赖于策略梯度的策略优化方法。在第一种情况下,梯度是使用模型精确计算的,而在第二种情况下,它们是使用蒙特卡罗模拟来估计的。
更新日期:2020-09-07
down
wechat
bug