当前位置:
X-MOL 学术
›
arXiv.cs.GT
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-02 , DOI: arxiv-2009.02146 Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun Tan
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-02 , DOI: arxiv-2009.02146 Ren\'e Carmona and Kenza Hamidouche and Mathieu Lauri\`ere and Zongjun Tan
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics
and quadratic utility are studied under infinite-horizon discounted utility
function. ZSMFTG are a class of games in which two decision makers whose
utilities sum to zero, compete to influence a large population of agents. In
particular, the case in which the transition and utility functions depend on
the state, the action of the controllers, and the mean of the state and the
actions, is investigated. The game is analyzed and explicit expressions for the
Nash equilibrium strategies are derived. Moreover, two policy optimization
methods that rely on policy gradient are proposed for both model-based and
sample-based frameworks. In the first case, the gradients are computed exactly
using the model whereas they are estimated using Monte-Carlo simulations in the
second case. Numerical experiments show the convergence of the two players'
controls as well as the utility function when the two algorithms are used in
different scenarios.
中文翻译:
线性二次零和平均场类型博弈的策略优化
在本文中,在无限范围贴现效用函数下研究了具有线性动力学和二次效用的零和平均场型博弈(ZSMFTG)。ZSMFTG 是一类博弈,其中效用总和为零的两个决策者竞争影响大量代理。特别地,研究了转移函数和效用函数取决于状态、控制器的动作以及状态和动作的均值的情况。分析博弈并推导出纳什均衡策略的显式表达式。此外,针对基于模型和基于样本的框架提出了两种依赖于策略梯度的策略优化方法。在第一种情况下,梯度是使用模型精确计算的,而在第二种情况下,它们是使用蒙特卡罗模拟来估计的。
更新日期:2020-09-07
中文翻译:
线性二次零和平均场类型博弈的策略优化
在本文中,在无限范围贴现效用函数下研究了具有线性动力学和二次效用的零和平均场型博弈(ZSMFTG)。ZSMFTG 是一类博弈,其中效用总和为零的两个决策者竞争影响大量代理。特别地,研究了转移函数和效用函数取决于状态、控制器的动作以及状态和动作的均值的情况。分析博弈并推导出纳什均衡策略的显式表达式。此外,针对基于模型和基于样本的框架提出了两种依赖于策略梯度的策略优化方法。在第一种情况下,梯度是使用模型精确计算的,而在第二种情况下,它们是使用蒙特卡罗模拟来估计的。