Unified Reinforcement Q-Learning for Mean Field Game and Control Problems,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unified Reinforcement Q-Learning for Mean Field Game and Control Problems
arXiv - CS - Multiagent Systems Pub Date : 2020-06-24 , DOI: arxiv-2006.13912
Andrea Angiuli and Jean-Pierre Fouque and Mathieu Lauri\`ere

We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simply tuning a parameter. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent can not observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.

中文翻译：

平均场博弈和控制问题的统一强化 Q 学习

我们提出了一种强化学习 (RL) 算法来解决无限范围渐近平均场博弈 (MFG) 和平均场控制 (MFC) 问题。我们的方法可以描述为统一的双时间尺度平均场 Q 学习：相同的算法可以通过简单地调整参数来学习 MFG 或 MFC 解决方案。该算法在离散时间和空间中，其中代理不仅向环境提供动作，而且还提供状态分布，以考虑问题的平均场特征。重要的是，我们假设智能体无法观察到总体分布，需要以无模型的方式对其进行估计。渐近 MFG 和 MFC 问题在连续时间和空间中呈现，并与经典（非渐近或平稳）MFG 和 MFC 问题进行比较。

更新日期：2020-06-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>