当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Trembling Hand Perfect Mean Field Equilibrium for Dynamic Mean Field Games
arXiv - CS - Multiagent Systems Pub Date : 2020-06-21 , DOI: arxiv-2006.11683
Kiyeob Lee, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

Mean Field Games (MFG) are those in which each agent assumes that the states of all others are drawn in an i.i.d. manner from a common belief distribution, and optimizes accordingly. The equilibrium concept here is a Mean Field Equilibrium (MFE), and algorithms for learning MFE in dynamic MFGs are unknown in general due to the non-stationary evolution of the belief distribution. Our focus is on an important subclass that possess a monotonicity property called Strategic Complementarities (MFG-SC). We introduce a natural refinement to the equilibrium concept that we call Trembling-Hand-Perfect MFE (T-MFE), which allows agents to employ a measure of randomization while accounting for the impact of such randomization on their payoffs. We propose a simple algorithm for computing T-MFE under a known model. We introduce both a model-free and a model based approach to learning T-MFE under unknown transition probabilities, using the trembling-hand idea of enabling exploration. We analyze the sample complexity of both algorithms. We also develop a scheme on concurrently sampling the system with a large number of agents that negates the need for a simulator, even though the model is non-stationary. Finally, we empirically evaluate the performance of the proposed algorithms via examples motivated by real-world applications.

中文翻译:

学习动态平均场博弈的颤抖手完美平均场平衡

平均场博弈 (MFG) 是其中每个智能体都假设所有其他智能体的状态是从一个共同的信念分布以相同的方式绘制的,并相应地进行优化。这里的均衡概念是平均场均衡 (MFE),由于置信分布的非平稳演化,在动态 MFG 中学习 MFE 的算法通常是未知的。我们的重点是一个重要的子类,它具有称为战略互补性 (MFG-SC) 的单调性。我们对均衡概念引入了一种自然的改进,我们称之为颤抖手完美 MFE (T-MFE),它允许代理采用随机化的措施,同时考虑到这种随机化对其收益的影响。我们提出了一种在已知模型下计算 T-MFE 的简单算法。我们引入了一种无模型和基于模型的方法来在未知的转换概率下学习 T-MFE,使用能够探索的颤抖手的想法。我们分析了两种算法的样本复杂度。我们还开发了一种使用大量代理同时对系统进行采样的方案,即使模型是非平稳的,也不需要模拟器。最后,我们通过由现实世界应用程序激发的示例经验性地评估所提出算法的性能。即使模型是非平稳的。最后,我们通过由现实世界应用程序激发的示例经验性地评估所提出算法的性能。即使模型是非平稳的。最后,我们通过由现实世界应用程序激发的示例经验性地评估所提出算法的性能。
更新日期:2020-06-23
down
wechat
bug