当前位置: X-MOL 学术Commun. Math. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep fictitious play for stochastic differential games
Communications in Mathematical Sciences ( IF 1.2 ) Pub Date : 2021-01-01 , DOI: 10.4310/cms.2021.v19.n2.a2
Ruimeng Hu 1
Affiliation  

In this paper, we apply the idea of fictitious play to design deep neural networks (DNNs), and develop deep learning theory and algorithms for computing the Nash equilibrium of asymmetric $N$-player non-zero-sum stochastic differential games, for which we refer as deep fictitious play, a multi-stage learning process. Specifically at each stage, we propose the strategy of letting individual player optimize her own payoff subject to the other players’ previous actions, equivalent to solving $N$ decoupled stochastic control optimization problems, which are approximated by DNNs. Therefore, the fictitious play strategy leads to a structure consisting of $N$ DNNs, which only communicate at the end of each stage. The resulting deep learning algorithm based on fictitious play is scalable, parallel and model-free, i.e., using GPU parallelization, it can be applied to any $N$-player stochastic differential game with different symmetries and heterogeneities (e.g., existence of major players). We illustrate the performance of the deep learning algorithm by comparing to the closed-form solution of the linear quadratic game. Moreover, we prove the convergence of fictitious play under appropriate assumptions, and verify that the convergent limit forms an open-loop Nash equilibrium. We also discuss the extensions to other strategies designed upon fictitious play and closed-loop Nash equilibrium in the end.

中文翻译:

随机差分游戏的深度虚拟游戏

在本文中,我们将虚拟游戏的思想用于设计深度神经网络(DNN),并开发深度学习理论和算法来计算非对称$ N $玩家非零和随机微分游戏的Nash平衡,为此我们将其称为深度虚拟游戏,这是一个多阶段的学习过程。具体来说,在每个阶段,我们提出一种策略,让单个玩家根据其他玩家的先前行为来优化自己的收益,这等效于解决由DNN近似的$ N $解耦的随机控制优化问题。因此,虚拟游戏策略导致一个由$ N $ DNN组成的结构,该结构仅在每个阶段的末尾进行通信。由此产生的基于虚拟游戏的深度学习算法具有可扩展性,并行性和无模型性,,使用GPU并行化,可以将其应用于具有不同对称性和异构性(例如,主要玩家的存在)的任何$ N $玩家随机差分游戏。我们通过与线性二次博弈的闭式解进行比较来说明深度学习算法的性能。此外,我们在适当的假设下证明了虚拟游戏的收敛性,并验证了收敛极限形成了一个开环纳什均衡。最后,我们还将讨论对基于虚拟游戏和闭环Nash均衡设计的其他策略的扩展。
更新日期:2021-01-01
down
wechat
bug