Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 5-16-2018 , DOI: 10.1109/tcyb.2018.2830820
Qichao Zhang , Dongbin Zhao

This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a singlecritic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.

中文翻译：

具有未知漂移动力学的非零和博弈的基于数据的强化学习

本文关注的是具有未知漂移动力学的非零和（NZS）博弈的非线性优化问题。提出了基于数据的积分强化学习（IRL）方法来迭代逼近 NZS 博弈的纳什均衡。此外，我们证明了基于数据的IRL方法与基于模型的策略迭代算法是等价的，这保证了所提出方法的收敛性。为了实现目的，给出了 NZS 游戏的单批评神经网络结构。为了增强基于数据的IRL方法的应用能力，我们分别设计了基于离线和在线迭代学习方法的批评者权重的更新规律。值得注意的是，在线迭代学习中引入了经验回放技术，可以提高学习过程中批评者权重的收敛速度。使用 Lyapunov 方法保证了批评权重的统一最终有界性。最后，数值结果证明了基于数据的 IRL 算法对于具有未知漂移动力学的非线性 NZS 游戏的有效性。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11