当前位置: X-MOL 学术J. Franklin Inst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system
Journal of the Franklin Institute ( IF 3.7 ) Pub Date : 2020-06-01 , DOI: 10.1016/j.jfranklin.2020.05.038
Yinlei Wen , Huaguang Zhang , He Ren , Kun Zhang

In this paper, a novel model-free reinforcement learning method based on off-policy is introduced to solve nonzero-sum games of discrete-time linear systems. Compared with the traditional policy iteration (PI) method, which requires the knowledge of system dynamics, the proposed method can be trained by state data directly. Moreover, the traditional PI method is proved to be influenced by probing noises. In the analysis of the proposed method, the probing noises are specifically considered and proved to have no influence on the convergence. The solution of the optimal Nash equilibrium is deduced. It is also proved that the proposed algorithm can be applied in both online manner and offline manner. A simulation of the nonzero-sum games control problem on an F-16 aircraft discrete-time system is presented, and the results verify the effectiveness of the proposed algorithm.



中文翻译:

离散时间系统基于非策略的非零和博弈的自适应动态规划方法

本文提出了一种新的基于偏离策略的无模型强化学习方法,用于求解离散线性系统的非零和博弈。与需要系统动力学知识的传统策略迭代(PI)方法相比,该方法可以直接由状态数据进行训练。此外,传统的PI方法被证明受探测噪声的影响。在对提出的方法进行分析时,专门考虑了探测噪声,并证明它们对收敛没有影响。推导最佳纳什均衡的解。还证明了该算法可以在线和离线两种方式应用。提出了F-16飞机离散时间系统上非零和博弈控制问题的仿真,

更新日期:2020-07-29
down
wechat
bug