Optimal tracking control for non‐zero‐sum games of linear discrete‐time systems via off‐policy reinforcement learning,Optimal Control Applications and Methods

当前位置： X-MOL 学术 › Optim. Control Appl. Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal tracking control for non‐zero‐sum games of linear discrete‐time systems via off‐policy reinforcement learning
Optimal Control Applications and Methods ( IF 2.0 ) Pub Date : 2020-04-16 , DOI: 10.1002/oca.2597
Yinlei Wen _{1,

2} , Huaguang Zhang _{1,

2} , Hanguang Su _{1,

2} , He Ren _{1,

2}

Affiliation

In this article, a model‐free off‐policy reinforcement learning algorithm is applied to address the optimal tracking problem based on multiplayer non‐zero‐sum games for discrete‐time linear systems. In contrast to the traditional method and the policy iteration method for solving the optimal tracking problems, the proposed algorithm operates with the system data rather than the knowledge of the system dynamics. For performing the proposed algorithm, an auxiliary augmented system is constructed via assembling the original system and the reference trajectory while a discount factor is introduced into the performance indexes. It is analyzed that the solutions of the proposed algorithm converge to the Nash equilibrium and the result is not influenced by the probing noise. Two simulations are presented to verify the feasibility and effectiveness of the proposed algorithm.

中文翻译：

通过非策略强化学习对线性离散时间系统的非零和博弈的最优跟踪控制

在本文中，基于离散时间线性系统的多人非零和博弈，采用了一种无模型的非策略强化学习算法来解决最优跟踪问题。与解决最优跟踪问题的传统方法和策略迭代方法相反，该算法以系统数据而不是系统动力学知识为基础进行操作。为了执行所提出的算法，通过将原始系统和参考轨迹进行组合来构造辅助增强系统，同时在性能指标中引入折扣因子。分析表明，该算法的解收敛于纳什均衡，其结果不受探测噪声的影响。

更新日期：2020-04-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文