Neural Network-Based Optimal Tracking Control of Continuous-Time Uncertain Nonlinear System via Reinforcement Learning,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neural Network-Based Optimal Tracking Control of Continuous-Time Uncertain Nonlinear System via Reinforcement Learning
Neural Processing Letters ( IF 2.6 ) Pub Date : 2020-02-28 , DOI: 10.1007/s11063-020-10220-z
Jingang Zhao

In this note, optimal tracking control for uncertain continuous-time nonlinear system is investigated by using a novel reinforcement learning (RL) scheme. The uncertainty here refers to unknown system drift dynamics. Based on the nonlinear system and reference signal, we firstly formulate the tracking problem by constructing an augmented system. The optimal tracking control problem for original nonlinear system is thus transformed into solving the Hamilton–Jacobi–Bellman (HJB) equation of the augmented system. A new single neural network (NN)-based online RL method is proposed to learn the solution of tracking HJB equation while the corresponding optimal control input that minimizes the tracking HJB equation is calculated in a forward-in-time manner without requiring any value, policy iterations and the system drift dynamics. In order to relax the dependence of the RL method on traditional Persistence of Excitation (PE) conditions, a concurrent learning technique is adopted to design the NN tuning laws. The Uniformly Ultimately Boundedness of NN weight errors and closed-loop augmented system states are rigorous proved. Three numerical simulation examples are given to demonstrate the effectiveness of the proposed scheme.

中文翻译：

基于神经网络的强化学习的连续时间不确定非线性系统最优跟踪控制

在本文中，通过使用新型强化学习（RL）方案，研究了不确定连续时间非线性系统的最优跟踪控制。这里的不确定性是指未知的系统漂移动力学。基于非线性系统和参考信号，我们首先通过构造增强系统来表达跟踪问题。因此，将原始非线性系统的最优跟踪控制问题转化为求解扩充系统的汉密尔顿-雅各比-贝尔曼（HJB）方程。提出了一种新的基于单神经网络的在线RL方法，以学习跟踪HJB方程的解决方案，同时以实时方式计算不需要跟踪HJB方程的相应最佳控制输入，而无需任何值，策略迭代和系统漂移动态。为了放松RL方法对传统的持久性（PE）条件的依赖性，采用并行学习技术来设计NN调整律。严格证明了NN权重误差和闭环增广系统状态的一致最终极限。给出了三个数值仿真例子，以证明所提方案的有效性。

更新日期：2020-02-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11