当前位置: X-MOL 学术Int. J. Adapt. Control Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online optimal and adaptive integral tracking control for varying discrete‐time systems using reinforcement learning
International Journal of Adaptive Control and Signal Processing ( IF 3.9 ) Pub Date : 2020-04-16 , DOI: 10.1002/acs.3115
Ibrahim Sanusi 1 , Andrew Mills 1 , Tony Dodd 1 , George Konstantopoulos 1
Affiliation  

Conventional closed‐form solution to the optimal control problem using optimal control theory is only available under the assumption that there are known system dynamics/models described as differential equations. Without such models, reinforcement learning (RL) as a candidate technique has been successfully applied to iteratively solve the optimal control problem for unknown or varying systems. For the optimal tracking control problem, existing RL techniques in the literature assume either the use of a predetermined feedforward input for the tracking control, restrictive assumptions on the reference model dynamics, or discounted tracking costs. Furthermore, by using discounted tracking costs, zero steady‐state error cannot be guaranteed by the existing RL methods. This article therefore presents an optimal online RL tracking control framework for discrete‐time (DT) systems, which does not impose any restrictive assumptions of the existing methods and equally guarantees zero steady‐state tracking error. This is achieved by augmenting the original system dynamics with the integral of the error between the reference inputs and the tracked outputs for use in the online RL framework. It is further shown that the resulting value function for the DT linear quadratic tracker using the augmented formulation with integral control is also quadratic. This enables the development of Bellman equations, which use only the system measurements to solve the corresponding DT algebraic Riccati equation and obtain the optimal tracking control inputs online. Two RL strategies are thereafter proposed based on both the value function approximation and the Q‐learning along with bounds on excitation for the convergence of the parameter estimates. Simulation case studies show the effectiveness of the proposed approach.

中文翻译:

使用强化学习的不同离散时间系统的在线最优和自适应积分跟踪控制

使用最优控制理论的最优控制问题的传统封闭形式解决方案仅在假设存在已知的系统动力学/模型描述为微分方程的情况下才可用。如果没有这样的模型,强化学习 (RL) 作为一种候选技术已成功应用于迭代解决未知或变化系统的最优控制问题。对于最优跟踪控制问题,文献中现有的 RL 技术假设使用预定的前馈输入进行跟踪控制、对参考模型动态的限制性假设或折扣跟踪成本。此外,通过使用折扣跟踪成本,现有的 RL 方法无法保证零稳态误差。因此,本文提出了一种用于离散时间 (DT) 系统的最佳在线 RL 跟踪控制框架,该框架不对现有方法施加任何限制性假设,并同样保证零稳态跟踪误差。这是通过使用在线强化学习框架中使用的参考输入和跟踪输出之间的误差积分来增强原始系统动力学来实现的。进一步表明,使用具有积分控制的增强公式的 DT 线性二次跟踪器的结果值函数也是二次的。这使得贝尔曼方程的发展成为可能,该方程仅使用系统测量来求解相应的 DT 代数 Riccati 方程并在线获得最佳跟踪控制输入。此后,基于价值函数近似和 Q 学习以及参数估计收敛的激励边界提出了两种 RL 策略。仿真案例研究表明了所提出方法的有效性。
更新日期:2020-04-16
down
wechat
bug