Automatica ( IF 4.8 ) Pub Date : 2022-11-29 , DOI: 10.1016/j.automatica.2022.110761 Jianguo Zhao , Chunyu Yang , Weinan Gao , Hamidreza Modares , Xinkai Chen , Wei Dai
This paper considers the problem of linear quadratic tracking control (LQTC) with a discounted cost function for unknown systems. The existing design methods often require the discount factor to be small enough to guarantee the closed-loop stability. However, solving the discounted algebraic Riccati equation (ARE) may lead to ill-conditioned numerical issues if the discount factor is too small. By singular perturbation theory, we decompose the full-order discounted ARE into a reduced-order ARE and a Sylvester equation, which facilitate designing the feedback and feedforward control gains. The obtained controller is proved to be a stabilizing and near-optimal solution to the original LQTC problem. In the framework of reinforcement learning, both on-policy and off-policy two-phase learning algorithms are derived to design the near-optimal tracking control policy without knowing the discount factor. The advantages of the developed results are illustrated by comparative simulation results.
中文翻译:
未知系统的线性二次跟踪控制:一种两阶段强化学习方法
本文考虑了未知系统的具有折扣成本函数的线性二次跟踪控制 (LQTC) 问题。现有的设计方法往往要求折扣因子足够小以保证闭环稳定性。但是,如果折扣因子太小,求解折扣代数 Riccati 方程 (ARE) 可能会导致病态数值问题。通过奇异微扰理论,我们将全阶贴现 ARE 分解为降阶 ARE 和西尔维斯特方程,这有助于设计反馈和前馈控制增益。所获得的控制器被证明是对原始 LQTC 问题的稳定且接近最优的解决方案。在强化学习的框架下,推导了 on-policy 和 off-policy 两阶段学习算法,以在不知道折扣因子的情况下设计接近最优的跟踪控制策略。对比仿真结果说明了所开发结果的优点。