当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Linear quadratic tracking control of unknown systems: A two-phase reinforcement learning method
Automatica ( IF 4.8 ) Pub Date : 2022-11-29 , DOI: 10.1016/j.automatica.2022.110761
Jianguo Zhao , Chunyu Yang , Weinan Gao , Hamidreza Modares , Xinkai Chen , Wei Dai

This paper considers the problem of linear quadratic tracking control (LQTC) with a discounted cost function for unknown systems. The existing design methods often require the discount factor to be small enough to guarantee the closed-loop stability. However, solving the discounted algebraic Riccati equation (ARE) may lead to ill-conditioned numerical issues if the discount factor is too small. By singular perturbation theory, we decompose the full-order discounted ARE into a reduced-order ARE and a Sylvester equation, which facilitate designing the feedback and feedforward control gains. The obtained controller is proved to be a stabilizing and near-optimal solution to the original LQTC problem. In the framework of reinforcement learning, both on-policy and off-policy two-phase learning algorithms are derived to design the near-optimal tracking control policy without knowing the discount factor. The advantages of the developed results are illustrated by comparative simulation results.



中文翻译:

未知系统的线性二次跟踪控制:一种两阶段强化学习方法

本文考虑了未知系统的具有折扣成本函数的线性二次跟踪控制 (LQTC) 问题。现有的设计方法往往要求折扣因子足够小以保证闭环稳定性。但是,如果折扣因子太小,求解折扣代数 Riccati 方程 (ARE) 可能会导致病态数值问题。通过奇异微扰理论,我们将全阶贴现 ARE 分解为降阶 ARE 和西尔维斯特方程,这有助于设计反馈和前馈控制增益。所获得的控制器被证明是对原始 LQTC 问题的稳定且接近最优的解决方案。在强化学习的框架下,推导了 on-policy 和 off-policy 两阶段学习算法,以在不知道折扣因子的情况下设计接近最优的跟踪控制策略。对比仿真结果说明了所开发结果的优点。

更新日期:2022-11-29
down
wechat
bug