当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee
Automatica ( IF 6.4 ) Pub Date : 2021-05-08 , DOI: 10.1016/j.automatica.2021.109689
Minghao Han , Yuan Tian , Lixian Zhang , Jun Wang , Wei Pan

Reinforcement learning (RL) is promising for complicated stochastic nonlinear control problems. Without using a mathematical model, an optimal controller can be learned from data evaluated by certain performance criteria through trial-and-error. However, the data-based learning approach is notorious for not guaranteeing stability, which is the most fundamental property for any control system. In this paper, the classic Lyapunov’s method is explored to analyze the uniformly ultimate boundedness stability (UUB) solely based on data without using a mathematical model. It is further shown how RL with UUB guarantee can be applied to control dynamic systems with safety constraints. Based on the theoretical results, both off-policy and on-policy learning algorithms are proposed respectively. As a result, optimal controllers can be learned to guarantee UUB of the closed-loop system both at convergence and during learning. The proposed algorithms are evaluated on a series of robotic continuous control tasks with safety constraints. In comparison with the existing RL algorithms, the proposed method can achieve superior performance in terms of maintaining safety. As a qualitative evaluation of stability, our method shows impressive resilience even in the presence of external disturbances.



中文翻译:

具有统一极限有界稳定性保证的约束动力系统的强化学习控制

强化学习(RL)有望解决复杂的随机非线性控制问题。在不使用数学模型的情况下,可以通过反复试验从某些性能标准评估的数据中学习最佳控制器。但是,基于数据的学习方法因不能保证稳定性而臭名昭著,这是任何控制系统的最基本属性。本文探讨了经典的Lyapunov方法,仅基于数据即可分析均一极限有界稳定性(UUB),而无需使用数学模型。进一步显示了具有UUB保证的RL如何应用于具有安全约束的动态系统。基于理论结果,分别提出了基于策略的学习算法和基于策略的学习算法。因此,可以学习最佳控制器以在收敛时和学习期间保证闭环系统的UUB。所提出的算法是在一系列具有安全约束的机器人连续控制任务上进行评估的。与现有的RL算法相比,该方法在保持安全性方面可以达到较高的性能。作为稳定性的定性评估,即使在存在外部干扰的情况下,我们的方法也显示出令人印象深刻的回弹力。

更新日期:2021-05-08
down
wechat
bug