MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty
arXiv - CS - Robotics Pub Date : 2020-11-20 , DOI: arxiv-2011.10562
Anubhav Guha, Anuradha Annaswamy

Reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems. For many such systems, these policies are trained in a simulated environment. Due to discrepancies between the simulated model and the true system dynamics, RL trained policies often fail to generalize and adapt appropriately when deployed in the real-world environment. Current research in bridging this sim-to-real gap has largely focused on improvements in simulation design and on the development of improved and specialized RL algorithms for robust control policy generation. In this paper we apply principles from adaptive control and system identification to develop the model-reference adaptive control & reinforcement learning (MRAC-RL) framework. We propose a set of novel MRAC algorithms applicable to a broad range of linear and nonlinear systems, and derive the associated control laws. The MRAC-RL framework utilizes an inner-loop adaptive controller that allows a simulation-trained outer-loop policy to adapt and operate effectively in a test environment, even when parametric model uncertainty exists. We demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing control policies that can be applied to systems with modeling errors.

中文翻译：

MRAC-RL：参数模型不确定性下的在线策略调整框架

强化学习（RL）算法已成功用于开发动态系统的控制策略。对于许多这样的系统，这些策略是在模拟环境中训练的。由于仿真模型与真实系统动力学之间存在差异，因此，在实际环境中部署时，受RL训练的策略通常无法概括和适应。弥合这种模拟与实际差距的当前研究主要集中在仿真设计的改进以及用于鲁棒控制策略生成的改进的专用RL算法的开发上。在本文中，我们将自适应控制和系统识别中的原理应用于模型参考自适应控制和强化学习（MRAC-RL）框架。我们提出了一套适用于广泛的线性和非线性系统的新颖的MRAC算法，并推导了相关的控制律。MRAC-RL框架利用内环自适应控制器，即使存在参数模型不确定性，该内环自适应控制器也允许经过模拟训练的外环策略在测试环境中适应并有效运行。我们证明，MRAC-RL方法在开发可应用于具有建模错误的系统的控制策略时，改进了最新的RL算法。

更新日期：2020-11-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文