当前位置:
X-MOL 学术
›
arXiv.cs.RO
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty
arXiv - CS - Robotics Pub Date : 2020-11-20 , DOI: arxiv-2011.10562 Anubhav Guha, Anuradha Annaswamy
arXiv - CS - Robotics Pub Date : 2020-11-20 , DOI: arxiv-2011.10562 Anubhav Guha, Anuradha Annaswamy
Reinforcement learning (RL) algorithms have been successfully used to develop
control policies for dynamical systems. For many such systems, these policies
are trained in a simulated environment. Due to discrepancies between the
simulated model and the true system dynamics, RL trained policies often fail to
generalize and adapt appropriately when deployed in the real-world environment.
Current research in bridging this sim-to-real gap has largely focused on
improvements in simulation design and on the development of improved and
specialized RL algorithms for robust control policy generation. In this paper
we apply principles from adaptive control and system identification to develop
the model-reference adaptive control & reinforcement learning (MRAC-RL)
framework. We propose a set of novel MRAC algorithms applicable to a broad
range of linear and nonlinear systems, and derive the associated control laws.
The MRAC-RL framework utilizes an inner-loop adaptive controller that allows a
simulation-trained outer-loop policy to adapt and operate effectively in a test
environment, even when parametric model uncertainty exists. We demonstrate that
the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing
control policies that can be applied to systems with modeling errors.
中文翻译:
MRAC-RL:参数模型不确定性下的在线策略调整框架
强化学习(RL)算法已成功用于开发动态系统的控制策略。对于许多这样的系统,这些策略是在模拟环境中训练的。由于仿真模型与真实系统动力学之间存在差异,因此,在实际环境中部署时,受RL训练的策略通常无法概括和适应。弥合这种模拟与实际差距的当前研究主要集中在仿真设计的改进以及用于鲁棒控制策略生成的改进的专用RL算法的开发上。在本文中,我们将自适应控制和系统识别中的原理应用于模型参考自适应控制和强化学习(MRAC-RL)框架。我们提出了一套适用于广泛的线性和非线性系统的新颖的MRAC算法,并推导了相关的控制律。MRAC-RL框架利用内环自适应控制器,即使存在参数模型不确定性,该内环自适应控制器也允许经过模拟训练的外环策略在测试环境中适应并有效运行。我们证明,MRAC-RL方法在开发可应用于具有建模错误的系统的控制策略时,改进了最新的RL算法。
更新日期:2020-11-23
中文翻译:
MRAC-RL:参数模型不确定性下的在线策略调整框架
强化学习(RL)算法已成功用于开发动态系统的控制策略。对于许多这样的系统,这些策略是在模拟环境中训练的。由于仿真模型与真实系统动力学之间存在差异,因此,在实际环境中部署时,受RL训练的策略通常无法概括和适应。弥合这种模拟与实际差距的当前研究主要集中在仿真设计的改进以及用于鲁棒控制策略生成的改进的专用RL算法的开发上。在本文中,我们将自适应控制和系统识别中的原理应用于模型参考自适应控制和强化学习(MRAC-RL)框架。我们提出了一套适用于广泛的线性和非线性系统的新颖的MRAC算法,并推导了相关的控制律。MRAC-RL框架利用内环自适应控制器,即使存在参数模型不确定性,该内环自适应控制器也允许经过模拟训练的外环策略在测试环境中适应并有效运行。我们证明,MRAC-RL方法在开发可应用于具有建模错误的系统的控制策略时,改进了最新的RL算法。