当前位置: X-MOL 学术IEEE Trans. Robot. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A General Framework to Increase Safety of Learning Algorithms for Dynamical Systems Based on Region of Attraction Estimation
IEEE Transactions on Robotics ( IF 9.4 ) Pub Date : 2020-10-01 , DOI: 10.1109/tro.2020.2992981
Zhehua Zhou , Ozgur S. Oguz , Marion Leibold , Martin Buss

Although the state-of-the-art learning approaches exhibit impressive results for dynamical systems, only a few applications on real physical systems have been presented. One major impediment is that the intermediate policy during the training procedure may result in behaviors that are not only harmful to the system itself but also to the environment. In essence, imposing safety guarantees for learning algorithms is vital for autonomous systems acting in the real world. In this article, we propose a computationally effective and general safe learning framework, specifically for complex dynamical systems. With a proper definition of the safe region, a supervisory control strategy, which switches the actions applied on the system between the learning-based controller and a predefined corrective controller, is given. A simplified system facilitates the estimation of the safe region for the high-dimensional dynamical system. During the learning phase, the belief of the safe region is updated with the actual execution results of the corrective controller, which in turn enables the learning-based controller to have more freedom in choosing its actions. Two examples are given to demonstrate the performance of the proposed framework, one simple inverted pendulum to illustrate the online adaptation method, and one quadcopter control task to show the overall performance.

中文翻译:

一种提高基于吸引区域估计的动态系统学习算法安全性的通用框架

尽管最先进的学习方法在动态系统方面表现出令人印象深刻的结果,但在真实物理系统上的应用却很少。一个主要障碍是训练过程中的中间策略可能导致不仅对系统本身而且对环境有害的行为。从本质上讲,为学习算法强加安全保证对于在现实世界中运行的自主系统至关重要。在本文中,我们提出了一个计算有效且通用的安全学习框架,专门针对复杂的动态系统。通过对安全区域的正确定义,给出了一种监督控制策略,该策略在基于学习的控制器和预定义的纠正控制器之间切换应用在系统上的动作。简化的系统有助于估计高维动力系统的安全区域。在学习阶段,安全区域的信念随着纠正控制器的实际执行结果而更新,这反过来又使基于学习的控制器在选择其动作时有更多的自由。给出了两个例子来证明所提出框架的性能,一个简单的倒立摆来说明在线适应方法,一个四轴飞行器控制任务来展示整体性能。这反过来又使基于学习的控制器在选择其动作时有更多的自由。给出了两个例子来证明所提出框架的性能,一个简单的倒立摆来说明在线适应方法,一个四轴飞行器控制任务来展示整体性能。这反过来又使基于学习的控制器在选择其动作时有更多的自由。给出了两个例子来证明所提出框架的性能,一个简单的倒立摆来说明在线适应方法,一个四轴飞行器控制任务来展示整体性能。
更新日期:2020-10-01
down
wechat
bug