当前位置: X-MOL 学术Int. J. Robot. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Constrained stochastic optimal control with learned importance sampling: A path integral approach
The International Journal of Robotics Research ( IF 7.5 ) Pub Date : 2021-10-12 , DOI: 10.1177/02783649211047890
Jan Carius 1 , René Ranftl 2 , Farbod Farshidian 1 , Marco Hutter 1
Affiliation  

Modern robotic systems are expected to operate robustly in partially unknown environments. This article proposes an algorithm capable of controlling a wide range of high-dimensional robotic systems in such challenging scenarios. Our method is based on the path integral formulation of stochastic optimal control, which we extend with constraint-handling capabilities. Under our control law, the optimal input is inferred from a set of stochastic rollouts of the system dynamics. These rollouts are simulated by a physics engine, placing minimal restrictions on the types of systems and environments that can be modeled. Although sampling-based algorithms are typically not suitable for online control, we demonstrate in this work how importance sampling and constraints can be used to effectively curb the sampling complexity and enable real-time control applications. Furthermore, the path integral framework provides a natural way of incorporating existing control architectures as ancillary controllers for shaping the sampling distribution. Our results reveal that even in cases where the ancillary controller would fail, our stochastic control algorithm provides an additional safety and robustness layer. Moreover, in the absence of an existing ancillary controller, our method can be used to train a parametrized importance sampling policy using data from the stochastic rollouts. The algorithm may thereby bootstrap itself by learning an importance sampling policy offline and then refining it to unseen environments during online control. We validate our results on three robotic systems, including hardware experiments on a quadrupedal robot.



中文翻译:

具有学习重要性采样的约束随机最优控制:路径积分方法

现代机器人系统有望在部分未知的环境中稳健运行。本文提出了一种能够在此类具有挑战性的场景中控制各种高维机器人系统的算法。我们的方法基于随机最优控制的路径积分公式,我们扩展了约束处理能力。根据我们的控制律,最优输入是从系统动力学的一组随机推出中推断出来的。这些部署由物理引擎模拟,对可以建模的系统和环境类型的限制最小。虽然基于采样的算法通常不适合在线控制,我们在这项工作中展示了如何使用重要性采样和约束来有效地抑制采样复杂性并启用实时控制应用程序。此外,路径积分框架提供了一种自然的方式,可以将现有的控制架构作为辅助控制器整合到采样分布中。我们的结果表明,即使在辅助控制器发生故障的情况下,我们的随机控制算法也提供了额外的安全性和鲁棒性层。此外,在没有现有辅助控制器的情况下,我们的方法可用于使用来自随机推出的数据训练参数化重要性抽样策略。因此,该算法可以通过离线学习重要性采样策略来引导自身,然后在在线控制期间将其改进为看不见的环境。

更新日期:2021-10-12
down
wechat
bug