当前位置: X-MOL 学术Sci. Robot. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A formal methods approach to interpretable reinforcement learning for robotic planning
Science Robotics ( IF 26.1 ) Pub Date : 2019-12-18 , DOI: 10.1126/scirobotics.aay6276
Xiao Li 1 , Zachary Serlin 1 , Guang Yang 2 , Calin Belta 1, 2
Affiliation  

A formal methods approach to reinforcement learning generates rewards from a formal language and guarantees safety. Growing interest in reinforcement learning approaches to robotic planning and control raises concerns of predictability and safety of robot behaviors realized solely through learned control policies. In addition, formally defining reward functions for complex tasks is challenging, and faulty rewards are prone to exploitation by the learning agent. Here, we propose a formal methods approach to reinforcement learning that (i) provides a formal specification language that integrates high-level, rich, task specifications with a priori, domain-specific knowledge; (ii) makes the reward generation process easily interpretable; (iii) guides the policy generation process according to the specification; and (iv) guarantees the satisfaction of the (critical) safety component of the specification. The main ingredients of our computational framework are a predicate temporal logic specifically tailored for robotic tasks and an automaton-guided, safe reinforcement learning algorithm based on control barrier functions. Although the proposed framework is quite general, we motivate it and illustrate it experimentally for a robotic cooking task, in which two manipulators worked together to make hot dogs.

中文翻译:

机器人计划中可解释的强化学习的一种正式方法

正式的强化学习方法可以从正式的语言中获得回报,并保证安全。对强化学习方法进行机器人计划和控制的兴趣日益浓厚,这引起了人们对仅通过学习的控制策略实现的机器人行为的可预测性和安全性的关注。此外,为复杂任务正式定义奖励功能具有挑战性,并且错误的奖励易于被学习代理人利用。在这里,我们提出了一种形式化的方法来强化学习:(ii)使奖励产生过程易于解释;(iii)根据规范指导策略生成过程;(iv)保证满足规范中(关键)安全组件的要求。我们计算框架的主要组成部分是专门为机器人任务量身定制的谓词时间逻辑,以及基于控制障碍函数的自动机引导的安全强化学习算法。尽管所提出的框架相当笼统,但我们将其动机化并通过实验说明它用于机器人烹饪任务,其中两个操纵器共同工作以制造热狗。
更新日期:2019-12-18
down
wechat
bug