当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Verifiable and Compositional Reinforcement Learning Systems
arXiv - CS - Machine Learning Pub Date : 2021-06-07 , DOI: arxiv-2106.05864
Cyrus Neary, Christos Verginis, Murat Cubuktepe, Ufuk Topcu

We propose a novel framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL sub-systems, each of which learns to accomplish a separate sub-task, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process (pMDP) which is used to plan and to analyze compositions of sub-systems, and of the collection of low-level sub-systems themselves. By defining interfaces between the sub-systems, the framework enables automatic decompositons of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual sub-task specifications, i.e. achieve the sub-system's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the sub-systems; if they each learn a policy satisfying the appropriate sub-task specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the sub-task specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the pMDP, to automatically update the sub-task specifications to account for the observed shortcomings. The result is an iterative procedure for defining sub-task specifications, and for training the sub-systems to meet them. As an additional benefit, this procedure allows for particularly challenging or important components of an overall task to be determined automatically, and focused on, during training. Experimental results demonstrate the presented framework's novel capabilities.

中文翻译:

可验证和组合式强化学习系统

我们提出了一个用于可验证和组合强化学习 (RL) 的新框架,其中一组 RL 子系统,每个子系统都学习完成一个单独的子任务,以实现整体任务。该框架由一个高级模型组成,表示为参数马尔可夫决策过程 (pMDP),用于规划和分析子系统的组成,以及低级子系统本身的集合。通过定义子系统之间的接口,该框架实现了任务规范的自动分解,例如,以至少 0.95 的概率达到目标状态集,到单独的子任务规范中,即实现子系统的退出条件鉴于满足其进入条件,至少有一些最小概率。这反过来又允许对子系统进行独立的培训和测试;如果他们每个人都学习了满足适当子任务规范的策略,那么他们的组合就可以保证满足整个任务规范。相反,如果学习策略不能完全满足子任务规范,我们提出一种方法,公式化为在 pMDP 中找到一组最佳参数的问题,以自动更新子任务规范以说明观察到的缺点。结果是定义子任务规范和训练子系统以满足它们的迭代过程。作为一个额外的好处,这个过程允许在训练期间自动确定和关注整个任务中特别具有挑战性或重要的组成部分。
更新日期:2021-06-11
down
wechat
bug