PAC-Bayes control: learning policies that provably generalize to novel environments,The International Journal of Robotics Research

当前位置： X-MOL 学术 › Int. J. Robot. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PAC-Bayes control: learning policies that provably generalize to novel environments
The International Journal of Robotics Research ( IF 7.5 ) Pub Date : 2020-10-03 , DOI: 10.1177/0278364920959444
Anirudha Majumdar ₁ , Alec Farid ₁ , Anoopkumar Sonar ₂

Affiliation

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural network-based grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural network-based policies.

中文翻译：

PAC-贝叶斯控制：可证明可推广到新环境的学习策略

我们的目标是学习机器人的控制策略，在给定示例环境数据集的情况下，这些策略可以很好地推广到新环境。我们的方法背后的关键技术思想是利用机器学习中泛化理论中的工具，通过利用控制策略泛化到新环境与监督学习环境中假设泛化之间的精确类比（我们以简化的形式呈现） . 特别是，我们利用了可能近似正确（PAC）-贝叶斯框架，这使我们能够获得在新环境中（随机）控制策略的预期成本的高概率保持的上限。我们提出了明确寻求最小化这个上限的策略学习算法。在我们在有限策略空间上进行优化的设置中，可以使用凸优化（特别是相对熵规划）来解决相应的优化问题。在连续参数化策略（例如，神经网络策略）的更一般设置中，我们使用随机梯度下降来最小化这个上限。我们展示了我们的方法应用于学习（1）反应性避障策略和（2）基于神经网络的抓取策略的模拟结果。我们还展示了 Parrot Swing 无人机在不同障碍环境中导航的硬件结果。我们的例子证明了我们的方法为具有连续状态和动作空间、复杂（例如非线性）动力学、丰富感官输入（例如，

更新日期：2020-10-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文