Safe Zero-Shot Model-Based Learning and Control: A Wasserstein Distributionally Robust Approach,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Safe Zero-Shot Model-Based Learning and Control: A Wasserstein Distributionally Robust Approach
arXiv - CS - Systems and Control Pub Date : 2020-04-02 , DOI: arxiv-2004.00759
Aaron Kandel and Scott J. Moura

This paper explores distributionally robust zero-shot model-based learning and control using Wasserstein ambiguity sets. Conventional model-based reinforcement learning algorithms struggle to guarantee feasibility throughout the online learning process. We address this open challenge with the following approach. Using a stochastic model-predictive control (MPC) strategy, we augment safety constraints with affine random variables corresponding to the instantaneous empirical distributions of modeling error. We obtain these distributions by evaluating model residuals in real time throughout the online learning process. By optimizing over the worst case modeling error distribution defined within a Wasserstein ambiguity set centered about our empirical distributions, we can approach the nominal constraint boundary in a provably safe way. We validate the performance of our approach using a case study of lithium-ion battery fast charging, a relevant and safety-critical energy systems control application. Our results demonstrate marked improvements in safety compared to a basic learning model-predictive controller, with constraints satisfied at every instance during online learning and control.

中文翻译：

基于安全零样本模型的学习和控制：Wasserstein 分布鲁棒的方法

本文探讨了使用 Wasserstein 歧义集的分布鲁棒的基于零样本模型的学习和控制。传统的基于模型的强化学习算法难以保证整个在线学习过程的可行性。我们通过以下方法应对这一公开挑战。使用随机模型预测控制 (MPC) 策略，我们使用与建模误差的瞬时经验分布相对应的仿射随机变量来增强安全约束。我们通过在整个在线学习过程中实时评估模型残差来获得这些分布。通过优化在以我们的经验分布为中心的 Wasserstein 模糊集中定义的最坏情况建模误差分布，我们可以以一种可证明安全的方式接近名义约束边界。我们使用锂离子电池快速充电的案例研究来验证我们方法的性能，这是一种相关且安全关键的能源系统控制应用程序。我们的结果表明，与基本学习模型预测控制器相比，安全性有了显着提高，在线学习和控制过程中的每个实例都满足约束。

更新日期：2020-04-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文