Reinforcement Learning of the Prediction Horizon in Model Predictive Control,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning of the Prediction Horizon in Model Predictive Control
arXiv - CS - Systems and Control Pub Date : 2021-02-22 , DOI: arxiv-2102.11122
Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

Model predictive control (MPC) is a powerful trajectory optimization control technique capable of controlling complex nonlinear systems while respecting system constraints and ensuring safe operation. The MPC's capabilities come at the cost of a high online computational complexity, the requirement of an accurate model of the system dynamics, and the necessity of tuning its parameters to the specific control application. The main tunable parameter affecting the computational complexity is the prediction horizon length, controlling how far into the future the MPC predicts the system response and thus evaluates the optimality of its computed trajectory. A longer horizon generally increases the control performance, but requires an increasingly powerful computing platform, excluding certain control applications.The performance sensitivity to the prediction horizon length varies over the state space, and this motivated the adaptive horizon model predictive control (AHMPC), which adapts the prediction horizon according to some criteria. In this paper we propose to learn the optimal prediction horizon as a function of the state using reinforcement learning (RL). We show how the RL learning problem can be formulated and test our method on two control tasks, showing clear improvements over the fixed horizon MPC scheme, while requiring only minutes of learning.

中文翻译：

模型预测控制中预测视野的强化学习

模型预测控制（MPC）是一种功能强大的轨迹优化控制技术，能够控制复杂的非线性系统，同时遵守系统约束并确保安全运行。MPC的功能是以高昂的在线计算复杂性，对系统动力学的精确模型的要求以及必须针对特定控制应用调整其参数为代价的。影响计算复杂度的主要可调参数是预测水平长度，它控制MPC预测系统响应的未来时间，从而评估其计算轨迹的最优性。更长的视线通常会提高控制性能，但需要一个功能日益强大的计算平台，但不包括某些控制应用程序。对预测地平线长度的性能敏感度在状态空间上有所不同，这激发了自适应地平线模型预测控制（AHMPC），它根据某些标准对预测地平线进行了调整。在本文中，我们建议使用强化学习（RL）学习作为状态函数的最佳预测范围。我们展示了可以如何制定RL学习问题并在两个控制任务上测试了我们的方法，显示了对固定水平MPC方案的明显改进，而只需要学习几分钟。在本文中，我们建议使用强化学习（RL）学习作为状态函数的最佳预测范围。我们展示了可以如何制定RL学习问题并在两个控制任务上测试了我们的方法，显示了对固定水平MPC方案的明显改进，而只需要学习几分钟。在本文中，我们建议使用强化学习（RL）学习作为状态函数的最佳预测范围。我们展示了可以如何制定RL学习问题并在两个控制任务上测试了我们的方法，显示了对固定水平MPC方案的明显改进，而只需要学习几分钟。

更新日期：2021-02-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文