当前位置: X-MOL 学术IEEE Access › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effects of Sampling and Prediction Horizon in Reinforcement Learning
IEEE Access ( IF 3.9 ) Pub Date : 2021-09-13 , DOI: 10.1109/access.2021.3112498
Pavel Osinenko , Dmitrii Dobriborsci

Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve proper functionality. This is in contrast to classical control algorithms, which are typically model-based. A direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC). This, however, introduces new hyper-parameters related to the prediction horizon. Furthermore, RL is usually concerned with Markov decision processes. Nevertheless, most of the real environments are not time-discrete. The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system. There is thus, in fact, yet another hyper-parameter – the agent sampling time. In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC agents in a case study with a mobile robot parking, which is, in turn, a canonical control problem. We benchmark the agents with a simple variant of MPC. The sampling showed a “sweet spot” behavior, whereas the RL agents demonstrated merits at shorter horizons.

中文翻译:

抽样和预测范围在强化学习中的影响

普通强化学习 (RL) 可能容易出现收敛损失、违反约束、意外性能等问题。通常,RL 代理经过广泛的学习阶段才能实现适当的功能。这与典型的基于模型的经典控制算法形成对比。一个研究方向是 RL 与此类算法的融合,尤其是模型预测控制 (MPC)。然而,这引入了与预测范围相关的新超参数。此外,RL 通常与马尔可夫决策过程有关。然而,大多数真实环境不是时间离散的。RL 的事实物理设置由数字代理和时间连续动态系统组成。因此,事实上,还有另一个超参数——代理采样时间。在本文中,我们在一个移动机器人停车的案例研究中研究了预测范围和两个混合 RL-MPC 代理采样的影响,这又是一个规范控制问题。我们使用 MPC 的简单变体对代理进行基准测试。采样显示出“最佳点”行为,而 RL 代理在较短的范围内表现出优点。
更新日期:2021-09-24
down
wechat
bug