当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stability-constrained Markov Decision Processes using MPC
Automatica ( IF 6.4 ) Pub Date : 2022-06-08 , DOI: 10.1016/j.automatica.2022.110399
Mario Zanon , Sébastien Gros , Michele Palladino

In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured approximator in the context of Reinforcement Learning, which makes it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. Because the stability theory for MPC is most mature for the undiscounted MPC case, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the undiscounted MPC-based policy with stability guarantees will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise.



中文翻译:

使用 MPC 的稳定性约束马尔可夫决策过程

在本文中,我们考虑在结果策略稳定的约束下解决折扣马尔可夫决策过程(MDP)。在实践中,MDP 是基于某种形式的策略近似来解决的。我们将利用最近提出的结果,建议在强化学习的背景下使用模型预测控制 (MPC) 作为结构化逼近器,这使得直接在基于 MPC 的策略中引入稳定性要求成为可能。这将限制MDP的解决方案以建设稳定政策。由于 MPC 的稳定性理论对于未贴现 MPC 的情况最为成熟,我们将在本文中首先展示稳定的贴现 MDP 可以重新表述为未贴现 MDP。

更新日期:2022-06-09
down
wechat
bug