当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Economic MPC of Markov Decision Processes: Dissipativity in undiscounted infinite-horizon optimal control
Automatica ( IF 6.4 ) Pub Date : 2022-09-23 , DOI: 10.1016/j.automatica.2022.110602
Sebastien Gros, Mario Zanon

Economic Model Predictive Control (MPC) dissipativity theory is central to discussing the stability of policies resulting from minimizing economic stage costs. In its current form, the dissipativity theory for economic MPC applies to problems based on deterministic dynamics or to very specific classes of stochastic problems, and does not readily extend to generic Markov decision processes. In this paper, we clarify the core reason for this difficulty, and propose a generalization of the economic MPC dissipativity theory that circumvents it. This generalization focuses on undiscounted infinite-horizon problems and is based on nonlinear stage cost functionals, allowing one to discuss the Lyapunov asymptotic stability of policies for Markov decision processes in terms of the probability measures underlying their stochastic dynamics. This theory is illustrated for the stochastic linear quadratic regulator with Gaussian process noise, for which a storage functional can be provided explicitly. For the sake of brevity, we limit our discussion to undiscounted Markov decision processes.



中文翻译:

马尔可夫决策过程的经济 MPC:未折现无限范围最优控制中的耗散性

经济模型预测控制(MPC) 耗散理论是讨论最小化经济阶段成本所产生的政策稳定性的核心。以目前的形式,经济 MPC 的耗散理论适用于基于确定性动力学的问题或非常特定类别的随机问题,并且不容易扩展到通用马尔可夫决策过程。在本文中,我们阐明了这一困难的核心原因,并提出了一种规避它的经济 MPC 耗散理论的概括。这种概括侧重于未折现的无限范围问题,并基于非线性阶段成本泛函,允许人们讨论 Lyapunov渐近稳定性马尔可夫决策过程的策略根据其随机动态背后的概率度量。该理论适用于具有高斯过程噪声的随机线性二次调节器,可以明确提供存储功能。为简洁起见,我们将讨论限制在未折现的马尔可夫决策过程上。

更新日期:2022-09-23
down
wechat
bug