当前位置: X-MOL 学术Math. Meth. Oper. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discrete-time control with non-constant discount factor
Mathematical Methods of Operations Research ( IF 1.2 ) Pub Date : 2020-06-27 , DOI: 10.1007/s00186-020-00716-8
Héctor Jasso-Fuentes , José-Luis Menaldi , Tomás Prieto-Rumeau

This paper deals with discrete-time Markov decision processes (MDPs) with Borel state and action spaces, and total expected discounted cost optimality criterion. We assume that the discount factor is not constant: it may depend on the state and action; moreover, it can even take the extreme values zero or one. We propose sufficient conditions on the data of the model ensuring the existence of optimal control policies and allowing the characterization of the optimal value function as a solution to the dynamic programming equation. As a particular case of these MDPs with varying discount factor, we study MDPs with stopping, as well as the corresponding optimal stopping times and contact set. We show applications to switching MDPs models and, in particular, we study a pollution accumulation problem.



中文翻译:

具有非恒定折扣因子的离散时间控制

本文讨论了具有Borel状态和动作空间的离散时间Markov决策过程(MDP),以及总的预期折现成本最优性准则。我们假设贴现因子不是恒定的:它可能取决于状态和行为;而且,它甚至可以取零或一的极值。我们对模型的数据提出了充分的条件,以确保存在最优控制策略,并允许表征最优值函数作为动态规划方程的解决方案。作为这些MDP具有不同折扣系数的特殊情况,我们研究了带停止的MDP,以及相应的最佳停止时间和联系集。我们展示了切换MDP模型的应用,尤其是研究了污染累积问题。

更新日期:2020-06-27
down
wechat
bug