Finite horizon continuous-time Markov decision processes with mean and variance criteria,Discrete Event Dynamic Systems

当前位置： X-MOL 学术 › Discrete Event Dyn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Finite horizon continuous-time Markov decision processes with mean and variance criteria
Discrete Event Dynamic Systems ( IF 2 ) Pub Date : 2018-09-29 , DOI: 10.1007/s10626-018-0273-1
Yonghui Huang

This paper studies mean maximization and variance minimization problems in finite horizon continuous-time Markov decision processes. The state and action spaces are assumed to be Borel spaces, while reward functions and transition rates are allowed to be unbounded. For the mean problem, we design a method called successive approximation, which enables us to prove the existence of a solution to the Hamilton-Jacobi-Bellman (HJB) equation, and then the existence of a mean-optimal policy under some growth and compact-continuity conditions. For the variance problem, using the first-jump analysis, we succeed in converting the second moment of the finite horizon reward to a mean of a finite horizon reward with new reward functions under suitable conditions, based on which the associated HJB equation for the variance problem and the existence of variance-optimal policies are established. Value iteration algorithms for computing mean- and variance-optimal policies are proposed.

中文翻译：

具有均值和方差标准的有限范围连续时间马尔可夫决策过程

本文研究了有限范围连续时间马尔可夫决策过程中的均值最大化和方差最小化问题。状态和动作空间被假定为 Borel 空间，而奖励函数和转换率被允许是无界的。对于均值问题，我们设计了一种称为逐次逼近的方法，它使我们能够证明 Hamilton-Jacobi-Bellman (HJB) 方程的解的存在性，然后证明在某些增长和紧缩下均值最优策略的存在性- 连续性条件。对于方差问题，使用第一跳分析，我们成功地将有限范围奖励的二阶矩转换为在合适条件下具有新奖励函数的有限范围奖励的均值，在此基础上，建立了方差问题的关联 HJB 方程和方差最优策略的存在性。提出了用于计算均值和方差最优策略的值迭代算法。

更新日期：2018-09-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>