Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints,Applied Mathematics and Optimization

当前位置： X-MOL 学术 › Appl. Math. Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints
Applied Mathematics and Optimization ( IF 1.6 ) Pub Date : 2020-08-04 , DOI: 10.1007/s00245-020-09707-x
Naoyuki Ichihara

This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value \(\lambda _*\), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if \(\lambda _*<0\), then the value function converges to a solution to the corresponding stationary equation; (ii) if \(\lambda _*>0\), then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if \(\lambda _*=0\), then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

中文翻译：

有约束的有限水平马尔可夫决策过程的值函数收敛

本文涉及具有吸收集作为约束的有限水平可数状态马尔可夫决策过程（MDP）。讨论了值迭代的收敛性，以研究随着时间范围趋于无穷大而导致的值函数的渐近行为。事实证明，值函数根据相关遍历问题的临界值\（\ lambda _ * \），即所谓的广义本征值，表现出三种不同的限制行为。具体来说，我们证明（i）如果\（\ lambda _ * <0 \），则值函数收敛到相应平稳方程的解；（ii）如果\（\ lambda _ *> 0 \），则在进行适当的归一化之后，它会找到相应遍历问题的解决方案；（iii）如果\（\ lambda _ * = 0 \），那么它最多以对数顺序发散到无穷大。当时间范围足够大时，我们使用此收敛结果来检验有限范围MDP的最佳马尔可夫策略的定性性质。

更新日期：2020-08-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11