Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Ichihara, Naoyuki

doi:10.1007/s00245-020-09707-x

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Published: 04 August 2020

Volume 84, pages 2177–2220, (2021)
Cite this article

Applied Mathematics & Optimization Submit manuscript

Naoyuki Ichihara ORCID: orcid.org/0000-0002-9744-3240¹

686 Accesses
1 Citation
Explore all metrics

Abstract

This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value $\lambda _*$, the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if $\lambda _*<0$, then the value function converges to a solution to the corresponding stationary equation; (ii) if $\lambda _*>0$, then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if $\lambda _*=0$, then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors

Article 25 January 2015

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Article 15 February 2019

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Article 29 September 2018

References

Arapostathis, A., Borkar, V.S., Fernández-Gaucherand, E., Ghosh, M.K., Marcus, S.I.: Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)
Article MathSciNet Google Scholar
Balaji, S., Meyn, S.P.: Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stoch. Processes Appl. 90, 123–144 (2000)
Article MathSciNet Google Scholar
Barles, G., Porretta, A., Tchamba, T.: On the large time behavior of solutions of the Dirichlet problem for subquadratic viscous Hamilton–Jacobi equations. J. Math. Pures Appl. 94, 497–519 (2010)
Article MathSciNet Google Scholar
Borkar, V.S., Meyn, S.P.: Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27, 192–209 (2002)
Article MathSciNet Google Scholar
Brémaud, P.: Markov chains, Gibbs fields, Monte Carlo simulation, and queues. Texts in Applied Mathematics, vol. 31. Springer, New York (1999)
Google Scholar
Cavazos-Cadena, R., Hernández-Hernández, D.: A system of Poisson equations for a nonconstant Varadhan functional on a finite state space. Appl. Math. Optim. 53, 101–119 (2006)
Article MathSciNet Google Scholar
Cavazos-Cadena, R., Hernández-Hernández, D.: Necessary and sufficient conditions for a solution to the risk-sensitive Poisson equation on a finite state space. Syst. Control Lett. 58, 254–258 (2009)
Article MathSciNet Google Scholar
Cavazos-Cadena, R., Hernández-Hernández, D.: Local Poisson equations associated with the Varadhan functional. Asymptot. Anal. 96, 23–50 (2015)
Article MathSciNet Google Scholar
Cranston, M., Molchanov, S.: On phase transitions and limit theorems for homopolymers. Probability and mathematical physics. In: CRM Proceedings on Lecture Notes, 42, pp. 97–112. American Mathematical Society, Providence, RI (2007)
Fleming, W.H., Hernández-Hernández, D.: Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J. Control Optim. 35, 1790–1810 (1997)
Article MathSciNet Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Discrete-time Markov control processes: basic optimality criteria. Springer, New York (1996)
Book Google Scholar
Hinderer, K., Waldmann, K.-H.: The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Methods Oper. Res. 57, 1–19 (2003)
Article MathSciNet Google Scholar
Hinderer, K., Waldmann, K.-H.: Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control Optim. 43, 2109–2131 (2005)
Article MathSciNet Google Scholar
Ichihara, N.: Phase transitions for controlled Markov chains on infinite graphs. SIAM J. Control Optim. 54, 450–474 (2016)
Article MathSciNet Google Scholar
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87, 159–182 (2012)
Article MathSciNet Google Scholar
Kontoyiannis, I., Meyn, S.P.: Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13, 304–362 (2003)
Article MathSciNet Google Scholar
Patek, S.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37, 1379–1386 (2001)
Article Google Scholar
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Book Google Scholar
Seneta, E.: Non-negative Matrices and Markov Chains. Springer Series in Statistics, 2nd edn. Springer, Berlin (1980)
MATH Google Scholar
Tchamba, T.: Large time behavior of solutions of viscous Hamilton–Jacobi equations with superquadratic Hamiltonian. Asymptot. Anal. 66, 161–186 (2010)
Article MathSciNet Google Scholar
Todorov, E.: Linearly-solvable Markov decision problems. Adv. Neural Inf. Proc. Syst. 19, 1369–1376 (2006)
Google Scholar

Download references

Acknowledgements

The author would like to thank two anonymous referees for their valuable comments and criticisms that allowed him to improve not only the results but also the presentation of this paper.

Author information

Authors and Affiliations

Department of Physics and Mathematics, Aoyama Gakuin University, 5-10-1, Fuchinobe, Chuo-ku, Sagamihara-shi, Kanagawa, 252-5258, Japan
Naoyuki Ichihara

Authors

Naoyuki Ichihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naoyuki Ichihara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported in part by JSPS KAKENHI Grant No. 18K03343.

Appendices

Appendix A: Verification Theorem

In this appendix, we verify (2.4) under our standing assumptions (A1)-(A4). Since this kind of verification arguments are quite standard (e.g., [18]), we give only a sketch of the proof.

We first consider the case where initial function f is real valued.

Proposition A.1

Suppose that (A1)-(A3) hold. Let $\{v_n\}$ be defined by (2.2) with initial function f being real valued and bounded above in D. Then, (2.4) holds for all $n\ge 1$ and $x\in D$.

Proof

Fix any $\varvec{p}=\{p_j\}\in {{\mathcal {A}}}$, $n\ge 1$, and $x\in D$. We may assume without loss of generality that the support of $p_j(x,\,\cdot \,)$ is included in the support of $P(x,\,\cdot \,)$ for all $ j\ge 0$ and $x\in D$. Then, since $\{y\in V\,|\, P_D^{(n)}(x,y)>0\}$ is finite by (A1), we see from Dynkin’s formula that

$$\begin{aligned} v_n(x)=\mathbf {E}^{\varvec{p}}_x\left[ \sum _{j=0}^{n-1}(v_{n-j}-p_jv_{n-j-1})(X_j){{\mathbf {1}}}_{\{\tau _G>j\}}+v_{n-n\wedge \tau _G}(X_{n\wedge \tau _G})\right] . \end{aligned}$$

Furthermore, since $v_{n-j}=g$ in G for all $0\le j\le n$, $v_0=f$ in D, and

$$\begin{aligned} v_{n-j}(y)\ge p_jv_{n-j-1}(y)-c(y,p_j(y,\,\cdot \,))+r(y) \end{aligned}$$

for all $y\in D$ and $0\le j\le n-1$, we have

$$\begin{aligned} v_n(x)&\ge \mathbf {E}^{\varvec{p}}_x\left[ \sum _{j=0}^{n-1}(r(X_j)-c(X_j,\,\cdot \,)){{\mathbf {1}}}_{\{\tau _G>j\}}+f(X_n){{\mathbf {1}}}_{\{n<\tau _G\}}+g(X_{\tau _G}){{\mathbf {1}}}_{\{n\ge \tau _G\}}\right] \\&=J_n(x;\varvec{p}). \end{aligned}$$

Hence, $v_n(x)\ge \sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p})$. We also notice that the previous inequality holds with equality if we choose a ${\hat{\varvec{p}}}=\{\hat{p}_j \}\in {{\mathcal {A}}}$ such that

$$\begin{aligned} H[v_{n-j-1}](y)= \hat{p}_jv_{n-j-1}(y)-c(y,\hat{p}_j(y,\,\cdot \,)),\quad y\in D,\quad 0\le j\le n-1. \end{aligned}$$

Note that such a ${\hat{\varvec{p}}}$ exists by virtue of Lemma 3.1. Indeed, each ${\hat{p}}_j$, with $0\le j\le n-1$, is uniquely determined by

$$\begin{aligned} {\hat{p}}_j(x,y):=\frac{P(x,y)e^{\kappa (x)v_{n-j-1}(y)}}{\sum _{z\in V}P(x,z)e^{\kappa (x)v_{n-j-1}(z)}} \end{aligned}$$

for $x,y\in V$. Hence, we obtain (2.4). $\square $

We now consider the case where initial functions satisfy (A4).

Proposition A.2

Suppose that (A1)-(A4) hold. Let $\{v_n\}$ be defined by (2.2). Then, (2.4) holds for all $n\ge 1$ and $x\in D$.

Proof

Let $\{f^{(k)}\}$ be a sequence of initial functions such that $f^{(k)}$ is real valued and bounded above in D for all $k\ge 1$, that $\{f^{(k)}(x)\}$ is non-increasing in k for all $x\in D$, and that $\{f^{(k)}\}$ converges to f in D as $k\rightarrow \infty $. For each $k\ge 1$, let $\{v^{(k)}_n\}$ be the sequence of functions defined by (2.2) with $v_0=f^{(k)}$. It is clear that $v^{(k)}_n\ge v^{(k+1)}_n$ in D for all $n\ge 1$ and $k\ge 1$. Moreover, for any $n\ge 1$, $\{v^{(k)}_n\}$ converges to $v_n$ in D as $k\rightarrow \infty $. Indeed, if we set $v^{(\infty )}_n:=\lim _{k\rightarrow \infty }v^{(k)}_n$, then we observe from Lemma 3.1 and the locally finiteness of P that, for all $x\in D$ and $n\ge 0$,

$$\begin{aligned} \lim _{k\rightarrow \infty } H[v^{(k)}_n](x)&=\lim _{k\rightarrow \infty }\frac{1}{\kappa (x)}\log \sum _{y\in V}P(x,y)e^{\kappa (x)v^{(k)}_n(y)}\\&=\frac{1}{\kappa (x)}\log \sum _{y\in V}P(x,y)e^{\kappa (x)v^{(\infty )}_n(y)}=H[v^{(\infty )}_n](x), \end{aligned}$$

where equalities may hold with $-\infty =-\infty $. Thus, passing to the limit as $k\rightarrow \infty $ in (2.2) with $v^{(k)}_n$ in place of $v_n$, we see that $\{v^{(\infty )}_n\}$ satisfies (2.2) with $v_0=f$. This implies that $v^{(\infty )}_n=v_n$ in D for all $n\ge 1 $. Hence, $\{v^{(k)}_n\}$ converges to $v_n$ in D as $k\rightarrow \infty $.

Now, let $J^{(k)}_n(x;\varvec{p})$ be the reward criterion defined by (2.3) with $f^{(k)}$ instead of f. Then, by Proposition A.1, we have $v^{(k)}_n(x)=\sup _{\varvec{p}\in {{\mathcal {A}}}}J^{(k)}_n(x;\varvec{p})$ for all $x\in D$, $n\ge 1$, and $k\ge 1$. Noting that $J^{(k)}_n(x;\varvec{p})\ge J_n(x;\varvec{p})$, we have

$$\begin{aligned} v_n(x)=\lim _{k\rightarrow \infty }v^{(k)}_n(x)=\lim _{k\rightarrow \infty }\sup _{\varvec{p}\in {{\mathcal {A}}}}J^{(k)}_n(x;\varvec{p})\ge \sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p}). \end{aligned}$$

In order to verify the opposite inequality, fix any $x\in D$ and $n\ge 1$. Note that $v_n(x)>-\infty $ if and only if either there exist some $x_1,\ldots ,x_{n-1},x_n\in D$ with $x_n\in F$ such that $P(x_{j-1},x_j)>0$ for all $1\le j\le n$, or there exist some $1\le m\le n$, $x_1,\ldots ,x_{m-1}\in D$, and $x_m\in G$ such that $P(x_{j-1},x_j)>0$ for all $1\le j\le m$, where $x_0:=x$. In either case, one can find a ${\hat{\varvec{p}}}=\{\hat{p}_j\}\in {{\mathcal {A}}}$ such that

$$\begin{aligned} H[v_{j-n-1}](y)=\hat{p}_jv_{n-j-1}(y)-c(y,\hat{p}_j(y,\,\cdot \,))>-\infty \end{aligned}$$

for all $0\le j\le n-1$ and $ y\in D$ along a suitable path from x to a point in $F\cup G$. This implies, as in the proof of Proposition A.1, that $v_n(x)=J_n(x;{\hat{\varvec{p}}})$. On the other hand, if $v_n(x)=-\infty $, then $J_n(x;\varvec{p})=-\infty $ for all $\varvec{p}\in {{\mathcal {A}}}$. Hence, we obtain $v_n(x)=\sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p})$. $\square $

Appendix B: Solvability of (3.2)

In this appendix we prove the existence and uniqueness of solution $(\lambda _n,w_n)$ to (3.2). We follow the argument of [14, Appendix]. Hereafter, we fix an arbitrary $n\ge 1$. Recall that $D_n$ is finite and $P_n$ is irreducible on $D_n$.

We first consider the following equation with discount factor $\alpha \in (0,1]$:

$$\begin{aligned} w=H_n[\alpha w]+r\quad \text {in } D_n. \end{aligned}$$

(B.1)

As usual, we say that w is a supersolution (resp. subsolution) of (B.1) if $w\ge H_n[\alpha w]+r$ in $D_n$ (resp. $w\le H_n[\alpha w]+r$ in $D_n$), and that w is a solution of (B.1) if it is both a subsolution and a supersolution of (B.1).

We first observe that the following comparison principle holds.

Theorem B.1

(i) Assume $\alpha \in (0,1)$. Let $w_1$ and $w_2$ be a subsolution and a supersolution of (B.1), respectively. Then $w_1\le w_2$ in $D_n$.

(ii) Let $w_1$ and $w_2$ be a subsolution and a supersolution of $w=H_n[w]+r$ in a subset $A\subset D_n$. Then, $w_1\le w_2$ in $D_n\setminus A$ implies $w_1\le w_2$ in A.

Proof

We first prove (i). Set $w:=w_2-w_1$, and let $\bar{q}={\bar{q}} (x,y)$ be the maximizer of $H_n[\alpha w_1]$. Then, we see that $\alpha (\bar{q} w)\le w$ in $D_n$. We now claim that $w\ge 0$ in $D_n$. In order to verify it, assume contrarily that $\min _{D_n}w=w({\bar{x}})<0$ for some ${\bar{x}}\in D_n$. Then, since $\alpha (\bar{q} w)({\bar{x}})\le w({\bar{x}})<\alpha w({\bar{x}})$, we see that

$$\begin{aligned} 0\le \alpha \sum _{y\in D_n}\bar{q}({\bar{x}},y)(w(y)-w({\bar{x}}))<0, \end{aligned}$$

which is a contradiction. Hence, $w\ge 0$ in $D_n$, and we have proved that $w_1\le w_2$ in $D_n$.

We next show (ii). We may assume without loss of generality that $A\ne \emptyset $. Otherwise, there is nothing to prove. Set $w:=w_2-w_1$ and, as above, we assume that $\min _Aw=w({\bar{x}})<0$ for some ${\bar{x}}\in A$. Let $\bar{q}={\bar{q}} (x,y)$ be the maximizer of $H_n[w_1]$. Note that $\bar{q}$ is irreducible on $D_n$. Then, since $\bar{q}w\le w$ in A, we have

$$\begin{aligned} 0\le \sum _{y\in D_n}\bar{q}({\bar{x}},y)(w(y)-w({\bar{x}}))\le 0. \end{aligned}$$

This implies that $w(y)=w({\bar{x}})$ for any $y\in D_n$ such that $\bar{q}({\bar{x}},y)>0$. Repeating this procedure and noting that $\bar{q}$ is irreducible on $D_n$, one can find a pair $({\bar{y}},{\bar{z}})$ such that ${\bar{y}}\in A$, ${\bar{z}}\in D_n\setminus A$, $w({\bar{y}})=\min _A w<0$, and $\bar{q}({\bar{y}},{\bar{z}})>0$. Since $w\ge w({\bar{y}})$ in $D_n$, this implies that

$$\begin{aligned} 0<\bar{q}({\bar{y}},{\bar{z}})(w({\bar{z}})-w({\bar{y}}))\le \sum _{z\in D_n}\bar{q}({\bar{y}}, z)(w(z)-w({\bar{y}}))=(\bar{q}w)({\bar{y}})-w({\bar{y}})\le 0, \end{aligned}$$

a contradiction. Hence, $w\ge 0$ in A, and the proof is complete. $\square $

Now, we discuss the solvability of (B.1) for $\alpha \in (0,1)$. To this end, we set

$$\begin{aligned} d_n(x,y):=\inf \{j\ge 0\,|\, P_n^{(j)}(x,y)>0\} \end{aligned}$$

for $x,y\in D_n$. Note that, since $D_n$ is finite and $P_n$ is irreducible on $D_n$, there exists an $M>0$ such that $x,y\in D_n$ and $P_n(x,y)>0$ imply $d_n(y,x)\le M$.

Theorem B.2

Assume that $\alpha \in (0,1)$. Then there exists a unique solution $w_\alpha $ of (B.1). Furthermore, for any $x,y\in D_n$,

$$\begin{aligned}&\min _{D_n}r\le (1-\alpha )w_\alpha (x)\le \max _{D_n}r,\qquad \nonumber \\&\quad \alpha |w_\alpha (y)-w_\alpha (x)|\le Kd_n(x,y), \end{aligned}$$

(B.2)

where K is some constant not depending on $\alpha $.

Proof

The existence part can be shown by the standard value iteration argument, so that we omit it. Uniqueness follows from Theorem B.1 (i). In order to obtain the estimates (B.2) for the solution $w_\alpha $ of (B.1), we set $w_1:=\underline{r}/(1-\alpha )$ and $w_2:={\overline{r}}/(1-\alpha )$, where $\underline{r}:=\min _{D_n}r$, ${\overline{r}}:=\max _{D_n}r$. Then, for any $x\in D_n$,

$$\begin{aligned} w_1(x)-H_n[\alpha w_1](x)-r(x)= & {} \frac{{\underline{r}}}{1-\alpha }-\alpha \frac{{\underline{r}}}{1-\alpha }+\inf _{p\in {{\mathcal {P}}}(D_n)} c_n(x,p)-r(x)\\= & {} {\underline{r}}-r(x)\le 0. \end{aligned}$$

This implies that $w_1$ is a subsolution of (B.1). Similarly, one can verify that $w_2$ is a supersolutions of (B.1). Applying Theorem B.1 (i), we have $w_1\le w_\alpha \le w_2$ in $D_n$. Hence, the first estimate in (B.2) is valid.

To prove the second estimate, fix any $x,y\in D_n$ such that $P_n(x,y)>0$. Then we see from (B.1) that $w_\alpha (x)\ge \alpha w_\alpha (y)-c_n(x,{{\mathbf {1}}}_{\{y\}})+r(x)$. This and the previous estimate imply that

$$\begin{aligned} \alpha (w_\alpha (y)-w_\alpha (x))\le & {} (1-\alpha ) w_\alpha (x)+c_n(x,{{\mathbf {1}}}_{\{y\}})-r(x)\\\le & {} \frac{1}{\kappa (x)}\log \frac{1}{P_n(x,y)}+{{\overline{r}}}-r(x). \end{aligned}$$

On the other hand, since $d_n(y,x)\le M$, we can find some $y_0:=y$, $y_1,\ldots , y_{m-1}, y_m=:x\in D_n$ with $m\le M$ such that $P_n(y_{i-1},y_{i})>0$ for all $1\le i\le m$. In particular, by using the above estimate repeatedly, we have

$$\begin{aligned} \alpha (w_\alpha (x)-w_\alpha (y))&=\sum _{i=1}^m \alpha (w_\alpha (y_i)-w_\alpha (y_{i-1}))\\&\le \sum _{i=1}^m\left\{ \frac{1}{\kappa (y_{i-1})}\log \frac{1}{P_n(y_{i-1},y_i)}+{{\overline{r}}}-r(y_{i-1})\right\} . \end{aligned}$$

Combining the last two estimates and setting $y_{m+1}:=x$, we obtain

$$\begin{aligned} \alpha |w_\alpha (x)-w_\alpha (y)|\le \sum _{i=1}^{m+1}\left\{ \frac{1}{\kappa (y_{i-1})}\log \frac{1}{P_n(y_{i-1},y_i)}+{{\overline{r}}}-r(y_{i-1})\right\} =:K. \end{aligned}$$

Hence, the second estimate is valid if $x,y\in D_n$ satisfy $P_n(x,y)>0$.

For general $x,y\in D_n$, we choose a finite sequence $z_0,z_1,\ldots ,z_N\in A$ with $N:=d_n(x,y)$ such that $z_0=x$ and $z_N=y$. Then, applying the previous estimate repeatedly, we obtain

$$\begin{aligned} \alpha |w_\alpha (y)-w_\alpha (x)|\le \sum _{j=1}^N \alpha |w_\alpha (z_j)-w_\alpha (z_{j-1})|\le \sum _{j=1}^NK=NK=Kd_n(x,y). \end{aligned}$$

Since K clearly does not depend on $\alpha $, we have completed the proof. $\square $

We are in position to prove the solvability of (3.2).

Theorem B.3

There exists a unique solution $(\lambda ,w)$ of (3.2). Moreover, there exists a $K>0$ such that, for all $x,y\in D_n$,

$$\begin{aligned} \min _{D_n}r\le \lambda \le \max _{D_n}r,\qquad |w(y)-w(x)|\le Kd_n(x,y). \end{aligned}$$

Proof

We first prove the existence part. Fix any $\alpha \in (0,1)$. Let $w_\alpha $ be the unique solution of (B.1). We set $\lambda _\alpha :=(1-\alpha )w_\alpha (x_0)$ and ${\hat{w}}_\alpha :=w_\alpha -w_\alpha (x_0)$. Then $(\lambda _\alpha ,{\hat{w}}_\alpha )$ enjoys

$$\begin{aligned} \lambda _\alpha +{\hat{w}}_\alpha =H_n[\alpha {\hat{w}}_\alpha ]+r\quad \text {in } D_n,\quad {\hat{w}}_\alpha (x_0)=0. \end{aligned}$$

(B.3)

In view of (B.2), one can find an increasing sequence $\{\alpha _j\}$ converging to 1 as $j\rightarrow \infty $ such that $\lambda _j:=\lambda _{\alpha _j}\rightarrow \lambda $ and ${\hat{w}}_j:={\hat{w}}_{\alpha _j}\rightarrow w$ in $D_n$ as $j\rightarrow \infty $ for some $\lambda \in \mathbf {R}$ and $w:D_n\rightarrow \mathbf {R}$. It is obvious that $(\lambda ,w)$ enjoys the estimates (B.2) with w in place of $w_\alpha $. It is also easy to check that $(\lambda ,w)$ solves (3.2) by letting $j\rightarrow \infty $ in (B.3) with $\alpha =\alpha _j$. Hence, the existence part has been proved.

We next show the uniqueness part. Let $(\lambda _1,w_1)$ and $(\lambda _2,w_2)$ be a subsolution and a supersolutions of (3.2), respectively. We first show that $\lambda _1\le \lambda _2$, which leads to the uniqueness of $\lambda $. Set $w:=w_2-w_1$, and let $\bar{q}$ be the maximizer of $H_n[w_1]$. Note that ${\bar{q}}$ is irreducible on $D_n$. Then we see that, for all $x\in D_n$,

$$\begin{aligned} \lambda _2-\lambda _1+w(x)\ge (\bar{q} w)(x). \end{aligned}$$

(B.4)

Since $D_n$ is finite and $\bar{q}$ is irreducible on $D_n$, there exists a unique invariant distribution $\pi $ on $D_n$ associated with $\bar{q}$. Multiplying both sides of (B.4) by $\pi $ and taking the sums of all $x\in D_n$, we have

$$\begin{aligned} \lambda _2-\lambda _1+\pi w\ge \pi \bar{q} w=\pi w\quad \text {in }D_n, \end{aligned}$$

where $\pi w:=\sum _{x\in D_n }\pi (x)w(x)$. Since $\pi w$ is finite, we obtain $\lambda _2\ge \lambda _1$.

We next show that $w_1=w_2$ in $D_n$. By setting $\lambda _1=\lambda _2$ in (B.4), we observe that $\bar{q} w\le w$ in $D_n$. We now choose an ${\bar{x}}\in D_n$ such that $w({\bar{x}})=\min _{D_n}w$. Then,

$$\begin{aligned} 0\le \sum _{y\in D_n}{\bar{q}}({\bar{x}},y)(w(y)-w({\bar{x}}))=\bar{q} w({\bar{x}}) -w({\bar{x}})\le 0. \end{aligned}$$

This implies as in the proof of Proposition B.1 (ii) that w is constant in $D_n$. Since $w(x_0)=0$, we obtain $w=0$ in $D_n$, that is, $w_1=w_2$ in $D_n$. Hence, we have completed the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ichihara, N. Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints. Appl Math Optim 84, 2177–2220 (2021). https://doi.org/10.1007/s00245-020-09707-x

Download citation

Published: 04 August 2020
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00245-020-09707-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Abstract

Access this article

Similar content being viewed by others

Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Finite horizon continuous-time Markov decision processes with mean and variance criteria

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Verification Theorem

Proposition A.1

Proof

Proposition A.2

Proof

Appendix B: Solvability of (3.2)

Theorem B.1

Proof

Theorem B.2

Proof

Theorem B.3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Abstract

Access this article

Similar content being viewed by others

Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Finite horizon continuous-time Markov decision processes with mean and variance criteria

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Verification Theorem

Proposition A.1

Proof

Proposition A.2

Proof

Appendix B: Solvability of (3.2)

Theorem B.1

Proof

Theorem B.2

Proof

Theorem B.3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation