Skip to main content
Log in

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

  • Published:
Applied Mathematics & Optimization Submit manuscript

Abstract

This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value \(\lambda _*\), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if \(\lambda _*<0\), then the value function converges to a solution to the corresponding stationary equation; (ii) if \(\lambda _*>0\), then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if \(\lambda _*=0\), then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arapostathis, A., Borkar, V.S., Fernández-Gaucherand, E., Ghosh, M.K., Marcus, S.I.: Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)

    Article  MathSciNet  Google Scholar 

  2. Balaji, S., Meyn, S.P.: Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stoch. Processes Appl. 90, 123–144 (2000)

    Article  MathSciNet  Google Scholar 

  3. Barles, G., Porretta, A., Tchamba, T.: On the large time behavior of solutions of the Dirichlet problem for subquadratic viscous Hamilton–Jacobi equations. J. Math. Pures Appl. 94, 497–519 (2010)

    Article  MathSciNet  Google Scholar 

  4. Borkar, V.S., Meyn, S.P.: Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27, 192–209 (2002)

    Article  MathSciNet  Google Scholar 

  5. Brémaud, P.: Markov chains, Gibbs fields, Monte Carlo simulation, and queues. Texts in Applied Mathematics, vol. 31. Springer, New York (1999)

    Google Scholar 

  6. Cavazos-Cadena, R., Hernández-Hernández, D.: A system of Poisson equations for a nonconstant Varadhan functional on a finite state space. Appl. Math. Optim. 53, 101–119 (2006)

    Article  MathSciNet  Google Scholar 

  7. Cavazos-Cadena, R., Hernández-Hernández, D.: Necessary and sufficient conditions for a solution to the risk-sensitive Poisson equation on a finite state space. Syst. Control Lett. 58, 254–258 (2009)

    Article  MathSciNet  Google Scholar 

  8. Cavazos-Cadena, R., Hernández-Hernández, D.: Local Poisson equations associated with the Varadhan functional. Asymptot. Anal. 96, 23–50 (2015)

    Article  MathSciNet  Google Scholar 

  9. Cranston, M., Molchanov, S.: On phase transitions and limit theorems for homopolymers. Probability and mathematical physics. In: CRM Proceedings on Lecture Notes, 42, pp. 97–112. American Mathematical Society, Providence, RI (2007)

  10. Fleming, W.H., Hernández-Hernández, D.: Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J. Control Optim. 35, 1790–1810 (1997)

    Article  MathSciNet  Google Scholar 

  11. Hernández-Lerma, O., Lasserre, J.B.: Discrete-time Markov control processes: basic optimality criteria. Springer, New York (1996)

    Book  Google Scholar 

  12. Hinderer, K., Waldmann, K.-H.: The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Methods Oper. Res. 57, 1–19 (2003)

    Article  MathSciNet  Google Scholar 

  13. Hinderer, K., Waldmann, K.-H.: Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control Optim. 43, 2109–2131 (2005)

    Article  MathSciNet  Google Scholar 

  14. Ichihara, N.: Phase transitions for controlled Markov chains on infinite graphs. SIAM J. Control Optim. 54, 450–474 (2016)

    Article  MathSciNet  Google Scholar 

  15. Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87, 159–182 (2012)

    Article  MathSciNet  Google Scholar 

  16. Kontoyiannis, I., Meyn, S.P.: Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13, 304–362 (2003)

    Article  MathSciNet  Google Scholar 

  17. Patek, S.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37, 1379–1386 (2001)

    Article  Google Scholar 

  18. Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)

    Book  Google Scholar 

  19. Seneta, E.: Non-negative Matrices and Markov Chains. Springer Series in Statistics, 2nd edn. Springer, Berlin (1980)

    MATH  Google Scholar 

  20. Tchamba, T.: Large time behavior of solutions of viscous Hamilton–Jacobi equations with superquadratic Hamiltonian. Asymptot. Anal. 66, 161–186 (2010)

    Article  MathSciNet  Google Scholar 

  21. Todorov, E.: Linearly-solvable Markov decision problems. Adv. Neural Inf. Proc. Syst. 19, 1369–1376 (2006)

    Google Scholar 

Download references

Acknowledgements

The author would like to thank two anonymous referees for their valuable comments and criticisms that allowed him to improve not only the results but also the presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoyuki Ichihara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported in part by JSPS KAKENHI Grant No. 18K03343.

Appendices

Appendix A: Verification Theorem

In this appendix, we verify (2.4) under our standing assumptions (A1)-(A4). Since this kind of verification arguments are quite standard (e.g., [18]), we give only a sketch of the proof.

We first consider the case where initial function f is real valued.

Proposition A.1

Suppose that (A1)-(A3) hold. Let \(\{v_n\}\) be defined by (2.2) with initial function f being real valued and bounded above in D. Then, (2.4) holds for all \(n\ge 1\) and \(x\in D\).

Proof

Fix any \(\varvec{p}=\{p_j\}\in {{\mathcal {A}}}\), \(n\ge 1\), and \(x\in D\). We may assume without loss of generality that the support of \(p_j(x,\,\cdot \,)\) is included in the support of \(P(x,\,\cdot \,)\) for all \( j\ge 0\) and \(x\in D\). Then, since \(\{y\in V\,|\, P_D^{(n)}(x,y)>0\}\) is finite by (A1), we see from Dynkin’s formula that

$$\begin{aligned} v_n(x)=\mathbf {E}^{\varvec{p}}_x\left[ \sum _{j=0}^{n-1}(v_{n-j}-p_jv_{n-j-1})(X_j){{\mathbf {1}}}_{\{\tau _G>j\}}+v_{n-n\wedge \tau _G}(X_{n\wedge \tau _G})\right] . \end{aligned}$$

Furthermore, since \(v_{n-j}=g\) in G for all \(0\le j\le n\), \(v_0=f\) in D, and

$$\begin{aligned} v_{n-j}(y)\ge p_jv_{n-j-1}(y)-c(y,p_j(y,\,\cdot \,))+r(y) \end{aligned}$$

for all \(y\in D\) and \(0\le j\le n-1\), we have

$$\begin{aligned} v_n(x)&\ge \mathbf {E}^{\varvec{p}}_x\left[ \sum _{j=0}^{n-1}(r(X_j)-c(X_j,\,\cdot \,)){{\mathbf {1}}}_{\{\tau _G>j\}}+f(X_n){{\mathbf {1}}}_{\{n<\tau _G\}}+g(X_{\tau _G}){{\mathbf {1}}}_{\{n\ge \tau _G\}}\right] \\&=J_n(x;\varvec{p}). \end{aligned}$$

Hence, \(v_n(x)\ge \sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p})\). We also notice that the previous inequality holds with equality if we choose a \({\hat{\varvec{p}}}=\{\hat{p}_j \}\in {{\mathcal {A}}}\) such that

$$\begin{aligned} H[v_{n-j-1}](y)= \hat{p}_jv_{n-j-1}(y)-c(y,\hat{p}_j(y,\,\cdot \,)),\quad y\in D,\quad 0\le j\le n-1. \end{aligned}$$

Note that such a \({\hat{\varvec{p}}}\) exists by virtue of Lemma 3.1. Indeed, each \({\hat{p}}_j\), with \(0\le j\le n-1\), is uniquely determined by

$$\begin{aligned} {\hat{p}}_j(x,y):=\frac{P(x,y)e^{\kappa (x)v_{n-j-1}(y)}}{\sum _{z\in V}P(x,z)e^{\kappa (x)v_{n-j-1}(z)}} \end{aligned}$$

for \(x,y\in V\). Hence, we obtain (2.4). \(\square \)

We now consider the case where initial functions satisfy (A4).

Proposition A.2

Suppose that (A1)-(A4) hold. Let \(\{v_n\}\) be defined by (2.2). Then, (2.4) holds for all \(n\ge 1\) and \(x\in D\).

Proof

Let \(\{f^{(k)}\}\) be a sequence of initial functions such that \(f^{(k)}\) is real valued and bounded above in D for all \(k\ge 1\), that \(\{f^{(k)}(x)\}\) is non-increasing in k for all \(x\in D\), and that \(\{f^{(k)}\}\) converges to f in D as \(k\rightarrow \infty \). For each \(k\ge 1\), let \(\{v^{(k)}_n\}\) be the sequence of functions defined by (2.2) with \(v_0=f^{(k)}\). It is clear that \(v^{(k)}_n\ge v^{(k+1)}_n\) in D for all \(n\ge 1\) and \(k\ge 1\). Moreover, for any \(n\ge 1\), \(\{v^{(k)}_n\}\) converges to \(v_n\) in D as \(k\rightarrow \infty \). Indeed, if we set \(v^{(\infty )}_n:=\lim _{k\rightarrow \infty }v^{(k)}_n\), then we observe from Lemma 3.1 and the locally finiteness of P that, for all \(x\in D\) and \(n\ge 0\),

$$\begin{aligned} \lim _{k\rightarrow \infty } H[v^{(k)}_n](x)&=\lim _{k\rightarrow \infty }\frac{1}{\kappa (x)}\log \sum _{y\in V}P(x,y)e^{\kappa (x)v^{(k)}_n(y)}\\&=\frac{1}{\kappa (x)}\log \sum _{y\in V}P(x,y)e^{\kappa (x)v^{(\infty )}_n(y)}=H[v^{(\infty )}_n](x), \end{aligned}$$

where equalities may hold with \(-\infty =-\infty \). Thus, passing to the limit as \(k\rightarrow \infty \) in (2.2) with \(v^{(k)}_n\) in place of \(v_n\), we see that \(\{v^{(\infty )}_n\}\) satisfies (2.2) with \(v_0=f\). This implies that \(v^{(\infty )}_n=v_n\) in D for all \(n\ge 1 \). Hence, \(\{v^{(k)}_n\}\) converges to \(v_n\) in D as \(k\rightarrow \infty \).

Now, let \(J^{(k)}_n(x;\varvec{p})\) be the reward criterion defined by (2.3) with \(f^{(k)}\) instead of f. Then, by Proposition A.1, we have \(v^{(k)}_n(x)=\sup _{\varvec{p}\in {{\mathcal {A}}}}J^{(k)}_n(x;\varvec{p})\) for all \(x\in D\), \(n\ge 1\), and \(k\ge 1\). Noting that \(J^{(k)}_n(x;\varvec{p})\ge J_n(x;\varvec{p})\), we have

$$\begin{aligned} v_n(x)=\lim _{k\rightarrow \infty }v^{(k)}_n(x)=\lim _{k\rightarrow \infty }\sup _{\varvec{p}\in {{\mathcal {A}}}}J^{(k)}_n(x;\varvec{p})\ge \sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p}). \end{aligned}$$

In order to verify the opposite inequality, fix any \(x\in D\) and \(n\ge 1\). Note that \(v_n(x)>-\infty \) if and only if either there exist some \(x_1,\ldots ,x_{n-1},x_n\in D\) with \(x_n\in F\) such that \(P(x_{j-1},x_j)>0\) for all \(1\le j\le n\), or there exist some \(1\le m\le n\), \(x_1,\ldots ,x_{m-1}\in D\), and \(x_m\in G\) such that \(P(x_{j-1},x_j)>0\) for all \(1\le j\le m\), where \(x_0:=x\). In either case, one can find a \({\hat{\varvec{p}}}=\{\hat{p}_j\}\in {{\mathcal {A}}}\) such that

$$\begin{aligned} H[v_{j-n-1}](y)=\hat{p}_jv_{n-j-1}(y)-c(y,\hat{p}_j(y,\,\cdot \,))>-\infty \end{aligned}$$

for all \(0\le j\le n-1\) and \( y\in D\) along a suitable path from x to a point in \(F\cup G\). This implies, as in the proof of Proposition A.1, that \(v_n(x)=J_n(x;{\hat{\varvec{p}}})\). On the other hand, if \(v_n(x)=-\infty \), then \(J_n(x;\varvec{p})=-\infty \) for all \(\varvec{p}\in {{\mathcal {A}}}\). Hence, we obtain \(v_n(x)=\sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p})\). \(\square \)

Appendix B: Solvability of (3.2)

In this appendix we prove the existence and uniqueness of solution \((\lambda _n,w_n)\) to (3.2). We follow the argument of [14, Appendix]. Hereafter, we fix an arbitrary \(n\ge 1\). Recall that \(D_n\) is finite and \(P_n\) is irreducible on \(D_n\).

We first consider the following equation with discount factor \(\alpha \in (0,1]\):

$$\begin{aligned} w=H_n[\alpha w]+r\quad \text {in } D_n. \end{aligned}$$
(B.1)

As usual, we say that w is a supersolution (resp. subsolution) of (B.1) if \(w\ge H_n[\alpha w]+r\) in \(D_n\) (resp. \(w\le H_n[\alpha w]+r\) in \(D_n\)), and that w is a solution of (B.1) if it is both a subsolution and a supersolution of (B.1).

We first observe that the following comparison principle holds.

Theorem B.1

(i) Assume \(\alpha \in (0,1)\). Let \(w_1\) and \(w_2\) be a subsolution and a supersolution of (B.1), respectively. Then \(w_1\le w_2\) in \(D_n\).

(ii) Let \(w_1\) and \(w_2\) be a subsolution and a supersolution of \(w=H_n[w]+r\) in a subset \(A\subset D_n\). Then, \(w_1\le w_2\) in \(D_n\setminus A\) implies \(w_1\le w_2\) in A.

Proof

We first prove (i). Set \(w:=w_2-w_1\), and let \(\bar{q}={\bar{q}} (x,y)\) be the maximizer of \(H_n[\alpha w_1]\). Then, we see that \(\alpha (\bar{q} w)\le w\) in \(D_n\). We now claim that \(w\ge 0\) in \(D_n\). In order to verify it, assume contrarily that \(\min _{D_n}w=w({\bar{x}})<0\) for some \({\bar{x}}\in D_n\). Then, since \(\alpha (\bar{q} w)({\bar{x}})\le w({\bar{x}})<\alpha w({\bar{x}})\), we see that

$$\begin{aligned} 0\le \alpha \sum _{y\in D_n}\bar{q}({\bar{x}},y)(w(y)-w({\bar{x}}))<0, \end{aligned}$$

which is a contradiction. Hence, \(w\ge 0\) in \(D_n\), and we have proved that \(w_1\le w_2\) in \(D_n\).

We next show (ii). We may assume without loss of generality that \(A\ne \emptyset \). Otherwise, there is nothing to prove. Set \(w:=w_2-w_1\) and, as above, we assume that \(\min _Aw=w({\bar{x}})<0\) for some \({\bar{x}}\in A\). Let \(\bar{q}={\bar{q}} (x,y)\) be the maximizer of \(H_n[w_1]\). Note that \(\bar{q}\) is irreducible on \(D_n\). Then, since \(\bar{q}w\le w\) in A, we have

$$\begin{aligned} 0\le \sum _{y\in D_n}\bar{q}({\bar{x}},y)(w(y)-w({\bar{x}}))\le 0. \end{aligned}$$

This implies that \(w(y)=w({\bar{x}})\) for any \(y\in D_n\) such that \(\bar{q}({\bar{x}},y)>0\). Repeating this procedure and noting that \(\bar{q}\) is irreducible on \(D_n\), one can find a pair \(({\bar{y}},{\bar{z}})\) such that \({\bar{y}}\in A\), \({\bar{z}}\in D_n\setminus A\), \(w({\bar{y}})=\min _A w<0\), and \(\bar{q}({\bar{y}},{\bar{z}})>0\). Since \(w\ge w({\bar{y}})\) in \(D_n\), this implies that

$$\begin{aligned} 0<\bar{q}({\bar{y}},{\bar{z}})(w({\bar{z}})-w({\bar{y}}))\le \sum _{z\in D_n}\bar{q}({\bar{y}}, z)(w(z)-w({\bar{y}}))=(\bar{q}w)({\bar{y}})-w({\bar{y}})\le 0, \end{aligned}$$

a contradiction. Hence, \(w\ge 0\) in A, and the proof is complete. \(\square \)

Now, we discuss the solvability of (B.1) for \(\alpha \in (0,1)\). To this end, we set

$$\begin{aligned} d_n(x,y):=\inf \{j\ge 0\,|\, P_n^{(j)}(x,y)>0\} \end{aligned}$$

for \(x,y\in D_n\). Note that, since \(D_n\) is finite and \(P_n\) is irreducible on \(D_n\), there exists an \(M>0\) such that \(x,y\in D_n\) and \(P_n(x,y)>0\) imply \(d_n(y,x)\le M\).

Theorem B.2

Assume that \(\alpha \in (0,1)\). Then there exists a unique solution \(w_\alpha \) of (B.1). Furthermore, for any \(x,y\in D_n\),

$$\begin{aligned}&\min _{D_n}r\le (1-\alpha )w_\alpha (x)\le \max _{D_n}r,\qquad \nonumber \\&\quad \alpha |w_\alpha (y)-w_\alpha (x)|\le Kd_n(x,y), \end{aligned}$$
(B.2)

where K is some constant not depending on \(\alpha \).

Proof

The existence part can be shown by the standard value iteration argument, so that we omit it. Uniqueness follows from Theorem B.1 (i). In order to obtain the estimates (B.2) for the solution \(w_\alpha \) of (B.1), we set \(w_1:=\underline{r}/(1-\alpha )\) and \(w_2:={\overline{r}}/(1-\alpha )\), where \(\underline{r}:=\min _{D_n}r\), \({\overline{r}}:=\max _{D_n}r\). Then, for any \(x\in D_n\),

$$\begin{aligned} w_1(x)-H_n[\alpha w_1](x)-r(x)= & {} \frac{{\underline{r}}}{1-\alpha }-\alpha \frac{{\underline{r}}}{1-\alpha }+\inf _{p\in {{\mathcal {P}}}(D_n)} c_n(x,p)-r(x)\\= & {} {\underline{r}}-r(x)\le 0. \end{aligned}$$

This implies that \(w_1\) is a subsolution of (B.1). Similarly, one can verify that \(w_2\) is a supersolutions of (B.1). Applying Theorem B.1 (i), we have \(w_1\le w_\alpha \le w_2\) in \(D_n\). Hence, the first estimate in (B.2) is valid.

To prove the second estimate, fix any \(x,y\in D_n\) such that \(P_n(x,y)>0\). Then we see from (B.1) that \(w_\alpha (x)\ge \alpha w_\alpha (y)-c_n(x,{{\mathbf {1}}}_{\{y\}})+r(x)\). This and the previous estimate imply that

$$\begin{aligned} \alpha (w_\alpha (y)-w_\alpha (x))\le & {} (1-\alpha ) w_\alpha (x)+c_n(x,{{\mathbf {1}}}_{\{y\}})-r(x)\\\le & {} \frac{1}{\kappa (x)}\log \frac{1}{P_n(x,y)}+{{\overline{r}}}-r(x). \end{aligned}$$

On the other hand, since \(d_n(y,x)\le M\), we can find some \(y_0:=y\), \(y_1,\ldots , y_{m-1}, y_m=:x\in D_n\) with \(m\le M\) such that \(P_n(y_{i-1},y_{i})>0\) for all \(1\le i\le m\). In particular, by using the above estimate repeatedly, we have

$$\begin{aligned} \alpha (w_\alpha (x)-w_\alpha (y))&=\sum _{i=1}^m \alpha (w_\alpha (y_i)-w_\alpha (y_{i-1}))\\&\le \sum _{i=1}^m\left\{ \frac{1}{\kappa (y_{i-1})}\log \frac{1}{P_n(y_{i-1},y_i)}+{{\overline{r}}}-r(y_{i-1})\right\} . \end{aligned}$$

Combining the last two estimates and setting \(y_{m+1}:=x\), we obtain

$$\begin{aligned} \alpha |w_\alpha (x)-w_\alpha (y)|\le \sum _{i=1}^{m+1}\left\{ \frac{1}{\kappa (y_{i-1})}\log \frac{1}{P_n(y_{i-1},y_i)}+{{\overline{r}}}-r(y_{i-1})\right\} =:K. \end{aligned}$$

Hence, the second estimate is valid if \(x,y\in D_n\) satisfy \(P_n(x,y)>0\).

For general \(x,y\in D_n\), we choose a finite sequence \(z_0,z_1,\ldots ,z_N\in A\) with \(N:=d_n(x,y)\) such that \(z_0=x\) and \(z_N=y\). Then, applying the previous estimate repeatedly, we obtain

$$\begin{aligned} \alpha |w_\alpha (y)-w_\alpha (x)|\le \sum _{j=1}^N \alpha |w_\alpha (z_j)-w_\alpha (z_{j-1})|\le \sum _{j=1}^NK=NK=Kd_n(x,y). \end{aligned}$$

Since K clearly does not depend on \(\alpha \), we have completed the proof. \(\square \)

We are in position to prove the solvability of (3.2).

Theorem B.3

There exists a unique solution \((\lambda ,w)\) of (3.2). Moreover, there exists a \(K>0\) such that, for all \(x,y\in D_n\),

$$\begin{aligned} \min _{D_n}r\le \lambda \le \max _{D_n}r,\qquad |w(y)-w(x)|\le Kd_n(x,y). \end{aligned}$$

Proof

We first prove the existence part. Fix any \(\alpha \in (0,1)\). Let \(w_\alpha \) be the unique solution of (B.1). We set \(\lambda _\alpha :=(1-\alpha )w_\alpha (x_0)\) and \({\hat{w}}_\alpha :=w_\alpha -w_\alpha (x_0)\). Then \((\lambda _\alpha ,{\hat{w}}_\alpha )\) enjoys

$$\begin{aligned} \lambda _\alpha +{\hat{w}}_\alpha =H_n[\alpha {\hat{w}}_\alpha ]+r\quad \text {in } D_n,\quad {\hat{w}}_\alpha (x_0)=0. \end{aligned}$$
(B.3)

In view of (B.2), one can find an increasing sequence \(\{\alpha _j\}\) converging to 1 as \(j\rightarrow \infty \) such that \(\lambda _j:=\lambda _{\alpha _j}\rightarrow \lambda \) and \({\hat{w}}_j:={\hat{w}}_{\alpha _j}\rightarrow w\) in \(D_n\) as \(j\rightarrow \infty \) for some \(\lambda \in \mathbf {R}\) and \(w:D_n\rightarrow \mathbf {R}\). It is obvious that \((\lambda ,w)\) enjoys the estimates (B.2) with w in place of \(w_\alpha \). It is also easy to check that \((\lambda ,w)\) solves (3.2) by letting \(j\rightarrow \infty \) in (B.3) with \(\alpha =\alpha _j\). Hence, the existence part has been proved.

We next show the uniqueness part. Let \((\lambda _1,w_1)\) and \((\lambda _2,w_2)\) be a subsolution and a supersolutions of (3.2), respectively. We first show that \(\lambda _1\le \lambda _2\), which leads to the uniqueness of \(\lambda \). Set \(w:=w_2-w_1\), and let \(\bar{q}\) be the maximizer of \(H_n[w_1]\). Note that \({\bar{q}}\) is irreducible on \(D_n\). Then we see that, for all \(x\in D_n\),

$$\begin{aligned} \lambda _2-\lambda _1+w(x)\ge (\bar{q} w)(x). \end{aligned}$$
(B.4)

Since \(D_n\) is finite and \(\bar{q}\) is irreducible on \(D_n\), there exists a unique invariant distribution \(\pi \) on \(D_n\) associated with \(\bar{q}\). Multiplying both sides of (B.4) by \(\pi \) and taking the sums of all \(x\in D_n\), we have

$$\begin{aligned} \lambda _2-\lambda _1+\pi w\ge \pi \bar{q} w=\pi w\quad \text {in }D_n, \end{aligned}$$

where \(\pi w:=\sum _{x\in D_n }\pi (x)w(x)\). Since \(\pi w\) is finite, we obtain \(\lambda _2\ge \lambda _1\).

We next show that \(w_1=w_2\) in \(D_n\). By setting \(\lambda _1=\lambda _2\) in (B.4), we observe that \(\bar{q} w\le w\) in \(D_n\). We now choose an \({\bar{x}}\in D_n\) such that \(w({\bar{x}})=\min _{D_n}w\). Then,

$$\begin{aligned} 0\le \sum _{y\in D_n}{\bar{q}}({\bar{x}},y)(w(y)-w({\bar{x}}))=\bar{q} w({\bar{x}}) -w({\bar{x}})\le 0. \end{aligned}$$

This implies as in the proof of Proposition B.1 (ii) that w is constant in \(D_n\). Since \(w(x_0)=0\), we obtain \(w=0\) in \(D_n\), that is, \(w_1=w_2\) in \(D_n\). Hence, we have completed the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ichihara, N. Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints. Appl Math Optim 84, 2177–2220 (2021). https://doi.org/10.1007/s00245-020-09707-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00245-020-09707-x

Keywords

Mathematics Subject Classification

Navigation