Abstract
This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value \(\lambda _*\), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if \(\lambda _*<0\), then the value function converges to a solution to the corresponding stationary equation; (ii) if \(\lambda _*>0\), then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if \(\lambda _*=0\), then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.
Similar content being viewed by others
References
Arapostathis, A., Borkar, V.S., Fernández-Gaucherand, E., Ghosh, M.K., Marcus, S.I.: Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)
Balaji, S., Meyn, S.P.: Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stoch. Processes Appl. 90, 123–144 (2000)
Barles, G., Porretta, A., Tchamba, T.: On the large time behavior of solutions of the Dirichlet problem for subquadratic viscous Hamilton–Jacobi equations. J. Math. Pures Appl. 94, 497–519 (2010)
Borkar, V.S., Meyn, S.P.: Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27, 192–209 (2002)
Brémaud, P.: Markov chains, Gibbs fields, Monte Carlo simulation, and queues. Texts in Applied Mathematics, vol. 31. Springer, New York (1999)
Cavazos-Cadena, R., Hernández-Hernández, D.: A system of Poisson equations for a nonconstant Varadhan functional on a finite state space. Appl. Math. Optim. 53, 101–119 (2006)
Cavazos-Cadena, R., Hernández-Hernández, D.: Necessary and sufficient conditions for a solution to the risk-sensitive Poisson equation on a finite state space. Syst. Control Lett. 58, 254–258 (2009)
Cavazos-Cadena, R., Hernández-Hernández, D.: Local Poisson equations associated with the Varadhan functional. Asymptot. Anal. 96, 23–50 (2015)
Cranston, M., Molchanov, S.: On phase transitions and limit theorems for homopolymers. Probability and mathematical physics. In: CRM Proceedings on Lecture Notes, 42, pp. 97–112. American Mathematical Society, Providence, RI (2007)
Fleming, W.H., Hernández-Hernández, D.: Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J. Control Optim. 35, 1790–1810 (1997)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-time Markov control processes: basic optimality criteria. Springer, New York (1996)
Hinderer, K., Waldmann, K.-H.: The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Methods Oper. Res. 57, 1–19 (2003)
Hinderer, K., Waldmann, K.-H.: Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control Optim. 43, 2109–2131 (2005)
Ichihara, N.: Phase transitions for controlled Markov chains on infinite graphs. SIAM J. Control Optim. 54, 450–474 (2016)
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87, 159–182 (2012)
Kontoyiannis, I., Meyn, S.P.: Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13, 304–362 (2003)
Patek, S.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37, 1379–1386 (2001)
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Seneta, E.: Non-negative Matrices and Markov Chains. Springer Series in Statistics, 2nd edn. Springer, Berlin (1980)
Tchamba, T.: Large time behavior of solutions of viscous Hamilton–Jacobi equations with superquadratic Hamiltonian. Asymptot. Anal. 66, 161–186 (2010)
Todorov, E.: Linearly-solvable Markov decision problems. Adv. Neural Inf. Proc. Syst. 19, 1369–1376 (2006)
Acknowledgements
The author would like to thank two anonymous referees for their valuable comments and criticisms that allowed him to improve not only the results but also the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported in part by JSPS KAKENHI Grant No. 18K03343.
Appendices
Appendix A: Verification Theorem
In this appendix, we verify (2.4) under our standing assumptions (A1)-(A4). Since this kind of verification arguments are quite standard (e.g., [18]), we give only a sketch of the proof.
We first consider the case where initial function f is real valued.
Proposition A.1
Suppose that (A1)-(A3) hold. Let \(\{v_n\}\) be defined by (2.2) with initial function f being real valued and bounded above in D. Then, (2.4) holds for all \(n\ge 1\) and \(x\in D\).
Proof
Fix any \(\varvec{p}=\{p_j\}\in {{\mathcal {A}}}\), \(n\ge 1\), and \(x\in D\). We may assume without loss of generality that the support of \(p_j(x,\,\cdot \,)\) is included in the support of \(P(x,\,\cdot \,)\) for all \( j\ge 0\) and \(x\in D\). Then, since \(\{y\in V\,|\, P_D^{(n)}(x,y)>0\}\) is finite by (A1), we see from Dynkin’s formula that
Furthermore, since \(v_{n-j}=g\) in G for all \(0\le j\le n\), \(v_0=f\) in D, and
for all \(y\in D\) and \(0\le j\le n-1\), we have
Hence, \(v_n(x)\ge \sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p})\). We also notice that the previous inequality holds with equality if we choose a \({\hat{\varvec{p}}}=\{\hat{p}_j \}\in {{\mathcal {A}}}\) such that
Note that such a \({\hat{\varvec{p}}}\) exists by virtue of Lemma 3.1. Indeed, each \({\hat{p}}_j\), with \(0\le j\le n-1\), is uniquely determined by
for \(x,y\in V\). Hence, we obtain (2.4). \(\square \)
We now consider the case where initial functions satisfy (A4).
Proposition A.2
Suppose that (A1)-(A4) hold. Let \(\{v_n\}\) be defined by (2.2). Then, (2.4) holds for all \(n\ge 1\) and \(x\in D\).
Proof
Let \(\{f^{(k)}\}\) be a sequence of initial functions such that \(f^{(k)}\) is real valued and bounded above in D for all \(k\ge 1\), that \(\{f^{(k)}(x)\}\) is non-increasing in k for all \(x\in D\), and that \(\{f^{(k)}\}\) converges to f in D as \(k\rightarrow \infty \). For each \(k\ge 1\), let \(\{v^{(k)}_n\}\) be the sequence of functions defined by (2.2) with \(v_0=f^{(k)}\). It is clear that \(v^{(k)}_n\ge v^{(k+1)}_n\) in D for all \(n\ge 1\) and \(k\ge 1\). Moreover, for any \(n\ge 1\), \(\{v^{(k)}_n\}\) converges to \(v_n\) in D as \(k\rightarrow \infty \). Indeed, if we set \(v^{(\infty )}_n:=\lim _{k\rightarrow \infty }v^{(k)}_n\), then we observe from Lemma 3.1 and the locally finiteness of P that, for all \(x\in D\) and \(n\ge 0\),
where equalities may hold with \(-\infty =-\infty \). Thus, passing to the limit as \(k\rightarrow \infty \) in (2.2) with \(v^{(k)}_n\) in place of \(v_n\), we see that \(\{v^{(\infty )}_n\}\) satisfies (2.2) with \(v_0=f\). This implies that \(v^{(\infty )}_n=v_n\) in D for all \(n\ge 1 \). Hence, \(\{v^{(k)}_n\}\) converges to \(v_n\) in D as \(k\rightarrow \infty \).
Now, let \(J^{(k)}_n(x;\varvec{p})\) be the reward criterion defined by (2.3) with \(f^{(k)}\) instead of f. Then, by Proposition A.1, we have \(v^{(k)}_n(x)=\sup _{\varvec{p}\in {{\mathcal {A}}}}J^{(k)}_n(x;\varvec{p})\) for all \(x\in D\), \(n\ge 1\), and \(k\ge 1\). Noting that \(J^{(k)}_n(x;\varvec{p})\ge J_n(x;\varvec{p})\), we have
In order to verify the opposite inequality, fix any \(x\in D\) and \(n\ge 1\). Note that \(v_n(x)>-\infty \) if and only if either there exist some \(x_1,\ldots ,x_{n-1},x_n\in D\) with \(x_n\in F\) such that \(P(x_{j-1},x_j)>0\) for all \(1\le j\le n\), or there exist some \(1\le m\le n\), \(x_1,\ldots ,x_{m-1}\in D\), and \(x_m\in G\) such that \(P(x_{j-1},x_j)>0\) for all \(1\le j\le m\), where \(x_0:=x\). In either case, one can find a \({\hat{\varvec{p}}}=\{\hat{p}_j\}\in {{\mathcal {A}}}\) such that
for all \(0\le j\le n-1\) and \( y\in D\) along a suitable path from x to a point in \(F\cup G\). This implies, as in the proof of Proposition A.1, that \(v_n(x)=J_n(x;{\hat{\varvec{p}}})\). On the other hand, if \(v_n(x)=-\infty \), then \(J_n(x;\varvec{p})=-\infty \) for all \(\varvec{p}\in {{\mathcal {A}}}\). Hence, we obtain \(v_n(x)=\sup _{\varvec{p}\in {{\mathcal {A}}}}J_n(x;\varvec{p})\). \(\square \)
Appendix B: Solvability of (3.2)
In this appendix we prove the existence and uniqueness of solution \((\lambda _n,w_n)\) to (3.2). We follow the argument of [14, Appendix]. Hereafter, we fix an arbitrary \(n\ge 1\). Recall that \(D_n\) is finite and \(P_n\) is irreducible on \(D_n\).
We first consider the following equation with discount factor \(\alpha \in (0,1]\):
As usual, we say that w is a supersolution (resp. subsolution) of (B.1) if \(w\ge H_n[\alpha w]+r\) in \(D_n\) (resp. \(w\le H_n[\alpha w]+r\) in \(D_n\)), and that w is a solution of (B.1) if it is both a subsolution and a supersolution of (B.1).
We first observe that the following comparison principle holds.
Theorem B.1
(i) Assume \(\alpha \in (0,1)\). Let \(w_1\) and \(w_2\) be a subsolution and a supersolution of (B.1), respectively. Then \(w_1\le w_2\) in \(D_n\).
(ii) Let \(w_1\) and \(w_2\) be a subsolution and a supersolution of \(w=H_n[w]+r\) in a subset \(A\subset D_n\). Then, \(w_1\le w_2\) in \(D_n\setminus A\) implies \(w_1\le w_2\) in A.
Proof
We first prove (i). Set \(w:=w_2-w_1\), and let \(\bar{q}={\bar{q}} (x,y)\) be the maximizer of \(H_n[\alpha w_1]\). Then, we see that \(\alpha (\bar{q} w)\le w\) in \(D_n\). We now claim that \(w\ge 0\) in \(D_n\). In order to verify it, assume contrarily that \(\min _{D_n}w=w({\bar{x}})<0\) for some \({\bar{x}}\in D_n\). Then, since \(\alpha (\bar{q} w)({\bar{x}})\le w({\bar{x}})<\alpha w({\bar{x}})\), we see that
which is a contradiction. Hence, \(w\ge 0\) in \(D_n\), and we have proved that \(w_1\le w_2\) in \(D_n\).
We next show (ii). We may assume without loss of generality that \(A\ne \emptyset \). Otherwise, there is nothing to prove. Set \(w:=w_2-w_1\) and, as above, we assume that \(\min _Aw=w({\bar{x}})<0\) for some \({\bar{x}}\in A\). Let \(\bar{q}={\bar{q}} (x,y)\) be the maximizer of \(H_n[w_1]\). Note that \(\bar{q}\) is irreducible on \(D_n\). Then, since \(\bar{q}w\le w\) in A, we have
This implies that \(w(y)=w({\bar{x}})\) for any \(y\in D_n\) such that \(\bar{q}({\bar{x}},y)>0\). Repeating this procedure and noting that \(\bar{q}\) is irreducible on \(D_n\), one can find a pair \(({\bar{y}},{\bar{z}})\) such that \({\bar{y}}\in A\), \({\bar{z}}\in D_n\setminus A\), \(w({\bar{y}})=\min _A w<0\), and \(\bar{q}({\bar{y}},{\bar{z}})>0\). Since \(w\ge w({\bar{y}})\) in \(D_n\), this implies that
a contradiction. Hence, \(w\ge 0\) in A, and the proof is complete. \(\square \)
Now, we discuss the solvability of (B.1) for \(\alpha \in (0,1)\). To this end, we set
for \(x,y\in D_n\). Note that, since \(D_n\) is finite and \(P_n\) is irreducible on \(D_n\), there exists an \(M>0\) such that \(x,y\in D_n\) and \(P_n(x,y)>0\) imply \(d_n(y,x)\le M\).
Theorem B.2
Assume that \(\alpha \in (0,1)\). Then there exists a unique solution \(w_\alpha \) of (B.1). Furthermore, for any \(x,y\in D_n\),
where K is some constant not depending on \(\alpha \).
Proof
The existence part can be shown by the standard value iteration argument, so that we omit it. Uniqueness follows from Theorem B.1 (i). In order to obtain the estimates (B.2) for the solution \(w_\alpha \) of (B.1), we set \(w_1:=\underline{r}/(1-\alpha )\) and \(w_2:={\overline{r}}/(1-\alpha )\), where \(\underline{r}:=\min _{D_n}r\), \({\overline{r}}:=\max _{D_n}r\). Then, for any \(x\in D_n\),
This implies that \(w_1\) is a subsolution of (B.1). Similarly, one can verify that \(w_2\) is a supersolutions of (B.1). Applying Theorem B.1 (i), we have \(w_1\le w_\alpha \le w_2\) in \(D_n\). Hence, the first estimate in (B.2) is valid.
To prove the second estimate, fix any \(x,y\in D_n\) such that \(P_n(x,y)>0\). Then we see from (B.1) that \(w_\alpha (x)\ge \alpha w_\alpha (y)-c_n(x,{{\mathbf {1}}}_{\{y\}})+r(x)\). This and the previous estimate imply that
On the other hand, since \(d_n(y,x)\le M\), we can find some \(y_0:=y\), \(y_1,\ldots , y_{m-1}, y_m=:x\in D_n\) with \(m\le M\) such that \(P_n(y_{i-1},y_{i})>0\) for all \(1\le i\le m\). In particular, by using the above estimate repeatedly, we have
Combining the last two estimates and setting \(y_{m+1}:=x\), we obtain
Hence, the second estimate is valid if \(x,y\in D_n\) satisfy \(P_n(x,y)>0\).
For general \(x,y\in D_n\), we choose a finite sequence \(z_0,z_1,\ldots ,z_N\in A\) with \(N:=d_n(x,y)\) such that \(z_0=x\) and \(z_N=y\). Then, applying the previous estimate repeatedly, we obtain
Since K clearly does not depend on \(\alpha \), we have completed the proof. \(\square \)
We are in position to prove the solvability of (3.2).
Theorem B.3
There exists a unique solution \((\lambda ,w)\) of (3.2). Moreover, there exists a \(K>0\) such that, for all \(x,y\in D_n\),
Proof
We first prove the existence part. Fix any \(\alpha \in (0,1)\). Let \(w_\alpha \) be the unique solution of (B.1). We set \(\lambda _\alpha :=(1-\alpha )w_\alpha (x_0)\) and \({\hat{w}}_\alpha :=w_\alpha -w_\alpha (x_0)\). Then \((\lambda _\alpha ,{\hat{w}}_\alpha )\) enjoys
In view of (B.2), one can find an increasing sequence \(\{\alpha _j\}\) converging to 1 as \(j\rightarrow \infty \) such that \(\lambda _j:=\lambda _{\alpha _j}\rightarrow \lambda \) and \({\hat{w}}_j:={\hat{w}}_{\alpha _j}\rightarrow w\) in \(D_n\) as \(j\rightarrow \infty \) for some \(\lambda \in \mathbf {R}\) and \(w:D_n\rightarrow \mathbf {R}\). It is obvious that \((\lambda ,w)\) enjoys the estimates (B.2) with w in place of \(w_\alpha \). It is also easy to check that \((\lambda ,w)\) solves (3.2) by letting \(j\rightarrow \infty \) in (B.3) with \(\alpha =\alpha _j\). Hence, the existence part has been proved.
We next show the uniqueness part. Let \((\lambda _1,w_1)\) and \((\lambda _2,w_2)\) be a subsolution and a supersolutions of (3.2), respectively. We first show that \(\lambda _1\le \lambda _2\), which leads to the uniqueness of \(\lambda \). Set \(w:=w_2-w_1\), and let \(\bar{q}\) be the maximizer of \(H_n[w_1]\). Note that \({\bar{q}}\) is irreducible on \(D_n\). Then we see that, for all \(x\in D_n\),
Since \(D_n\) is finite and \(\bar{q}\) is irreducible on \(D_n\), there exists a unique invariant distribution \(\pi \) on \(D_n\) associated with \(\bar{q}\). Multiplying both sides of (B.4) by \(\pi \) and taking the sums of all \(x\in D_n\), we have
where \(\pi w:=\sum _{x\in D_n }\pi (x)w(x)\). Since \(\pi w\) is finite, we obtain \(\lambda _2\ge \lambda _1\).
We next show that \(w_1=w_2\) in \(D_n\). By setting \(\lambda _1=\lambda _2\) in (B.4), we observe that \(\bar{q} w\le w\) in \(D_n\). We now choose an \({\bar{x}}\in D_n\) such that \(w({\bar{x}})=\min _{D_n}w\). Then,
This implies as in the proof of Proposition B.1 (ii) that w is constant in \(D_n\). Since \(w(x_0)=0\), we obtain \(w=0\) in \(D_n\), that is, \(w_1=w_2\) in \(D_n\). Hence, we have completed the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Ichihara, N. Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints. Appl Math Optim 84, 2177–2220 (2021). https://doi.org/10.1007/s00245-020-09707-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-020-09707-x
Keywords
- Stochastic control
- Markov decision process
- Value function
- Generalized principal eigenvalue
- Bellman equation