A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

Yang, Insoon

doi:10.1007/s10957-020-01747-1

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

Published: 14 September 2020

Volume 187, pages 133–157, (2020)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Insoon Yang ORCID: orcid.org/0000-0001-5887-6169¹

904 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bilevel program, in which the inner problem is a linear program, without losing the uniform convergence properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CasADi: a software framework for nonlinear optimization and optimal control

Article 11 July 2018

Joel A. E. Andersson, Joris Gillis, … Moritz Diehl

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

Yu Wang, Wotao Yin & Jinshan Zeng

Optimality conditions and sensitivity analysis in parametric nonconvex minimax programming

Article 08 April 2024

D. T. V. An, N. H. Hung, … N. V. Tuyen

Notes

However, our method is suitable for problems with high-dimensional action spaces.
More precisely, the set ${\mathcal {U}}({\varvec{x}})$ needs to be represented by convex inequalities, i.e., there exist functions $a_k: {\mathcal {X}} \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}$ and $b_k: {\mathcal {X}} \rightarrow {\mathbb {R}}$ such that
$$\begin{aligned} {\mathcal {U}}({\varvec{x}}) := \{ {\varvec{u}} \in {\mathbb {R}}^m : a_k ({\varvec{x}}, {\varvec{u}}) \le b_k({\varvec{x}}), k=1, \ldots , N_{ineq}\}, \end{aligned}$$
where ${\varvec{u}} \mapsto a_k ({\varvec{x}}, {\varvec{u}})$ is a convex function for each fixed ${\varvec{x}} \in {\mathcal {X}}$ and each k.
Note that the convexity of v is unused in the second part of the proof of Proposition 3.1. Thus, it is valid in the nonconvex case.
The matrix B used in our experiments can be downloaded from the following link: http://coregroup.snu.ac.kr/DB/B1000.mat.
The CPU time increases superlinearly with the number of grid points. This is because the size of the optimization problem (5) also increases with the grid size. Note that the problem size is invariant when using the bi-level method in Sect. 4.2. Thus, in that case the CPU time scales linearly as shown in Table 3.
The observation of the second-order empirical convergence rate is consistent with our theoretical result since Theorem 3.1 only suggests that the suboptimality gap decreases with the first-order rate. Thus, the actual convergence rate can be higher than the convergence rate for the suboptimality gap.
To compute the optimal value function, we used the method in Sect. 4.2 discretizing the action space with 1001 equally spacing grid points.
The forward reachable set can be over-approximated in an analytical way, particularly when a loose approximation is allowed. For a high quality of approximation, one may use advanced computational techniques with semidefinite approximation [37] and ellipsoidal approximation [38], among others.

References

Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
MATH Google Scholar
Kushner, H., Dupuis, P.G.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, New York (2013)
MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (2012)
MATH Google Scholar
Savorgnan, C., Lasserre, J.B., Diehl, M.: Discrete-time stochastic optimal control via occupation measures and moment relaxations. In: Proceedings of Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference (2009)
Dufour, F., Prieto-Rumeau, T.: Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optim. 51(2), 1298–1324 (2013)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
MATH Google Scholar
Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific, Belmont (2019)
Google Scholar
Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, San Rafael (2010)
Book MATH Google Scholar
Bertsekas, D.P.: Convergence of discretization procedures in dynamic programming. IEEE Trans. Autom. Control 20(3), 415–419 (1975)
Article MathSciNet MATH Google Scholar
Langen, H.-J.: Convergence of dynamic programming models. Math. Oper. Res. 6(4), 493–512 (1981)
Article MathSciNet MATH Google Scholar
Whitt, W.: Approximations of dynamic programs, I. Math. Oper. Res. 3(3), 231–243 (1978)
Article MathSciNet MATH Google Scholar
Hinderer, K.: On approximate solutions of finite-stage dynamic programs. In: Puterman, M.L. (ed.) Dynamic Programming and Its Applications, pp. 289–317. Academic Press, New York (1978)
Chapter Google Scholar
Chow, C.-S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans. Autom. Control 36(8), 898–914 (1991)
Article MathSciNet MATH Google Scholar
Dufour, F., Prieto-Rumeau, T.: Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 1254–1267 (2012)
Article MathSciNet MATH Google Scholar
Dufour, F., Prieto-Rumeau, T.: Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochast. Int. J. Probab. Stochast. Process. 87(2), 273–307 (2015)
Article MathSciNet MATH Google Scholar
Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)
Article MathSciNet MATH Google Scholar
Hernández-Lerma, O.: Discretization procedures for adaptive Markov control processes. J. Math. Anal. Appl. 137, 485–514 (1989)
Article MathSciNet MATH Google Scholar
Johnson, S.A., Stedinger, J.R., Shoemaker, C.A., Li, Y., Tejada-Guibert, J.A.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41(3), 484–500 (1993)
Article MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (2012)
MATH Google Scholar
Rust, J.: Using randomization to break the curse of dimensionality. Econometrica 65(3), 487–516 (1997)
Article MathSciNet MATH Google Scholar
Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. J. Mach. Learn. Res. 1, 815–857 (2008)
MathSciNet MATH Google Scholar
Haskell, W.B., Jain, R., Sharma, H., Yu, P.: A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Trans. Autom. Control 65(1), 115–129 (2020)
Article MathSciNet MATH Google Scholar
Jang, S., Yang, I.: Stochastic subgradient methods for dynamic programming in continuous state and action spaces. In: Proceedings of the 58th IEEE Conference on Decision and Control, pp. 7287–7293 (2019)
Nesterov, Y.: Lectures on Convex Optimization, 2nd edn. Springer, Cham (2018)
Book MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Bellmont (2015)
MATH Google Scholar
Falcone, M., Ferretti, R.: Semi-Lagrangian Approximation Schemes for Linear and Hamilton-Jacobi Equations. SIAM, Philadelphia (2013)
Book MATH Google Scholar
Alla, A., Falcone, M., Saluzzi, L.: An efficient DP algorithm on a tree-structure for finite horizon optimal control problems. SIAM J. Sci. Comput. 41(4), A2384–A2406 (2019)
Article MathSciNet MATH Google Scholar
Picarelli, A., Reisinger, C.: Probabilistic error analysis for some approximation schemes to optimal control problems. Syst. Control Lett. 137, 104619 (2020)
Article MathSciNet MATH Google Scholar
Yang, I.: A convex optimization approach to dynamic programming in continuous state and action spaces. arXiv preprint arXiv:1810.03847 (2018)
Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44, 2724–2734 (2008)
Article MathSciNet MATH Google Scholar
Yang, I.: A dynamic game approach to distributionally robust safety specifications for stochastic systems. Automatica 94, 94–101 (2018)
Article MathSciNet MATH Google Scholar
Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press, Princeton (1998)
MATH Google Scholar
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)
Google Scholar
Sethi, S.P., Thompson, G.L.: Optimal Control Theory: Applications to Management Science and Economics. Springer, New York (2000)
MATH Google Scholar
Dubins, L.E.: On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Am. J. Math. 79(3), 497–516 (1957)
Article MathSciNet MATH Google Scholar
Magron, V., Garoche, P.-L., Henrion, D., Thirioux, X.: Semidefinite approximations of reachable sets for discrete-time polynomial systems. SIAM J. Control Optim. 57(4), 2799–2820 (2019)
Article MathSciNet MATH Google Scholar
Kurzhanskiy, A.A., Varaiya, P.: Reach set computation and control synthesis for discrete-time dynamical systems with disturbances. Automatica 47, 1414–1426 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the Creative-Pioneering Researchers Program through SNU, the National Research Foundation of Korea funded by the MSIT (2020R1C1C1009766), and Samsung Electronics.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Automation Systems Research Institute, Seoul National University, Seoul, South Korea
Insoon Yang

Authors

Insoon Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Insoon Yang.

Additional information

Communicated by Lars Grüne.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: State Space Discretization Using a Rectilinear Grid

In this appendix, we provide a concrete way to discretize the state space using a rectilinear grid. The construction below satisfies all the conditions in Sect. 2.3.

1.
Choose a convex compact set ${\mathcal {Z}}_0 := [\underline{{\varvec{x}}}_{0, 1}, \overline{{\varvec{x}}}_{0, 1}] \times [\underline{{\varvec{x}}}_{0, 2}, \overline{{\varvec{x}}}_{0, 2}] \times \cdots \times [\underline{{\varvec{x}}}_{0, n}, \overline{{\varvec{x}}}_{0, n}]$, and discretize it using an n-dimensional rectilinear grid. Set $t \leftarrow 0$.
2.
Compute (or over-approximate) the forward reachable set^{Footnote 8}
$$\begin{aligned} R_{t} := \big \{ f({\varvec{x}}, {\varvec{u}}, {\varvec{\xi }}) : {\varvec{x}} \in {\mathcal {Z}}_{t}, {\varvec{u}} \in {\mathcal {U}}({\varvec{x}}), {\varvec{\xi }} \in \varXi \big \}. \end{aligned}$$
3.
Choose a convex compact set ${\mathcal {Z}}_{t+1} := [\underline{{\varvec{x}}}_{t+1, 1}, \overline{{\varvec{x}}}_{t+1, 1}] \times [\underline{{\varvec{x}}}_{t+1, 2}, \overline{{\varvec{x}}}_{t+1, 2}] \times \cdots \times [\underline{{\varvec{x}}}_{t+1, n}, \overline{{\varvec{x}}}_{t+1, n}]$ such that $R_t \subseteq {\mathcal {Z}}_{t+1}$.
4.
Expand the rectilinear grid to fit ${\mathcal {Z}}_{t+1}$.
5.
Stop if $t+1 = K$; otherwise, set $t \leftarrow t+1$ and go to Step 2.

We can then choose ${\mathcal {C}}_i$ as each grid cell. We label ${\mathcal {C}}_i$ so that $\bigcup _{i=1}^{N_{{\mathcal {C}}, t}} {\mathcal {C}}_i = {\mathcal {Z}}_t$ for all t. A two-dimensional example is shown in Fig. 1. This construction approach was used in Sects. 5.1 and 5.3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, I. A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces. J Optim Theory Appl 187, 133–157 (2020). https://doi.org/10.1007/s10957-020-01747-1

Download citation

Received: 15 January 2020
Accepted: 03 September 2020
Published: 14 September 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s10957-020-01747-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

Abstract

Access this article

Similar content being viewed by others

CasADi: a software framework for nonlinear optimization and optimal control

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Optimality conditions and sensitivity analysis in parametric nonconvex minimax programming

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: State Space Discretization Using a Rectilinear Grid

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

Abstract

Access this article

Similar content being viewed by others

CasADi: a software framework for nonlinear optimization and optimal control

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Optimality conditions and sensitivity analysis in parametric nonconvex minimax programming

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: State Space Discretization Using a Rectilinear Grid

Appendix: State Space Discretization Using a Rectilinear Grid

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation