Skip to main content
Log in

TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control

  • OPTIMAL CONTROL
  • Published:
Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

Abstract

The problem of general non-linear stochastic optimal control with small Wiener noise is studied. The problem is approximated by a Markov Decision Process. Bellman Equation is solved using Value Iteration (VI) algorithm in the low rank Tensor Train format (TT-VI). In this paper a modification of the TT-VI algorithm called TT-Q-Iteration (TT-QI) is proposed by authors. In it, the nonlinear Bellman Optimality Operator is iteratively applied to the solution as a composition of internal Tensor Train algebraic operations and TT-CROSS algorithm. We show that it has lower asymptotic complexity per iteration than the method existing in the literature, provided that TT-ranks of transition probabilities are small. In test examples of an underpowered inverted pendulum and Dubins cars our method shows up to 3–10 times faster convergence in terms of wall clock time compared with the original method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

REFERENCES

  1. H. Kushner and P. G. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time (Springer, New York, 2013).

    MATH  Google Scholar 

  2. W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions (Springer, New York, 2006).

    MATH  Google Scholar 

  3. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming (Athena Scientific, 1996).

    MATH  Google Scholar 

  4. T. P. Lillicrap, et al., “Continuous control with deep reinforcement learning,” 4th International Conference on Learning Representations, ICLR (2016).

  5. Ł. Kidziński, et al., “Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments,” Proceedings of NIPS (2017).

  6. M. Horowitz, A. Damle, and J. W. Burdick, “Linear Hamilton–Jacobi–Bellman equations in high dimensions,” 53rd IEEE Conference on Decision and Control (2014).

  7. A. A. Gorodetsky, S. Karaman, and Y. M. Marzouk, “Efficient high-dimensional stochastic optimal motion control using tensor train decomposition,” 11th Conference on Robotics: Science and Systems, Rome, Italy (2015).

  8. A. A. Gorodetsky, S. Karaman, and Y. M. Marzouk, “High-dimensional stochastic optimal control using continuous tensor decompositions,” Int. J. Rob. Res. 37 (2–3), 340–377 (2018).

    Article  Google Scholar 

  9. E. Tal, A. Gorodetsky, and S. Karaman, “Continuous tensor train-based dynamic programming for high-dimensional zero-sum differential games,” 2018 Annual American Control Conference (2018), pp. 6086–6093.

  10. I. V. Oseledets and E. E. Tyrtyshnikov, “Breaking the curse of dimensionality, or how to use SVD in many dimensions,” SIAM J. Sci. Comput. 31 (5), 3744–3759 (2009).

    Article  MathSciNet  Google Scholar 

  11. I. V. Oseledets and E. E. Tyrtyshnikov, “TT-cross approximation for multidimensional arrays,” Linear Algebra Appl. 432 (1), 70–88 (2010).

    Article  MathSciNet  Google Scholar 

  12. R. E. Bellman, Dynamic Programming (Princeton Univ. Press, Princeton, 1957).

    MATH  Google Scholar 

  13. A. Rossler, “Runge–Kutta Methods for the strong approximation of solutions of stochastic differential equations,” SIAM J. Numer. Anal. 48 (3), 922–952 (2010).

    Article  MathSciNet  Google Scholar 

Download references

ACKNOWLEDGMENTS

Authors are grateful to Dr. S. Dolgov (University of Bath), Dr. S. Matveev (INM RAS, Skoltech) and Dr. G. Ovchinnikov (Skoltech) for valuable discussions on the topic.

Funding

This research was partially supported by the grant of Ministry of Education and Science of Russian Federation (14.756.31.0001).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. I. Boyko or I. V. Oseledets.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boyko, A.I., Oseledets, I.V. & Ferrer, G. TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control. Comput. Math. and Math. Phys. 61, 836–846 (2021). https://doi.org/10.1134/S0965542521050043

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0965542521050043

Keywords:

Navigation