TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control

Boyko, A. I.; Oseledets, I. V.; Ferrer, G.

doi:10.1134/S0965542521050043

TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control

OPTIMAL CONTROL
Published: 01 July 2021

Volume 61, pages 836–846, (2021)
Cite this article

Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

A. I. Boyko¹,
I. V. Oseledets^1,2 &
G. Ferrer¹

263 Accesses
1 Citation
Explore all metrics

Abstract

The problem of general non-linear stochastic optimal control with small Wiener noise is studied. The problem is approximated by a Markov Decision Process. Bellman Equation is solved using Value Iteration (VI) algorithm in the low rank Tensor Train format (TT-VI). In this paper a modification of the TT-VI algorithm called TT-Q-Iteration (TT-QI) is proposed by authors. In it, the nonlinear Bellman Optimality Operator is iteratively applied to the solution as a composition of internal Tensor Train algebraic operations and TT-CROSS algorithm. We show that it has lower asymptotic complexity per iteration than the method existing in the literature, provided that TT-ranks of transition probabilities are small. In test examples of an underpowered inverted pendulum and Dubins cars our method shows up to 3–10 times faster convergence in terms of wall clock time compared with the original method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A control-theoretic perspective on optimal high-order optimization

Article Open access 22 October 2021

Stochastic near-optimal control: additive, multiplicative, non-Markovian and applications

Article 06 June 2021

Recent Results in the Approximation of Nonlinear Optimal Control Problems

REFERENCES

H. Kushner and P. G. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time (Springer, New York, 2013).
MATH Google Scholar
W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions (Springer, New York, 2006).
MATH Google Scholar
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming (Athena Scientific, 1996).
MATH Google Scholar
T. P. Lillicrap, et al., “Continuous control with deep reinforcement learning,” 4th International Conference on Learning Representations, ICLR (2016).
Ł. Kidziński, et al., “Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments,” Proceedings of NIPS (2017).
M. Horowitz, A. Damle, and J. W. Burdick, “Linear Hamilton–Jacobi–Bellman equations in high dimensions,” 53rd IEEE Conference on Decision and Control (2014).
A. A. Gorodetsky, S. Karaman, and Y. M. Marzouk, “Efficient high-dimensional stochastic optimal motion control using tensor train decomposition,” 11th Conference on Robotics: Science and Systems, Rome, Italy (2015).
A. A. Gorodetsky, S. Karaman, and Y. M. Marzouk, “High-dimensional stochastic optimal control using continuous tensor decompositions,” Int. J. Rob. Res. 37 (2–3), 340–377 (2018).
Article Google Scholar
E. Tal, A. Gorodetsky, and S. Karaman, “Continuous tensor train-based dynamic programming for high-dimensional zero-sum differential games,” 2018 Annual American Control Conference (2018), pp. 6086–6093.
I. V. Oseledets and E. E. Tyrtyshnikov, “Breaking the curse of dimensionality, or how to use SVD in many dimensions,” SIAM J. Sci. Comput. 31 (5), 3744–3759 (2009).
Article MathSciNet Google Scholar
I. V. Oseledets and E. E. Tyrtyshnikov, “TT-cross approximation for multidimensional arrays,” Linear Algebra Appl. 432 (1), 70–88 (2010).
Article MathSciNet Google Scholar
R. E. Bellman, Dynamic Programming (Princeton Univ. Press, Princeton, 1957).
MATH Google Scholar
A. Rossler, “Runge–Kutta Methods for the strong approximation of solutions of stochastic differential equations,” SIAM J. Numer. Anal. 48 (3), 922–952 (2010).
Article MathSciNet Google Scholar

Download references

ACKNOWLEDGMENTS

Authors are grateful to Dr. S. Dolgov (University of Bath), Dr. S. Matveev (INM RAS, Skoltech) and Dr. G. Ovchinnikov (Skoltech) for valuable discussions on the topic.

Funding

This research was partially supported by the grant of Ministry of Education and Science of Russian Federation (14.756.31.0001).

Author information

Authors and Affiliations

Skolkovo Institute of Science and Technology, 121205, Moscow, Russia
A. I. Boyko, I. V. Oseledets & G. Ferrer
INM RAS, 119333, Moscow, Russia
I. V. Oseledets

Authors

A. I. Boyko
View author publications
You can also search for this author in PubMed Google Scholar
I. V. Oseledets
View author publications
You can also search for this author in PubMed Google Scholar
G. Ferrer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. I. Boyko or I. V. Oseledets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boyko, A.I., Oseledets, I.V. & Ferrer, G. TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control. Comput. Math. and Math. Phys. 61, 836–846 (2021). https://doi.org/10.1134/S0965542521050043

Download citation

Received: 24 November 2020
Revised: 24 November 2020
Accepted: 14 January 2021
Published: 01 July 2021
Issue Date: May 2021
DOI: https://doi.org/10.1134/S0965542521050043

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions