Abstract
The problem of general non-linear stochastic optimal control with small Wiener noise is studied. The problem is approximated by a Markov Decision Process. Bellman Equation is solved using Value Iteration (VI) algorithm in the low rank Tensor Train format (TT-VI). In this paper a modification of the TT-VI algorithm called TT-Q-Iteration (TT-QI) is proposed by authors. In it, the nonlinear Bellman Optimality Operator is iteratively applied to the solution as a composition of internal Tensor Train algebraic operations and TT-CROSS algorithm. We show that it has lower asymptotic complexity per iteration than the method existing in the literature, provided that TT-ranks of transition probabilities are small. In test examples of an underpowered inverted pendulum and Dubins cars our method shows up to 3–10 times faster convergence in terms of wall clock time compared with the original method.
Similar content being viewed by others
REFERENCES
H. Kushner and P. G. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time (Springer, New York, 2013).
W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions (Springer, New York, 2006).
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming (Athena Scientific, 1996).
T. P. Lillicrap, et al., “Continuous control with deep reinforcement learning,” 4th International Conference on Learning Representations, ICLR (2016).
Ł. Kidziński, et al., “Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments,” Proceedings of NIPS (2017).
M. Horowitz, A. Damle, and J. W. Burdick, “Linear Hamilton–Jacobi–Bellman equations in high dimensions,” 53rd IEEE Conference on Decision and Control (2014).
A. A. Gorodetsky, S. Karaman, and Y. M. Marzouk, “Efficient high-dimensional stochastic optimal motion control using tensor train decomposition,” 11th Conference on Robotics: Science and Systems, Rome, Italy (2015).
A. A. Gorodetsky, S. Karaman, and Y. M. Marzouk, “High-dimensional stochastic optimal control using continuous tensor decompositions,” Int. J. Rob. Res. 37 (2–3), 340–377 (2018).
E. Tal, A. Gorodetsky, and S. Karaman, “Continuous tensor train-based dynamic programming for high-dimensional zero-sum differential games,” 2018 Annual American Control Conference (2018), pp. 6086–6093.
I. V. Oseledets and E. E. Tyrtyshnikov, “Breaking the curse of dimensionality, or how to use SVD in many dimensions,” SIAM J. Sci. Comput. 31 (5), 3744–3759 (2009).
I. V. Oseledets and E. E. Tyrtyshnikov, “TT-cross approximation for multidimensional arrays,” Linear Algebra Appl. 432 (1), 70–88 (2010).
R. E. Bellman, Dynamic Programming (Princeton Univ. Press, Princeton, 1957).
A. Rossler, “Runge–Kutta Methods for the strong approximation of solutions of stochastic differential equations,” SIAM J. Numer. Anal. 48 (3), 922–952 (2010).
ACKNOWLEDGMENTS
Authors are grateful to Dr. S. Dolgov (University of Bath), Dr. S. Matveev (INM RAS, Skoltech) and Dr. G. Ovchinnikov (Skoltech) for valuable discussions on the topic.
Funding
This research was partially supported by the grant of Ministry of Education and Science of Russian Federation (14.756.31.0001).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Boyko, A.I., Oseledets, I.V. & Ferrer, G. TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control. Comput. Math. and Math. Phys. 61, 836–846 (2021). https://doi.org/10.1134/S0965542521050043
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0965542521050043