TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control,Computational Mathematics and Mathematical Physics

当前位置： X-MOL 学术 › Comput. Math. Math. Phys. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TT-QI: Faster Value Iteration in Tensor Train Format for Stochastic Optimal Control
Computational Mathematics and Mathematical Physics ( IF 0.7 ) Pub Date : 2021-07-01 , DOI: 10.1134/s0965542521050043
A. I. Boyko , I. V. Oseledets , G. Ferrer

Abstract

The problem of general non-linear stochastic optimal control with small Wiener noise is studied. The problem is approximated by a Markov Decision Process. Bellman Equation is solved using Value Iteration (VI) algorithm in the low rank Tensor Train format (TT-VI). In this paper a modification of the TT-VI algorithm called TT-Q-Iteration (TT-QI) is proposed by authors. In it, the nonlinear Bellman Optimality Operator is iteratively applied to the solution as a composition of internal Tensor Train algebraic operations and TT-CROSS algorithm. We show that it has lower asymptotic complexity per iteration than the method existing in the literature, provided that TT-ranks of transition probabilities are small. In test examples of an underpowered inverted pendulum and Dubins cars our method shows up to 3–10 times faster convergence in terms of wall clock time compared with the original method.

中文翻译：

TT-QI：用于随机优化控制的张量训练格式的更快值迭代

摘要

研究了具有小维纳噪声的一般非线性随机最优控制问题。该问题由马尔可夫决策过程近似。贝尔曼方程使用低秩张量训练格式 (TT-VI) 中的值迭代 (VI) 算法求解。在本文中，作者提出了一种对 TT-VI 算法的修改，称为 TT-Q-Iteration (TT-QI)。其中，非线性贝尔曼最优算子作为内部 Tensor Train 代数运算和 TT-CROSS 算法的组合被迭代应用于解。我们表明，只要转移概率的 TT 秩很小，它每次迭代的渐进复杂度比文献中现有的方法低。

更新日期：2021-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>