当前位置: X-MOL 学术SIAM J. Control Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exponential Convergence and Stability of Howard's Policy Improvement Algorithm for Controlled Diffusions
SIAM Journal on Control and Optimization ( IF 2.2 ) Pub Date : 2020-05-11 , DOI: 10.1137/19m1236758
Bekzhan Kerimkulov , David Šiška , Lukasz Szpruch

SIAM Journal on Control and Optimization, Volume 58, Issue 3, Page 1314-1340, January 2020.
Optimal control problems are inherently hard to solve as the optimization must be performed simultaneously with updating the underlying system. Starting from an initial guess, Howard's policy improvement algorithm separates the step of updating the trajectory of the dynamical system from the optimization and iterations of this should converge to the optimal control. In the discrete space-time setting this is often the case and even rates of convergence are known. In the continuous space-time setting of controlled diffusion the algorithm consists of solving a linear PDE followed by a maximization problem. This has been shown to converge; in some situations, however, no global rate is known. The first main contribution of this paper is to establish global rate of convergence for the policy improvement algorithm and a variant, called here the gradient iteration algorithm. The second main contribution is the proof of stability of the algorithms under perturbations to both the accuracy of the linear PDE solution and the accuracy of the maximization step. The proof technique is new in this context as it uses the theory of backward stochastic differential equations.


中文翻译:

受控扩散的霍华德策略改进算法的指数收敛性和稳定性

SIAM控制与优化杂志,第58卷,第3期,第1314-1340页,2020年1月。
最优控制问题本质上难以解决,因为必须在更新底层系统的同时进行优化。从最初的猜测开始,霍华德的策略改进算法将更新动态系统轨迹的步骤与优化分开,并且此迭代应收敛至最优控制。在离散时空设置中,通常是这种情况,甚至收敛速度也是已知的。在受控扩散的连续时空设置中,该算法包括求解线性PDE,然后求解最大化问题。已经证明这是收敛的。但是,在某些情况下,尚不知道全局速率。本文的第一个主要贡献是为策略改进算法及其变体建立全局收敛速度,这里称为梯度迭代算法。第二个主要贡献是证明算法在扰动下对线性PDE解的精度和最大化步长的精度的稳定性。在这种情况下,证明技术是新的,因为它使用了反向随机微分方程的理论。
更新日期:2020-05-11
down
wechat
bug