当前位置: X-MOL 学术Found. Comput. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Neural Network-Based Policy Iteration Algorithm with Global $$H^2$$H2 -Superlinear Convergence for Stochastic Games on Domains
Foundations of Computational Mathematics ( IF 2.5 ) Pub Date : 2020-05-18 , DOI: 10.1007/s10208-020-09460-1
Kazufumi Ito , Christoph Reisinger , Yufei Zhang

In this work, we propose a class of numerical schemes for solving semilinear Hamilton–Jacobi–Bellman–Isaacs (HJBI) boundary value problems which arise naturally from exit time problems of diffusion processes with controlled drift. We exploit policy iteration to reduce the semilinear problem into a sequence of linear Dirichlet problems, which are subsequently approximated by a multilayer feedforward neural network ansatz. We establish that the numerical solutions converge globally in the \(H^2\)-norm and further demonstrate that this convergence is superlinear, by interpreting the algorithm as an inexact Newton iteration for the HJBI equation. Moreover, we construct the optimal feedback controls from the numerical value functions and deduce convergence. The numerical schemes and convergence results are then extended to oblique derivative boundary conditions. Numerical experiments on the stochastic Zermelo navigation problem are presented to illustrate the theoretical results and to demonstrate the effectiveness of the method.



中文翻译:

基于神经网络的全局$$ H ^ 2 $$ H2策略迭代算法-域上随机博弈的超线性收敛

在这项工作中,我们提出了一类用于解决半线性Hamilton–Jacobi–Bellman–Isaacs(HJBI)边值问题的数值方案,这些边值问题自然是由具有受控漂移的扩散过程的出口时间问题引起的。我们利用策略迭代将半线性问题简化为一系列线性Dirichlet问题,随后通过多层前馈神经网络ansatz对其进行近似。我们建立数值解全局收敛于\(H ^ 2 \)-norm并通过将算法解释为HJBI方程的不精确的牛顿迭代来证明这种收敛是超线性的。此外,我们从数值函数构造最优反馈控制并推导收敛性。然后将数值方案和收敛结果扩展到斜导数边界条件。通过对随机采尔梅洛导航问题进行数值实验,以说明理论结果并证明该方法的有效性。

更新日期:2020-05-18
down
wechat
bug