Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks,arXiv - CS - Numerical Analysis

当前位置： X-MOL 学术 › arXiv.cs.NA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks
arXiv - CS - Numerical Analysis Pub Date : 2021-02-22 , DOI: arxiv-2102.11379
Mo Zhou, Jiequn Han, Jianfeng Lu

We propose a novel numerical method for high dimensional Hamilton--Jacobi--Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least square temporal difference method (VR-LSTD) using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive stepsize scheme to improve the accuracy near the domain boundary. Numerical examples up to $20$ spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, and the diffusive Eikonal equations are presented to validate the effectiveness of our proposed method.

中文翻译：

基于神经网络的高维静态Hamilton-Jacobi-Bellman偏微分方程的Actor-Crit方法

我们提出了一种新的高维Hamilton-Jacobi-Bellman（HJB）型椭圆偏微分方程（PDE）的数值方法。HJB PDEs被重新定义为最佳控制问题，通过基于价值和控制功能的神经网络参数化，受强化学习启发的行动者批评框架来解决。在行动者批评框架内，我们采用策略梯度方法来改善控制，而对于价值函数，我们使用随机演算来推导方差缩减最小二乘时差方法（VR-LSTD）。为了在数值上离散化随机控制问题，我们采用了自适应步长调整方案来提高域边界附近的精度。高达20 $空间尺寸的数值示例，包括线性二次调节器，

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文