当前位置: X-MOL 学术arXiv.cs.NA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks
arXiv - CS - Numerical Analysis Pub Date : 2021-02-22 , DOI: arxiv-2102.11379
Mo Zhou, Jiequn Han, Jianfeng Lu

We propose a novel numerical method for high dimensional Hamilton--Jacobi--Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least square temporal difference method (VR-LSTD) using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive stepsize scheme to improve the accuracy near the domain boundary. Numerical examples up to $20$ spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, and the diffusive Eikonal equations are presented to validate the effectiveness of our proposed method.

中文翻译:

基于神经网络的高维静态Hamilton-Jacobi-Bellman偏微分方程的Actor-Crit方法

我们提出了一种新的高维Hamilton-Jacobi-Bellman(HJB)型椭圆偏微分方程(PDE)的数值方法。HJB PDEs被重新定义为最佳控制问题,通过基于价值和控制功能的神经网络参数化,受强化学习启发的行动者批评框架来解决。在行动者批评框架内,我们采用策略梯度方法来改善控制,而对于价值函数,我们使用随机演算来推导方差缩减最小二乘时差方法(VR-LSTD)。为了在数值上离散化随机控制问题,我们采用了自适应步长调整方案来提高域边界附近的精度。高达20 $空间尺寸的数值示例,包括线性二次调节器,
更新日期:2021-02-24
down
wechat
bug