当前位置:
X-MOL 学术
›
arXiv.cs.SY
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Ternary Policy Iteration Algorithm for Nonlinear Robust Control
arXiv - CS - Systems and Control Pub Date : 2020-07-14 , DOI: arxiv-2007.06810 Jie Li, Shengbo Eben Li, Yang Guan, Jingliang Duan, Wenyu Li, Yuming Yin
arXiv - CS - Systems and Control Pub Date : 2020-07-14 , DOI: arxiv-2007.06810 Jie Li, Shengbo Eben Li, Yang Guan, Jingliang Duan, Wenyu Li, Yuming Yin
The uncertainties in plant dynamics remain a challenge for nonlinear control
problems. This paper develops a ternary policy iteration (TPI) algorithm for
solving nonlinear robust control problems with bounded uncertainties. The
controller and uncertainty of the system are considered as game players, and
the robust control problem is formulated as a two-player zero-sum differential
game. In order to solve the differential game, the corresponding
Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and
three update phases are designed to match the identity equation, minimization
and maximization of the HJI equation, respectively. These loss functions are
defined by the expectation of the approximate Hamiltonian in a generated state
set to prevent operating all the states in the entire state set concurrently.
The parameters of value function and policies are directly updated by
diminishing the designed loss functions using the gradient descent method.
Moreover, zero-initialization can be applied to the parameters of the control
policy. The effectiveness of the proposed TPI algorithm is demonstrated through
two simulation studies. The simulation results show that the TPI algorithm can
converge to the optimal solution for the linear plant, and has high resistance
to disturbances for the nonlinear plant.
中文翻译:
非线性鲁棒控制的三元策略迭代算法
设备动力学的不确定性仍然是非线性控制问题的挑战。本文开发了一种三元策略迭代 (TPI) 算法,用于解决具有有界不确定性的非线性鲁棒控制问题。将系统的控制器和不确定性视为博弈参与者,将鲁棒控制问题表述为两人零和微分博弈。为了求解微分博弈,然后推导出相应的 Hamilton-Jacobi-Isaacs (HJI) 方程。设计了三个损失函数和三个更新阶段,分别匹配 HJI 方程的恒等式、最小化和最大化。这些损失函数由生成状态集中的近似哈密顿量的期望定义,以防止同时操作整个状态集中的所有状态。通过使用梯度下降法减少设计的损失函数,直接更新值函数和策略的参数。此外,零初始化可以应用于控制策略的参数。通过两个仿真研究证明了所提出的 TPI 算法的有效性。仿真结果表明,TPI算法能够收敛到线性被控对象的最优解,对非线性被控对象具有较高的抗干扰能力。
更新日期:2020-07-15
中文翻译:
非线性鲁棒控制的三元策略迭代算法
设备动力学的不确定性仍然是非线性控制问题的挑战。本文开发了一种三元策略迭代 (TPI) 算法,用于解决具有有界不确定性的非线性鲁棒控制问题。将系统的控制器和不确定性视为博弈参与者,将鲁棒控制问题表述为两人零和微分博弈。为了求解微分博弈,然后推导出相应的 Hamilton-Jacobi-Isaacs (HJI) 方程。设计了三个损失函数和三个更新阶段,分别匹配 HJI 方程的恒等式、最小化和最大化。这些损失函数由生成状态集中的近似哈密顿量的期望定义,以防止同时操作整个状态集中的所有状态。通过使用梯度下降法减少设计的损失函数,直接更新值函数和策略的参数。此外,零初始化可以应用于控制策略的参数。通过两个仿真研究证明了所提出的 TPI 算法的有效性。仿真结果表明,TPI算法能够收敛到线性被控对象的最优解,对非线性被控对象具有较高的抗干扰能力。