Ternary Policy Iteration Algorithm for Nonlinear Robust Control,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Ternary Policy Iteration Algorithm for Nonlinear Robust Control
arXiv - CS - Systems and Control Pub Date : 2020-07-14 , DOI: arxiv-2007.06810
Jie Li, Shengbo Eben Li, Yang Guan, Jingliang Duan, Wenyu Li, Yuming Yin

The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order to solve the differential game, the corresponding Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and three update phases are designed to match the identity equation, minimization and maximization of the HJI equation, respectively. These loss functions are defined by the expectation of the approximate Hamiltonian in a generated state set to prevent operating all the states in the entire state set concurrently. The parameters of value function and policies are directly updated by diminishing the designed loss functions using the gradient descent method. Moreover, zero-initialization can be applied to the parameters of the control policy. The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies. The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant.

中文翻译：

非线性鲁棒控制的三元策略迭代算法

设备动力学的不确定性仍然是非线性控制问题的挑战。本文开发了一种三元策略迭代 (TPI) 算法，用于解决具有有界不确定性的非线性鲁棒控制问题。将系统的控制器和不确定性视为博弈参与者，将鲁棒控制问题表述为两人零和微分博弈。为了求解微分博弈，然后推导出相应的 Hamilton-Jacobi-Isaacs (HJI) 方程。设计了三个损失函数和三个更新阶段，分别匹配 HJI 方程的恒等式、最小化和最大化。这些损失函数由生成状态集中的近似哈密顿量的期望定义，以防止同时操作整个状态集中的所有状态。通过使用梯度下降法减少设计的损失函数，直接更新值函数和策略的参数。此外，零初始化可以应用于控制策略的参数。通过两个仿真研究证明了所提出的 TPI 算法的有效性。仿真结果表明，TPI算法能够收敛到线性被控对象的最优解，对非线性被控对象具有较高的抗干扰能力。

更新日期：2020-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文