当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems
Automatica ( IF 4.8 ) Pub Date : 2021-05-08 , DOI: 10.1016/j.automatica.2021.109687
Chun Li , Jinliang Ding , Frank L. Lewis , Tianyou Chai

In this paper, to eliminate the tracking error by using adaptive dynamic programming (ADP) algorithms, a novel formulation of the value function is presented for the optimal tracking problem (TP) of nonlinear discrete-time systems. Unlike existing ADP methods, this formulation introduces the control input into the tracking error, and ignores the quadratic form of the control input directly, which makes the boundedness and convergence of the value function independent of the discount factor. Based on the proposed value function, the optimal control policy can be deduced without considering the reference control input. Value iteration (VI) and policy iteration (PI) methods are applied to prove the optimality of the obtained control policy, and derived the monotonicity property and convergence of the iterative value function. Simulation examples realized with neural networks and the actor–critic structure are provided to verify the effectiveness of the proposed ADP algorithm.



中文翻译:

非线性离散时间系统中基于跟踪误差的新型自适应动态规划

为了消除跟踪误差,采用了自适应动态规划(ADP)算法,提出了一种针对非线性离散时间系统最优跟踪问题(TP)的价值函数的新表述。与现有的ADP方法不同,此公式将控制输入引入跟踪误差,并直接忽略控制输入的二次形式,这使得值函数的有界性和收敛性与折现因子无关。基于建议的值函数,可以在不考虑参考控制输入的情况下得出最优控制策略。应用值迭代(VI)和策略迭代(PI)方法来证明所获得控制策略的最优性,并推导得出单调性和迭代值函数的收敛性。提供了使用神经网络和行为者-批评者结构实现的仿真示例,以验证所提出的ADP算法的有效性。

更新日期:2021-05-08
down
wechat
bug