Model-Free Event-Triggered Optimal Consensus Control of Multiple Euler-Lagrange Systems via Reinforcement Learning,IEEE Transactions on Network Science and Engineering

当前位置： X-MOL 学术 › IEEE Trans. Netw. Sci. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-Free Event-Triggered Optimal Consensus Control of Multiple Euler-Lagrange Systems via Reinforcement Learning
IEEE Transactions on Network Science and Engineering ( IF 6.7 ) Pub Date : 2020-11-09 , DOI: 10.1109/tnse.2020.3036604
Saiwei Wang , Xin Jin , Shuai Mao , Athanasios V. Vasilakos , Yang Tang

This paper develops a model-free approach to solve the event-triggered optimal consensus of multiple Euler-Lagrange systems (MELSs) via reinforcement learning (RL). Firstly, an augmented system is constructed by defining a pre-compensator to circumvent the dependence on system dynamics. Secondly, the Hamilton-Jacobi-Bellman (HJB) equations are applied to the deduction of the model-free event-triggered optimal controller. Thirdly, we present a policy iteration (PI) algorithm derived from RL, which converges to the optimal policy. Then, the value function of each agent is represented through a neural network to realize the PI algorithm. Moreover, the gradient descent method is used to update the neural network only at a series of discrete event-triggered instants. The specific form of the event-triggered condition is then proposed, and it is guaranteed that the closed-loop augmented system under the event-triggered mechanism is uniformly ultimately bounded (UUB). Meanwhile, the Zeno behavior is also eliminated. Finally, the validity of this approach is verified by a simulation example.

中文翻译：

通过强化学习对多个Euler-Lagrange系统进行无模型触发的最优共识控制

本文开发了一种无模型方法，通过强化学习（RL）解决了多个Euler-Lagrange系统（MELS）的事件触发的最优共识。首先，通过定义预补偿器来构建增强系统，以规避对系统动力学的依赖。其次，将Hamilton-Jacobi-Bellman（HJB）方程应用于无模型事件触发的最优控制器的推论。第三，我们提出了一种基于RL的策略迭代（PI）算法，该算法收敛于最优策略。然后，通过神经网络表示每个主体的值函数，以实现PI算法。此外，梯度下降法仅在一系列离散的事件触发时刻用于更新神经网络。然后提出事件触发条件的特定形式，并保证了事件触发机制下的闭环扩充系统是统一的最终有界（UUB）。同时，也消除了芝诺行为。最后，通过仿真实例验证了该方法的有效性。

更新日期：2020-11-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文