当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-04-21 , DOI: 10.1109/tnnls.2021.3071545
Qingtao Zhao 1 , Jian Sun 2 , Gang Wang 3 , Jie Chen 4
Affiliation  

For nonzero-sum (NZS) games of nonlinear systems, reinforcement learning (RL) or adaptive dynamic programming (ADP) has shown its capability of approximating the desired index performance and the optimal input policy iteratively. In this article, an event-triggered ADP is proposed for NZS games of continuous-time nonlinear systems with completely unknown system dynamics. To achieve the Nash equilibrium solution approximately, the critic neural networks and actor neural networks are utilized to estimate the value functions and the control policies, respectively. Compared with the traditional time-triggered mechanism, the proposed algorithm updates the neural network weights as well as the inputs of players only when a state-based event-triggered condition is violated. It is shown that the system stability and the weights’ convergence are still guaranteed under mild assumptions, while occupation of communication and computation resources is considerably reduced. Meanwhile, the infamous Zeno behavior is excluded by proving the existence of a minimum inter-event time (MIET) to ensure the feasibility of the closed-loop event-triggered continuous-time system. Finally, a numerical example is simulated to illustrate the effectiveness of the proposed approach.

中文翻译:

未知非线性系统非零和博弈的事件触发 ADP

对于非线性系统的非零和 (NZS) 博弈,强化学习 (RL) 或自适应动态规划 (ADP) 已显示出其迭代逼近所需索引性能和最优输入策略的能力。在本文中,针对具有完全未知系统动力学的连续时间非线性系统的 NZS 博弈,提出了一种事件触发的 ADP。为了近似实现纳什均衡解,分别利用批评神经网络和行动者神经网络来估计价值函数和控制策略。与传统的时间触发机制相比,该算法仅在违反基于状态的事件触发条件时才更新神经网络权重以及玩家的输入。结果表明,在温和的假设下,系统的稳定性和权重的收敛性仍然得到保证,同时通信和计算资源的占用大大减少。同时,通过证明最小事件间时间(MIET)的存在来排除臭名昭著的芝诺行为,以确保闭环事件触发连续时间系统的可行性。最后,仿真一个数值例子来说明所提方法的有效性。
更新日期:2021-04-21
down
wechat
bug