当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2017-09-07 , DOI: 10.1109/tcyb.2017.2741342
Vignesh Narayanan , Sarangapani Jagannathan

In this paper, a distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration. First, an approximate solution to the Hamilton-Jacobi-Bellman equation is generated with event sampled neural network (NN) approximation and subsequently, a near optimal control policy for each subsystem is derived. Artificial NNs are utilized as function approximators to develop a suite of identifiers and learn the dynamics of each subsystem. The NN weight tuning rules for the identifier and event-triggering condition are derived using Lyapunov stability theory. Taking into account, the effects of NN approximation of system dynamics and boot-strapping, a novel NN weight update is presented to approximate the optimal value function. Finally, a novel strategy to incorporate exploration in online control framework, using identifiers, is introduced to reduce the overall cost at the expense of additional computations during the initial online learning phase. System states and the NN weight estimation errors are regulated and local uniformly ultimately bounded results are achieved. The analytical results are substantiated using simulation studies.

中文翻译:


使用在线强化学习和探索的非线性互连系统的事件触发分布式控制



本文提出了一种由具有事件触发状态反馈的不确定输入仿射非线性子系统组成的互连系统的分布式控制方案,该方案采用基于混合学习方案的在线探索近似动态规划。首先,利用事件采样神经网络 (NN) 近似生成 Hamilton-Jacobi-Bellman 方程的近似解,随后导出每个子系统的近乎最优控制策略。人工神经网络被用作函数逼近器来开发一套标识符并学习每个子系统的动态。使用李亚普诺夫稳定性理论推导出标识符和事件触发条件的神经网络权重调整规则。考虑到神经网络逼近系统动力学和自举的影响,提出了一种新颖的神经网络权重更新来逼近最优值函数。最后,引入了一种使用标识符将探索纳入在线控制框架的新颖策略,以在初始在线学习阶段以额外计算为代价来降低总体成本。调节系统状态和神经网络权重估计误差,并获得局部一致的最终有界结果。分析结果通过模拟研究得到证实。
更新日期:2017-09-07
down
wechat
bug