Convergence of Recurrent Neuro-Fuzzy Value-Gradient Learning With and Without an Actor,IEEE Transactions on Fuzzy Systems

当前位置： X-MOL 学术 › IEEE Trans. Fuzzy Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Convergence of Recurrent Neuro-Fuzzy Value-Gradient Learning With and Without an Actor
IEEE Transactions on Fuzzy Systems ( IF 10.7 ) Pub Date : 4-25-2019 , DOI: 10.1109/tfuzz.2019.2912349
Seaar Al-Dabooni , Donald Wunsch

In recent years, a gradient of the nn-step temporal-difference [TD(λ\lambda)] learning has been developed to present an advanced adaptive dynamic programming (ADP) algorithm, called value-gradient learning [VGL(λ\lambda)]. In this paper, we improve the VGL(λ\lambda) architecture, which is called the “single adaptive actor network [SNVGL(λ\lambda)]” because it has only a single approximator function network (critic) instead of dual networks (critic and actor) as in VGL(λ\lambda). Therefore, SNVGL(λ\lambda) has lower computational requirements when compared to VGL(λ\lambda). Moreover, in this paper, a recurrent hybrid neuro-fuzzy (RNF) and a first-order Takagi_Sugeno RNF (TSRNF) are derived and implemented to build the critic and actor networks. Furthermore, we develop the novel study of the theoretical convergence proofs for both VGL(λ\lambda) and SNVGL(λ\lambda) under certain conditions. In this paper, mobile robot simulation model (model based) is used to solve the optimal control problem for affine nonlinear discrete-time systems. Mobile robot is exposed various noise levels to verify the performance and to validate the theoretical analysis.

中文翻译：

有和没有演员的循环神经模糊值梯度学习的收敛

近年来，人们开发了 nn 步时间差梯度 [TD(λ\lambda)] 学习，提出了一种先进的自适应动态规划 (ADP) 算法，称为值梯度学习 [VGL(λ\lambda) ]。在本文中，我们改进了 VGL(λ\lambda) 架构，将其称为“单自适应行动者网络 [SNVGL(λ\lambda)]”，因为它只有一个逼近函数网络（critic）而不是双网络（评论家和演员）如 VGL(λ\lambda) 中所示。因此，与 VGL(λ\lambda) 相比，SNVGL(λ\lambda) 的计算要求较低。此外，在本文中，导出并实现了循环混合神经模糊（RNF）和一阶 Takagi_Sugeno RNF（TSRNF）来构建评论家和演员网络。此外，我们对特定条件下 VGL(λ\lambda) 和 SNVGL(λ\lambda) 的理论收敛证明进行了新颖的研究。本文采用移动机器人仿真模型（基于模型）来解决仿射非线性离散时间系统的最优控制问题。移动机器人暴露在各种噪声水平下以验证性能并验证理论分析。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11