当前位置:
X-MOL 学术
›
IEEE Trans. Fuzzy Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Convergence of Recurrent Neuro-Fuzzy Value-Gradient Learning With and Without Actor
IEEE Transactions on Fuzzy Systems ( IF 11.9 ) Pub Date : 2020-04-01 , DOI: 10.1109/tfuzz.2019.2912349 Seaar Al-Dabooni , Donald Wunsch
IEEE Transactions on Fuzzy Systems ( IF 11.9 ) Pub Date : 2020-04-01 , DOI: 10.1109/tfuzz.2019.2912349 Seaar Al-Dabooni , Donald Wunsch
In recent years, a gradient of the $n$ -step temporal-difference [TD($\lambda$ )] learning has been developed to present an advanced adaptive dynamic programming (ADP) algorithm, called value-gradient learning [VGL($\lambda$ )]. In this paper, we improve the VGL($\lambda$ ) architecture, which is called the “single adaptive actor network [SNVGL($\lambda$ )]” because it has only a single approximator function network (critic) instead of dual networks (critic and actor) as in VGL($\lambda$ ). Therefore, SNVGL($\lambda$ ) has lower computational requirements when compared to VGL($\lambda$ ). Moreover, in this paper, a recurrent hybrid neuro-fuzzy (RNF) and a first-order Takagi–Sugeno RNF (TSRNF) are derived and implemented to build the critic and actor networks. Furthermore, we develop the novel study of the theoretical convergence proofs for both VGL($\lambda$ ) and SNVGL($\lambda$ ) under certain conditions. In this paper, mobile robot simulation model (model based) is used to solve the optimal control problem for affine nonlinear discrete-time systems. Mobile robot is exposed various noise levels to verify the performance and to validate the theoretical analysis.
中文翻译:
有和没有Actor的递归神经模糊值梯度学习的收敛
近年来,梯度$n$ -step 时间差 [TD($\lambda$ )] 学习已经发展成为一种先进的自适应动态规划 (ADP) 算法,称为值梯度学习 [VGL($\lambda$ )]。在本文中,我们改进了 VGL($\lambda$ ) 架构,称为“单自适应行为者网络 [SNVGL($\lambda$ )]”因为它只有一个近似函数网络(评论家)而不是像 VGL($\lambda$ )。因此,SNVGL($\lambda$ ) 与 VGL($\lambda$ )。此外,在本文中,推导并实现了循环混合神经模糊(RNF)和一阶 Takagi-Sugeno RNF(TSRNF)以构建评论家和演员网络。此外,我们对 VGL($\lambda$ ) 和 SNVGL($\lambda$ ) 在特定条件下。在本文中,移动机器人仿真模型(基于模型)用于解决仿射非线性离散时间系统的最优控制问题。移动机器人暴露于各种噪声水平以验证性能并验证理论分析。
更新日期:2020-04-01
中文翻译:
有和没有Actor的递归神经模糊值梯度学习的收敛
近年来,梯度