Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning,IEEE Transactions on Fuzzy Systems

当前位置： X-MOL 学术 › IEEE Trans. Fuzzy Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning
IEEE Transactions on Fuzzy Systems ( IF 11.9 ) Pub Date : 2022-05-03 , DOI: 10.1109/tfuzz.2022.3171844
Haiyang Fang ₁ , Yidong Tu ₁ , Hai Wang ₂ , Shuping He ₃ , Fei Liu ₄ , Zhengtao Ding ₅ , Shing Shin Cheng ₆

Affiliation

This article explores a novel adaptive optimal control strategy for a class of sophisticated discrete-time nonlinear Markov jump systems (DTNMJSs) via Takagi–Sugeno fuzzy models and reinforcement learning (RL) techniques. First, the original nonlinear system model is represented by fuzzy approximation, while the relevant optimal control problem is equivalent to designing fuzzy controllers for linear fuzzy systems with Markov jumping parameters. Subsequently, we derive the fuzzy coupled algebraic Riccati equations for the fuzzy-based discrete-time linear Markov jump systems by using Hamiltonian–Bellman methods. Following this, an online fuzzy optimization algorithm for DTNMJSs as well as the associated equivalence proof is given. Then, a fully model-free off-policy fuzzy RL algorithm is derived with proved convergence for the DTNMJSs without using the information of system dynamics and transition probability. Finally, two simulation examples, respectively, related to the single-link robotic arm and the half-car active suspension are given to verify the effectiveness and good performance of the proposed approach.

中文翻译：

具有离策略强化学习的未知离散时间非线性马尔可夫跳跃系统的基于模糊的自适应优化

本文通过 Takagi–Sugeno 模糊模型和强化学习 (RL) 技术，为一类复杂的离散时间非线性马尔可夫跳跃系统 (DTNMJS) 探索了一种新颖的自适应最优控制策略。首先，将原始非线性系统模型用模糊逼近表示，而相关的最优控制问题等价于为具有马尔可夫跳跃参数的线性模糊系统设计模糊控制器。随后，我们使用哈密顿-贝尔曼方法推导了基于模糊的离散时间线性马尔可夫跳跃系统的模糊耦合代数 Riccati 方程。在此之后，给出了 DTNMJSs 的在线模糊优化算法以及相关的等价性证明。然后，在不使用系统动力学和转移概率信息的情况下，推导出了一种完全无模型的非策略模糊 RL 算法，并证明了 DTNMJS 的收敛性。最后给出了两个分别与单连杆机械臂和半车主动悬架相关的仿真实例，验证了所提方法的有效性和良好性能。

更新日期：2022-05-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>