Fuzzy H∞ Control of Discrete-Time Nonlinear Markov Jump Systems via a Novel Hybrid Reinforcement Q-Learning Method.,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fuzzy H∞ Control of Discrete-Time Nonlinear Markov Jump Systems via a Novel Hybrid Reinforcement Q-Learning Method.
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2023-10-17 , DOI: 10.1109/tcyb.2022.3220537
Jing Wang ₁ , Jiacheng Wu ₁ , Hao Shen ₂ , Jinde Cao ₃ , Leszek Rutkowski ₄

Affiliation

In this article, a novel hybrid reinforcement Q -learning control method is proposed to solve the adaptive fuzzy H∞ control problem of discrete-time nonlinear Markov jump systems based on the Takagi-Sugeno fuzzy model. First, the core problem of adaptive fuzzy H∞ control is converted to solving fuzzy game coupled algebraic Riccati equation, which can hardly be solved by mathematical methods directly. To solve this problem, an offline parallel hybrid learning algorithm is first designed, where system dynamics should be known as a prior. Furthermore, an online parallel Q -learning hybrid learning algorithm is developed. The main characteristics of the proposed online hybrid learning algorithms are threefold: 1) system dynamics are avoided during the learning process; 2) compared with the policy iteration method, the restriction of the initial stable control policy is removed; and 3) compared with the value iteration method, a faster convergence rate can be obtained. Finally, we provide a tunnel diode circuit system model to validate the effectiveness of the present learning algorithm.

中文翻译：

通过新型混合强化 Q 学习方法对离散时间非线性马尔可夫跳跃系统进行模糊 H∞ 控制。

本文提出了一种新的混合强化Q学习控制方法来解决基于Takagi-Sugeno模糊模型的离散时间非线性马尔可夫跳跃系统的自适应模糊H∞控制问题。首先，将自适应模糊H∞控制的核心问题转化为求解模糊博弈耦合的代数Riccati方程，该问题很难直接用数学方法求解。为了解决这个问题，首先设计了一种离线并行混合学习算法，其中系统动力学应该被称为先验。此外，还开发了一种在线并行Q-learning混合学习算法。所提出的在线混合学习算法的主要特点有三个：1）在学习过程中避免了系统动态；2）与策略迭代方法相比，去掉了初始稳定控制策略的限制；3)与值迭代法相比，可以获得更快的收敛速度。最后，我们提供了隧道二极管电路系统模型来验证本学习算法的有效性。

更新日期：2022-11-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>