当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 4-27-2018 , DOI: 10.1109/tcyb.2018.2821369
Biao Luo , Yin Yang , Derong Liu

In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive Q-learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the Q-function, an off-policy adaptive QL algorithm is developed to learn the optimal Q-function. An adaptive parameter αi in the policy evaluation is used to achieve tradeoff between the current and future Q-functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.

中文翻译:


自适应 $Q$ - 通过经验回放学习基于数据的最佳输出调节



本文研究了基于数据的离散时间系统的最优输出调节问题。利用真实系统数据开发了一种离策略自适应 Q 学习(QL)方法,无需系统动力学知识和效用函数的数学模型。通过引入Q函数,开发了一种离策略自适应QL算法来学习最优Q函数。策略评估中的自适应参数αi用于实现当前和未来Q函数之间的权衡。证明了自适应QL算法的收敛性,并分析了自适应参数的影响。为了利用真实系统数据实现自适应 QL 算法,开发了 Actor-Critic 神经网络(NN)结构。开发了最小二乘方案和批量梯度下降法来分别更新批评者和行动者神经网络权重。学习过程中采用了经验回放技术,使得自适应QL方法的实现简单方便。最后,通过数值模拟验证了所开发的自适应QL方法的有效性。
更新日期:2024-08-22
down
wechat
bug