State-Following-Kernel-Based Online Reinforcement Learning Guidance Law Against Maneuvering Target,IEEE Transactions on Aerospace and Electronic Systems

当前位置： X-MOL 学术 › IEEE Trans. Aerosp. Electron. Sys. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

State-Following-Kernel-Based Online Reinforcement Learning Guidance Law Against Maneuvering Target
IEEE Transactions on Aerospace and Electronic Systems ( IF 5.1 ) Pub Date : 5-30-2022 , DOI: 10.1109/taes.2022.3178770
Chi Peng ₁ , Hanwen Zhang ₁ , Yongxiang He ₁ , Jianjun Ma ₁

Affiliation

In this article, a state-following-kernel-based reinforcement learning method with an extended disturbance observer is proposed, whose application to a missile-target interception system is considered. First, the missile-target engagement is formulated as a vertical planar pursuit–evasion problem. The target maneuver is then estimated by an extended disturbance observer in real time, which leads to an infinite-horizon optimal regulation problem. Next, utilizing the local state approximation ability of state-following kernels, the critic neural network (NN) and actor NN for synchronous iteration are constructed to calculate the approximate optimal guidance policy. The states and NN weights are proven to be uniformly ultimately bounded using the Lyapunov method. Finally, numerical simulations against different types of nonstationary targets are effectively tested, and the results highlight the role of state-following kernels in the value function and policy approximation.

中文翻译：

基于状态跟踪核的在线强化学习针对机动目标的制导法

本文提出了一种带有扩展扰动观测器的基于状态跟踪核的强化学习方法，并考虑将其应用于导弹目标拦截系统。首先，导弹与目标的交战被表述为垂直平面追逃问题。然后通过扩展的扰动观测器实时估计目标机动，这导致无限范围的最优调节问题。接下来，利用状态跟踪核的局部状态逼近能力，构建同步迭代的批评神经网络（NN）和行动者神经网络来计算近似最优引导策略。使用 Lyapunov 方法证明状态和神经网络权重最终一致有界。最后，对不同类型非平稳目标的数值模拟进行了有效测试，结果突出了状态跟踪核在价值函数和策略逼近中的作用。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11