Proximal policy optimization guidance algorithm for intercepting near-space maneuvering targets,Aerospace Science and Technology

当前位置： X-MOL 学术 › Aerosp. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Proximal policy optimization guidance algorithm for intercepting near-space maneuvering targets
Aerospace Science and Technology ( IF 5.0 ) Pub Date : 2022-11-25 , DOI: 10.1016/j.ast.2022.108031
Wenxue Chen , Changsheng Gao , Wuxing Jing

This paper studies a novel guidance framework of the vehicle against a high-speed and maneuvering target based on deep reinforcement learning (DRL) considering the energy consumption, autopilot lag dynamics, and input saturation, which can effectively cope with the high flight-path angle error flight phase and various uncertainties. The guidance framework proposes an end-to-end mapping transformation between the guidance command and observation states consisting of line-of-sight (LOS) angle, relative distance, and their rate measured by the seeker. At the same time, the observability of the LOS angle and relative distance is included in constructing the reward function. Besides, the relative engagement kinematic model of the interceptor-target is established and combined with the PPO guidance algorithm, jointly described as a Markov decision process (MDP). Notably, the guidance framework is optimized using the improved proximal policy optimization (PPO) algorithm and demonstrated in a simulated terminal phase in the near-space. Specifically, the PPO guidance algorithm is structured by the policy (actor) neural network and the critic neural network, and both are standard fully-connected neural networks. Subsequently, observation states and rewards are fully collected and applied by introducing the experience replay method. Also, the exponential decay learning rate method, mini-batch stochastic gradient ascent (SGA) method, zero-score standardization, and Adam optimizer are proposed to train the reinforcement learning algorithm more efficiently. Moreover, the proposed guidance framework has an excellent generalization capability and guarantees the implementation of fixed and stochastic engagement scenarios, which means that the interceptor can realize the unlearned practical combat scenarios. The robust capacity is indicated and validated using Monte Carlo simulation under various uncertainties. Moreover, the DRL guidance framework can satisfy the onboard application requirement.

中文翻译：

拦截临近空间机动目标的临近策略优化制导算法

本文研究了一种基于深度强化学习（DRL）的新型车辆针对高速机动目标的制导框架，该框架考虑了能量消耗、自动驾驶滞后动力学和输入饱和度，可以有效应对高飞行路径角度错误飞行阶段和各种不确定性。制导框架提出了制导命令和观察状态之间的端到端映射转换，包括视线 (LOS) 角度、相对距离和导引头测量的速率。同时，将LOS角度和相对距离的可观测性纳入奖励函数的构建。此外，建立了拦截器-目标的相对交战运动学模型，并结合PPO制导算法，共同描述为马尔可夫决策过程（MDP）。值得注意的是，制导框架使用改进的近端策略优化 (PPO) 算法进行了优化，并在近端空间的模拟终端阶段进行了演示。具体来说，PPO引导算法由策略（演员）神经网络和批评神经网络构成，都是标准的全连接神经网络。随后，通过引入经验回放方法，充分收集和应用观察状态和奖励。此外，还提出了指数衰减学习率方法、小批量随机梯度上升（SGA）方法、零分标准化和 Adam 优化器，以更有效地训练强化学习算法。而且，所提出的制导框架具有出色的泛化能力，保证了固定和随机交战场景的实施，这意味着拦截器可以实现未学习的实战场景。在各种不确定性下，使用蒙特卡洛模拟表明和验证了稳健的能力。此外，DRL制导框架可以满足车载应用需求。

更新日期：2022-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11