当前位置: X-MOL 学术Int. J. Adv. Robot. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space
International Journal of Advanced Robotic Systems ( IF 2.3 ) Pub Date : 2021-02-05 , DOI: 10.1177/1729881421989546
Zhuang Wang 1 , Hui Li 1, 2 , Zhaoxin Wu 2 , Haolin Wu 1
Affiliation  

To enhance the performance of guiding an aircraft to a moving destination in a certain direction in three-dimensional continuous space, it is essential to develop an efficient intelligent algorithm. In this article, a pretrained proximal policy optimization (PPO) with reward shaping algorithm, which does not require an accurate model, is proposed to solve the guidance problem of manned aircraft and unmanned aerial vehicles. Continuous action reward function and position reward function are presented, by which the training speed is increased and the performance of the generated trajectory is improved. Using pretrained PPO, a new agent can be trained efficiently for a new task. A reinforcement learning framework is built, in which an agent can be trained to generate a reference trajectory or a series of guidance instructions. General simulation results show that the proposed method can significantly improve the training efficiency and trajectory performance. The carrier-based aircraft approach simulation is carried out to prove the application value of the proposed approach.



中文翻译:

具有奖励整形的预训练近端策略优化算法,用于将飞机引导到三维连续空间中的移动目标

为了提高在连续的三维空间中将飞机按特定方向引导到目的地的性能,开发一种有效的智能算法至关重要。本文提出一种不需要精确模型的,带有奖励整形算法的预训练近端策略优化(PPO),以解决有人驾驶飞机和无人机的制导问题。提出了连续动作奖励功能和位置奖励功能,提高了训练速度,提高了生成轨迹的性能。使用预先训练的PPO,可以针对新任务有效地训练新代理。建立了强化学习框架,可以在其中训练代理以生成参考轨迹或一系列指导说明。通用仿真结果表明,该方法可以显着提高训练效率和弹道性能。进行了基于舰载机的飞机进近仿真,以证明该方法的应用价值。

更新日期:2021-02-05
down
wechat
bug