当前位置: X-MOL 学术Acta Astronaut. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Meta-reinforcement learning for adaptive spacecraft guidance during finite-thrust rendezvous missions
Acta Astronautica ( IF 3.1 ) Pub Date : 2022-09-03 , DOI: 10.1016/j.actaastro.2022.08.047
Lorenzo Federici , Andrea Scorsoglio , Alessandro Zavoli , Roberto Furfaro

In this paper, a meta-reinforcement learning approach is investigated to design an adaptive guidance algorithm capable of carrying out multiple rendezvous space missions. Specifically, both a standard fully-connected network and a recurrent neural network are trained by proximal policy optimization on a wide distribution of finite-thrust rendezvous transfers between circular co-planar orbits. The recurrent network is also provided with the control and reward at the previous simulation step, thus allowing it to build, thanks to its history-dependent state, an internal representation of the considered task distribution. The ultimate goal is to generate a model which could adapt to unseen tasks and produce a nearly-optimal guidance law along any transfer leg of a multi-target mission. As a first step towards the solution of a complete multi-target problem, a sensitivity analysis on the single rendezvous leg is carried out in this paper, by varying the radius either of the initial or the final orbit, the transfer time, and the initial phasing between the chaser and the target. Numerical results show that the recurrent-network-based meta-reinforcement learning approach is able to better reconstruct the optimal control in almost all the analyzed scenarios, and, at the same time, to meet, with greater accuracy, the terminal rendezvous condition, even when considering problem instances that fall outside the original training domain.



中文翻译:

有限推力交会任务中自适应航天器引导的元强化学习

在本文中,研究了一种元强化学习方法,以设计一种能够执行多个交会空间任务的自适应制导算法。具体来说,标准全连接网络和循环神经网络都通过近端策略优化对圆形共面轨道之间有限推力会合转移的广泛分布进行训练。循环网络还在前一个模拟步骤中提供了控制和奖励,因此由于其历史相关状态,它可以构建所考虑任务分布的内部表示。最终目标是生成一个模型,该模型可以适应看不见的任务,并沿着多目标任务的任何转移腿产生近乎最佳的制导律。作为解决完整多目标问题的第一步,本文通过改变初始或最终轨道的半径、转移时间和初始在追逐者和目标之间移动。数值结果表明,基于循环网络的元强化学习方法能够在几乎所有分析场景中更好地重构最优控制,同时更准确地满足终端交会条件,甚至在考虑超出原始训练域的问题实例时。以及追逐者和目标之间的初始相位。数值结果表明,基于循环网络的元强化学习方法能够在几乎所有分析场景中更好地重构最优控制,同时更准确地满足终端交会条件,甚至在考虑超出原始训练域的问题实例时。以及追逐者和目标之间的初始相位。数值结果表明,基于循环网络的元强化学习方法能够在几乎所有分析场景中更好地重构最优控制,同时更准确地满足终端交会条件,甚至在考虑超出原始训练域的问题实例时。

更新日期:2022-09-03
down
wechat
bug