当前位置: X-MOL 学术J. Spacecr. Rockets › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Learning Techniques for Autonomous Spacecraft Guidance During Proximity Operations
Journal of Spacecraft and Rockets ( IF 1.6 ) Pub Date : 2021-08-02 , DOI: 10.2514/1.a35076
Lorenzo Federici 1 , Boris Benedikter 1 , Alessandro Zavoli 1
Affiliation  

This paper investigates the use of deep learning techniques for real-time optimal spacecraft guidance during terminal rendezvous maneuvers, in presence of both operational constraints and stochastic effects, such as an inaccurate knowledge of the initial spacecraft state and the presence of random in-flight disturbances. The performance of two well-studied deep learning methods, behavioral cloning (BC) and reinforcement learning (RL), is investigated on a linear multi-impulsive rendezvous mission. To this aim, a multilayer perceptron network, with custom architecture, is designed to map any observation of the actual spacecraft relative position and velocity to the propellant-optimal control action, which corresponds to a bounded-magnitude impulsive velocity variation. In the BC approach, the deep neural network is trained by supervised learning on a set of optimal trajectories, generated by routinely solving the deterministic optimal control problem via convex optimization, starting from scattered initial conditions. Conversely, in the RL approach, a state-of-the-art actor–critic algorithm, proximal policy optimization, is used for training the network through repeated interactions with the stochastic environment. Eventually, the robustness and propellant efficiency of the obtained closed-loop control policies are assessed and compared by means of a Monte Carlo analysis, carried out by considering different test cases with increasing levels of perturbations.



中文翻译:

接近操作期间自主航天器制导的深度学习技术

本文研究了使用深度学习技术在终端交会机动期间实时优化航天器制导,同时存在操作约束和随机效应,例如初始航天器状态的不准确知识和随机飞行中干扰的存在. 在线性多脉冲会合任务中研究了两种经过充分研究的深度学习方法,行为克隆 (BC) 和强化学习 (RL) 的性能。为此,具有定制架构的多层感知器网络旨在将实际航天器相对位置和速度的任何观察映射到推进剂最佳控制动作,这对应于有界幅度的脉冲速度变化。在 BC 方法中,深度神经网络通过一组最优轨迹的监督学习进行训练,这些轨迹是通过凸优化常规解决确定性最优控制问题而生成的,从分散的初始条件开始。相反,在 RL 方法中,最先进的 actor-critic 算法、近端策略优化用于通过与随机环境的重复交互来训练网络。最终,通过考虑具有增加扰动水平的不同测试案例进行蒙特卡罗分析,评估和比较获得的闭环控制策略的鲁棒性和推进剂效率。相反,在 RL 方法中,最先进的 actor-critic 算法、近端策略优化用于通过与随机环境的重复交互来训练网络。最终,通过考虑具有增加扰动水平的不同测试案例进行蒙特卡罗分析,评估和比较获得的闭环控制策略的鲁棒性和推进剂效率。相反,在 RL 方法中,最先进的 actor-critic 算法、近端策略优化用于通过与随机环境的重复交互来训练网络。最终,通过考虑具有增加扰动水平的不同测试案例进行蒙特卡罗分析,评估和比较获得的闭环控制策略的鲁棒性和推进剂效率。

更新日期:2021-08-03
down
wechat
bug