Model-Based Reinforcement Learning and Neural-Network-Based Policy Compression for Spacecraft Rendezvous on Resource-Constrained Embedded Systems,IEEE Transactions on Industrial Informatics

当前位置： X-MOL 学术 › IEEE Trans. Ind. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-Based Reinforcement Learning and Neural-Network-Based Policy Compression for Spacecraft Rendezvous on Resource-Constrained Embedded Systems
IEEE Transactions on Industrial Informatics ( IF 12.3 ) Pub Date : 2022-07-18 , DOI: 10.1109/tii.2022.3192085
Zhibin Yang ₁ , Linquan Xing ₁ , Zonghua Gu ₂ , Yingmin Xiao ₁ , Yong Zhou ₁ , Zhiqiu Huang ₁ , Lei Xue ₃

Affiliation

Autonomous spacecraft rendezvous is very challenging in increasingly complex space missions. In this article, we present our approach model-based reinforcement learning for spacecraft rendezvous guidance (MBRL4SRG). We build a Markov decision process model based on the Clohessy-Wiltshire equation of spacecraft dynamics and use dynamic programming to solve it and generate the decision table as the optimal agent policy. Since the onboard computing system of spacecraft is resource constrained in terms of both memory size and processing speed, we train a neural network (NN) as a compact and efficient function approximation to the tabular representation of the decision table. The NN outputs are formally verified using the verification tool ReluVal, and the verification results show that the robustness of the NN is maintained. Experimental results indicate that MBRL4SRG achieves lower computational overhead than the conventional proportional–integral–derivative algorithm and has higher trustworthiness and better computational efficiency during training than the model-free reinforcement learning algorithms.

中文翻译：

基于模型的强化学习和基于神经网络的策略压缩，用于资源受限嵌入式系统上的航天器交会

在日益复杂的太空任务中，自主航天器会合非常具有挑战性。在本文中，我们介绍了我们的基于模型的航天器交会制导强化学习方法 (MBRL4SRG)。我们基于航天器动力学的 Clohessy-Wiltshire 方程建立了马尔可夫决策过程模型，并使用动态规划对其进行求解，并生成决策表作为最优代理策略。由于航天器的机载计算系统在内存大小和处理速度方面都受到资源限制，因此我们将神经网络 (NN) 训练为决策表表格表示的紧凑且高效的函数逼近。使用验证工具ReluVal对NN输出进行形式化验证，验证结果表明NN的鲁棒性得以保持。

更新日期：2022-07-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>