当前位置: X-MOL 学术Int. J. Aerosp. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-Free Attitude Control of Spacecraft Based on PID-Guide TD3 Algorithm
International Journal of Aerospace Engineering ( IF 1.1 ) Pub Date : 2020-12-30 , DOI: 10.1155/2020/8874619
ZhiBin Zhang 1 , XinHong Li 1 , JiPing An 1 , WanXin Man 1 , GuoHui Zhang 1
Affiliation  

This paper is devoted to model-free attitude control of rigid spacecraft in the presence of control torque saturation and external disturbances. Specifically, a model-free deep reinforcement learning (DRL) controller is proposed, which can learn continuously according to the feedback of the environment and realize the high-precision attitude control of spacecraft without repeatedly adjusting the controller parameters. Considering the continuity of state space and action space, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm based on actor-critic architecture is adopted. Compared with the Deep Deterministic Policy Gradient (DDPG) algorithm, TD3 has better performance. TD3 obtains the optimal policy by interacting with the environment without using any prior knowledge, so the learning process is time-consuming. Aiming at this problem, the PID-Guide TD3 algorithm is proposed, which can speed up the training speed and improve the convergence precision of the TD3 algorithm. Aiming at the problem that reinforcement learning (RL) is difficult to deploy in the actual environment, the pretraining/fine-tuning method is proposed for deployment, which can not only save training time and computing resources but also achieve good results quickly. The experimental results show that DRL controller can realize high-precision attitude stabilization and attitude tracking control, with fast response speed and small overshoot. The proposed PID-Guide TD3 algorithm has faster training speed and higher stability than the TD3 algorithm.

中文翻译:

基于PID-指南TD3算法的航天器无模型姿态控制

本文致力于在存在控制转矩饱和和外部干扰的情况下,对刚性航天器进行无模型姿态控制。具体而言,提出了一种无模型的深度强化学习(DRL)控制器,该控制器可以根据环境的反馈不断学习,无需重复调整控制器参数即可实现航天器的高精度姿态控制。考虑到状态空间和动作空间的连续性,采用基于行为者批判体系结构的双延迟深度确定性策略梯度(TD3)算法。与深度确定性策略梯度(DDPG)算法相比,TD3具有更好的性能。TD3通过与环境交互来获得最佳策略,而无需使用任何先验知识,因此学习过程非常耗时。针对该问题,提出了PID-Guide TD3算法,可以加快训练速度,提高TD3算法的收敛精度。针对实际环境中难以部署强化学习的问题,提出了一种预训练/微调的部署方法,不仅可以节省训练时间和计算资源,而且可以快速取得良好的效果。实验结果表明,DRL控制器可以实现高精度的姿态稳定和姿态跟踪控制,响应速度快,过冲小。与TD3算法相比,所提出的PID-Guide TD3算法具有更快的训练速度和更高的稳定性。针对实际环境中难以部署强化学习的问题,提出了一种预训练/微调的部署方法,不仅可以节省训练时间和计算资源,而且可以快速取得良好的效果。实验结果表明,DRL控制器可以实现高精度的姿态稳定和姿态跟踪控制,响应速度快,过冲小。与TD3算法相比,所提出的PID-Guide TD3算法具有更快的训练速度和更高的稳定性。针对实际环境中难以部署强化学习的问题,提出了一种预训练/微调的部署方法,不仅可以节省训练时间和计算资源,而且可以快速取得良好的效果。实验结果表明,DRL控制器可以实现高精度的姿态稳定和姿态跟踪控制,响应速度快,过冲小。与TD3算法相比,所提出的PID-Guide TD3算法具有更快的训练速度和更高的稳定性。实验结果表明,DRL控制器可以实现高精度的姿态稳定和姿态跟踪控制,响应速度快,过冲小。与TD3算法相比,所提出的PID-Guide TD3算法具有更快的训练速度和更高的稳定性。实验结果表明,DRL控制器可以实现高精度的姿态稳定和姿态跟踪控制,响应速度快,过冲小。与TD3算法相比,所提出的PID-Guide TD3算法具有更快的训练速度和更高的稳定性。
更新日期:2020-12-30
down
wechat
bug