当前位置: X-MOL 学术Front. Inform. Technol. Electron. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proximal policy optimization with an integral compensator for quadrotor control
Frontiers of Information Technology & Electronic Engineering ( IF 2.7 ) Pub Date : 2020-05-21 , DOI: 10.1631/fitee.1900641
Huan Hu , Qing-ling Wang

We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.



中文翻译:

具有用于四旋翼控制的积分补偿器的近端策略优化

我们使用先进的近端策略优化(PPO)强化学习算法来优化随机控制策略,以实现“无模型”四旋翼的速度控制。该模型由四个学习的神经网络控制,它们直接以端对端的方式映射系统状态以控制命令。通过将积分补偿器引入行为者评论框架,可以大大提高速度跟踪的准确性和鲁棒性。此外,还开发了包括离线学习和在线学习的两阶段学习方案,以供实际使用。在离线阶段学习具有强大泛化能力的模型。然后,在在线学习阶段不断优化模型的飞行策略。最后,

更新日期:2020-05-21
down
wechat
bug