当前位置: X-MOL 学术Sci. China Phys. Mech. Astronomy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An identifier-actor-optimizer policy learning architecture for optimal control of continuous-time nonlinear systems
Science China Physics, Mechanics & Astronomy ( IF 6.4 ) Pub Date : 2020-03-19 , DOI: 10.1007/s11433-019-1481-2
Lin Cheng , ZhenBo Wang , FangHua Jiang , JunFeng Li

An intelligent solution method is proposed to achieve real-time optimal control for continuous-time nonlinear systems using a novel identifier-actor-optimizer (IAO) policy learning architecture. In this IAO-based policy learning approach, a dynamical identifier is developed to approximate the unknown part of system dynamics using deep neural networks (DNNs). Then, an indirect-method-based optimizer is proposed to generate high-quality optimal actions for system control considering both the constraints and performance index. Furthermore, a DNN-based actor is developed to approximate the obtained optimal actions and return good initial guesses to the optimizer. In this way, the traditional optimal control methods and state-of-the-art DNN techniques are combined in the IAO-based optimal policy learning method. Compared to the reinforcement learning algorithms with actor-critic architectures that suffer hard reward design and low computational efficiency, the IAO-based optimal policy learning algorithm enjoys fewer user-defined parameters, higher learning speeds, and steadier convergence properties in solving complex continuous-time optimal control problems (OCPs). Simulation results of three space flight control missions are given to substantiate the effectiveness of this IAO-based policy learning strategy and to illustrate the performance of the developed DNN-based optimal control method for continuous-time OCPs.

中文翻译:

用于连续时间非线性系统最优控制的标识符-角色-优化器策略学习架构

提出了一种新颖的标识符-角色-优化器(IAO)策略学习架构,为连续非线性系统实现实时最优控制的智能解决方法。在这种基于IAO的策略学习方法中,使用深度神经网络(DNN)开发了动态标识符以逼近系统动力学的未知部分。然后,提出了一种基于间接方法的优化器,以考虑约束和性能指标来生成高质量的系统控制最佳动作。此外,开发了基于DNN的参与者,以近似获得的最佳动作,并将良好的初始猜测返回给优化器。这样,将传统的最优控制方法和最新的DNN技术结合到了基于IAO的最优策略学习方法中。与具有严格奖励设计且计算效率低的具有行为批评体系的强化学习算法相比,基于IAO的最优策略学习算法在解决复杂的连续时间问题上具有更少的用户定义参数,更高的学习速度和稳定的收敛性最佳控制问题(OCP)。给出了三个空间飞行控制任务的仿真结果,以证实这种基于IAO的策略学习策略的有效性,并说明了针对连续时间OCP的已开发的基于DNN的最优控制方法的性能。解决更复杂的连续时间最优控制问题(OCP)时具有更稳定的收敛特性。给出了三个空间飞行控制任务的仿真结果,以证实这种基于IAO的策略学习策略的有效性,并说明了针对连续时间OCP的已开发的基于DNN的最优控制方法的性能。解决更复杂的连续时间最优控制问题(OCP)时具有更稳定的收敛特性。给出了三个空间飞行控制任务的仿真结果,以证实这种基于IAO的策略学习策略的有效性,并说明了针对连续时间OCP的已开发的基于DNN的最优控制方法的性能。
更新日期:2020-03-19
down
wechat
bug