当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-Driven Dynamic Multiobjective Optimal Control: An Aspiration-Satisfying Reinforcement Learning Approach
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-04-22 , DOI: 10.1109/tnnls.2021.3072571
Majid Mazouchi 1 , Yongliang Yang 2 , Hamidreza Modares 1
Affiliation  

This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives’ aspirations. Relaxed Hamilton–Jacobi–Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm.

中文翻译:

数据驱动的动态多目标最优控制:一种满足愿望的强化学习方法

本文提出了一种迭代数据驱动算法,用于解决非线性连续时间系统控制中出现的动态多目标 (MO) 最优控制问题。首先表明,可以利用与每个目标对应的哈密顿函数来比较可接受策略的性能。然后使用哈密顿不等式,它们的满意度保证满足目标的愿望。然后在动态约束 MO 框架中求解根据 HJB 不等式的松弛 Hamilton-Jacobi-Bellman (HJB) 方程,以找到 Pareto 最优解。显示了与令人满意(足够好)的决策框架的关系。开发了一种基于平方和 (SOS) 的迭代算法来解决公式化的愿望满足 MO 优化。为了避免对系统动力学的完整知识的要求,提出了一种数据驱动的满足强化学习方法,仅使用在一个时间间隔内测量的系统轨迹信息,而不需要完全了解系统动态,来实时解决 SOS 优化问题。系统动力学。最后,利用两个仿真实例验证了所提算法的分析结果。
更新日期:2021-04-22
down
wechat
bug