当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-Driven Dynamic Multiobjective Optimal Control: An Aspiration-Satisfying Reinforcement Learning Approach.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2021-04-22 , DOI: 10.1109/tnnls.2021.3072571
Majid Mazouchi , Yongliang Yang , Hamidreza Modares

This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. Relaxed Hamilton-Jacobi-Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm.

中文翻译:

数据驱动的动态多目标最优控制:一种满足愿望的强化学习方法。

本文提出了一种迭代数据驱动算法,用于解决非线性连续时间系统的控制中出现的动态多目标(MO)最优控制问题。首先表明,可以利用与每个目标相对应的汉密尔顿函数来比较允许策略的性能。然后使用哈密顿不等式,对于这些不等式,它们的满意程度可以保证满足目标的愿望。然后,在动态约束的MO框架中求解基于HJB不等式的松弛Hamilton-Jacobi-Bellman(HJB)方程,以找到帕累托最优解。显示了与令人满意(足够好)的决策框架的关系。开发了一种基于平方和(SOS)的迭代算法,以解决公式化的满足需求的MO优化。为了消除对系统动力学的全面了解的需求,提出了一种数据驱动的满足性强化学习方法,以仅使用在一定时间间隔内测得的系统轨迹信息来实时解决SOS优化问题。系统动力学。最后,利用两个仿真实例验证了所提算法的解析结果。
更新日期:2021-04-22
down
wechat
bug