当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-04-21 , DOI: 10.1109/tnnls.2021.3071960
Marta Zaniolo , Matteo Giuliani , Andrea Castelletti

Direct policy search (DPS) is emerging as one of the most effective and widely applied reinforcement learning (RL) methods to design optimal control policies for multiobjective Markov decision processes (MOMDPs). Traditionally, DPS defines the control policy within a preselected functional class and searches its optimal parameterization with respect to a given set of objectives. The functional class should be tailored to the problem at hand and its selection is crucial, as it determines the search space within which solutions can be found. In MOMDPs problems, a different objective tradeoff determines a different fitness landscape, requiring a tradeoff-dynamic functional class selection. Yet, in state-of-the-art applications, the policy class is generally selected a priori and kept constant across the multidimensional objective space. In this work, we present a novel policy search routine called neuro-evolutionary multiobjective DPS (NEMODPS), which extends the DPS problem formulation to conjunctively search the policy functional class and its parameterization in a hyperspace containing policy architectures and coefficients. NEMODPS begins with a population of minimally structured approximating networks and progressively builds more sophisticated architectures by topological and parametrical mutation and crossover, and selection of the fittest individuals concerning multiple objectives. We tested NEMODPS for the problem of designing the control policy of a multipurpose water system. Numerical results show that the tradeoff-dynamic structural and parametrical policy search of NEMODPS is consistent across multiple runs, and outperforms the solutions designed via traditional DPS with predefined policy topologies.

中文翻译:

多目标最优控制的神经进化直接策略搜索

直接策略搜索 (DPS) 正在成为为多目标马尔可夫决策过程 (MOMDP) 设计最优控制策略的最有效和广泛应用的强化学习 (RL) 方法之一。传统上,DPS 在预先选择的功能类中定义控制策略,并针对给定的一组目标搜索其最佳参数化。功能类应该针对手头的问题进行定制,它的选择至关重要,因为它决定了可以在其中找到解决方案的搜索空间。在 MOMDPs 问题中,不同的客观权衡决定了不同的适应度环境,需要权衡动态功能类选择。然而,在最先进的应用程序中,策略类通常是先验选择的,并在多维目标空间中保持不变。在这项工作中,我们提出了一种新的策略搜索例程,称为神经进化多目标 DPS (NEMODPS),它扩展了 DPS 问题公式,以在包含策略架构和系数的超空间中联合搜索策略功能类及其参数化。NEMODPS 从一组最小结构的近似网络开始,并通过拓扑和参数突变和交叉以及选择涉及多个目标的最合适的个体逐步构建更复杂的架构。我们针对设计多用途水系统控制策略的问题测试了 NEMODPS。数值结果表明,NEMODPS 的权衡动态结构和参数策略搜索在多次运行中是一致的,
更新日期:2021-04-21
down
wechat
bug