当前位置: X-MOL 学术IISE Trans. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Policy manifold generation for multi-task multi-objective optimization of energy flexible machining systems
IISE Transactions ( IF 2.6 ) Pub Date : 2021-07-01 , DOI: 10.1080/24725854.2021.1934756
Qinge Xiao 1 , Ben Niu 1, 2 , Ying Chen 3
Affiliation  

Abstract

Contemporary organizations recognize the importance of lean and green production to realize ecological and economic benefits. Compared with the existing optimization methods, the multi-task multi-objective reinforcement learning (MT-MORL) offers an attractive means to address the dynamic, multi-target process-optimization problems associated with Energy-Flexible Machining (EFM). Despite the recent advances in reinforcement learning, the realization of an accurate Pareto frontier representation remains a major challenge. This article presents a generative manifold-based policy-search method to approximate the continuously distributed Pareto frontier for EFM optimization. To this end, multi-pass operations are formulated as part of a multi-policy Markov decision process, wherein the machining configurations witness dynamic changes. However, the traditional Gaussian distribution cannot accurately fit complex upper-level policies. Thus, a multi-layered generator was designed to map the high-dimensional policy manifold from a simple Gaussian distribution without performing complex calculations. Additionally, a hybrid multi-task training approach is proposed to handle the mode collapse and large task difference observed during the improvement of the generalization performance. Extensive computational testing and comparisons against existing baseline methods have been performed to demonstrate the improved Pareto frontier quality and computational efficiency of the proposed algorithm.



中文翻译:

能量柔性加工系统多任务多目标优化的策略流形生成

摘要

当代组织认识到精益和绿色生产对于实现生态和经济效益的重要性。与现有的优化方法相比,多任务多目标强化学习 (MT-MORL) 提供了一种有吸引力的方法来解决与能量灵活加工 (EFM) 相关的动态、多目标过程优化问题。尽管最近在强化学习方面取得了进展,但实现准确的帕累托前沿表示仍然是一项重大挑战。本文提出了一种基于生成流形的策略搜索方法来逼近 EFM 优化的连续分布的帕累托前沿。为此,多遍操作被制定为多策略马尔可夫决策过程的一部分,其中加工配置见证了动态变化。然而,传统的高斯分布无法准确拟合复杂的上层策略。因此,设计了一个多层生成器,用于从简单的高斯分布映射高维策略流形,而无需执行复杂的计算。此外,提出了一种混合多任务训练方法来处理在提高泛化性能期间观察到的模式崩溃和大任务差异。已经对现有基线方法进行了广泛的计算测试和比较,以证明所提出算法的改进的帕累托前沿质量和计算效率。多层生成器旨在从简单的高斯分布映射高维策略流形,而无需执行复杂的计算。此外,提出了一种混合多任务训练方法来处理在提高泛化性能期间观察到的模式崩溃和大任务差异。已经对现有基线方法进行了广泛的计算测试和比较,以证明所提出算法的改进的帕累托前沿质量和计算效率。多层生成器旨在从简单的高斯分布映射高维策略流形,而无需执行复杂的计算。此外,提出了一种混合多任务训练方法来处理在提高泛化性能期间观察到的模式崩溃和大任务差异。已经对现有基线方法进行了广泛的计算测试和比较,以证明所提出算法的改进的帕累托前沿质量和计算效率。

更新日期:2021-07-01
down
wechat
bug