Multi-Task Policy Adversarial Learning for Human-Level Control with Large State Spaces,IEEE Transactions on Industrial Informatics

当前位置： X-MOL 学术 › IEEE Trans. Ind. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Task Policy Adversarial Learning for Human-Level Control with Large State Spaces
IEEE Transactions on Industrial Informatics ( IF 12.3 ) Pub Date : 2019-04-01 , DOI: 10.1109/tii.2018.2881266
Jun Ping Wang , You Kang Shi , Wen Sheng Zhang , Ian Thomas , Shi Hui Duan

The sequential decision-making problem with large-scale state spaces is an important and challenging topic for multitask reinforcement learning (MTRL). Training near-optimality policies across tasks suffers from prior knowledge deficiency in discrete-time nonlinear environment, especially for continuous task variations, requiring scalability approaches to transfer prior knowledge among new tasks when considering large number of tasks. This paper proposes a multitask policy adversarial learning (MTPAL) method for learning a nonlinear feedback policy that generalizes across multiple tasks, making cognizance ability of robot much closer to human-level decision making. The key idea is to construct a parametrized policy model directly from large high-dimensional observations by deep function approximators, and then train optimal of sequential decision policy for each new task by an adversarial process, in which simultaneously two models are trained: a multitask policy generator transforms samples drawn from a prior distribution into samples from a complex data distribution with higher dimensionality, and a multitask policy discriminator decides whether the given sample is prior distribution from human-level empirically derived or from the generator. All the related human-level empirically derived are integrated into the sequential decision policy, transferring human-level policy at every layer in a deep policy network. Extensive experimental testing result of four different WeiChai Power manufacturing data sets shows that our approach can surpass human performance simultaneously from cart-pole to production assembly control.

中文翻译：

具有大状态空间的人级控制的多任务策略对抗学习

具有大型状态空间的顺序决策问题是多任务强化学习（MTRL）的重要且具有挑战性的主题。在离散时间非线性环境中，尤其是对于连续的任务变化，跨任务训练接近最优的策略存在先验知识不足的问题，当考虑大量任务时，需要可伸缩性方法在新任务之间转移先验知识。本文提出了一种多任务策略对抗学习（MTPAL）方法，用于学习可跨多个任务概括的非线性反馈策略，从而使机器人的认知能力更接近于人类的水平决策。关键思想是通过深度函数逼近器直接从大型高维观测值构造参数化的策略模型，然后通过对抗过程为每个新任务训练最优的顺序决策策略，同时训练两个模型：多任务策略生成器将从先前分布中抽取的样本转换为具有更高维度的复杂数据分布中的样本，以及多任务政策鉴别者决定给定样本是来自经验水平的人类水平还是产生者的先验分布。从经验上得出的所有相关人员级别都集成到顺序决策策略中，从而在深层策略网络的每一层上传递人员级别策略。四个不同的潍柴动力制造数据集的广泛实验测试结果表明，从控制杆到生产装配控制，我们的方法可以同时超越人工性能。

更新日期：2019-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>