Adaptable automation with modular deep reinforcement learning and policy transfer,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptable automation with modular deep reinforcement learning and policy transfer
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2021-05-21 , DOI: 10.1016/j.engappai.2021.104296
Zohreh Raziei , Mohsen Moghaddam

Future industrial automation systems are anticipated to be shaped by intelligent technologies that allow for the adaptability of machines to the variations and uncertainties in processes and work environments. This paper is motivated by the need for devising new intelligent methods that enable efficient and scalable training of collaborative robots on a variety of tasks that foster their adaptability to new tasks and environments. Recent advances in deep Reinforcement Learning (RL) provide new possibilities to realize this vision. The state-of-the-art in deep RL offers proven algorithms that enable autonomous learning and mastery of a variety of robotic manipulation tasks with minimal human intervention. However, current deep RL algorithms predominantly specialize in a narrow range of tasks, are sample inefficient, and lack sufficient stability, which hinders their adoption in real-life, industrial settings. This paper develops and tests a Hyper-Actor Soft Actor–Critic (HASAC) deep RL framework based on the notions of task modularization and transfer learning to tackle this limitation. The goal of the proposed HASAC is to enhance an agent’s adaptability to new tasks by transferring the learned policies of former tasks to the new task through a ”hyper-actor”. The HASAC framework is tested on the virtual robotic manipulation benchmark, Meta-World. Numerical experiments indicate superior performance by HASAC over state-of-the-art deep RL algorithms in terms of reward value, success rate, and task completion time.

中文翻译：

具有模块化深度强化学习和策略传递的适应性自动化

预计未来的工业自动化系统将由智能技术塑造，这些技术将使机器适应过程和工作环境的变化和不确定性。本文的动机是需要设计新的智能方法，以使协作机器人能够在各种任务上进行有效且可扩展的培训，以增强其对新任务和环境的适应性。深度强化学习（RL）的最新进展为实现这一愿景提供了新的可能性。深度RL中的最新技术提供了经过验证的算法，可在最少的人工干预下实现自主学习和掌握各种机器人操纵任务。但是，当前的深层RL算法主要专注于窄范围的任务，样本效率低，并且缺乏足够的稳定性，这阻碍了它们在现实生活中的工业环境中的采用。本文基于任务模块化和转移学习的概念来开发和测试超演员软演员关键（HASAC）深度RL框架，以解决这一局限性。提议的HASAC的目标是通过“超级参与者”将先前任务的学习策略转移到新任务，从而增强代理对新任务的适应性。HASAC框架已在虚拟机器人操作基准Meta-World上进行了测试。数值实验表明，在奖励价值，成功率和任务完成时间方面，HASAC优于最新的深度RL算法。本文基于任务模块化和转移学习的概念来开发和测试超演员软演员关键（HASAC）深度RL框架，以解决这一局限性。提议的HASAC的目标是通过“超级参与者”将先前任务的学习策略转移到新任务，从而增强代理对新任务的适应性。HASAC框架已在虚拟机器人操作基准Meta-World上进行了测试。数值实验表明，在奖励价值，成功率和任务完成时间方面，HASAC优于最新的深度RL算法。本文基于任务模块化和转移学习的概念来开发和测试超演员软演员关键（HASAC）深度RL框架，以解决这一局限性。提议的HASAC的目标是通过“超级参与者”将先前任务的学习策略转移到新任务，从而增强代理对新任务的适应性。HASAC框架已在虚拟机器人操作基准Meta-World上进行了测试。数值实验表明，在奖励价值，成功率和任务完成时间方面，HASAC优于最新的深度RL算法。HASAC框架已在虚拟机器人操作基准Meta-World上进行了测试。数值实验表明，在奖励价值，成功率和任务完成时间方面，HASAC优于最新的深度RL算法。HASAC框架已在虚拟机器人操作基准Meta-World上进行了测试。数值实验表明，在奖励价值，成功率和任务完成时间方面，HASAC优于最新的深度RL算法。

更新日期：2021-05-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11