当前位置: X-MOL 学术J. Intell. Manuf. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Acquiring reusable skills in intrinsically motivated reinforcement learning
Journal of Intelligent Manufacturing ( IF 8.3 ) Pub Date : 2020-07-22 , DOI: 10.1007/s10845-020-01629-3
Marzieh Davoodabadi Farahani , Nasser Mozayani

This paper proposes a novel incremental model for acquiring skills and using them in Intrinsically Motivated Reinforcement Learning (IMRL). In this model, the learning process is divided into two phases. In the first phase, the agent explores the environment and acquires task-independent skills by using different intrinsic motivation mechanisms. We present two intrinsic motivation factors for acquiring skills by detecting states that can lead to other states (being a cause) and by detecting states that help the agent to transition to a different region (discounted relative novelty). In the second phase, the agent evaluates the acquired skills to find suitable ones for accomplishing a specific task. Despite the importance of assessing task-independent skills to perform a task, the idea of evaluating skills and pruning them has not been considered in IMRL literature. In this article, two methods are presented for evaluating previously learned skills based on the value function of the assigned task. Using such a two-phase learning model and the skill evaluation capability helps the agent to acquire task-independent skills that can be transferred to other similar tasks. Experimental results in four domains show that the proposed method significantly increases learning speed.



中文翻译:

在内在动机的强化学习中获得可重用的技能

本文提出了一种新颖的增量模型,用于获取技能并将其用于内在动机强化学习(IMRL)。在此模型中,学习过程分为两个阶段。在第一阶段,代理通过使用不同的内在动力机制来探索环境并获得与任务无关的技能。我们通过检测可能导致其他状态(成为原因)的状态以及通过检测有助于代理过渡到不同区域的状态(折扣的相对新颖性),提出了获取技能的两个内在动机因素。在第二阶段,代理评估获得的技能,以找到适合完成特定任务的技能。尽管评估执行任务的独立于任务的技能的重要性,IMRL文献中尚未考虑评估技能和修剪技能的想法。在本文中,提出了两种方法来基于分配的任务的价值函数评估先前学习的技能。使用这样的两阶段学习模型和技能评估功能可帮助代理获取独立于任务的技能,这些技能可以转移到其他类似任务。在四个领域的实验结果表明,该方法大大提高了学习速度。

更新日期:2020-07-23
down
wechat
bug