当前位置: X-MOL 学术Topics in Cognitive Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Critical Period for Robust Curriculum-Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
Topics in Cognitive Science ( IF 2.9 ) Pub Date : 2022-01-10 , DOI: 10.1111/tops.12595
Roy de Kleijn 1 , Deniz Sen 2 , George Kachergis 3
Affiliation  

Many everyday activities are sequential in nature. That is, they can be seen as a sequence of subactions and sometimes subgoals. In the motor execution of sequential action, context effects are observed in which later subactions modulate the execution of earlier subactions (e.g., reaching for an overturned mug, people will optimize their grasp to achieve a comfortable end state). A trajectory (movement) adaptation of an often-used paradigm in the study of sequential action, the serial response time task, showed several context effects of which centering behavior is of special interest. Centering behavior refers to the tendency (or strategy) of subjects to move their arm or mouse cursor to a position equidistant to all stimuli in the absence of predictive information, thereby reducing movement time to all possible targets. In the current study, we investigated sequential action in a virtual robotic agent trained using proximal policy optimization, a state-of-the-art deep reinforcement learning algorithm. The agent was trained to reach for appearing targets, similar to a serial response time task given to humans. We found that agents were more likely to develop centering behavior similar to human subjects after curricularized learning. In our curriculum, we first rewarded agents for reaching targets before introducing a penalty for energy expenditure. When the penalty was applied with no curriculum, many agents failed to learn the task due to a lack of action space exploration, resulting in high variability of agents' performance. Our findings suggest that in virtual agents, similar to infants, early energetic exploration can promote robust later learning. This may have the same effect as infants' curiosity-based learning by which they shape their own curriculum. However, introducing new goals cannot wait too long, as there may be critical periods in development after which agents (as humans) cannot flexibly learn to incorporate new objectives. These lessons are making their way into machine learning and offer exciting new avenues for studying both human and machine learning of sequential action.

中文翻译:

机器人手臂顺序动作的稳健的基于课程的深度强化学习的关键时期

许多日常活动本质上是连续的。也就是说,它们可以被视为一系列子动作,有时甚至是子目标。在顺序动作的运动执行中,观察到上下文效应,其中后面的子动作调节早期子动作的执行(例如,伸手去拿一个翻倒的杯子,人们会优化他们的抓握以达到舒适的结束状态)。对序列动作研究中经常使用的范式的轨迹(运动)适应,即串行响应时间任务,显示了一些特别感兴趣的对中行为的上下文影响。居中行为是指在没有预测信息的情况下,受试者倾向于(或策略)将他们的手臂或鼠标光标移动到与所有刺激等距的位置,从而减少对所有可能目标的移动时间。在目前的研究中,我们研究了使用最先进的深度强化学习算法近端策略优化训练的虚拟机器人代理中的顺序动作。该代理被训练以达到出现的目标,类似于给予人类的串行响应时间任务。我们发现,在课程化学习后,代理人更有可能发展出与人类受试者相似的居中行为。在我们的课程中,我们首先奖励达到目标的智能体,然后再对能量消耗进行惩罚。当在没有课程的情况下应用惩罚时,由于缺乏动作空间探索,许多智能体未能学习任务,导致智能体表现的高度可变性。我们的研究结果表明,在虚拟代理中,与婴儿类似,早期的能量探索可以促进后期的稳健学习。这可能与婴儿基于好奇心的学习具有相同的效果,他们通过这种学习来塑造自己的课程。然而,引入新目标不能等待太久,因为在发展过程中可能存在关键时期,之后代理(作为人类)无法灵活地学习纳入新目标。这些课程正在进入机器学习,并为研究人类和机器学习顺序动作提供了令人兴奋的新途径。
更新日期:2022-01-10
down
wechat
bug