当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning
arXiv - CS - Multiagent Systems Pub Date : 2020-09-21 , DOI: arxiv-2009.09575
Adam Bignold, Francisco Cruz, Richard Dazeley, Peter Vamplew, Cameron Foale

Reinforcement learning is an approach used by intelligent agents to autonomously learn new skills. Although reinforcement learning has been demonstrated to be an effective learning approach in several different contexts, a common drawback exhibited is the time needed in order to satisfactorily learn a task, especially in large state-action spaces. To address this issue, interactive reinforcement learning proposes the use of externally-sourced information in order to speed up the learning process. Up to now, different information sources have been used to give advice to the learner agent, among them human-sourced advice. When interacting with a learner agent, humans may provide either evaluative or informative advice. From the agent's perspective these styles of interaction are commonly referred to as reward-shaping and policy-shaping respectively. Evaluation requires the human to provide feedback on the prior action performed, while informative advice they provide advice on the best action to select for a given situation. Prior research has focused on the effect of human-sourced advice on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while reducing the engagement with the human. This work presents an experimental setup for a human-trial designed to compare the methods people use to deliver advice in term of human engagement. Obtained results show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent's ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice.

中文翻译:

人类参与为交互式强化学习提供评估性和信息性建议

强化学习是智能代理用来自主学习新技能的一种方法。尽管强化学习已被证明在几种不同的环境中是一种有效的学习方法,但一个共同的缺点是需要时间才能令人满意地学习任务,尤其是在大型状态-动作空间中。为了解决这个问题,交互式强化学习建议使用外部来源的信息来加速学习过程。到目前为止,已经使用了不同的信息源来向学习代理提供建议,其中包括人工建议。在与学习者代理交互时,人类可能会提供评估性或信息性建议。从代理' 从角度来看,这些交互方式通常分别称为奖励塑造和政策塑造。评估要求人类提供有关先前执行的操作的反馈,而信息性建议则提供有关为特定情况选择的最佳操作的建议。先前的研究集中在人工建议对交互式强化学习过程的影响,特别是旨在提高代理的学习速度,同时减少与人类的接触。这项工作提出了一个人体试验的实验设置,旨在比较人们用来在人类参与方面提供建议的方法。获得的结果表明,向学习者代理提供信息性建议的用户提供了更准确的建议,愿意帮助学习者代理更长时间,并为每集提供更多建议。此外,使用信息方法的参与者的自我评估表明,代理遵循建议的能力更高,因此,与提供评估建议的人相比,他们认为自己的建议具有更高的准确性。
更新日期:2020-09-22
down
wechat
bug