Human Feedback as Action Assignment in Interactive Reinforcement Learning,ACM Transactions on Autonomous and Adaptive Systems

当前位置： X-MOL 学术 › ACM Trans. Auton. Adapt. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Human Feedback as Action Assignment in Interactive Reinforcement Learning
ACM Transactions on Autonomous and Adaptive Systems ( IF 2.2 ) Pub Date : 2020-08-04 , DOI: 10.1145/3404197
Syed Ali Raza ₁ , Mary-Anne Williams ₁

Affiliation

Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner’s performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users’ ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent’s behaviour directly.

中文翻译：

交互式强化学习中作为行动分配的人类反馈

示范教学和分配奖励教学是人类知识转移的两种流行方法。然而，表现出正确的行为（通过示范）对人类教师来说似乎比评估学习者的表现并给予奖励或惩罚更自然。在机器人学习的背景下，这两种方法之间的偏好尚未得到广泛研究。在本文中，我们提出了一种在交互式强化学习中用动作分配（类似于提供演示）代替传统奖励分配方法的方法。建议动作的主要目的是通过查看建议动作是否被自动代理遵循来计算奖励。我们通过使用二维迷宫游戏在网络上进行的用户研究比较了动作分配和奖励分配。交互日志显示，动作分配显着提高了用户教授正确行为的能力。调查结果表明，动作和奖励分配似乎都非常自然和可用，奖励分配需要更多的脑力劳动，反复分配奖励和看到代理不服从命令导致用户沮丧，许多用户希望直接控制代理的行为。

更新日期：2020-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11