当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Facial feedback for reinforcement learning: a case study and offline analysis using the TAMER framework
Autonomous Agents and Multi-Agent Systems ( IF 2.0 ) Pub Date : 2020-02-12 , DOI: 10.1007/s10458-020-09447-w
Guangliang Li , Hamdi Dibeklioğlu , Shimon Whiteson , Hayley Hung

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this article, we investigate the potential of agent learning from trainers’ facial expressions via interpreting them as evaluative feedback. To do so, we implemented TAMER which is a popular interactive reinforcement learning method in a reinforcement-learning benchmark problem—Infinite Mario, and conducted the first large-scale study of TAMER involving 561 participants. With designed CNN–RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback using facial expressions. In addition, our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible and using strong/effective prediction models or a regression method, facial responses would significantly improve the performance of agents. Furthermore, our experiment supports previous studies demonstrating the importance of bi-directional feedback and competitive elements in the training interface.

中文翻译:

用于强化学习的面部反馈:使用TAMER框架的案例研究和离线分析

交互式强化学习为代理商提供了一种从人类用户提供的评估反馈中学习解决任务的方式。先前的研究表明,人类在训练初期会提供大量反馈,但此后很少。在本文中,我们通过将培训者的面部表情解释为评估反馈来研究其学习代理的潜力。为此,我们在强化学习基准问题Infinite Mario中实施了TAMER,这是一种流行的交互式强化学习方法,并进行了561名参与者的TAMER首次大规模研究。通过设计的CNN–RNN模型,我们的分析表明,告诉培训人员使用面部表情和比赛可以提高使用面部表情估算正面和负面反馈的准确性。此外,我们的模拟实验结果表明,仅从基于面部表情的预测反馈中学习是可能的,并且使用强大/有效的预测模型或回归方法,面部反应将显着提高代理商的绩效。此外,我们的实验支持先前的研究,这些研究表明了双向反馈和竞争性元素在训练界面中的重要性。
更新日期:2020-02-12
down
wechat
bug