当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interactive task learning via embodied corrective feedback
Autonomous Agents and Multi-Agent Systems ( IF 2.0 ) Pub Date : 2020-09-27 , DOI: 10.1007/s10458-020-09481-8
Mattias Appelgren , Alex Lascarides

This paper addresses a task in Interactive Task Learning (Laird et al. IEEE Intell Syst 32:6–21, 2017). The agent must learn to build towers which are constrained by rules, and whenever the agent performs an action which violates a rule the teacher provides verbal corrective feedback: e.g. “No, red blocks should be on blue blocks”. The agent must learn to build rule compliant towers from these corrections and the context in which they were given. The agent is not only ignorant of the rules at the start of the learning process, but it also has a deficient domain model, which lacks the concepts in which the rules are expressed. Therefore an agent that takes advantage of the linguistic evidence must learn the denotations of neologisms and adapt its conceptualisation of the planning domain to incorporate those denotations. We show that by incorporating constraints on interpretation that are imposed by discourse coherence into the models for learning (Hobbs in On the coherence and structure of discourse, Stanford University, Stanford, 1985; Asher et al. in Logics of conversation, Cambridge University Press, Cambridge, 2003), an agent which utilizes linguistic evidence outperforms a strong baseline which does not.



中文翻译:

通过具体的纠正反馈进行交互式任务学习

本文介绍了交互式任务学习中的一项任务(Laird等人,IEEE Intell Syst 32:6–21,2017)。代理必须学会建造受规则约束的塔,并且每当代理执行违反规则的动作时,老师都会提供口头纠正反馈:例如,“不,红色方块应该在蓝色方块上”。代理必须从这些更正和给出的上下文中学习构建符合规则的塔。代理不仅在学习过程开始时就不了解规则,而且还具有不足的领域模型,该模型缺少表达规则的概念。因此,利用语言证据的代理人必须学习新词的表述,并使其对计划领域的概念化以纳入这些表述。话语连贯性进入学习模型(霍布斯在《论语篇的连贯性和结构》,斯坦福大学,斯坦福,1985年;阿舍尔等人在《对话逻辑》中,剑桥大学出版社,剑桥,2003年),一个利用语言证据胜过代理人一个没有的强基准。

更新日期:2020-09-28
down
wechat
bug