当前位置: X-MOL 学术arXiv.cs.HC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Utility of Learning about Humans for Human-AI Coordination
arXiv - CS - Human-Computer Interaction Pub Date : 2019-10-13 , DOI: arxiv-1910.05789
Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.

中文翻译:

关于学习人类对人-人工智能协调的效用

虽然我们希望智能体可以与人类协调,但当前的算法(例如自我对弈和基于人群的训练)创建的智能体可以与自身协调。假设他们的合作伙伴是最佳的或与他们相似的代理可以收敛到无法理解和被人类理解的协调协议。为了证明这一点,我们基于流行的游戏 Overcooked 引入了一个需要具有挑战性的协调的简单环境,并学习了一个模仿人类游戏的简单模型。我们评估通过自我对弈和基于人群的训练训练的代理的表现。这些代理在与自己配对时表现非常好,但是当与我们的人类模型配对时,它们比设计用于与人类模型一起玩的代理要差得多。一个规划算法的实验得出了同样的结论,尽管只有当具有人类意识的规划器获得它正在使用的确切人体模型时。对真人进行的用户研究也显示了这种模式,但不那么强烈。定性地,我们发现收益来自让代理适应人类的游戏玩法。鉴于这一结果,我们提出了几种设计能够了解人类的代理的方法,以便更好地与他们协调。代码可在 https://github.com/HumanCompatibleAI/overcooked_ai 获得。我们提出了几种设计学习人类的代理的方法,以便更好地与他们协调。代码可在 https://github.com/HumanCompatibleAI/overcooked_ai 获得。我们提出了几种设计学习人类的代理的方法,以便更好地与他们协调。代码可在 https://github.com/HumanCompatibleAI/overcooked_ai 获得。
更新日期:2020-01-10
down
wechat
bug