Human-level play in the game of Diplomacy by combining language models with strategic reasoning.,Science

当前位置： X-MOL 学术 › Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Human-level play in the game of Diplomacy by combining language models with strategic reasoning.
Science ( IF 44.7 ) Pub Date : 2022-11-22 , DOI: 10.1126/science.ade9097
, Anton Bakhtin ₁ , Noam Brown ₁ , Emily Dinan ₁ , Gabriele Farina ₁ , Colin Flaherty ₁ , Daniel Fried _{1,

2} , Andrew Goff ₁ , Jonathan Gray ₁ , Hengyuan Hu _{1,

3} , Athul Paul Jacob _{1,

4} , Mojtaba Komeili ₁ , Karthik Konath ₁ , Minae Kwon _{1,

3} , Adam Lerer ₁ , Mike Lewis ₁ , Alexander H Miller ₁ , Sasha Mitts ₁ , Adithya Renduchintala ₁ , Stephen Roller ₁ , Dirk Rowe ₁ , Weiyan Shi _{1,

5} , Joe Spisak ₁ , Alexander Wei _{1,

6} , David Wu ₁ , Hugh Zhang _{1,

7} , Markus Zijlstra ₁

Affiliation

Despite much progress in training artificial intelligence (AI) systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

中文翻译：

通过将语言模型与战略推理相结合，在外交游戏中进行人类水平的游戏。

尽管在训练人工智能 (AI) 系统模仿人类语言方面取得了很大进展，但构建使用语言在交互式环境中有意与人类交流的代理仍然是一个重大挑战。我们介绍西塞罗，它是第一个在外交中实现人类水平表现的人工智能代理，外交是一种涉及合作和竞争的战略游戏，强调七名玩家之间的自然语言谈判和战术协调。Cicero 通过从对话中推断玩家的信念和意图并生成对话以实现其计划，将语言模型与规划和强化学习算法相结合。在匿名在线外交联赛的 40 场比赛中，

更新日期：2022-11-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11