当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Continuous Coordination As a Realistic Scenario for Lifelong Learning
arXiv - CS - Multiagent Systems Pub Date : 2021-03-04 , DOI: arxiv-2103.03216
Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar

Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.

中文翻译:

持续协调是终身学习的现实方案

当前的深度强化学习(RL)算法仍是高度特定于任务的,并且缺乏将其推广到新环境的能力。但是,终身学习(LLL)旨在通过有效地在任务之间传递和使用知识来顺序解决多个任务。尽管近年来对终身学习RL的兴趣激增,但缺乏实际的测试平台使对LLL算法的稳健评估变得困难。另一方面,由于代理固有的不稳定性,因此多代理RL(MARL)可以视为终身RL的自然场景,因为代理的政策会随着时间而变化。在这项工作中,我们引入了一个多代理终身学习测试平台,该平台支持零镜头设置和少镜头设置。我们的设置基于Hanabi-部分可见,完全合作的多主体游戏,对于零镜头协调而言已显示出挑战性。其巨大的战略空间使其成为终身RL任务的理想环境。我们评估了几种最新的MARL方法,并在有限的内存和计算机制中对最先进的LLL算法进行了基准测试,以阐明它们的优缺点。这种持续学习的范式还为我们提供了一种超越集中式训练的实用方法,而集中式训练是MARL中最常用的训练协议。我们的经验表明,在我们的设置中受过训练的特工能够与看不见的特工很好地进行协调,而无需先前工作做出任何其他假设。和有限内存和计算机制中的基准最新LLL算法,以阐明其优势和劣势。这种持续学习的范式还为我们提供了一种超越集中式训练的实用方法,而集中式训练是MARL中最常用的训练协议。我们的经验表明,在我们的设置中受过训练的特工能够与看不见的特工很好地进行协调,而无需先前工作做出任何其他假设。和有限内存和计算机制中的基准最新LLL算法,以阐明其优势和劣势。这种持续学习的范式还为我们提供了一种超越集中式训练的实用方法,而集中式训练是MARL中最常用的训练协议。我们的经验表明,在我们的设置中受过训练的特工能够与看不见的特工很好地进行协调,而无需先前工作做出任何其他假设。
更新日期:2021-03-05
down
wechat
bug