ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05492
Eric Zhao, Alexander R. Trott, Caiming Xiong, Stephan Zheng

Multi-agent simulations provide a scalable environment for learning policies that interact with rational agents. However, such policies may fail to generalize to the real-world where agents may differ from simulated counterparts due to unmodeled irrationality and misspecified reward functions. We introduce Epsilon-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps. While existing notions of multi-agent robustness concern perturbations in the actions of agents, we address a novel robustness objective concerning perturbations in the reward functions of agents. ERMAS provides this robustness by anticipating suboptimal behaviors from other agents, formalized as the worst-case epsilon-equilibrium. We show empirically that ERMAS yields robust policies for repeated bimatrix games and optimal taxation problems in economic simulations. In particular, in the two-level RL problem posed by the AI Economist (Zheng et al., 2020) ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complex spatiotemporal simulations.

中文翻译：

ERMAS：在多代理模拟中变得鲁棒以奖励函数模拟到真实的差距

多代理模拟为与理性代理交互的学习策略提供了可扩展的环境。然而，由于未建模的非理性和错误指定的奖励函数，此类策略可能无法推广到现实世界，在现实世界中，代理可能与模拟对手不同。我们介绍了 Epsilon-Robust Multi-Agent Simulation (ERMAS)，这是一个强大的优化框架，用于学习对这种多代理模拟到真实差距具有鲁棒性的 AI 策略。虽然现有的多智能体稳健性概念关注智能体行为的扰动，但我们提出了一个新的稳健性目标，即关于智能体奖励函数的扰动。ERMAS 通过预测其他代理的次优行为来提供这种稳健性，形式化为最坏情况的 epsilon 平衡。我们凭经验表明，ERMAS 为重复的双矩阵博弈和经济模拟中的最优税收问题产生了稳健的政策。特别是，在 AI 经济学家 (Zheng et al., 2020) 提出的两级 RL 问题中，ERMAS 学习了对代理风险规避变化具有鲁棒性的税收政策，在复杂的时空模拟中将社会福利提高了 15%。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>