FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis
arXiv - CS - Multiagent Systems Pub Date : 2020-03-09 , DOI: arxiv-2003.03900
Aman Sinha, Matthew O'Kelly, Hongrui Zheng, Rahul Mangharam, John Duchi, Russ Tedrake

Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents' behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.

中文翻译：

FormulaZero：通过离线人口综合实现分布式稳健在线适应

平衡性能和安全性对于在多代理环境中部署自动驾驶汽车至关重要。特别是，自动赛车是一个惩罚安全但保守的政策的领域，突出了对稳健、适应性策略的需求。当前的方法要么简化对其他代理的假设，要么缺乏强大的在线适应机制。这项工作对这两个挑战都做出了算法贡献。首先，为了生成一组真实的、多样化的对手，我们开发了一种基于副本交换马尔可夫链蒙特卡罗的自我对弈的新方法。其次，我们提出了一种分布稳健的老虎机优化程序，该程序可根据对手行为的信念的不确定性自适应地调整风险规避。

更新日期：2020-08-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>