当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to drive from a world on rails
arXiv - CS - Robotics Pub Date : 2021-05-03 , DOI: arxiv-2105.00636
Dian Chen, Vladlen Koltun, Philipp Krähenbühl

We learn an interactive vision-based driving policy from pre-recorded driving logs via a model-based approach. A forward model of the world supervises a driving policy that predicts the outcome of any potential driving trajectory. To support learning from pre-recorded logs, we assume that the world is on rails, meaning neither the agent nor its actions influence the environment. This assumption greatly simplifies the learning problem, factorizing the dynamics into a nonreactive world model and a low-dimensional and compact forward model of the ego-vehicle. Our approach computes action-values for each training trajectory using a tabular dynamic-programming evaluation of the Bellman equations; these action-values in turn supervise the final vision-based driving policy. Despite the world-on-rails assumption, the final driving policy acts well in a dynamic and reactive world. Our method ranks first on the CARLA leaderboard, attaining a 25% higher driving score while using 40 times less data. Our method is also an order of magnitude more sample-efficient than state-of-the-art model-free reinforcement learning techniques on navigational tasks in the ProcGen benchmark.

中文翻译:

学习从铁轨上的世界开车

我们通过基于模型的方法从预先记录的驾驶日志中学习了基于视觉的交互式驾驶策略。世界前向模型负责监督可预测任何潜在驾驶轨迹结果的驾驶策略。为了支持从预先记录的日志中学习,我们假设世界在轨道上,这意味着代理及其行为均不会影响环境。这个假设极大地简化了学习问题,将动力学分解成非反应性世界模型和自我车辆的低维紧凑前向模型。我们的方法是使用Bellman方程的表格动态编程评估来计算每个训练轨迹的动作值。这些行动价值进而监督最终的基于愿景的驾驶政策。尽管有“世界轨道”的假设,最终的驾驶政策在充满活力和反应迅速的世界中表现良好。我们的方法在CARLA排行榜上排名第一,在使用少于40倍数据的情况下,驾驶得分提高了25%。与ProcGen基准测试中导航任务的最新无模型强化学习技术相比,我们的方法的样本效率也高出一个数量级。
更新日期:2021-05-04
down
wechat
bug