Safer End-to-End Autonomous Driving via Conditional Imitation Learning and Command Augmentation,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Safer End-to-End Autonomous Driving via Conditional Imitation Learning and Command Augmentation
arXiv - CS - Multiagent Systems Pub Date : 2019-09-20 , DOI: arxiv-1909.09721
Renhao Wang, Adam Scibior, Frank Wood

Imitation learning is a promising approach to end-to-end training of autonomous vehicle controllers. Typically the driving process with such approaches is entirely automatic and black-box, although in practice it is desirable to control the vehicle through high-level commands, such as telling it which way to go at an intersection. In existing work this has been accomplished by the application of a branched neural architecture, since directly providing the command as an additional input to the controller often results in the command being ignored. In this work we overcome this limitation by learning a disentangled probabilistic latent variable model that generates the steering commands. We achieve faithful command-conditional generation without using a branched architecture and demonstrate improved stability of the controller, applying only a variational objective without any domain-specific adjustments. On top of that, we extend our model with an additional latent variable and augment the dataset to train a controller that is robust to unsafe commands, such as asking it to turn into a wall. The main contribution of this work is a recipe for building controllable imitation driving agents that improves upon multiple aspects of the current state of the art relating to robustness and interpretability.

中文翻译：

通过条件模仿学习和命令增强实现更安全的端到端自动驾驶

模仿学习是一种很有前途的自主车辆控制器端到端训练方法。通常，采用这种方法的驾驶过程是完全自动的和黑匣子，尽管在实践中希望通过高级命令来控制车辆，例如告诉它在十字路口走哪条路。在现有工作中，这是通过应用分支神经架构来实现的，因为直接将命令作为附加输入提供给控制器通常会导致命令被忽略。在这项工作中，我们通过学习生成转向命令的解开概率潜变量模型来克服这一限制。我们在不使用分支架构的情况下实现了忠实的命令条件生成，并展示了控制器稳定性的提高，仅应用变分目标而不进行任何特定于域的调整。最重要的是，我们用一个额外的潜在变量扩展了我们的模型，并增加了数据集来训练一个对不安全命令具有鲁棒性的控制器，例如要求它变成一堵墙。这项工作的主要贡献是构建可控模仿驱动代理的方法，该代理改进了当前与鲁棒性和可解释性相关的现有技术的多个方面。

更新日期：2020-03-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文