当前位置: X-MOL 学术arXiv.cs.SY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving
arXiv - CS - Systems and Control Pub Date : 2020-07-01 , DOI: arxiv-2007.00178
Zhangjie Cao, Erdem B{\i}y{\i}k, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh

Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. However, reinforcement learning (RL) and imitation learning (IL), two widely-used policy learning methods, cannot model rapid phase transitions and are not scalable to fully cover all the states. To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes. Our approach exploits the advantages of both IL and RL by integrating them into a unified learning framework. Experimental results and user studies suggest our approach can achieve higher efficiency and safety compared to other methods. Analyses of the policies demonstrate our high-level policy appropriately switches between different low-level policies in near-accident driving situations.

中文翻译:

基于强化学习的近似事故驾驶模仿策略控制

自动驾驶近年来取得了重大进展,但自动驾驶汽车仍无法应对可能发生事故的高风险情况。在这种近乎事故的场景中,即使车辆动作的微小变化也可能导致截然不同的后果。为了避免在接近事故的场景中出现不安全的行为,我们需要充分探索环境。然而,强化学习 (RL) 和模仿学习 (IL) 这两种广泛使用的策略学习方法无法对快速相变进行建模,并且无法扩展以完全覆盖所有状态。为了解决接近事故场景中的驾驶问题,我们提出了一种分层强化和模仿学习 (H-ReIL) 方法,该方法由 IL 为离散驾驶模式学习的低级策略组成 以及由 RL 学习的在不同驾驶模式之间切换的高级策略。我们的方法通过将 IL 和 RL 集成到一个统一的学习框架中来利用它们的优势。实验结果和用户研究表明,与其他方法相比,我们的方法可以实现更高的效率和安全性。对政策的分析表明,我们的高级政策在接近事故的驾驶情况下在不同的低级政策之间适当地切换。
更新日期:2020-07-02
down
wechat
bug