A New Approach for Tactical Decision Making in Lane Changing: Sample Efficient Deep Q Learning with a Safety Feedback Reward,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A New Approach for Tactical Decision Making in Lane Changing: Sample Efficient Deep Q Learning with a Safety Feedback Reward
arXiv - CS - Robotics Pub Date : 2020-09-24 , DOI: arxiv-2009.11905
M. Ugur Yavas, N. Kemal Ure, Tufan Kumbasar

Automated lane change is one of the most challenging task to be solved of highly automated vehicles due to its safety-critical, uncertain and multi-agent nature. This paper presents the novel deployment of the state of art Q learning method, namely Rainbow DQN, that uses a new safety driven rewarding scheme to tackle the issues in an dynamic and uncertain simulation environment. We present various comparative results to show that our novel approach of having reward feedback from the safety layer dramatically increases both the agent's performance and sample efficiency. Furthermore, through the novel deployment of Rainbow DQN, it is shown that more intuition about the agent's actions is extracted by examining the distributions of generated Q values of the agents. The proposed algorithm shows superior performance to the baseline algorithm in the challenging scenarios with only 200000 training steps (i.e. equivalent to 55 hours driving).

中文翻译：

变道中战术决策的新方法：具有安全反馈奖励的高效深度 Q 学习样本

由于其安全关键、不确定性和多代理性质，自动变道是高度自动化车辆需要解决的最具挑战性的任务之一。本文介绍了最先进的 Q 学习方法的新部署，即 Rainbow DQN，它使用新的安全驱动奖励方案来解决动态和不确定模拟环境中的问题。我们展示了各种比较结果，以表明我们从安全层获得奖励反馈的新方法极大地提高了代理的性能和样本效率。此外，通过 Rainbow DQN 的新颖部署，表明通过检查代理生成的 Q 值的分布，可以提取有关代理行为的更多直觉。

更新日期：2020-09-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>