当前位置: X-MOL 学术IEEE Trans. Veh. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Nash Q-Learning based motion decision algorithm with considering interaction to traffic participants
IEEE Transactions on Vehicular Technology ( IF 6.8 ) Pub Date : 2020-11-01 , DOI: 10.1109/tvt.2020.3027352
Can Xu , Wanzhong Zhao , Lin Li , Qingyun Chen , Dengming Kuang , Jianhao Zhou

In order to improve the efficiency and comfort of autonomous vehicles while ensuring safety, the decision algorithm needs to interact with human drivers, infer the most probable behavior and then makes advantageous decision. This paper proposes a Nash-Q learning based motion decision algorithm to consider the interaction. First, the local trajectory of surrounding vehicle is predicted by kinematic constraints, which can reflect the short-term motion trend. Then, the future action space is built based the predicted local trajectory that consists of five basis actions. With that, the Nash-Q learning process can be implemented by the game between these basis actions. By elimination of strictly dominated actions and the Lemke-Howson method, the autonomous vehicle can decide the optimal action and infer the behavior of surrounding vehicle. Finally, the lane merging scenario is built to test the performance contrast to the existing methods. The driver in loop experiment is further designed to verify the interaction performance in multi-vehicle traffic. The results show that the Nash-Q learning based algorithm can improve the efficiency and comfort by 15.75% and 20.71% to the Stackelberg game and the no-interaction method respectively while the safety is ensured. It can also make real-time interaction with human drivers in multi-vehicle traffic.

中文翻译:

一种考虑与交通参与者交互的基于 Nash Q-Learning 的运动决策算法

为了在保证安全的同时提高自动驾驶汽车的效率和舒适度,决策算法需要与人类驾驶员进行交互,推断出最可能的行为,然后做出有利的决策。本文提出了一种基于 Nash-Q 学习的运动决策算法来考虑交互。首先,通过运动学约束预测周围车辆的局部轨迹,可以反映短期的运动趋势。然后,基于由五个基本动作组成的预测局部轨迹构建未来动作空间。这样,Nash-Q 学习过程就可以通过这些基础动作之间的博弈来实现。通过消除严格支配的动作和 Lemke-Howson 方法,自主车辆可以决定最佳动作并推断周围车辆的行为。最后,车道合并场景旨在测试与现有方法的性能对比。进一步设计了驾驶员在环实验以验证多车交通中的交互性能。结果表明,在保证安全的情况下,基于Nash-Q学习的算法比Stackelberg博弈和无交互方法分别提高了15.75%和20.71%的效率和舒适度。它还可以在多车交通中与人类驾驶员进行实时交互。在保证安全的情况下,Stackelberg 游戏和无交互方法分别为 71%。它还可以在多车交通中与人类驾驶员进行实时交互。在保证安全的情况下,Stackelberg 游戏和无交互方法分别为 71%。它还可以在多车交通中与人类驾驶员进行实时交互。
更新日期:2020-11-01
down
wechat
bug