当前位置: X-MOL 学术IET Intell. Transp. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data
IET Intelligent Transport Systems ( IF 2.3 ) Pub Date : 2020-04-30 , DOI: 10.1049/iet-its.2019.0317
Jingliang Duan 1 , Shengbo Eben Li 1 , Yang Guan 1 , Qi Sun 1 , Bo Cheng 1
Affiliation  

Decision making for self-driving cars is usually tackled by manually encoding rules from drivers’ behaviours or imitating drivers’ manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This study presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large amount of labelled driving data. This method comprehensively considers both high-level manoeuvre selection and low-level motion control in both lateral and longitudinal directions. The authors firstly decompose the driving tasks into three manoeuvres, including driving in lane, right lane change and left lane change, and learn the sub-policy for each manoeuvre. Then, a master policy is learned to choose the manoeuvre policy to be executed in the current state. All policies, including master policy and manoeuvre policies, are represented by fully-connected neural networks and trained by using asynchronous parallel reinforcement learners, which builds a mapping from the sensory outputs to driving decisions. Different state spaces and reward functions are designed for each manoeuvre. They apply this method to a highway driving scenario, which demonstrates that it can realise smooth and safe decision making for self-driving cars.

中文翻译:

分层强化学习,无需依赖标记的驾驶数据即可进行自动驾驶决策

自动驾驶汽车的决策通常通过手动编码驾驶员行为规则或使用监督学习技术模仿驾驶员操纵来解决。他们俩都依靠大量驾驶数据来覆盖所有可能的驾驶场景。这项研究提出了一种用于自动驾驶汽车决策的分层强化学习方法,该方法不依赖大量标记的驾驶数据。该方法综合考虑了横向和纵向的高水平机动选择和低水平运动控制。作者首先将驾驶任务分解为三个方法,包括在车道上驾驶,右车道变更和左车道变更,并学习每种操作的子策略。然后,学习主策略以选择要在当前状态下执行的机动策略。所有策略(包括主策略和机动策略)均由完全连接的神经网络表示,并使用异步并行强化学习器进行训练,该学习器建立从感官输出到驾驶决策的映射。每个操作都设计了不同的状态空间和奖励功能。他们将此方法应用于高速公路驾驶场景,表明该方法可以实现自动驾驶汽车的平稳,安全的决策。每个操作都设计了不同的状态空间和奖励功能。他们将此方法应用于高速公路驾驶场景,表明该方法可以实现自动驾驶汽车的平稳,安全的决策。每个操作都设计了不同的状态空间和奖励功能。他们将此方法应用于高速公路驾驶场景,表明该方法可以实现自动驾驶汽车的平稳,安全的决策。
更新日期:2020-04-30
down
wechat
bug