Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving,Frontiers of Information Technology & Electronic Engineering

当前位置： X-MOL 学术 › Front. Inform. Technol. Electron. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving
Frontiers of Information Technology & Electronic Engineering ( IF 2.7 ) Pub Date : 2021-05-28 , DOI: 10.1631/fitee.1900637
Yunpeng Wang , Kunxian Zheng , Daxin Tian , Xuting Duan , Jianshan Zhou

Rule-based autonomous driving systems may suffer from increased complexity with large-scale intercoupled rules, so many researchers are exploring learning-based approaches. Reinforcement learning (RL) has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems. However, poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system. RL training requires extensive training data before the model achieves reasonable performance, making an RL-based model inapplicable in a real-world setting, particularly when data are expensive. We propose an asynchronous supervised learning (ASL) method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings. Specifically, prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple supervised learning processes in parallel, on multiple driving demonstration data sets. After pre-training, the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit. The presented pre-training method is evaluated on the race car simulator, TORCS (The Open Racing Car Simulator), to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage. In addition, a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment. Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.

中文翻译：

用于基于强化学习的自动驾驶的异步监督学习预训练

基于规则的自动驾驶系统可能会因大规模相互关联的规则而变得越来越复杂，因此许多研究人员正在探索基于学习的方法。强化学习（RL）由于其在各种顺序控制问题上的出色表现而已被用于设计自动驾驶系统。然而，较差的初始性能是对基于RL的自动驾驶系统的实际实施的主要挑战。RL 训练需要大量训练数据才能使模型达到合理的性能，这使得基于 RL 的模型不适用于现实环境，尤其是在数据昂贵的情况下。我们为基于 RL 的端到端自动驾驶模型提出了一种异步监督学习 (ASL) 方法，以解决在实际设置中训练基于 RL 的模型之前初始性能不佳的问题。具体来说，先验知识是通过在多个驾驶演示数据集上异步执行多个监督学习过程而在 ASL 预训练阶段引入的。预训练后，将模型部署在实车上，由 RL 进一步训练，以适应真实环境，不断突破性能极限。所提出的预训练方法在赛车模拟器 TORCS（开放式赛车模拟器）上进行评估，验证它在 RL 训练阶段提高端到端自动驾驶模型的初始性能和收敛速度方面是否足够可靠。此外，还构建了实车验证系统，以验证所提出的预训练方法在实车部署中的可行性。模拟结果表明，在有监督的预训练阶段使用一些演示可以显着提高 RL 训练阶段的初始性能和收敛速度。

更新日期：2021-05-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11