Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

Wang, Yunpeng; Zheng, Kunxian; Tian, Daxin; Duan, Xuting; Zhou, Jianshan

doi:10.1631/FITEE.1900637

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

面向强化学习自动驾驶模型的异步监督学习预训练方法

Published: 28 May 2021

Volume 22, pages 673–686, (2021)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Yunpeng Wang (王云鹏)¹,
Kunxian Zheng (郑坤贤) ORCID: orcid.org/0000-0002-2887-9294¹,
Daxin Tian (田大新)¹,
Xuting Duan (段续庭)¹ &
…
Jianshan Zhou (周建山)¹

266 Accesses
5 Citations
Explore all metrics

Abstract

Rule-based autonomous driving systems may suffer from increased complexity with large-scale intercoupled rules, so many researchers are exploring learning-based approaches. Reinforcement learning (RL) has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems. However, poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system. RL training requires extensive training data before the model achieves reasonable performance, making an RL-based model inapplicable in a real-world setting, particularly when data are expensive. We propose an asynchronous supervised learning (ASL) method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings. Specifically, prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple supervised learning processes in parallel, on multiple driving demonstration data sets. After pre-training, the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit. The presented pre-training method is evaluated on the race car simulator, TORCS (The Open Racing Car Simulator), to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage. In addition, a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment. Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.

摘要

基于人定规则所设计的自动驾驶系统可能会因大规模相互耦合的规则而变得越来越复杂, 因此许多研究人员致力于探索基于学习的解决方案. 强化学习 (reinforcement learning, RL) 因其在各种顺序控制问题上的出色表现而被应用于自动驾驶系统设计. 然而, 基于RL的自动驾驶系统落地应用所面临的主要挑战是其初始性能不佳. 强化学习训练需要大量训练数据, 然后模型才能达到合理的性能要求, 这使得基于强化学习的模型不适用于现实环境, 尤其在数据昂贵的情况下. 本文为基于强化学习的端到端自动驾驶模型提出一种异步监督学习 (asynchronous supervised learning, ASL) 方法, 以解决在实际环境中训练基于强化学习模型时初始性能差的问题. 具体而言, 通过在多个驾驶演示数据集上并行且异步执行多个监督学习过程, 在异步监督学习预训练阶段引入先验知识。经过预训练后, 模型将被部署到真实车辆上进一步开展强化学习训练, 以适应实际环境并不断突破性能极限. 本文在赛车模拟器TORCS (The Open Racing Car Simulator) 上对所提出的预训练方法进行评估, 以验证该方法在改善强化学习训练阶段端到端自动驾驶模型的初始性能和收敛速度方面足够可靠. 此外, 建立一个实车验证系统, 以验证所提预训练方法在实车部署中的可行性. 仿真结果表明, 在有监督的预训练阶段使用一些演示, 可以显著提高强化学习训练阶段的初始性能和收敛速度.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rethinking Closed-Loop Training for Autonomous Driving

Federated Transfer Reinforcement Learning for Autonomous Driving

Planning Maneuvers for Autonomous Driving Based on Offline Reinforcement Learning: Comparative Study

References

Bai ZW, Shangguan W, Cai BG, et al., 2019. Deep reinforcement learning based high-level driving behavior decision-making model in heterogeneous traffic. Proc Chinese Control Conf, p.8600–8605. https://doi.org/10.23919/ChiCC.2019.8866005
Google Scholar
Bojarski M, Del Testa D, Dworakowski D, et al., 2016. End to end learning for self-driving cars. https://arxiv.org/abs/1604.07316
Google Scholar
Brys T, Harutyunyan A, Suay HB, et al., 2015. Reinforcement learning from demonstration through shaping. Proc 24^th Int Conf on Artificial Intelligence, p.3352–3358.
Google Scholar
Chen CY, Seff A, Kornhauser A, et al., 2015. DeepDriving: learning affordance for direct perception in autonomous driving. Proc IEEE Int Conf on Computer Vision, p.2722–2730. https://doi.org/10.1109/ICCV.2015.312
Google Scholar
Chen JY, Yuan BD, Tomizuka M, 2019. Model-free deep reinforcement learning for urban autonomous driving. Proc IEEE Intelligent Transportation Systems Conf, p.2765–2771. https://doi.org/10.1109/ITSC.2019.8917306
Google Scholar
Codevilla F, Müller M, López A, et al., 2018. End-to-end driving via conditional imitation learning. Proc IEEE Int Conf on Robotics and Automation, p.4693–4700. https://doi.org/10.1109/ICRA.2018.8460487
Google Scholar
de la Cruz GV Jr, Du YS, Taylor ME, 2019. Pre-training with non-expert human demonstration for deep reinforcement learning. Knowl Eng Rev, 34:e10. https://doi.org/10.1017/S0269888919000055
Article Google Scholar
González D, Pérez J, Milanés V, et al., 2016. A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst, 17(4):1135–1145. https://doi.org/10.1109/TITS.2015.2498841
Article Google Scholar
Hao W, Lin YJ, Cheng Y, et al., 2018. Signal progression model for long arterial: intersection grouping and coordination. IEEE Access, 6:30128–30136. https://doi.org/10.1109/ACCESS.2018.2843324
Article Google Scholar
He KM, Sun J, 2015. Convolutional neural networks at constrained time cost. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5353–5360. https://doi.org/10.1109/CVPR.2015.7299173
Google Scholar
He Y, Zhao N, Yin HX, 2018. Integrated networking, caching, and computing for connected vehicles: a deep reinforcement learning approach. IEEE Trans Veh Technol, 67(1):44–55. https://doi.org/10.1109/TVT.2017.2760281
Article Google Scholar
Li L, Lv YS, Wang FY, 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin, 3(3):247–254. https://doi.org/10.1109/JAS.2016.7508798
Article MathSciNet Google Scholar
Li LZ, Ota K, Dong MX, 2018. Humanlike driving: empirical decision-making system for autonomous vehicles. IEEE Trans Veh Technol, 67(8):6814–6823. https://doi.org/10.1109/TVT.2018.2822762
Article Google Scholar
Liu N, Li Z, Xu JL, et al., 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. Proc IEEE 37^th Int Conf on Distributed Computing Systems, p.372–382. https://doi.org/10.1109/ICDCS.2017.123
Google Scholar
Mao HZ, Alizadeh M, Menache I, et al., 2016. Resource management with deep reinforcement learning. Proc 15^th ACM Workshop on Hot Topics in Networks, p.50–56. https://doi.org/10.1145/3005745.3005750
Chapter Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33^rd Int Conf on Machine Learning, p.1928–1937.
Google Scholar
Nair A, Srinivasan P, Blackwell S, et al., 2015. Massively parallel methods for deep reinforcement learning. https://arxiv.org/abs/1507.04296
Google Scholar
Nair A, McGrew B, Andrychowicz M, et al., 2018. Overcoming exploration in reinforcement learning with demonstrations. https://arxiv.org/abs/1709.10089
Book Google Scholar
Paden B, Čáp M, Yong SZ, et al., 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans Intell Veh, 1(1):33–55. https://doi.org/10.1109/TIV.2016.2578706
Article Google Scholar
Qiu CR, Hu Y, Chen Y, et al., 2019. Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Int Things J, 6(5):8577–8588. https://doi.org/10.1109/JIOT.2019.2921159
Article Google Scholar
Sallab AE, Abdou M, Perot E, et al., 2017. Deep reinforcement learning framework for autonomous driving. Electron Imag, 2017(19):70–76. https://doi.org/10.2352/ISSN.2470-1173.2017.19.AVM-023
Article Google Scholar
Schwarting W, Alonso-Mora J, Rus D, 2018. Planning and decision-making for autonomous vehicles. Ann Rev Contr Robot Auton Syst, 1:187–210. https://doi.org/10.1146/annurev-control-060117-105157
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, et al., 2019. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis, 128(8):336–359. https://doi.org/10.1007/s11263-019-01228-7
Google Scholar
Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354–359. https://doi.org/10.1038/nature24270
Article Google Scholar
Taylor ME, Stone P, 2009. Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res, 10:1633–1685.
MathSciNet MATH Google Scholar
Wang YP, Zheng KX, Tian DX, et al., 2020. Cooperative channel assignment for VANETs based on multiagent reinforcement learning. Front Inform Technol Electron Eng, 21(7):1047–1058. https://doi.org/10.1631/FITEE.1900308
Article Google Scholar
Xu ZY, Wang YZ, Tang J, et al., 2017. A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs. Proc IEEE Int Conf on Communications, p.1–6. https://doi.org/10.1109/ICC.2017.7997286
Google Scholar
Zhang XQ, Ma HM, 2018. Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. https://arxiv.org/abs/1801.10459
Google Scholar
Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921–2929. https://doi.org/10.1109/CVPR.2016.319
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Advanced Innovation Center for Big Data and Brain Computing, School of Transportation Science and Engineering, Beihang University, Beijing, 100191, China
Yunpeng Wang (王云鹏), Kunxian Zheng (郑坤贤), Daxin Tian (田大新), Xuting Duan (段续庭) & Jianshan Zhou (周建山)

Authors

Yunpeng Wang (王云鹏)
View author publications
You can also search for this author in PubMed Google Scholar
Kunxian Zheng (郑坤贤)
View author publications
You can also search for this author in PubMed Google Scholar
Daxin Tian (田大新)
View author publications
You can also search for this author in PubMed Google Scholar
Xuting Duan (段续庭)
View author publications
You can also search for this author in PubMed Google Scholar
Jianshan Zhou (周建山)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yunpeng WANG designed the research. Kunxian ZHENG processed the data. Daxin TIAN drafted the manuscript. Xuting DUAN helped organize the manuscript. Kunxian ZHENG and Jianshan ZHOU revised and finalized the paper.

Corresponding author

Correspondence to Kunxian Zheng (郑坤贤).

Ethics declarations

Yunpeng WANG, Kunxian ZHENG, Daxin TIAN, Xuting DUAN, and Jianshan ZHOU declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61672082 and 61822101), the Beijing Municipal Natural Science Foundation, China (No. 4181002), and the Beihang University Innovation and Practice Fund for Graduate, China (No. YCSJ-02-2018-05)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Zheng, K., Tian, D. et al. Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving. Front Inform Technol Electron Eng 22, 673–686 (2021). https://doi.org/10.1631/FITEE.1900637

Download citation

Received: 20 November 2019
Accepted: 29 December 2020
Published: 28 May 2021
Issue Date: May 2021
DOI: https://doi.org/10.1631/FITEE.1900637

Key words

关键词

CLC number

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

Abstract

摘要

Access this article

Similar content being viewed by others

Rethinking Closed-Loop Training for Autonomous Driving

Federated Transfer Reinforcement Learning for Autonomous Driving

Planning Maneuvers for Autonomous Driving Based on Offline Reinforcement Learning: Comparative Study

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Navigation

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

Abstract

摘要

Access this article

Similar content being viewed by others

Rethinking Closed-Loop Training for Autonomous Driving

Federated Transfer Reinforcement Learning for Autonomous Driving

Planning Maneuvers for Autonomous Driving Based on Offline Reinforcement Learning: Comparative Study

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation