当前位置: X-MOL 学术J. Intell. Transp. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A cold-start-free reinforcement learning approach for traffic signal control
Journal of Intelligent Transportation Systems ( IF 2.8 ) Pub Date : 2021-06-06 , DOI: 10.1080/15472450.2021.1934679
Nan Xiao 1 , Liang Yu 2 , Jinqiang Yu 2 , Peng Chen 2 , Yuehu Liu 3
Affiliation  

Abstract

Typical reinforcement learning (RL) requires a huge amount of data before achieving an acceptable result, and its performance can be rather poor during initial interacting process. Sample inefficiency and cold-start phenomenon of RL limits its feasibility in a range of real-world applications such as traffic signal control (TSC). On the other hand, a large amount of data on TSC can be accumulated by various model-based controllers (MBCs) rooted in traffic engineering. In this context, we propose a new RL approach which can avoid the appearance of cold starts by taking advantage of MBC experiences. First, three frameworks of joint utilization of RL and MBC are summarized for TSC, and staged framework is considered to have the edge over the other two. Then, a staged noisy-net prioritized dueling double deep Q-network (NPDD-DQN) is described in detail for TSC, where MBC experiences are used in both pre-training and online training processes. Experimental evaluation demonstrates that staged NPDD-DQN can achieve a boost in initial performance as compared to pure NPDD-DQN that does not utilize any control experiences, and learn to improve final performance beyond the underlying MBC. The effectiveness of the proposed method opens up the possibility of real implementation of RL in TSC.



中文翻译:

一种用于交通信号控制的无冷启动强化学习方法

摘要

典型的强化学习 (RL) 在获得可接受的结果之前需要大量数据,并且在初始交互过程中其性能可能相当差。RL 的样本效率低下和冷启动现象限制了其在交通信号控制 (TSC) 等一系列实际应用中的可行性。另一方面,大量的 TSC 数据可以通过植根于交通工程的各种基于模型的控制器 (MBC) 积累。在这种情况下,我们提出了一种新的 RL 方法,该方法可以通过利用 MBC 经验来避免冷启动的出现。首先,针对 TSC 总结了 RL 和 MBC 联合利用的三个框架,阶段性框架被认为比其他两个更有优势。然后,为 TSC 详细描述了分阶段噪声网络优先决斗双深度 Q 网络 (NPDD-DQN),其中 MBC 经验用于预训练和在线训练过程。实验评估表明,与不使用任何控制经验的纯 NPDD-DQN 相比,分阶段 NPDD-DQN 可以实现初始性能的提升,并学习提高最终性能超出底层 MBC。所提出方法的有效性开辟了在 TSC 中真正实施 RL 的可能性。并学习在底层 MBC 之外提高最终性能。所提出方法的有效性开辟了在 TSC 中真正实施 RL 的可能性。并学习在底层 MBC 之外提高最终性能。所提出方法的有效性开辟了在 TSC 中真正实施 RL 的可能性。

更新日期:2021-06-06
down
wechat
bug