当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2021-01-01 , DOI: 10.1109/tcyb.2020.3015811
Xiaoqiang Wang , Liangjun Ke , Zhimin Qiao , Xinghua Chai

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double $Q$ -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double $Q$ -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent $Q$ -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.

中文翻译:

使用新型Multiagent强化学习的大规模交通信号控制

对于大规模交通信号控制(TSC)问题,找到最佳信号定时策略是一项艰巨的任务。多主体强化学习(MARL)是解决该问题的一种有前途的方法。但是,在扩展到大规模问题和为每个单独的代理建模其他代理的行为方面,仍有改进的空间。在本文中,一种新的MARL,称为合作双 $ Q $ 提出了“学习”(Co-DQL),它具有几个突出的特征。它使用高度可扩展的独立双 $ Q $ 双重估计量和上置信度上限(UCB)策略的网络学习方法,可以消除传统独立估计中存在的过高估计问题 $ Q $ 在确保探索的同时学习。它使用均值场逼近法对代理之间的交互进行建模,从而使代理学习更好的协作策略。为了提高学习过程的稳定性和鲁棒性,我们引入了一种新的奖励分配机制和一种局部状态共享方法。此外,我们分析了该算法的收敛性。Co-DQL已应用于TSC,并在TSC模拟器的各种交通流场景中进行了测试。结果表明,就多种流量指标而言,Co-DQL优于最新的去中心化MARL算法。
更新日期:2021-01-01
down
wechat
bug