当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning scalable multi-agent coordination by spatial differentiation for traffic signal control
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2021-02-10 , DOI: 10.1016/j.engappai.2021.104165
Junjia Liu , Huimin Zhang , Zhuang Fu , Yao Wang

The intelligent control of the traffic signal is critical to the optimization of transportation systems. To achieve global optimal traffic efficiency in large-scale road networks, recent works have focused on coordination among intersections, which have shown promising results. However, existing studies paid more attention to observations sharing among intersections (both explicit and implicit) and did not care about the consequences after decisions. In this paper, we design a multi-agent coordination framework based on Deep Reinforcement Learning method for traffic signal control, defined as γ-Reward that includes both original γ-Reward and γ-Attention-Reward. Specifically, we propose the Spatial Differentiation method for coordination which uses the temporal–spatial information in the replay buffer to amend the reward of each action. A concise theoretical analysis that proves the proposed model can converge to Nash equilibrium is given. By extending the idea of Markov Chain to the dimension of space–time, this truly decentralized coordination mechanism replaces the graph attention method and realizes the decoupling of the road network, which is more scalable and more in line with practice. The simulation results show that the proposed model remains a state-of-the-art performance even not use a centralized setting. Code is available in https://github.com/Skylark0924/Gamma_Reward.



中文翻译:

通过空间差异学习可扩展的多主体协调,以进行交通信号控制

交通信号的智能控制对于优化交通系统至关重要。为了在大规模道路网络中实现全球最佳交通效率,最近的工作集中于交叉口之间的协调,已显示出令人鼓舞的结果。但是,现有研究更加关注交叉路口(显式和隐式)之间共享的观察结果,并不关心决策后的后果。在本文中,我们基于深度强化学习方法设计了一种用于交通信号控制的多主体协调框架,定义为γ-包含原件的奖励γ-奖励γ-注意奖励。具体来说,我们提出了一种用于协调的空间差异方法,该方法使用重播缓冲区中的时空信息来修改每个动作的收益。给出了简洁的理论分析,证明了所提模型可以收敛到纳什均衡。通过将马尔可夫链的思想扩展到时空的维度,这种真正的分散式协调机制取代了图注意方法,实现了路网的解耦,这具有更大的可扩展性,更符合实际。仿真结果表明,所提出的模型即使不使用集中式设置也能保持最先进的性能。可以在https://github.com/Skylark0924/Gamma_Reward中找到代码。

更新日期:2021-02-10
down
wechat
bug