当前位置: X-MOL 学术Control Eng. Pract. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating semi-cooperative Nash/Stackelberg Q-learning for traffic routes plan in a single intersection
Control Engineering Practice ( IF 5.4 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.conengprac.2020.104525
Jian Guo , Istvan Harmati

Abstract As traffic congestion grows tremendous and frequent in the urban transportation system, many efficient models with reinforcement learning (RL) methods have already been proposed to optimize this situation. A multi-agent reinforcement learning (MARL) system can be constructed from the traffic problem, where the incoming links (i.e., sections) are regarded as agents and the actions made by the agents are for controlling signal lights. A semi-cooperative Nash Q-learning approach on the basis of single-agent Q-learning and Nash equilibrium is proposed and presented in this paper, in which the agents agree on the process of action selection by Nash equilibrium, but strive finally for a common goal with cooperative behaviour when more than one Nash equilibriums exist. Then an extended version called semi-cooperative Stackelberg Q-learning is designed to make a comparison, where Nash equilibrium is replaced by Stackelberg equilibrium in the Q-learning process. Specifically, the agent who has the largest queues will be promoted as a leader and the others are followers who react to the leader’s decision. Instead of adjusting the plan of green light timing published in other research, this paper is contributing to finding the best multi-routes plan for passing most vehicles in a single traffic intersection, with combining game theory and RL in decision-making in the multi-agent framework. These two multi-agent Q-learning methods are implemented and compared with the constant strategy (i.e., the time intervals of green or red lights are fixed and periodical). The simulated result shows that the performance of semi-cooperative Stackelberg Q-learning is better.

中文翻译:

评估半合作 Nash/Stackelberg Q-learning 在单个交叉路口的交通路线规划

摘要 随着城市交通系统中交通拥堵的日益严重和频繁,已经提出了许多具有强化学习(RL)方法的有效模型来优化这种情况。可以从交通问题构建多代理强化学习(MARL)系统,其中传入链路(即路段)被视为代理,代理所做的动作用于控制信号灯。本文提出并提出了一种基于单智能体 Q-learning 和 Nash 均衡的半合作 Nash Q-learning 方法,其中智能体同意 Nash 均衡的动作选择过程,但最终争取一个当存在多个纳什均衡时,具有合作行为的共同目标。然后设计了一个叫做半合作 Stackelberg Q-learning 的扩展版本来进行比较,其中在 Q-learning 过程中将 Nash 均衡替换为 Stackelberg 均衡。具体来说,拥有最大队列的代理将被提升为领导者,其他人是对领导者的决定做出反应的追随者。本文没有调整其他研究中公布的绿灯时间计划,而是通过将博弈论和 RL 相结合在多路决策中寻找最佳的多路线计划,在单个交通路口通过大多数车辆。代理框架。这两种多智能体Q-learning方法实现并与恒定策略(即绿灯或红灯的时间间隔固定且周期性)进行比较。
更新日期:2020-09-01
down
wechat
bug