当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Agent Reinforcement Learning for Dynamic Routing Games: A Unified Paradigm
arXiv - CS - Multiagent Systems Pub Date : 2020-11-22 , DOI: arxiv-2011.10915
Zhenyu Shou, Xuan Di

This paper aims to develop a unified paradigm that models one's learning behavior and the system's equilibrating processes in a routing game among atomic selfish agents. Such a paradigm can assist policymakers in devising optimal operational and planning countermeasures under both normal and abnormal circumstances. To this end, a multi-agent reinforcement learning (MARL) paradigm is proposed in which each agent learns and updates her own en-route path choice policy while interacting with others on transportation networks. This paradigm is shown to generalize the classical notion of dynamic user equilibrium (DUE) to model-free and data-driven scenarios. We also illustrate that the equilibrium outcomes computed from our developed MARL paradigm coincide with DUE and dynamic system optimal (DSO), respectively, when rewards are set differently. In addition, with the goal to optimize some systematic objective (e.g., overall traffic condition) of city planners, we formulate a bilevel optimization problem with the upper level as city planners and the lower level as a multi-agent system where each rational and selfish traveler aims to minimize her travel cost. We demonstrate the effect of two administrative measures, namely tolling and signal control, on the behavior of travelers and show that the systematic objective of city planners can be optimized by a proper control. The results show that on the Braess network, the optimal toll charge on the central link is greater or equal to 25, with which the average travel time of selfish agents is minimized and the emergence of Braess paradox could be avoided. In a large-sized real-world road network with 69 nodes and 166 links, the optimal offset for signal control on Broadway is derived as 4 seconds, with which the average travel time of all controllable agents is minimized.

中文翻译:

动态路由游戏的多智能体强化学习:统一范例

本文旨在开发一个统一的范式,在原子自私代理之间的路由博弈中对一个人的学习行为和系统的平衡过程进行建模。这种范例可以帮助决策者在正常和异常情况下设计最佳的运营和计划对策。为此,提出了一种多智能体强化学习(MARL)范式,其中,每个智能体在与运输网络上的其他人进行交互的同时,学习和更新自己的途中路径选择策略。该范例显示了将动态用户平衡(DUE)的经典概念推广到无模型和数据驱动的场景。我们还说明,当奖励设置不同时,从我们开发的MARL范式计算出的均衡结果分别与DUE和动态系统最优(DSO)一致。此外,为了优化城市规划者的某些系统性目标(例如总体交通状况),我们制定了一个双层优化问题,上层为城市规划者,下层为多代理系统,其中每个理性和自私旅行者的目标是尽量减少旅行成本。我们演示了收费和信号控制这两种行政措施对旅行者行为的影响,并表明可以通过适当的控制来优化城市规划者的系统目标。结果表明,在Braess网络上,中央链路上的最佳通行费大于或等于25,从而使自私代理的平均旅行时间最小化,可以避免Braess悖论的出现。在具有69个节点和166条链接的大型现实世界路网中,
更新日期:2020-11-25
down
wechat
bug