当前位置: X-MOL 学术Knowl. Eng. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toll-based reinforcement learning for efficient equilibria in route choice
The Knowledge Engineering Review ( IF 2.1 ) Pub Date : 2020-03-06 , DOI: 10.1017/s0269888920000119
Gabriel de O. Ramos , Bruno C. Da Silva , Roxana Rădulescu , Ana L. C. Bazzan , Ann Nowé

The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

中文翻译:

基于 Toll 的强化学习在路线选择中实现高效均衡

交通拥堵问题引起了许多社会和经济影响,因此已成为世界上每个主要城市的中心问题。对于这项工作,我们从多智能体系统的角度来看待交通领域,其中每个司机都可以被视为一个自主决策代理。我们探讨了学习方法如何帮助实现有效的结果,即使代理在竞争环境中交互以共享公共资源也是如此。为此,我们考虑了路线选择问题,自利的司机需要独立地学习哪些路线可以最小化他们的预期旅行成本。这种自私的行为导致了所谓的用户均衡,从系统的角度来看是低效的。为了减轻自私的影响,我们提出了基于 Toll 的 Q-learning (TQ-learning, 简称)。TQ-learning 采用边际成本收费 (MCT) 的概念,其中每个司机根据其强加给其他人的成本收费。MCT 的使用会导致代理人以一种社会合意的方式行事,从而可以实现这一目标。然而,与以前的工作相比,我们的收费方案是分布式的(即每个代理可以计算自己的收费),收费后验的(即,在每次旅行结束时),并且更公平(即,代理商准确支付其边际成本)。此外,我们提供了单变量齐次多项式成本函数的通行费值的一般公式。我们提出了 TQ 学习的理论分析,证明它在极限内收敛到系统效率平衡(即与系统最优对齐的平衡)。此外,我们对现实道路网络进行了广泛的实证评估以支持我们的理论发现,表明 TQ 学习确实收敛到最佳状态,这意味着拥堵水平平均降低了 9.1%。
更新日期:2020-03-06
down
wechat
bug