Quantifying the impact of non-stationarity in reinforcement learning-based traffic signal control,PeerJ Computer Science

当前位置： X-MOL 学术 › PeerJ Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Quantifying the impact of non-stationarity in reinforcement learning-based traffic signal control
PeerJ Computer Science ( IF 3.5 ) Pub Date : 2021-05-27 , DOI: 10.7717/peerj-cs.575
Lucas N Alegre ₁ , Ana L C Bazzan ₁ , Bruno C da Silva ₂

Affiliation

In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing the context in which an agent learns (e.g., a change in flow rates experienced by it), as well as the effects of reducing agent observability of the true environment state. Partial observability may cause distinct states (in which distinct actions are optimal) to be seen as the same by the traffic signal agents. This, in turn, may lead to sub-optimal performance. We show that the lack of suitable sensors to provide a representative observation of the real state seems to affect the performance more drastically than the changes to the underlying traffic patterns.

中文翻译：

量化非平稳性对基于强化学习的交通信号控制的影响

在强化学习（RL）中，应对非平稳性是一个具有挑战性的问题。但是，某些领域（例如流量优化）本质上是不稳定的。造成这种情况的原因和影响是多种多样的。特别是在处理交通信号控制时，解决非平稳性是关键，因为交通状况会随时间变化，并根据网络其他部分做出的交通控制决策而变化。在本文中，我们分析了交通信号网络中不同的非平稳性来源所产生的影响，其中每个信号都被建模为学习代理。更确切地说，我们既研究了改变代理学习的环境的影响（例如，代理所经历的流速的变化），也包括了降低代理对真实环境状态的可观察性的影响。部分可观察性可能会导致交通信号代理将不同的状态（其中不同的动作是最佳的）视为相同。反过来，这可能会导致性能欠佳。我们表明，缺少合适的传感器来提供对真实状态的代表性观察，似乎比基础流量模式的变化对性能的影响更大。

更新日期：2021-05-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文