Deep reinforcement learning algorithm for dynamic pricing of express lanes with multiple access locations,Transportation Research Part C: Emerging Technologies

当前位置： X-MOL 学术 › Transp. Res. Part C Emerg. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep reinforcement learning algorithm for dynamic pricing of express lanes with multiple access locations
Transportation Research Part C: Emerging Technologies ( IF 7.6 ) Pub Date : 2020-08-08 , DOI: 10.1016/j.trc.2020.102715
Venktesh Pandey , Evana Wang , Stephen D. Boyles

This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers’ value of time, origin, and destination. This framework relaxes assumptions in the literature by considering multiple origins and destinations, multiple access locations to the managed lane, en route diversion of travelers, partial observability of the sensor readings, and stochastic demand and observations. The problem is formulated as a partially observable Markov decision process (POMDP) and policy gradient methods are used to determine tolls as a function of real-time observations. Tolls are modeled as continuous and stochastic variables and are determined using a feedforward neural network. The method is compared against a feedback control method used for dynamic pricing. We show that Deep-RL is effective in learning toll policies for maximizing revenue, minimizing total system travel time, and other joint weighted objectives, when tested on real-world transportation networks. The Deep-RL toll policies outperform the feedback control heuristic for the revenue maximization objective by generating revenues up to 8.5% higher than the heuristic and for the objective minimizing total system travel time (TSTT) by generating TSTT up to 8.4% lower than the heuristic. We also propose reward shaping methods for the POMDP to overcome the undesired behavior of toll policies, like the jam-and-harvest behavior of revenue-maximizing policies. Additionally, we test transferability of the algorithm trained on one set of inputs for new input distributions and offer recommendations on real-time implementations of Deep-RL algorithms. The source code for our experiments is available online at https://github.com/venktesh22/ExpressLanes_Deep-RL.

中文翻译：

深度强化学习算法，用于具有多个访问位置的快速通道的动态定价

本文开发了一种深度强化学习（Deep-RL）框架，用于在具有多个访问位置且旅客在时间，起点和目的地的价值上存在异质性的受管车道上进行动态定价。该框架通过考虑多个起点和目的地，对管理车道的多个访问位置以及途中的信息，从而放松了文献中的假设。旅行者的转移，传感器读数的部分可观察性以及随机需求和观察结果。该问题被表述为部分可观察的马尔可夫决策过程（POMDP），并且使用策略梯度方法来确定作为实时观测函数的通行费。将通行费建模为连续和随机变量，并使用前馈神经网络确定通行费。将该方法与用于动态定价的反馈控制方法进行了比较。我们显示，在实际运输网络上进行测试时，Deep-RL可有效学习收费政策，以最大限度地提高收入，最大程度地减少系统总行驶时间以及其他共同加权目标。Deep-RL收费政策通过产生最多8个收入来实现收入最大化目标的反馈控制启发式方法。比启发式算法高5％，并且通过生成比启发式算法低8.4％的TSTT，以最大程度地减少总系统运行时间（TSTT）。我们还提出了针对POMDP的奖励整形方法，以克服通行费政策的不良行为，例如收益最大化政策的拥挤和收获行为。此外，我们测试了在一组输入上训练的新输入分布算法的可传递性，并提供了有关Deep-RL算法的实时实现的建议。可通过https://github.com/venktesh22/ExpressLanes_Deep-RL在线获取我们实验的源代码。

更新日期：2020-08-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文