Markov-game modeling of cyclist-pedestrian interactions in shared spaces: A multi-agent adversarial inverse reinforcement learning approach,Transportation Research Part C: Emerging Technologies

当前位置： X-MOL 学术 › Transp. Res. Part C Emerg. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Markov-game modeling of cyclist-pedestrian interactions in shared spaces: A multi-agent adversarial inverse reinforcement learning approach
Transportation Research Part C: Emerging Technologies ( IF 7.6 ) Pub Date : 2021-05-21 , DOI: 10.1016/j.trc.2021.103191
Rushdi Alsaleh , Tarek Sayed

Understanding and modeling road user dynamics and their microscopic interaction behaviour at shared space facilities are curial for several applications including safety and performance evaluations. Despite the multi-agent nature of road user interactions, the majority of previous studies modeled their interactions as a single-agent modeling framework, i.e., considering the other interaction agents as part of the passive environment. However, this assumption is unrealistic and could limit the model's accuracy and transferability in non-stationary road user environments. This study proposes a novel Multi-Agent Adversarial Inverse Reinforcement Learning approach (MA-AIRL) to model and simulate road user interactions at shared space facilities. Unlike the traditional game-theoretic framework that models multi-agent systems as a single time-step payoff, the proposed approach is based on Markov Games (MG) which models road users' sequential decisions concurrently. Moreover, the proposed model can handle bounded rationality agents, e.g., limited information access, through the implementation of the Logistic Stochastic Best Response Equilibrium (LSBRE) solution concept. The proposed algorithm recovers road users' multi-agent reward functions using adversarial deep neural network discriminators and estimates their optimal policies using Multi-agent Actor-Critic with Kronecker factors (MACK) deep reinforcement learning. Data from three shared space locations in Vancouver, BC and New York City, New York are used in this study. The model's performance is compared to a baseline single-agent Gaussian Process Inverse Reinforcement Learning (GPIRL). The results show that the multi-agent modeling framework led to a significantly more accurate prediction of road users’ behaviour and their evasive action mechanisms. Moreover, the recovered reward functions based on the single-agent modeling approach failed to capture the equilibrium solution concept similar to the multi-agent approach.

中文翻译：

共享空间中骑车人与人之间互动的马尔可夫博弈建模：一种多智能体对抗逆强化学习方法

对于共享空间设施中的道路用户动态及其微观交互行为的理解和建模，对于包括安全性和性能评估在内的多种应用是必不可少的。尽管道路用户交互具有多主体性质，但之前的大多数研究都将其交互建模为单主体建模框架，即将其他交互主体视为被动环境的一部分。但是，这种假设是不现实的，并且可能会限制模型在非固定道路用户环境中的准确性和可传递性。这项研究提出了一种新颖的多智能体对抗逆向强化学习方法（MA-AIRL），用于在共享空间设施中建模和模拟道路用户的交互作用。与将多主体系统建模为单个时间步收益的传统游戏理论框架不同，该提议的方法基于马尔可夫游戏（MG），该模型同时对道路用户的顺序决策进行建模。此外，所提出的模型可以通过实现Logistic随机最佳响应均衡（LSBRE）解决方案概念来处理有限的理性主体，例如有限的信息访问权限。所提出的算法使用对抗性深度神经网络判别器恢复道路用户的多主体奖励函数，并使用具有Kronecker因子的多主体Actor-Critic（MACK）深度强化学习方法估算道路用户的最佳策略。这项研究使用了来自不列颠哥伦比亚省温哥华市和纽约州纽约市三个共享空间位置的数据。该模型' 将其性能与基准单人高斯过程逆强化学习（GPIRL）进行比较。结果表明，多主体建模框架导致对道路使用者的行为及其回避行动机制的预测更加准确。此外，基于单主体建模方法的恢复的奖励函数未能捕获类似于多主体方法的均衡解决方案概念。

更新日期：2021-05-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文