First-Person Activity Forecasting from Video with Online Inverse Reinforcement Learning,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

First-Person Activity Forecasting from Video with Online Inverse Reinforcement Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 10-4-2018 , DOI: 10.1109/tpami.2018.2873794
Nicholas Rhinehart , Kris M. Kitani

Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries, enabling efficient and accurate prediction of future trajectories. By customizing intention queries for distinct motion modalities, MTR improves multimodal motion prediction while reducing reliance on dense goal candidates. The framework comprises two essential processes: global intention localization, identifying the agent's intent to enhance overall efficiency, and local movement refinement, adaptively refining predicted trajectories for improved accuracy. Moreover, we introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents. MTR++ incorporates symmetric context modeling and mutually-guided intention querying modules to facilitate future behavior interaction among multiple agents, resulting in scene-compliant future trajectories. Extensive experimental results demonstrate that the MTR framework achieves state-of-the-art performance on the highly-competitive motion prediction benchmarks, while the MTR++ framework surpasses its precursor, exhibiting enhanced performance and efficiency in predicting accurate multimodal future trajectories for multiple agents.

中文翻译：

通过在线逆强化学习从视频进行第一人称活动预测

运动预测对于自动驾驶系统理解复杂的驾驶场景并做出明智的决策至关重要。然而，由于交通参与者行为的多样性和复杂的环境背景，这项任务具有挑战性。在本文中，我们提出了 Motion TRansformer (MTR) 框架来应对这些挑战。最初的 MTR 框架利用具有可学习意图查询的 Transformer 编码器-解码器结构，能够高效、准确地预测未来轨迹。通过为不同的运动模态定制意图查询，MTR 改进了多模态运动预测，同时减少了对密集目标候选的依赖。该框架包括两个基本过程：全局意图定位，识别代理的意图以提高整体效率，以及局部运动细化，自适应地细化预测轨迹以提高准确性。此外，我们引入了先进的 MTR++ 框架，扩展了 MTR 同时预测多个智能体的多模态运动的能力。 MTR++结合了对称上下文建模和相互引导的意图查询模块，以促进多个代理之间的未来行为交互，从而产生符合场景的未来轨迹。大量的实验结果表明，MTR 框架在竞争激烈的运动预测基准上实现了最先进的性能，而 MTR++ 框架超越了其前身，在预测多个智能体的准确多模态未来轨迹方面表现出增强的性能和效率。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11