A Real-time Vision Framework for Pedestrian Behavior Recognition and Intention Prediction at Intersections Using 3D Pose Estimation,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Real-time Vision Framework for Pedestrian Behavior Recognition and Intention Prediction at Intersections Using 3D Pose Estimation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10868
Ue-Hwan Kim, Dongho Ka, Hwasoo Yeo, Jong-Hwan Kim

Minimizing traffic accidents between vehicles and pedestrians is one of the primary research goals in intelligent transportation systems. To achieve the goal, pedestrian behavior recognition and prediction of pedestrian's crossing or not-crossing intention play a central role. Contemporary approaches do not guarantee satisfactory performance due to lack of generalization, the requirement of manual data labeling, and high computational complexity. To overcome these limitations, we propose a real-time vision framework for two tasks: pedestrian behavior recognition (100.53 FPS) and intention prediction (35.76 FPS). Our framework obtains satisfying generalization over multiple sites because of the proposed site-independent features. At the center of the feature extraction lies 3D pose estimation. The 3D pose analysis enables robust and accurate recognition of pedestrian behaviors and prediction of intentions over multiple sites. The proposed vision framework realizes 89.3% accuracy in the behavior recognition task on the TUD dataset without any training process and 91.28% accuracy in intention prediction on our dataset achieving new state-of-the-art performance. To contribute to the corresponding research community, we make our source codes public which are available at https://github.com/Uehwan/VisionForPedestrian

中文翻译：

使用 3D 姿态估计的交叉路口行人行为识别和意图预测的实时视觉框架

最大限度地减少车辆和行人之间的交通事故是智能交通系统的主要研究目标之一。为了实现这一目标，行人行为识别和行人穿越或不穿越意图的预测起着核心作用。由于缺乏泛化、手动数据标记的要求和高计算复杂度，现代方法不能保证令人满意的性能。为了克服这些限制，我们为两个任务提出了一个实时视觉框架：行人行为识别（100.53 FPS）和意图预测（35.76 FPS）。由于提出了与站点无关的特征，我们的框架在多个站点上获得了令人满意的泛化。特征提取的核心是 3D 姿态估计。3D 姿态分析能够对行人行为进行稳健而准确的识别，并预测多个站点的意图。所提出的视觉框架在没有任何训练过程的情况下在 TUD 数据集上的行为识别任务中实现了 89.3% 的准确率，在我们的数据集上实现了 91.28% 的意图预测准确率，实现了新的最先进性能。为了对相应的研究社区做出贡献，我们将源代码公开，可在 https://github.com/Uehwan/VisionForPedestrian 上获得

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>