当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Socially and Contextually Aware Human Motion and Pose Forecasting
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-07-14 , DOI: arxiv-2007.06843
Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi

Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline. To deal with this real-world problem, we consider incorporating both scene and social contexts, as critical clues for this prediction task, into our proposed framework. To this end, we first couple these two tasks by i) encoding their history using a shared Gated Recurrent Unit (GRU) encoder and ii) applying a metric as loss, which measures the source of errors in each task jointly as a single distance. Then, we incorporate the scene context by encoding a spatio-temporal representation of the video data. We also include social clues by generating a joint feature representation from motion and pose of all individuals from the scene using a social pooling layer. Finally, we use a GRU based decoder to forecast both motion and skeleton pose. We demonstrate that our proposed framework achieves a superior performance compared to several baselines on two social datasets.

中文翻译:

具有社会和情境意识的人体运动和姿势预测

机器人在与人类互动时顺畅无缝的导航取决于对人类运动的预测。预测此类人体动力学通常涉及对人体轨迹(全局运动)或详细的身体关节运动(局部运动)进行建模。先前的工作通常分别处理本地和全球人类运动。在本文中,我们提出了一个新颖的框架,以在统一的端到端管道中处理人体运动(或轨迹)和身体骨骼姿势预测的任务。为了解决这个现实世界的问题,我们考虑将场景和社会背景作为这个预测任务的关键线索,并入我们提出的框架中。为此,我们首先通过 i) 使用共享门控循环单元 (GRU) 编码器对它们的历史进行编码和 ii) 应用度量作为损失来将这两个任务结合起来,它将每个任务中的错误源联合测量为单个距离。然后,我们通过编码视频数据的时空表示来合并场景上下文。我们还通过使用社交池化层从场景中所有个体的运动和姿势生成联合特征表示来包含社交线索。最后,我们使用基于 GRU 的解码器来预测运动和骨架姿势。我们证明,与两个社交数据集上的几个基线相比,我们提出的框架实现了卓越的性能。我们使用基于 GRU 的解码器来预测运动和骨架姿势。我们证明,与两个社交数据集上的几个基线相比,我们提出的框架实现了卓越的性能。我们使用基于 GRU 的解码器来预测运动和骨架姿势。我们证明,与两个社交数据集上的几个基线相比,我们提出的框架实现了卓越的性能。
更新日期:2020-07-15
down
wechat
bug