Accurate Long-Term Multiple People Tracking using Video and Body-Worn IMUs.,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accurate Long-Term Multiple People Tracking using Video and Body-Worn IMUs.
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2020-08-13 , DOI: 10.1109/tip.2020.3013801
Roberto Henschel , Timo Von Marcard , Bodo Rosenhahn

Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person’s outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.

中文翻译：

使用视频和佩戴式 IMU 进行准确的长期多人跟踪。

大多数基于视频的多人跟踪的现代方法都依赖于人的外表来利用人员检测之间的相似性。因此，如果此类信息不具有区分性或人们更换服装，跟踪准确性就会降低。相比之下，我们提出了一种将视频信息与来自身体佩戴的惯性测量单元（IMU）的附加运动信号融合的方法。特别是，我们提出了一种神经网络，将人员检测与 IMU 方向联系起来，并制定图形标记问题，以获得与视频和惯性记录全局一致的跟踪解决方案。视觉和惯性线索的融合提供了几个优点。视频和 IMU 设备中检测框的关联基于运动，与人的外观无关。此外，无论视觉遮挡如何，惯性传感器都可以提供运动信息。因此，一旦视频中的检测与 IMU 设备相关联，就可以根据相应的惯性传感器数据重建中间位置，而仅使用视频会不稳定。由于这个新设置不存在数据集，我们发布了一个具有挑战性的跟踪序列数据集，其中包含视频和 IMU 记录以及地面实况注释。我们在新数据集上评估了我们的方法，获得了 91.2% 的平均 IDF1 分数。所提出的方法适用于任何允许人们配备惯性传感器的情况。

更新日期：2020-08-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11