当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2018-09-08 , DOI: 10.1007/s11263-018-1118-y
Andrew Gilbert , Matthew Trumble , Charles Malleson , Adrian Hilton , John Collomosse

We propose an approach to accurately estimate 3D human pose by fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data, without optical markers, a complex hardware setup or a full body model. Uniquely we use a multi-channel 3D convolutional neural network to learn a pose embedding from visual occupancy and semantic 2D pose estimates from the MVV in a discretised volumetric probabilistic visual hull. The learnt pose stream is concurrently processed with a forward kinematic solve of the IMU data and a temporal model (LSTM) exploits the rich spatial and temporal long range dependencies among the solved joints, the two streams are then fused in a final fully connected layer. The two complementary data sources allow for ambiguities to be resolved within each sensor modality, yielding improved accuracy over prior methods. Extensive evaluation is performed with state of the art performance reported on the popular Human 3.6M dataset (Ionescu et al. in Intell IEEE Trans Pattern Anal Mach 36(7):1325–1339, 2014), the newly released TotalCapture dataset and a challenging set of outdoor videos TotalCaptureOutdoor. We release the new hybrid MVV dataset (TotalCapture) comprising of multi-viewpoint video, IMU and accurate 3D skeletal joint ground truth derived from a commercial motion capture system. The dataset is available online at http://cvssp.org/data/totalcapture/.

中文翻译:

将视觉和惯性传感器与语义融合以进行 3D 人体姿势估计

我们提出了一种通过将多视点视频 (MVV) 与惯性测量单元 (IMU) 传感器数据融合来准确估计 3D 人体姿势的方法,无需光学标记、复杂的硬件设置或全身模型。独特的是,我们使用多通道 3D 卷积神经网络在离散体积概率视觉外壳中从视觉占用和语义 2D 姿态估计中学习姿态嵌入。学习到的姿势流与 IMU 数据的正向运动学求解同时处理,时间模型 (LSTM) 利用求解的关节之间丰富的空间和时间长距离依赖性,然后将两个流融合到最终的全连接层中。两个互补的数据源允许在每个传感器模态内解决歧义,从而提高了先前方法的准确性。使用流行的 Human 3.6M 数据集(Ionescu et al. in Intell IEEE Trans Pattern Anal Mach 36(7):1325–1339, 2014)、新发布的 TotalCapture 数据集和具有挑战性的一组户外视频 TotalCaptureOutdoor。我们发布了新的混合 MVV 数据集 (TotalCapture),包括多视点视频、IMU 和源自商业运动捕捉系统的准确 3D 骨骼关节地面实况。该数据集可在 http://cvssp.org/data/totalcapture/ 在线获得。我们发布了新的混合 MVV 数据集 (TotalCapture),包括多视点视频、IMU 和源自商业运动捕捉系统的准确 3D 骨骼关节地面实况。该数据集可在 http://cvssp.org/data/totalcapture/ 在线获得。我们发布了新的混合 MVV 数据集 (TotalCapture),包括多视点视频、IMU 和源自商业运动捕捉系统的准确 3D 骨骼关节地面实况。该数据集可在 http://cvssp.org/data/totalcapture/ 在线获得。
更新日期:2018-09-08
down
wechat
bug