当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Understanding the limits of 2D skeletons for action recognition
Multimedia Systems ( IF 3.9 ) Pub Date : 2021-02-07 , DOI: 10.1007/s00530-021-00754-0
Petr Elias , Jan Sedmidubsky , Pavel Zezula

With the development of motion capture technologies, 3D action recognition has become a popular task that finds great applicability in many areas, such as augmented reality, human–computer interaction, sports, or healthcare. On the other hand, the acquisition of 3D human skeleton data is an expensive and time-consuming process, mainly due to the high costs of capturing technologies and the absence of suitable actors. We overcome these issues by focusing on the 2D skeleton modality that can be easily extracted from ordinary videos. The objective of this work is to demonstrate a high descriptive power of such a 2D skeleton modality by achieving accuracy on the task of daily action recognition competitive to 3D skeleton data. More importantly, we thoroughly analyze the factors that significantly influence the 2D recognition accuracy, such as the sensitivity towards data normalization, scaling, quantization, and 3D-to-2D distortions in skeleton orientations and sizes, which are caused by the loss of depth dimension and fixed-angle camera view. We also provide valuable insights on how to mitigate these problems to increase recognition accuracy significantly. The experimental evaluation is conducted on three datasets different in nature. The ability to learn different types of actions better using either 2D or 3D skeletons is also reported. Throughout experiments, a generic light-weight LSTM network is used, whose architecture can be easily tuned to achieve the desired trade-off between its accuracy and efficiency. We show that the proposed approach achieves not only the state-of-the-art results in 2D skeleton action recognition but is also highly competitive to the best-performing methods classifying 3D skeleton sequences or the visual content extracted from ordinary videos.



中文翻译:

了解用于动作识别的2D骨架的局限性

随着动作捕捉技术的发展,3D动作识别已成为一项流行的任务,在增强现实,人机交互,运动或医疗保健等许多领域都具有很高的适用性。另一方面,3D人体骨骼数据的获取是一个昂贵且耗时的过程,这主要是由于捕获技术的高成本和缺少合适的参与者。我们通过专注于可以从普通视频中轻松提取的2D骨架模态来克服这些问题。这项工作的目的是通过实现与3D骨架数据竞争的日常动作识别任务的准确性,来展示这种2D骨架模态的高描述能力。更重要的是,我们彻底分析了影响2D识别精度的因素,例如对数据归一化,缩放,量化以及骨骼方向和尺寸的3D到2D失真的敏感性,这些是由深度尺寸和固定角度相机视图的丢失引起的。我们还提供有关如何减轻这些问题以显着提高识别准确性的宝贵见解。对性质不同的三个数据集进行了实验评估。还报告了使用2D或3D骨骼更好地学习不同类型的动作的能力。在整个实验中,使用了通用的轻量级LSTM网络,可以轻松调整其架构,以在精度和效率之间取得理想的平衡。

更新日期:2021-02-07
down
wechat
bug