A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2020-05-25 , DOI: 10.1145/3365212
Jiaying Liu ₁ , Sijie Song ₁ , Chunhui Liu ₁ , Yanghao Li ₁ , Yueyu Hu ₁

Affiliation

Large-scale benchmarks provide a solid foundation for the development of action analytics. Most of the previous activity benchmarks focus on analyzing actions in RGB videos. There is a lack of large-scale and high-quality benchmarks for multi-modal action analytics. In this article, we introduce PKU Multi-Modal Dataset (PKU-MMD), a new large-scale benchmark for multi-modal human action analytics. It consists of about 28,000 action instances and 6.2 million frames in total and provides high-quality multi-modal data sources, including RGB, depth, infrared radiation (IR), and skeletons. To make PKU-MMD more practical, our dataset comprises two subsets under different settings for action understanding, namely Part I and Part II. Part I contains 1,076 untrimmed video sequences with 51 action classes performed by 66 subjects, while Part II contains 1,009 untrimmed video sequences with 41 action classes performed by 13 subjects. Compared to Part I, Part II is more challenging due to short action intervals, concurrent actions and heavy occlusion. PKU-MMD can be leveraged in two scenarios: action recognition with trimmed video clips and action detection with untrimmed video sequences. For each scenario, we provide benchmark performance on both subsets by conducting different methods with different modalities under two evaluation protocols, respectively. Experimental results show that PKU-MMD is a significant challenge to many state-of-the-art methods. We further illustrate that the features learned on PKU-MMD can be well transferred to other datasets. We believe this large-scale dataset will boost the research in the field of action analytics for the community.

中文翻译：

多模态人类行为分析的基准数据集和比较研究

大规模的基准测试为行动分析的发展提供了坚实的基础。以前的大多数活动基准测试都集中在分析 RGB 视频中的动作。多模式行动分析缺乏大规模和高质量的基准。在本文中，我们介绍了 PKU 多模态数据集 (PKU-MMD)，这是一种用于多模态人类行为分析的新的大规模基准。它由大约 28,000 个动作实例和总共 620 万帧组成，并提供高质量的多模态数据源，包括 RGB、深度、红外辐射 (IR) 和骨骼。为了使 PKU-MMD 更实用，我们的数据集包括两个不同设置下的子集，用于动作理解，即第一部分和第二部分。第一部分包含 1,076 个未修剪的视频序列，由 66 个受试者执行 51 个动作类，而第二部分包含 1 个，009 个未修剪的视频序列，由 13 个受试者执行，包含 41 个动作类。与第一部分相比，第二部分由于动作间隔短、动作并发和严重遮挡而更具挑战性。PKU-MMD 可以在两种情况下使用：带有修剪视频剪辑的动作识别和带有未修剪视频序列的动作检测。对于每种情况，我们分别通过在两种评估协议下以不同的方式执行不同的方法来提供两个子集的基准性能。实验结果表明，PKU-MMD 是对许多最先进方法的重大挑战。我们进一步说明在 PKU-MMD 上学习的特征可以很好地转移到其他数据集。我们相信这个大规模的数据集将推动社区行动分析领域的研究。

更新日期：2020-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文