Learning Social Spatio-Temporal Relation Graph in the Wild and a Video Benchmark,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Social Spatio-Temporal Relation Graph in the Wild and a Video Benchmark
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2021-09-14 , DOI: 10.1109/tnnls.2021.3110682
Haoran Wang , Licheng Jiao , Fang Liu , Lingling Li , Xu Liu , Deyi Ji , Weihao Gan

Social relations are ubiquitous and form the basis of social structure in our daily life. However, existing studies mainly focus on recognizing social relations from still images and movie clips, which are different from real-world scenarios. For example, movie-based datasets define the task as the video classification, only recognizing one relation in the scene. In this article, we aim to study the problem of social relation recognition in an open environment. To close the gap, we provide the first video dataset collected from real-life scenarios, named social relation in the wild (SRIW), where the number of people can be huge and vary, and each pair of relations needs to be recognized. To overcome new challenges, we propose a spatio-temporal relation graph convolutional network (STRGCN) architecture, utilizing correlative visual features to recognize social relations intuitively. Our method decouples the task into two classification tasks: person-level and pair-level relation recognition. Specifically, we propose a person behavior and character module to encode moving and static features in two explicit ways. Then we take them as node features to build a relation graph with meaningful edges in a scene. Based on the relation graph, we introduce the graph convolutional network (GCN) and local GCN to encode social relation features which are used for both recognitions. Experimental results demonstrate the effectiveness of the proposed framework, achieving 83.1% and 40.8% mAP in person-level and pair-level classification. Moreover, the study also contributes to the practicality in this field.

中文翻译：

野外学习社交时空关系图和视频基准

社会关系无处不在，构成了我们日常生活中社会结构的基础。然而，现有的研究主要集中在从静态图像和电影剪辑中识别社会关系，这与现实世界的场景不同。例如，基于电影的数据集将任务定义为视频分类，仅识别场景中的一种关系。在本文中，我们旨在研究开放环境下的社会关系识别问题。为了缩小差距，我们提供了第一个从现实生活场景收集的视频数据集，称为野外社会关系（SRIW），其中人数可能庞大且各不相同，并且需要识别每一对关系。为了克服新的挑战，我们提出了一种时空关系图卷积网络（STRGCN）架构，利用相关的视觉特征直观地识别社会关系。我们的方法将任务解耦为两个分类任务：人级和配对级关系识别。具体来说，我们提出了一个人的行为和角色模块，以两种显式的方式对移动和静态特征进行编码。然后我们将它们作为节点特征来构建场景中具有有意义的边的关系图。基于关系图，我们引入图卷积网络（GCN）和局部GCN来编码用于两种识别的社会关系特征。实验结果证明了该框架的有效性，在人级和配对级分类中实现了 83.1% 和 40.8% 的 mAP。此外，该研究也有助于该领域的实用性。

更新日期：2021-09-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11