当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Graph-based Interactive Reasoning for Human-Object Interaction Detection
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-07-14 , DOI: arxiv-2007.06925
Dongming Yang and Yuexian Zou

Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring triplets of < human, verb, object >. However, recent HOI detection methods mostly rely on additional annotations (e.g., human pose) and neglect powerful interactive reasoning beyond convolutions. In this paper, we present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs, in which interactive semantics implied among visual targets are efficiently exploited. The proposed model consists of a project function that maps related targets from convolution space to a graph-based semantic space, a message passing process propagating semantics among all nodes and an update function transforming the reasoned nodes back to convolution space. Furthermore, we construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Beyond inferring HOIs using instance features respectively, the framework dynamically parses pairwise interactive semantics among visual targets by integrating two-level in-Graphs, i.e., scene-wide and instance-wide in-Graphs. Our framework is end-to-end trainable and free from costly annotations like human pose. Extensive experiments show that our proposed framework outperforms existing HOI detection methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 9.4% and 15% relatively, validating its efficacy in detecting HOIs.

中文翻译:

基于图的人机交互检测交互推理

人与物体交互 (HOI) 检测致力于通过推断 <human,verb,object> 的三元组来了解人类如何与周围物体交互。然而,最近的 HOI 检测方法主要依赖于额外的注释(例如,人体姿势),而忽略了卷积之外的强大交互推理。在本文中,我们提出了一种新的基于图的交互式推理模型,称为交互式图(简称 in-Graph)来推断 HOI,其中有效地利用了视觉目标之间隐含的交互式语义。所提出的模型由将相关目标从卷积空间映射到基于图的语义空间的项目函数、在所有节点之间传播语义的消息传递过程以及将推理节点转换回卷积空间的更新函数组成。此外,我们构建了一个新的框架来组装用于检测 HOI 的图内模型,即图内网络。除了分别使用实例特征推断 HOI 之外,该框架还通过集成两级 in-Graphs,即场景范围内和实例范围内的 in-Graphs,动态解析视觉目标之间的成对交互语义。我们的框架是端到端可训练的,并且没有像人体姿势这样的昂贵的注释。大量实验表明,我们提出的框架在 V-COCO 和 HICO-DET 基准上均优于现有的 HOI 检测方法,并相对提高了约 9.4% 和 15% 的基线,验证了其在检测 HOI 方面的有效性。该框架通过集成两级in-Graphs,即场景范围和实例范围的in-Graphs,动态解析视觉目标之间的成对交互语义。我们的框架是端到端可训练的,并且没有像人体姿势这样的昂贵的注释。大量实验表明,我们提出的框架在 V-COCO 和 HICO-DET 基准上均优于现有的 HOI 检测方法,并相对提高了约 9.4% 和 15% 的基线,验证了其在检测 HOI 方面的有效性。该框架通过集成两级in-Graphs,即场景范围和实例范围的in-Graphs,动态解析视觉目标之间的成对交互语义。我们的框架是端到端可训练的,并且没有像人体姿势这样的昂贵的注释。大量实验表明,我们提出的框架在 V-COCO 和 HICO-DET 基准上均优于现有的 HOI 检测方法,并相对提高了约 9.4% 和 15% 的基线,验证了其在检测 HOI 方面的有效性。
更新日期:2020-07-15
down
wechat
bug