当前位置: X-MOL 学术ETRI J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Three‐stream network with context convolution module for human–object interaction detection
ETRI Journal ( IF 1.4 ) Pub Date : 2020-02-11 , DOI: 10.4218/etrij.2019-0230
Thomhert S. Siadari 1, 2 , Mikyong Han 2 , Hyunjin Yoon 1, 2
Affiliation  

Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.

中文翻译:

具有上下文卷积模块的三流网络,用于人与对象的交互检测

人与物体的交互(HOI)检测是一种流行的计算机视觉任务,用于检测人与物体之间的交互。在需要深入了解语义场景的许多应用程序中,此任务很有用。当前的HOI检测网络通常包括特征提取器,其后是包含小滤波器(例如1×1或3×3)的检测层。尽管小型过滤器可以使用几个参数来捕获局部空间特征,但由于它们的接收区域较小,因此无法捕获与识别人和远处物体之间的交互作用有关的较大上下文信息。因此,我们在这里提出一种三流HOI检测网络,该网络在每个流分支中采用上下文卷积模块(CCM)。通过采用大型可分离卷积层和基于残差的卷积层的组合,CCM可以从输入特征图中捕获较大的上下文,而无需通过使用较少的大型可分离滤波器来增加参数的数量。我们使用两个基准数据集V‐COCO和HICO‐DET评估了HOI检测方法,并展示了其最新的性能。
更新日期:2020-02-11
down
wechat
bug