当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human-centric Relation Segmentation: Dataset and Solution.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-04-27 , DOI: 10.1109/tpami.2021.3075846
Si Liu , Zitian Wang , Yulu Gao , Lejian Ren , Yue Liao , Guanghui Ren , Bo Li , Shuicheng Yan

Vision and language techniques have achieved remarkable progress, but it is still difficult to well handle problems involving fine-grained details. For example, when the robot is told to bring me the book in the girls left hand, existing methods would fail if the girl holds one book respectively in her left and right hand. In this work, we introduce a new task named human-centric relation segmentation (HRS) as a fine-grained case of HOI-det. It aims to predict the relations between the human and surrounding entities and identify the interacted human parts, which are represented as pixel-level masks. Correspondingly, we collect a new Person In Context (PIC) dataset and propose a Simultaneously Matching and Segmentation (SMS) framework to solve the task. It contains three parallel branches. Specifically, the entity segmentation branch obtains entity masks by dynamically-generated conditional convolutions; the subject object matching branch links the corresponding subjects and objects by displacement estimation and classifies the interacted human parts; and the human parsing branch generates the pixelwise human part labels. Outputs of the three branches are fused to produce the final HRS results. Extensive experiments on two datasets show that SMS outperforms baselines with the 36 FPS inference speed.

中文翻译:

以人为本的关系细分:数据集和解决方案。

视觉和语言技术已经取得了显着进步,但是仍然很难很好地处理涉及细粒度细节的问题。例如,当告诉机器人将女孩的左手拿给我时,如果女孩的左手和右手分别拿着一本书,则现有方法将失败。在这项工作中,我们引入了一个名为“以人为中心的关系分割(HRS)”的新任务,作为HOI-det的细粒度案例。它旨在预测人类与周围物体之间的关系,并识别相互作用的人类部位,这些部位以像素级蒙版表示。相应地,我们收集了一个新的“人在上下文”(PIC)数据集,并提出了“同时匹配和分段”(SMS)框架来解决该任务。它包含三个并行分支。具体来说,实体分割分支通过动态生成的条件卷积获得实体掩码;主体对象匹配分支通过位移估计将对应的主体和对象链接起来,并对交互的人体部位进行分类。人的解析分支会生成按像素的人的部分标签。将三个分支的输出融合在一起,以产生最终的HRS结果。在两个数据集上进行的大量实验表明,SMS在36 FPS推理速度下的性能优于基线。
更新日期:2021-04-27
down
wechat
bug