当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to transfer focus of graph neural network for scene graph parsing
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.patcog.2020.107707
Junjie Jiang , Zaixing He , Shuyou Zhang , Xinyue Zhao , Jianrong Tan

Abstract Scene graph parsing has become a new challenge in the field of image understanding and pattern recognition in recent years. It captures objects and their relationships, and provides a structured representation of the visual scene. Among the three types of high-level relationships of scene graphs, semantic relationships, which contain the global understanding of the scene, are the core and the most valuable, while geometric and possessive relationships contain local and limited information. However, semantic relationships have the characteristics of multiple types and fewer instances, leading to a low recognition rate of most semantic relationships by existing detectors. To address this issue, this paper proposes a new architecture, the graphical focal network, which uses a decision-level global detector to capture the dependencies between object and relationship local detectors. We construct a graphical focal loss, which overcomes the lack of semantic relationship instances by adjusting the proportion of relationship loss based on the degree of relationship rarity and learning difficulty, and improves the stability of key object recognition by adjusting the proportion of object loss based on the degree of node connectivity and the value of neighborhood relationships. The proposed relative depth encoding module and regional layout encoding module, respectively, introduce relative depth information and more effective geometric layout information between objects, thereby further improving the performance. Experiments using the Visual Genome benchmark show that our method outperforms the most advanced competitors in two types of performance metrics. For semantic types, the recognition rate of our method is 2.0 times that of the baseline.

中文翻译:

学习转移图神经网络的焦点以进行场景图解析

摘要 场景图解析近年来成为图像理解和模式识别领域的新挑战。它捕捉对象及其关系,并提供视觉场景的结构化表示。在场景图的三类高层关系中,包含对场景全局理解的语义关系是核心和最有价值的,而几何关系和占有关系则包含局部和有限的信息。然而,语义关系具有类型多、实例少的特点,导致现有检测器对大部分语义关系的识别率较低。为了解决这个问题,本文提出了一种新的架构,图形焦点网络,它使用决策级全局检测器来捕获对象和关系局部检测器之间的依赖关系。我们构建了一个图形焦点损失,通过根据关系稀有程度和学习难度调整关系损失的比例来克服语义关系实例的缺乏,并通过根据关系的稀有程度和学习难度调整对象损失的比例来提高关键对象识别的稳定性。节点连通度和邻域关系的价值。所提出的相对深度编码模块和区域布局编码模块分别引入了对象之间的相对深度信息和更有效的几何布局信息,从而进一步提高了性能。使用 Visual Genome 基准测试的实验表明,我们的方法在两种类型的性能指标上都优于最先进的竞争对手。对于语义类型,我们方法的识别率是基线的 2.0 倍。
更新日期:2021-04-01
down
wechat
bug