当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Iterative Visual Relationship Detection via Commonsense Knowledge Graph
Big Data Research ( IF 3.3 ) Pub Date : 2020-12-21 , DOI: 10.1016/j.bdr.2020.100175
Hai Wan , Jinrui Liang , Jianfeng Du , Yanan Liu , Jialing Ou , Baoyi Wang , Jeff Z. Pan , Juan Zeng

Scene Graph Generation, which discovers the interaction between pairs of entities in an image, plays a significant role in image understanding. Most recent studies only consider visual features, ignoring the implicit effect of commonsense. We propose a novel model to take the advantage of commonsense knowledge in Scene Graph Generation, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). IVRDC consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional recurrent neural network; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. These two modules roll out iteratively and cross-feed predictions from and to each other. The final predictions are made by taking the result of every iteration into account with an attention mechanism. Experimental results on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.



中文翻译:

常识知识图的迭代视觉关系检测

场景图生成(Scene Graph Generation)发现图像中实体对之间的相互作用,在图像理解中起着重要作用。最近的研究仅考虑视觉特征,而忽略了常识的内在影响。我们提出了一种利用场景图生成中常识知识的优势的新颖模型,称为具有常识知识图的迭代视觉关系检测(IVRDC)。IVRDC由两个模块组成:特征模块,通过双向递归神经网络通过视觉特征和语义特征预测谓词;常识知识模块,用于构造特定的常识知识图用于谓词预测。这两个模块相互之间反复进行迭代和交叉馈送预测。通过使用注意力机制将每次迭代的结果考虑在内,可以做出最终的预测。视觉关系检测(VRD)数据集和视觉基因组(VG)数据集的实验结果表明,我们提出的模型具有竞争力。

更新日期:2021-01-06
down
wechat
bug