Cross-modal learning with prior visual relation knowledge,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-modal learning with prior visual relation knowledge
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-06-16 , DOI: 10.1016/j.knosys.2020.106150
Jing Yu , Weifeng Zhang , Zhuoqian Yang , Zengchang Qin , Yue Hu

Visual relational reasoning is a central component in recent cross-modal analysis tasks, which aims at reasoning about the visual relationships between objects and their properties. These relationships provide rich semantics and help to enhance the visual representation for improving cross-modal learning. Previous works have succeeded in modeling latent visual relationships or rigid-categorized visual relationships. However, these kinds of methods leave out the problem of ambiguity inherent in the visual relationships because of the diverse relational semantics of different visual appearances. In this work, we explore to model the visual relationships by context-aware representations based on human prior knowledge. Based on such representations, we novelly propose a plug-and-play visual relational reasoning module to enhance image encoding. Specifically, we design an Anisotropic Graph Convolution to utilize the information of relation embeddings and relation directionality between objects for generating relation-aware image representations. We demonstrate the effectiveness of the relational reasoning module by applying it to both Visual Question Answering (VQA) and Cross-Modal Information Retrieval (CMIR) tasks. Extensive experiments are conducted on VQA 2.0 and CMPlaces datasets and superior performance is reported when comparing with state-of-the-art works.

中文翻译：

具有先验视觉关系知识的跨模式学习

视觉关系推理是最近的跨模式分析任务中的核心组成部分，该任务旨在推理对象及其属性之间的视觉关系。这些关系提供了丰富的语义，并有助于增强视觉表示，以改善跨模式学习。先前的工作已经成功地对潜在的视觉关系或严格分类的视觉关系进行了建模。然而，由于不同视觉外观的不同关系语义，这些类型的方法排除了视觉关系中固有的歧义问题。在这项工作中，我们探索通过基于人类先验知识的上下文感知表示来对视觉关系进行建模。基于这种表示，我们新颖地提出了即插即用的视觉关系推理模块来增强图像编码。具体而言，我们设计了各向异性图卷积，以利用对象之间的关系嵌入和关系方向性信息生成关系感知的图像表示。我们通过将关系推理模块应用于视觉问答（VQA）和跨模态信息检索（CMIR）任务来证明其有效性。进行了广泛的实验与最先进的作品进行比较时，报告了VQA 2.0和CMPlaces数据集和优越的性能。

更新日期：2020-06-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11