当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Image and Sentence Matching via Semantic Concepts and Order Learning.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2018-11-28 , DOI: 10.1109/tpami.2018.2883466
Yan Huang , Qi Wu , Wei Wang , Liang Wang

Image and sentence matching has made great progress recently, but it remains challenging due to the existing large visual-semantic discrepancy. This mainly arises from two aspects: 1) images consist of unstructured content which is not semantically abstract as the words in the sentences, so they are not directly comparable, and 2) arranging semantic concepts in different semantic order could lead to quite diverse meanings. The words in the sentences are sequentially arranged in a grammatical manner, while the semantic concepts in the images are usually unorganized. In this work, we propose a semantic concepts and order learning framework for image and sentence matching, which can improve the image representation by first predicting semantic concepts and then organizing them in a correct semantic order. Given an image, we first use a multi-regional multi-label CNN to predict its included semantic concepts in terms of object, property and action. These word-level semantic concepts are directly comparable with the words of noun, adjective and verb in the matched sentence. Then, to organize these concepts and make them express similar meanings as the matched sentence, we use a context-modulated attentional LSTM to learn the semantic order. It regards the predicted semantic concepts and image global scene as context at each timestep, and selectively attends to concept-related image regions by referring to the context in a sequential order. To further enhance the semantic order, we perform additional sentence generation on the image representation, by using the groundtruth order in the matched sentence as supervision. After obtaining the improved image representation, we learn the sentence representation with a conventional LSTM, and then jointly perform image and sentence matching and sentence generation for model learning. Extensive experiments demonstrate the effectiveness of our learned semantic concepts and order, by achieving the state-of-the-art results on two public benchmark datasets.

中文翻译:

通过语义概念和顺序学习进行图像和句子匹配。

图像和句子匹配最近取得了长足的进步,但是由于现有的巨大的视觉语义差异,它仍然具有挑战性。这主要来自两个方面:1)图像由非结构化内容组成,这些内容在语义上不像句子中的单词那样抽象,因此它们不能直接比较; 2)以不同的语义顺序排列语义概念可能会导致十分不同的含义。句子中的单词以语法方式顺序排列,而图像中的语义概念通常是没有组织的。在这项工作中,我们提出了一种用于图像和句子匹配的语义概念和顺序学习框架,该框架可以通过先预测语义概念然后以正确的语义顺序组织它们来改善图像表示。给定一张图片,我们首先使用多区域多标签CNN从对象,属性和动作方面预测其包含的语义概念。这些词级语义概念可直接与匹配句子中的名词,形容词和动词词相提并论。然后,为了组织这些概念并使它们表达与匹配句子相似的含义,我们使用上下文调制的注意力LSTM来学习语义顺序。它在每个时间步均将预测的语义概念和图像全局场景视为上下文,并通过按顺序引用上下文来有选择地关注与概念相关的图像区域。为了进一步增强语义顺序,我们使用匹配句子中的地面顺序作为监督对图像表示执行额外的句子生成。在获得改进的图像表示之后,我们使用常规的LSTM学习句子表示,然后联合执行图像和句子匹配以及用于模型学习的句子生成。通过在两个公共基准数据集上获得最新结果,大量实验证明了我们学到的语义概念和顺序的有效性。
更新日期:2020-02-07
down
wechat
bug