Enhancing image retrieval for complex queries using external knowledge sources,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing image retrieval for complex queries using external knowledge sources
Multimedia Tools and Applications ( IF 3.6 ) Pub Date : 2020-07-28 , DOI: 10.1007/s11042-020-09360-0
Haitham Samih , Sherine Rady , Tarek F. Gharib

Annotation-based image retrieval associates textual descriptions to images based on human perception. A user query, composed of keywords of choice and for retrieval, are usually matched lexically with the textual descriptions associated for stored images to extract the best matches. This paradigm will not produce appropriate desired results for complex queries if a semantic approach is not considered. This paper proposes an image retrieval framework which integrates external knowledge sources for obtaining a higher-level inference that can both handle complex queries and increase the number of relevant retrievals. The framework includes a parser where a semantic representation graph is initially generated from both image captions and query. The semantic representation of image captions is stored in the form of Resource Description Framework (RDF) triples, while the user query is translated into a SPARQL language query. For better query understanding, the external knowledge sources (ConceptNet, WordNet), are next fused together with the parser’s output in a significant process named query expansion to infer combined and expanded knowledge about the terms used in the query. Also, the expansion process generates a set of expansion rules to semantically expand the user query to adapt the inferred knowledge. The expanded query is matched against the stored RDF triplets to indicate the best matched image retrievals. Retrievals are eventually ranked using a relation similarity metric to obtain a ranked list of relevant images. Experimental studies carried on two Flickr datasets show that the proposed framework outperforms related work with 40% increase in the number of relevant retrievals at almost full accuracy. The framework achieves additionally an average increase for the accuracy at given k in the range of 50–72% for up to the tenth retrieval.

中文翻译：

使用外部知识源来增强复杂查询的图像检索

基于注释的图像检索基于人类的感知将文本描述与图像相关联。通常，将由选择的关键字组成并用于检索的用户查询在词法上与与存储图像相关联的文本描述相匹配，以提取最佳匹配。如果不考虑语义方法，则该范例将不会为复杂的查询产生适当的期望结果。本文提出了一种图像检索框架，该框架集成了外部知识源，以获取可以处理复杂查询并增加相关检索次数的高级推理。该框架包括一个解析器，其中最初从图像标题和查询生成语义表示图。图像标题的语义表示以资源描述框架（RDF）三元组的形式存储，而用户查询被转换为SPARQL语言查询。为了更好地了解查询，接下来在一个名为查询扩展的重要过程中将外部知识源（ConceptNet，WordNet）与解析器的输出融合在一起，以推断出有关查询中使用的术语的组合知识和扩展知识。此外，扩展过程会生成一组扩展规则，以在语义上扩展用户查询以适应推断的知识。扩展查询与存储的RDF三元组匹配，以指示最匹配的图像检索。最终使用关系相似性度量对检索结果进行排名，以获得相关图像的排名列表。对两个Flickr数据集进行的实验研究表明，所提出的框架的性能优于相关工作，并且相关检索的数量几乎完全准确地增加了40％。在给定的精度下，该框架还实现了平均精度的提高对于第10次检索，k的范围为50–72％。

更新日期：2020-07-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>