Enhancing the Quality of Image Tagging Using a Visio-Textual Knowledge Base,IEEE Transactions on Multimedia

当前位置： X-MOL 学术 › IEEE Trans. Multimedia › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing the Quality of Image Tagging Using a Visio-Textual Knowledge Base
IEEE Transactions on Multimedia ( IF 7.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tmm.2019.2937181
Chandramani Chaudhary , Poonam Goyal , Dhanashree Nellayi Prasad , Yi-Ping Phoebe Chen

Auto-tagging of images is important for image understanding and for tag-based applications viz. image retrieval, visual question-answering, image captioning, etc. Although existing tagging methods incorporate both visual and textual information to assign/refine tags, they lag in tag-image relevance, completeness, and preciseness, thereby resulting in the unsatisfactory performance of tag-based applications. In order to bridge this gap, we propose a novel framework for tag assignment using knowledge embedding (TAKE) from a proposed external knowledge base, considering properties such as Rarity, Newness, Generality, and Naturalness (RNGN properties). These properties help in providing a rich semantic representation to images. Existing knowledge bases provide multiple types of relations extracted through only one modality, either text or visual, which is not effective in image related applications. We construct a simple yet effective Visio-Textual Knowledge Base (VTKB) with only four relations using reliable resources such as Wikipedia, thesauruses, dictionaries, etc. Our large scale experiments demonstrate that the proposed combination of TAKE and VTKB assigns a large number of high quality tags in comparison to the ConceptNet and ImageNet knowledge bases when used in conjunction with TAKE. Also, the effectiveness of knowledge embedding through VTKB is evaluated for image tagging and tag-based image retrieval (TBIR).

中文翻译：

使用 Visio-Textual 知识库提高图像标记的质量

图像的自动标记对于图像理解和基于标签的应用程序非常重要。图像检索、视觉问答、图像字幕等。现有的标注方法虽然结合了视觉和文本信息来分配/细化标签，但它们在标签图像的相关性、完整性和精确性方面存在滞后，从而导致标签的性能不理想基于应用程序。为了弥合这一差距，我们提出了一种使用来自提议的外部知识库的知识嵌入 (TAKE) 进行标签分配的新框架，考虑了稀有性、新奇性、通用性和自然性（RNGN 属性）等属性。这些属性有助于为图像提供丰富的语义表示。现有知识库仅通过一种形式（文本或视觉）提取多种类型的关系，这在与图像相关的应用程序中无效。我们使用可靠的资源，如维基百科、同义词库、词典等，构建了一个简单而有效的视觉文本知识库 (VTKB)，只有四个关系。我们的大规模实验表明，所提出的 TAKE 和 VTKB 组合分配了大量高当与 TAKE 结合使用时，质量标签与 ConceptNet 和 ImageNet 知识库的比较。此外，还评估了通过 VTKB 进行知识嵌入的有效性，用于图像标记和基于标签的图像检索 (TBIR)。我们的大规模实验表明，当与 TAKE 结合使用时，与 ConceptNet 和 ImageNet 知识库相比，所提出的 TAKE 和 VTKB 组合分配了大量高质量标签。此外，还评估了通过 VTKB 进行知识嵌入的有效性，用于图像标记和基于标签的图像检索 (TBIR)。我们的大规模实验表明，当与 TAKE 结合使用时，与 ConceptNet 和 ImageNet 知识库相比，所提出的 TAKE 和 VTKB 组合分配了大量高质量标签。此外，还评估了通过 VTKB 进行知识嵌入的有效性，用于图像标记和基于标签的图像检索 (TBIR)。

更新日期：2020-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>