当前位置: X-MOL 学术Mod. Phys. Lett. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating external knowledge for image captioning using CNN and LSTM
Modern Physics Letters B ( IF 1.9 ) Pub Date : 2020-07-17 , DOI: 10.1142/s0217984920503157
Himanshu Sharma 1 , Anand Singh Jalal 1
Affiliation  

Image captioning is a multidisciplinary artificial intelligence (AI) research task that has captures the interest of both image and natural language processing experts. Image captioning is a complex problem as it sometimes requires accessing the information that may not be directly visualized in a given scene. It possibly will require common sense interpretation or the detailed knowledge about the object present in image. In this paper, we have given a method that utilizes both visual and external knowledge from knowledge bases such as ConceptNet for better description the images. We demonstrated the usefulness of the method on two publicly available datasets; Flickr8k and Flickr30k.The results explain that the proposed model outperforms the state-of-the art approaches for generating image captions. At last, we will talk about possible future prospects in image captioning.

中文翻译:

使用 CNN 和 LSTM 结合外部知识进行图像描述

图像字幕是一项多学科人工智能 (AI) 研究任务,引起了图像和自然语言处理专家的兴趣。图像字幕是一个复杂的问题,因为它有时需要访问在给定场景中可能无法直接可视化的信息。它可能需要常识解释或有关图像中存在的对象的详细知识。在本文中,我们给出了一种方法,该方法利用来自 ConceptNet 等知识库的视觉和外部知识来更好地描述图像。我们在两个公开可用的数据集上展示了该方法的有用性;Flickr8k 和 Flickr30k。结果说明所提出的模型优于生成图像说明的最先进方法。最后,
更新日期:2020-07-17
down
wechat
bug