Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention,Wireless Communications and Mobile Computing

当前位置： X-MOL 学术 › Wirel. Commun. Mob. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention
Wireless Communications and Mobile Computing Pub Date : 2020-10-21 , DOI: 10.1155/2020/8909458
Yan Chu ₁ , Xiao Yue ₂ , Lei Yu ₁ , Mikhailov Sergei ₁ , Zhengkui Wang ₃

Affiliation

Captioning the images with proper descriptions automatically has become an interesting and challenging problem. In this paper, we present one joint model AICRL, which is able to conduct the automatic image captioning based on ResNet50 and LSTM with soft attention. AICRL consists of one encoder and one decoder. The encoder adopts ResNet50 based on the convolutional neural network, which creates an extensive representation of the given image by embedding it into a fixed length vector. The decoder is designed with LSTM, a recurrent neural network and a soft attention mechanism, to selectively focus the attention over certain parts of an image to predict the next sentence. We have trained AICRL over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics like BLEU, METEROR, and CIDEr. Our experimental results indicate that AICRL is effective in generating captions for the images.

中文翻译：

基于ResNet50和LSTM的具有软注意力的自动图像字幕

用适当的描述自动为图像添加字幕已经成为一个有趣且具有挑战性的问题。在本文中，我们提出了一种联合模型AICRL，该模型能够在ResNet50和LSTM的基础上进行自动图像字幕处理，并且要引起注意。AICRL由一个编码器和一个解码器组成。编码器采用基于卷积神经网络的ResNet50，它通过将给定图像嵌入到固定长度的向量中来创建其广泛表示。解码器采用LSTM，递归神经网络和软注意力机制设计，可以选择性地将注意力集中在图像的某些部分上，以预测下一个句子。我们已经在大型数据集MS COCO 2014上对AICRL进行了训练，以在给定训练图像的情况下最大化目标描述语句的可能性，并以BLEU，METEROR和CIDEr等各种指标对其进行评估。我们的实验结果表明，AICRL可有效地生成图像的字幕。

更新日期：2020-10-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文