Image Captioning using Reinforcement Learning with BLUDEr Optimization,Pattern Recognition and Image Analysis

当前位置： X-MOL 学术 › Pattern Recognit. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Image Captioning using Reinforcement Learning with BLUDEr Optimization
Pattern Recognition and Image Analysis ( IF 0.7 ) Pub Date : 2021-01-14 , DOI: 10.1134/s1054661820040094
P. R. Devi , V. Thrivikraman , D. Kashyap , S. S. Shylaja

Abstract

Image captioning is a growing field of research that has taken hold of the research community. It is a challenging task owing to the complexity of natural language generation and the difficulty involved in feature extraction from a diverse collection of images. Many models have been proposed to tackle the problem, like state-of-the-art encoder-decoder (Sequential CNN-RNN) systems that have proved to be capable of obtaining results. Recently, Reinforcement learning has made itself the new approach to the problem and has been successful in surpassing many of the state-of-the-art paradigms. We have come up with a new reward system known as the BLUDEr metric, which is a linear combination of the non-differentiable metrics BLEU and CIDEr. We directly optimize this metric for our model, on natural language generation tasks. In our experiments, we use the Flickr30k and Flickr8k datasets, which have become two of the benchmark datasets when it comes to image captioning systems. We have achieved state-of-the-art results on these two datasets, when compared with other models.

中文翻译：

使用带有BLUDEr优化的强化学习的图像字幕

摘要

图像字幕是一个日益增长的研究领域，已经占据了研究界的控制范围。由于自然语言生成的复杂性以及从各种图像集合中提取特征所涉及的困难，这是一项具有挑战性的任务。已经提出了许多模型来解决该问题，例如最先进的编码器/解码器（顺序CNN-RNN）系统已被证明能够获得结果。最近，强化学习已成为解决该问题的新方法，并成功地超越了许多最新的范例。我们提出了一种新的奖励系统，称为BLUDEr度量标准，它是不可微分度量标准BLEU和CIDEr的线性组合。我们根据自然语言生成任务直接为我们的模型优化此指标。在我们的实验中我们使用Flickr30k和Flickr8k数据集，它们在图像字幕系统中已成为基准数据集的两个。与其他模型相比，我们已经在这两个数据集上取得了最新的成果。

更新日期：2021-01-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文