当前位置: X-MOL 学术Signal Process. Image Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Past is important: Improved image captioning by looking back in time
Signal Processing: Image Communication ( IF 3.5 ) Pub Date : 2021-02-10 , DOI: 10.1016/j.image.2021.116183
Yiwei Wei , Chunlei Wu , ZhiYang Jia , XuFei Hu , Shuang Guo , Haitao Shi

A major development in the area of image captioning consists of trying to incorporate visual attention in the design of language generative model. However, most previous studies only emphasize its role in enhancing visual composition at the current moment, while neglect its role in global sequence reasoning. This problem appears not only in captioning model, but also in reinforcement learning structure. To tackle this issue, we first propose a Visual Reserved model that enables previous visual context to be considered for the current sequence reasoning. Next, a Attentional-Fluctuation Supervised model is also proposed in reinforcement learning structure. Compared against the traditional strategies that only take non-differentiable Natural Language Processing (NLP) metrics as the incentive standard, the proposed model regards the fluctuation of previous attention matrix as an important indicator to judge the convergence of the captioning model. The proposed methods have been tested on MS-COCO captioning dataset and achieve competitive results evaluated by the evaluation server of MS COCO captioning challenge.



中文翻译:

过去很重要:通过回顾时间来改善图像字幕

图像字幕领域的一项重大发展是试图将视觉注意力纳入语言生成模型的设计中。然而,大多数先前的研究仅强调其在当前时刻增强视觉组成方面的作用,而忽略了其在全局序列推理中的作用。这个问题不仅出现在字幕模型中,而且出现在强化学习结构中。为了解决这个问题,我们首先提出一个视觉保留模型,该模型可以将先前的视觉上下文考虑为当前序列推理。接下来,在强化学习结构中还提出了注意力波动监督模型。与仅以不可微分的自然语言处理(NLP)指标作为激励标准的传统策略相比,该模型以先前注意力矩阵的波动为判断字幕模型收敛性的重要指标。所提出的方法已经在MS-COCO字幕数据集上进行了测试,并获得了由MS COCO字幕挑战的评估服务器评估的竞争结果。

更新日期:2021-02-26
down
wechat
bug