Show, Recall, and Tell: Image Captioning with Recall Mechanism,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Show, Recall, and Tell: Image Captioning with Recall Mechanism
arXiv - CS - Machine Learning Pub Date : 2020-01-15 , DOI: arxiv-2001.05876
Li Wang, Zechen Bai, Yonghua Zhang, Hongtao Lu

Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.

中文翻译：

展示、回忆和讲述：具有回忆机制的图像字幕

在图像字幕中生成自然而准确的描述一直是一个挑战。在本文中，我们提出了一种新颖的召回机制来模仿人类进行字幕的方式。我们的召回机制分为三部分：召回单元、语义指南（SG）和召回词槽（RWS）。Recall unit 是一个文本检索模块，旨在检索图像中的回忆单词。SG 和 RWS 是为最好地使用回忆的单词而设计的。SG 分支可以生成回忆上下文，可以指导生成字幕的过程。RWS 分支负责将召回的单词复制到标题中。受文本摘要中指向机制的启发，我们采用软开关来平衡SG和RWS之间的生成词概率。在 CIDEr 优化步骤中，我们还引入了个人回忆词奖励 (WR) 来促进训练。

更新日期：2020-09-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文