当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Show, Recall, and Tell: Image Captioning with Recall Mechanism
arXiv - CS - Computation and Language Pub Date : 2020-01-15 , DOI: arxiv-2001.05876 Li Wang, Zechen Bai, Yonghua Zhang, Hongtao Lu
arXiv - CS - Computation and Language Pub Date : 2020-01-15 , DOI: arxiv-2001.05876 Li Wang, Zechen Bai, Yonghua Zhang, Hongtao Lu
Generating natural and accurate descriptions in image cap-tioning has always
been a challenge. In this paper, we pro-pose a novel recall mechanism to
imitate the way human con-duct captioning. There are three parts in our recall
mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS).
Recall unit is a text-retrieval module designedto retrieve recalled words for
images. SG and RWS are de-signed for the best use of recalled words. SG branch
cangenerate a recalled context, which can guide the process ofgenerating
caption. RWS branch is responsible for copyingrecalled words to the caption.
Inspired by pointing mecha-nism in text summarization, we adopt a soft switch
to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr
optimization step, we also introduce an individualrecalled-word reward (WR) to
boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr /
SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 /
22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the
results of other state-of-the-artmethods.
中文翻译:
展示、回忆和讲述:具有回忆机制的图像字幕
在图像字幕中生成自然而准确的描述一直是一个挑战。在本文中,我们提出了一种新颖的召回机制来模仿人类进行字幕的方式。我们的召回机制分为三部分:召回单元、语义指南(SG)和召回词槽(RWS)。Recall unit 是一个文本检索模块,旨在检索图像中的回忆单词。SG 和 RWS 是为最好地使用回忆的单词而设计的。SG 分支可以生成回忆上下文,可以指导生成字幕的过程。RWS 分支负责将召回的单词复制到标题中。受文本摘要中指向机制的启发,我们采用软开关来平衡SG和RWS之间的生成词概率。在 CIDEr 优化步骤中,我们还引入了个人回忆词奖励 (WR) 来促进训练。
更新日期:2020-09-14
中文翻译:
展示、回忆和讲述:具有回忆机制的图像字幕
在图像字幕中生成自然而准确的描述一直是一个挑战。在本文中,我们提出了一种新颖的召回机制来模仿人类进行字幕的方式。我们的召回机制分为三部分:召回单元、语义指南(SG)和召回词槽(RWS)。Recall unit 是一个文本检索模块,旨在检索图像中的回忆单词。SG 和 RWS 是为最好地使用回忆的单词而设计的。SG 分支可以生成回忆上下文,可以指导生成字幕的过程。RWS 分支负责将召回的单词复制到标题中。受文本摘要中指向机制的启发,我们采用软开关来平衡SG和RWS之间的生成词概率。在 CIDEr 优化步骤中,我们还引入了个人回忆词奖励 (WR) 来促进训练。