Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Wu, Jialin; Mooney, Raymond J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:1910.14208 (cs)

[Submitted on 31 Oct 2019 (v1), last revised 14 Jan 2020 (this version, v2)]

Title:Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Authors:Jialin Wu, Raymond J. Mooney

View PDF

Abstract:Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images or detected objects as inputs.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:1910.14208 [cs.CV]
	(or arXiv:1910.14208v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1910.14208

Submission history

From: Jialin Wu [view email]
[v1] Thu, 31 Oct 2019 01:56:33 UTC (5,063 KB)
[v2] Tue, 14 Jan 2020 19:21:02 UTC (5,064 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 1910

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jialin Wu
Raymond J. Mooney

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators