当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BERT-hLSTMs: BERT and hierarchical LSTMs for visual storytelling
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-11-26 , DOI: 10.1016/j.csl.2020.101169
Jing Su , Qingyun Dai , Frank Guerin , Mian Zhou

Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation.



中文翻译:

BERT-hLSTM:用于视觉叙事的BERT和分层LSTM

视觉叙事是一项富有创造性和挑战性的任务,旨在为一系列图像自动生成类似故事的描述。以前的视觉叙事方法生成的描述缺乏连贯性,因为它们使用单​​词级序列生成方法,并且没有充分考虑句子级的依存关系。为了解决这个问题,我们提出了一种新颖的分层视觉叙事框架,该框架分别对句子级和单词级语义进行建模。我们使用基于转换器的BERT来获取句子和单词的嵌入。然后,我们采用分层LSTM网络:底部LSTM从BERT接收句子矢量表示作为输入,以学习与图像相对应的句子之间的依存关系,顶部的LSTM负责从底部的LSTM获取输入来生成相应的单词矢量表示。实验结果表明,在自动评估指标BLEU和CIDEr下,我们的模型优于最相关的基线,并且还显示了该方法在人工评估下的有效性。

更新日期:2020-12-03
down
wechat
bug