当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Delving Deeper into the Decoder for Video Captioning
arXiv - CS - Computation and Language Pub Date : 2020-01-16 , DOI: arxiv-2001.05614 Haoran Chen, Jianmin Li and Xiaolin Hu
arXiv - CS - Computation and Language Pub Date : 2020-01-16 , DOI: arxiv-2001.05614 Haoran Chen, Jianmin Li and Xiaolin Hu
Video captioning is an advanced multi-modal task which aims to describe a
video clip using a natural language sentence. The encoder-decoder framework is
the most popular paradigm for this task in recent years. However, there exist
some problems in the decoder of a video captioning model. We make a thorough
investigation into the decoder and adopt three techniques to improve the
performance of the model. First of all, a combination of variational dropout
and layer normalization is embedded into a recurrent unit to alleviate the
problem of overfitting. Secondly, a new online method is proposed to evaluate
the performance of a model on a validation set so as to select the best
checkpoint for testing. Finally, a new training strategy called professional
learning is proposed which uses the strengths of a captioning model and
bypasses its weaknesses. It is demonstrated in the experiments on Microsoft
Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT)
datasets that our model has achieved the best results evaluated by BLEU, CIDEr,
METEOR and ROUGE-L metrics with significant gains of up to 18% on MSVD and 3.5%
on MSR-VTT compared with the previous state-of-the-art models.
中文翻译:
深入研究视频字幕解码器
视频字幕是一项先进的多模态任务,旨在使用自然语言句子描述视频剪辑。编码器-解码器框架是近年来此任务最流行的范式。然而,视频字幕模型的解码器存在一些问题。我们对解码器进行了深入研究,并采用了三种技术来提高模型的性能。首先,将变分 dropout 和层归一化的组合嵌入到循环单元中以缓解过拟合问题。其次,提出了一种新的在线方法来评估模型在验证集上的性能,以选择最佳检查点进行测试。最后,提出了一种称为专业学习的新培训策略,该策略利用字幕模型的优势并绕过其弱点。
更新日期:2020-02-18
中文翻译:
深入研究视频字幕解码器
视频字幕是一项先进的多模态任务,旨在使用自然语言句子描述视频剪辑。编码器-解码器框架是近年来此任务最流行的范式。然而,视频字幕模型的解码器存在一些问题。我们对解码器进行了深入研究,并采用了三种技术来提高模型的性能。首先,将变分 dropout 和层归一化的组合嵌入到循环单元中以缓解过拟合问题。其次,提出了一种新的在线方法来评估模型在验证集上的性能,以选择最佳检查点进行测试。最后,提出了一种称为专业学习的新培训策略,该策略利用字幕模型的优势并绕过其弱点。