当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Image captioning with transformer and knowledge graph
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2021-01-07 , DOI: 10.1016/j.patrec.2020.12.020
Yu Zhang , Xinyu Shi , Siya Mi , Xu Yang

The Transformer model has achieved very good results in machine translation tasks. In this paper, we adopt the Transformer model for the image captioning task. To promote the performance of image captioning, we improve the Transformer model from two aspects. First, we augment the maximum likelihood estimation (MLE) with an extra Kullback-Leibler (KL) divergence term to distinguish the difference between incorrect predictions. Second, we introduce a method to help the Transformer model generate captions by leveraging the knowledge graph. Experiments on benchmark datasets demonstrate the effectiveness of our method.



中文翻译:

使用变压器和知识图的图像字幕

Transformer模型在机器翻译任务中取得了很好的结果。在本文中,我们将Transformer模型用于图像字幕任务。为了提高图像字幕的性能,我们从两个方面改进了Transformer模型。首先,我们用额外的Kullback-Leibler(KL)发散项来增加最大似然估计(MLE),以区分错误预测之间的差异。其次,我们介绍一种利用知识图来帮助Transformer模型生成字幕的方法。在基准数据集上进行的实验证明了我们方法的有效性。

更新日期:2021-01-18
down
wechat
bug