当前位置: X-MOL 学术Int. J. Approx. Reason. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A semi-supervised deep learning image caption model based on Pseudo Label and N-gram
International Journal of Approximate Reasoning ( IF 3.2 ) Pub Date : 2020-12-28 , DOI: 10.1016/j.ijar.2020.12.016
Cheng Cheng , Chunping Li , Youfang Han , Yan Zhu

Image caption is an important application field of artificial intelligence technique. When a machine can describe a picture reasonably like a human, it represents that the machine has higher intelligence to understand the picture. However, for complex machine learning tasks such as image caption, data annotation is time-consuming and laborious. Usually in a new application scenario, data annotation rarely results in poor model performance. A large number of easily available unlabeled image data make it possible to semi-supervised learning of image caption methods. Based on the existing end-to-end deep learning paradigm, a semi-supervised deep learning method is proposed in this paper, called N-gram + Pseudo Label NIC method. The method combines the current mainstream deep neural network method, e.g. the NIC (Neural Image Caption) model, and the semi-supervised deep learning idea of pseudo labels, and N-gram. This method generates pseudo labels by N-gram Search algorithm, and improves the effect of the model by utilizing the prior knowledge of the N-gram table and people's descriptive habits. This method has achieved better results than the original NIC model on different sub-data sets of Flickr 8K data set and MSCOCO data set of 0.5k, 1k, 2k and 3k under BLEU-1 evaluation criteria.



中文翻译:

基于伪标签和N-gram的半监督深度学习图像标题模型

图像字幕是人工智能技术的重要应用领域。当机器可以像人类一样合理地描述图片时,则表示该机器具有更高的智能来理解图片。但是,对于复杂的机器学习任务(例如图像标题),数据注释既费时又费力。通常在新的应用程序场景中,数据注释很少会导致较差的模型性能。大量易于获得的未标记图像数据使得对图像字幕方法进行半监督学习成为可能。基于现有的端到端深度学习范式,本文提出了一种半监督的深度学习方法,称为N-gram +伪标签NIC方法。该方法结合了当前主流的深度神经网络方法,例如NIC(神经图像标题)模型,以及伪标签和N-gram的半监督式深度学习思想。该方法通过N-gram搜索算法生成伪标签,并利用N-gram表的先验知识和人们的描述习惯来提高模型的效果。在BLEU-1评估标准下,该方法在Flickr 8K数据集和0.5k,1k,2k和3k的MSCOCO数据集的不同子数据集上取得了比原始NIC模型更好的结果。

更新日期:2021-01-14
down
wechat
bug