Image captioning in Hindi language using transformer networks,Computers & Electrical Engineering

当前位置： X-MOL 学术 › Comput. Electr. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Image captioning in Hindi language using transformer networks
Computers & Electrical Engineering ( IF 4.0 ) Pub Date : 2021-04-17 , DOI: 10.1016/j.compeleceng.2021.107114
Santosh Kumar Mishra , Rijul Dhir , Sriparna Saha , Pushpak Bhattacharyya , Amit Kumar Singh

Neural encoder–decoder architectures have been used extensively for image captioning. Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are popularly used in encoder and decoder models. Recurrent Neural Networks are popular architectures in natural language processing used for language modeling, but they are sequential in nature. The transformer model can solve this problem of sequential dependency by using an attention mechanism. Many works are available for image captioning in the English language, but models for generating Hindi captions are limited; hence, we have tried to fill this gap. We have created the Hindi dataset for image captioning by manually translating the popular MSCOCO dataset from English to Hindi. Experimental results show that our proposed model outperforms other models. The proposed model has attained the BLEU-1 score of 62.9, BLEU-2 score of 43.3, BLEU-3 score of 29.1, and BLEU4 score of 19.0.

中文翻译：

使用变压器网络以印地语提供图像字幕

神经编码器-解码器体系结构已广泛用于图像字幕。卷积神经网络（CNN）和递归神经网络（RNN）广泛用于编码器和解码器模型。递归神经网络是用于语言建模的自然语言处理中流行的体系结构，但是它们本质上是顺序的。变换器模型可以通过使用注意机制来解决此顺序依赖性问题。许多作品都可以用英语进行图像字幕，但是用于生成印地语字幕的模型是有限的。因此，我们试图填补这一空白。我们通过手动将流行的MSCOCO数据集从英语翻译成印地语，创建了用于图像字幕的印地语数据集。实验结果表明，我们提出的模型优于其他模型。

更新日期：2021-04-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11