A Hindi Image Caption Generation Framework Using Deep Learning,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Hindi Image Caption Generation Framework Using Deep Learning
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2021-03-15 , DOI: 10.1145/3432246
Santosh Kumar Mishra ₁ , Rijul Dhir ₁ , Sriparna Saha ₁ , Pushpak Bhattacharyya ₁

Affiliation

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .

中文翻译：

使用深度学习的印地语图像标题生成框架

图像字幕是生成图像的文本描述的过程，旨在描述给定图像的显着部分。这是一个重要的问题，因为它涉及计算机视觉和自然语言处理，其中计算机视觉用于理解图像，而自然语言处理用于语言建模。已经为英语语言的图像字幕做了很多工作。在本文中，我们开发了一个印地语图像字幕模型。印地语是印度的官方语言，它是世界上第四大语言，在印度和南亚使用。据我们所知，这是第一次尝试用印地语生成图像说明。通过将众所周知的 MSCOCO 数据集从英语翻译成印地语来手动创建数据集。最后，为印地语中的图像字幕开发了不同类型的基于注意力的架构。这些注意机制对于印地语来说是新的，因为这些注意力机制从未用于印地语。将所提出模型的结果与几个基线在 BLEU 分数方面进行比较，结果表明我们的模型比其他模型表现更好。在充分性和流畅性方面对获得的字幕进行手动评估也揭示了我们提出的方法的有效性。结果表明我们的模型比其他模型表现更好。在充分性和流畅性方面对获得的字幕进行手动评估也揭示了我们提出的方法的有效性。结果表明我们的模型比其他模型表现更好。在充分性和流畅性方面对获得的字幕进行手动评估也揭示了我们提出的方法的有效性。资源的可用性: 文章的代码可在https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language; 数据集将可用：http://www.iitp.ac.in/∼ai-nlp-ml/resources.html.

更新日期：2021-03-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11