Data augmentation to improve robustness of image captioning solutions,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data augmentation to improve robustness of image captioning solutions
arXiv - CS - Computation and Language Pub Date : 2021-06-10 , DOI: arxiv-2106.05437
Shashank Bujimalla, Mahesh Subedar, Omesh Tickoo

In this paper, we study the impact of motion blur, a common quality flaw in real world images, on a state-of-the-art two-stage image captioning solution, and notice a degradation in solution performance as blur intensity increases. We investigate techniques to improve the robustness of the solution to motion blur using training data augmentation at each or both stages of the solution, i.e., object detection and captioning, and observe improved results. In particular, augmenting both the stages reduces the CIDEr-D degradation for high motion blur intensity from 68.7 to 11.7 on MS COCO dataset, and from 22.4 to 6.8 on Vizwiz dataset.

中文翻译：

数据增强以提高图像字幕解决方案的鲁棒性

在本文中，我们研究了运动模糊（现实世界图像中常见的质量缺陷）对最先进的两阶段图像字幕解决方案的影响，并注意到随着模糊强度的增加，解决方案的性能会下降。我们研究了在解决方案的每个或两个阶段使用训练数据增强来提高运动模糊解决方案的鲁棒性的技术，即对象检测和字幕，并观察改进的结果。特别是，增加这两个阶段将高运动模糊强度的 CIDEr-D 退化在 MS COCO 数据集上从 68.7 降低到 11.7，在 Vizwiz 数据集上从 22.4 降低到 6.8。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文