Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-01-05 , DOI: 10.1007/s11263-020-01411-1
Canjie Luo , Qingxiang Lin , Yuliang Liu , Lianwen Jin , Chunhua Shen

Scene text recognition is an important task in computer vision. Despite tremendous progress achieved in the past few years, issues such as varying font styles, arbitrary shapes and complex backgrounds etc. have made the problem very challenging. In this work, we propose to improve text recognition from a new perspective by separating the text content from complex backgrounds, thus making the recognition considerably easier and significantly improving recognition accuracy. To this end, we exploit the generative adversarial networks (GANs) for removing backgrounds while retaining the text content . As vanilla GANs are not sufficiently robust to generate sequence-like characters in natural images, we propose an adversarial learning framework for the generation and recognition of multiple characters in an image. The proposed framework consists of an attention-based recognizer and a generative adversarial architecture. Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training. Benefiting from the character-level adversarial training, our framework requires only unpaired simple data for style supervision. Each target style sample containing only one randomly chosen character can be simply synthesized online during the training. This is significant as the training does not require costly paired samples or character-level annotations. Thus, only the input images and corresponding text labels are needed. In addition to the style normalization of the backgrounds, we refine character patterns to ease the recognition task. A feedback mechanism is proposed to bridge the gap between the discriminator and the recognizer. Therefore, the discriminator can guide the generator according to the confusion of the recognizer, so that the generated patterns are clearer for recognition. Experiments on various benchmarks, including both regular and irregular text, demonstrate that our method significantly reduces the difficulty of recognition. Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.

中文翻译：

使用对抗性学习将内容与风格分离以识别野外文本

场景文本识别是计算机视觉中的一项重要任务。尽管在过去几年取得了巨大进步，但诸如字体样式变化、任意形状和复杂背景等问题使该问题变得非常具有挑战性。在这项工作中，我们建议通过将文本内容与复杂背景分离，从新的角度改进文本识别，从而使识别变得更加容易并显着提高识别准确性。为此，我们利用生成对抗网络 (GAN) 在保留文本内容的同时去除背景。由于普通 GAN 不足以在自然图像中生成类似序列的字符，因此我们提出了一种对抗性学习框架，用于生成和识别图像中的多个字符。所提出的框架由一个基于注意力的识别器和一个生成对抗架构组成。此外，为了解决缺乏配对训练样本的问题，我们设计了一种交互式联合训练方案，该方案共享从识别器到鉴别器的注意力掩码，并使鉴别器能够提取每个字符的特征以进行进一步的对抗训练。受益于角色级别的对抗训练，我们的框架只需要未配对的简单数据来进行风格监督。每个目标风格样本只包含一个随机选择的字符，可以在训练过程中简单地在线合成。这很重要，因为训练不需要昂贵的配对样本或字符级注释。因此，只需要输入图像和相应的文本标签。除了背景的风格标准化之外，我们还改进了字符模式以减轻识别任务。提出了一种反馈机制来弥合鉴别器和识别器之间的差距。因此，鉴别器可以根据识别器的混淆来引导生成器，使生成的模式更加清晰，便于识别。在各种基准（包括规则和不规则文本）上的实验表明，我们的方法显着降低了识别难度。我们的框架可以集成到最近的识别方法中，以实现新的最先进的识别精度。判别器可以根据识别器的混淆来引导生成器，使生成的模式更加清晰，便于识别。在各种基准（包括规则和不规则文本）上的实验表明，我们的方法显着降低了识别难度。我们的框架可以集成到最近的识别方法中，以实现新的最先进的识别精度。判别器可以根据识别器的混淆来引导生成器，使生成的模式更加清晰，便于识别。在各种基准（包括规则和不规则文本）上的实验表明，我们的方法显着降低了识别难度。我们的框架可以集成到最近的识别方法中，以实现新的最先进的识别精度。

更新日期：2021-01-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>