PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-05-04 , DOI: 10.1109/tpami.2021.3077555
Wenhai Wang , Enze Xie , Xiang Li , Xuebo Liu , Ding Liang , Yang Zhibo , Tong Lu , Chunhua Shen

Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the kernel representation and the tailored components, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method. For example, the proposed PAN++ achieves an end-to-end text spotting F-measure of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms the previous best method. Code will be available at: git.io/PAN.

中文翻译：

PAN++：实现任意形状文本的高效、准确的端到端识别

场景文本检测和识别在过去几年中得到了很好的探索。尽管取得了进展，但高效、准确地端到端识别任意形状的文本仍然具有挑战性。在这项工作中，我们提出了一种端到端文本识别框架，称为 PAN++，它可以有效地检测和识别自然场景中任意形状的文本。 PAN++ 基于内核表示，将文本行重新表述为由外围像素包围的文本内核（中心区域）。通过与现有的场景文本表示进行系统比较，我们表明我们的内核表示不仅可以描述任意形状的文本，而且可以很好地区分相邻文本。而且，作为基于像素的表示，核表示可以通过单个全卷积网络进行预测，这对于实时应用非常友好。利用核表示的优点，我们设计了一系列组件如下：1）由堆叠的特征金字塔增强模块（FPEM）组成的计算高效的特征增强网络； 2）配合像素聚合（PA）的轻量级检测头； 3）具有 Masked RoI 的高效基于注意力的识别头。受益于内核表示和定制组件，我们的方法实现了高推理速度，同时保持了有竞争力的准确性。大量的实验证明了我们方法的优越性。例如，所提出的 PAN++ 在 Total-Text 数据集上以 29.2 FPS 实现了 64.9 的端到端文本识别 F 测量，这明显优于之前的最佳方法。代码可在以下网址获取：git.io/PAN。

更新日期：2021-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11