PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-05-04 , DOI: 10.1109/tpami.2021.3077555
Wenhai Wang , Enze Xie , Xiang Li , Xuebo Liu , Ding Liang , Yang Zhibo , Tong Lu , Chunhua Shen

Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the above designs, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method.

中文翻译：

PAN ++：致力于高效，准确地端到端发现任意形状的文本。

在过去的几年中，场景文本的检测和识别已经得到了很好的探索。尽管取得了进步，但高效，准确地端到端发现任意形状的文本仍然具有挑战性。在这项工作中，我们提出了一种称为PAN ++的端到端文本发现框架，该框架可以有效地检测和识别自然场景中任意形状的文本。PAN ++基于内核表示形式，该表示形式将文本行重新构造为周围像素围绕的文本内核（中央区域）。通过与现有场景文本表示形式的系统比较，我们表明我们的内核表示形式不仅可以描述任意形状的文本，而且可以很好地区分相邻的文本。此外，作为基于像素的表示形式，内核表示形式可以通过单个完全卷积网络进行预测，这对实时应用程序非常友好。利用内核表示的优势，我们设计了以下一系列组件：1）一个由堆叠式特征金字塔增强模块（FPEM）组成的计算有效的特征增强网络；2）与像素聚合（PA）协作的轻量级检测头；3）具有Masked RoI的高效基于注意力的识别头。受益于上述设计，我们的方法在保持竞争准确性的同时实现了较高的推理速度。大量的实验证明了我们方法的优越性。2）与像素聚合（PA）协作的轻量级检测头；3）具有Masked RoI的高效基于注意力的识别头。受益于上述设计，我们的方法在保持竞争准确性的同时实现了较高的推理速度。大量的实验证明了我们方法的优越性。2）与像素聚合（PA）协作的轻量级检测头；3）具有Masked RoI的高效基于注意力的识别头。受益于上述设计，我们的方法在保持竞争准确性的同时实现了较高的推理速度。大量的实验证明了我们方法的优越性。

更新日期：2021-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11