当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-05-04 , DOI: 10.1109/tpami.2021.3077555
Wenhai Wang , Enze Xie , Xiang Li , Xuebo Liu , Ding Liang , Yang Zhibo , Tong Lu , Chunhua Shen

Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the above designs, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method.

中文翻译:

PAN ++:致力于高效,准确地端到端发现任意形状的文本。

在过去的几年中,场景文本的检测和识别已经得到了很好的探索。尽管取得了进步,但高效,准确地端到端发现任意形状的文本仍然具有挑战性。在这项工作中,我们提出了一种称为PAN ++的端到端文本发现框架,该框架可以有效地检测和识别自然场景中任意形状的文本。PAN ++基于内核表示形式,该表示形式将文本行重新构造为周围像素围绕的文本内核(中央区域)。通过与现有场景文本表示形式的系统比较,我们表明我们的内核表示形式不仅可以描述任意形状的文本,而且可以很好地区分相邻的文本。此外,作为基于像素的表示形式,内核表示形式可以通过单个完全卷积网络进行预测,这对实时应用程序非常友好。利用内核表示的优势,我们设计了以下一系列组件:1)一个由堆叠式特征金字塔增强模块(FPEM)组成的计算有效的特征增强网络;2)与像素聚合(PA)协作的轻量级检测头;3)具有Masked RoI的高效基于注意力的识别头。受益于上述设计,我们的方法在保持竞争准确性的同时实现了较高的推理速度。大量的实验证明了我们方法的优越性。2)与像素聚合(PA)协作的轻量级检测头;3)具有Masked RoI的高效基于注意力的识别头。受益于上述设计,我们的方法在保持竞争准确性的同时实现了较高的推理速度。大量的实验证明了我们方法的优越性。2)与像素聚合(PA)协作的轻量级检测头;3)具有Masked RoI的高效基于注意力的识别头。受益于上述设计,我们的方法在保持竞争准确性的同时实现了较高的推理速度。大量的实验证明了我们方法的优越性。
更新日期:2021-05-04
down
wechat
bug