当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Arbitrary Shape Text Detection via Segmentation with Probability Maps
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 5-20-2022 , DOI: 10.1109/tpami.2022.3176122
Shi-Xue Zhang 1 , Xiaobin Zhu 2 , Lei Chen 2 , Jie-Bo Hou 2 , Xu-Cheng Yin 3
Affiliation  

Arbitrary shape text detection is a challenging task due to the significantly varied sizes and aspect ratios, arbitrary orientations or shapes, inaccurate annotations, etc. Due to the scalability of pixel-level prediction, segmentation-based methods can adapt to various shape texts and hence attracted considerable attention recently. However, accurate pixel-level annotations of texts are formidable, and the existing datasets for scene text detection only provide coarse-grained boundary annotations. Consequently, numerous misclassified text pixels or background pixels inside annotations always exist, degrading the performance of segmentation-based text detection methods. Generally speaking, whether a pixel belongs to text or not is highly related to the distance with the adjacent annotation boundary. With this observation, in this paper, we propose an innovative and robust segmentation-based detection method via probability maps for accurately detecting text instances. To be concrete, we adopt a Sigmoid Alpha Function (SAF) to transfer the distances between boundaries and their inside pixels to a probability map. However, one probability map can not cover complex probability distributions well because of the uncertainty of coarse-grained text boundary annotations. Therefore, we adopt a group of probability maps computed by a series of Sigmoid Alpha Functions to describe the possible probability distributions. In addition, we propose an iterative model to learn to predict and assimilate probability maps for providing enough information to reconstruct text instances. Finally, simple region growth algorithms are adopted to aggregate probability maps to complete text instances. Experimental results demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy on several benchmarks. Notably, our method with Watershed Algorithm as post-processing achieves the best F-measure on Total-Text (88.79%), CTW1500 (85.75%), and MSRA-TD500 (88.93%). Besides, our method achieves promising performance on multi-oriented datasets (ICDAR2015) and multilingual datasets (ICDAR2017-MLT). Code is available at: https://github.com/GXYM/TextPMs.

中文翻译:


通过概率图分割进行任意形状文本检测



由于尺寸和长宽比显着变化、任意方向或形状、不准确的注释等,任意形状文本检测是一项具有挑战性的任务。由于像素级预测的可扩展性,基于分割的方法可以适应各种形状的文本,因此最近引起了相当大的关注。然而,准确的文本像素级注释是非常困难的,并且现有的场景文本检测数据集仅提供粗粒度的边界注释。因此,注释中始终存在大量错误分类的文本像素或背景像素,从而降低了基于分割的文本检测方法的性能。一般来说,一个像素是否属于文本与与相邻标注边界的距离密切相关。根据这一观察,在本文中,我们提出了一种创新且稳健的基于分割的检测方法,通过概率图来准确检测文本实例。具体来说,我们采用 Sigmoid Alpha 函数(SAF)将边界与其内部像素之间的距离转换为概率图。然而,由于粗粒度文本边界标注的不确定性,一张概率图无法很好地覆盖复杂的概率分布。因此,我们采用一组由一系列 Sigmoid Alpha 函数计算得到的概率图来描述可能的概率分布。此外,我们提出了一种迭代模型来学习预测和同化概率图,以提供足够的信息来重建文本实例。最后,采用简单的区域增长算法来聚合概率图以完成文本实例。 实验结果表明,我们的方法在多个基准测试的检测精度方面实现了最先进的性能。值得注意的是,我们使用分水岭算法作为后处理的方法在 Total-Text (88.79%)、CTW1500 (85.75%) 和 MSRA-TD500 (88.93%) 上实现了最佳 F 测量。此外,我们的方法在多向数据集(ICDAR2015)和多语言数据集(ICDAR2017-MLT)上取得了良好的性能。代码位于:https://github.com/GXYM/TextPMs。
更新日期:2024-08-26
down
wechat
bug