Detecting dense text in natural images,IET Computer Vision

当前位置： X-MOL 学术 › IET Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting dense text in natural images
IET Computer Vision ( IF 1.5 ) Pub Date : 2020-12-15 , DOI: 10.1049/iet-cvi.2019.0916
Dianzhuan Jiang ₁ , Shengsheng Zhang ₁ , Yaping Huang ₁ , Qi Zou ₁ , Xingyuan Zhang ₁ , Mengyang Pu ₁ , Junbo Liu ₁

Affiliation

Most existing text detection methods are mainly motivated by deep learning-based object detection approaches, which may result in serious overlapping between detected text lines, especially in dense text scenarios. It is because text boxes are not commonly overlapped, as different from general objects in natural scenes. Moreover, text detection requires higher localisation accuracy than object detection. To tackle these problems, the authors propose a novel dense text detection network (DTDN) to localise tighter text lines without overlapping. Their main novelties are: (i) propose an intersection-over-union overlap loss, which considers correlations between one anchor and GT boxes and measures how many text areas one anchor contains, (ii) propose a novel anchor sample selection strategy, named CMax-OMin, to select tighter positive samples for training. CMax-OMin strategy not only considers whether an anchor has the largest overlap with its corresponding GT box (CMax), but also ensures the overlapping between one anchor and other GT boxes as little as possible (OMin). Besides, they train a bounding-box regressor as post-processing to further improve text localisation performance. Experiments on scene text benchmark datasets and their proposed dense text dataset demonstrate that the proposed DTDN achieves competitive performance, especially for dense text scenarios.

中文翻译：

检测自然图像中的密集文本

大多数现有的文本检测方法主要是由基于深度学习的对象检测方法驱动的，这可能会导致检测到的文本行之间出现严重重叠，尤其是在密集文本场景中。这是因为与自然场景中的普通对象不同，文本框通常不会重叠。此外，文本检测比对象检测需要更高的定位精度。为了解决这些问题，作者提出了一种新颖的密集文本检测网络（DTDN），可以在不重叠的情况下定位较紧的文本行。它们的主要新颖之处是：（i）提出了一个交集重叠重叠损失，该损失考虑了一个锚点和GT框之间的相关性，并测量了一个锚点包含多少文本区域，（ii）提出了一种新颖的锚点样本选择策略，称为CMax -OMin，选择更严格的阳性样本进行训练。CMax-OMin策略不仅考虑锚点与其对应的GT框（CMax）是否具有最大的重叠，而且还确保一个锚点与其他GT框之间的重叠尽可能小（OMin）。此外，他们训练边界框回归器作为后处理，以进一步提高文本本地化性能。对场景文本基准数据集及其建议的密集文本数据集进行的实验表明，提出的DTDN具有竞争性，尤其是对于密集文本场景而言。他们训练边界框回归器作为后处理，以进一步提高文本本地化性能。对场景文本基准数据集及其建议的密集文本数据集进行的实验表明，所提出的DTDN具有竞争性性能，尤其是对于密集文本场景而言。他们训练边界框回归器作为后处理，以进一步提高文本本地化性能。对场景文本基准数据集及其建议的密集文本数据集进行的实验表明，所提出的DTDN具有竞争性性能，尤其是对于密集文本场景而言。

更新日期：2020-12-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11