当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IOS-Net: An Inside-to-outside Supervision Network for Scale Robust Text Detection in the wild
Pattern Recognition ( IF 8 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.patcog.2020.107304
Yuanqiang Cai , Weiqiang Wang , Yuting Chen , Qixiang Ye

Abstract Accurately detecting scene text is a challenging task due to perspective distortion, scale variance, varied orientations, uneven illumination. Among them, scale variance has always been a core issue and generally involves two types: various size and diverse aspect ratios of the text regions. In contrast to most existing approaches focusing on addressing one type of scale variance, this paper presents a novel inside-to-outside supervision network (IOS-Net) that can well tackle both two. Specifically, we design a hierarchical supervision module (HSM), which consists of a new inception unit with parallel asymmetric convolution and a skip-layer fusion structure. Inside the HSM, we introduce hierarchical supervision into the new inception unit to effectively capture the texts with diverse aspect ratios. Outside the HSM, we adopt multiple-scale supervision on the stacked HSMs to accurately detect the texts with various sizes. Moreover, a position-sensitive segmentation is used to enhance the representation of difficult text objects and the discrimination of adjacent ones. The proposed method achieves state-of-the-art performance on representative public benchmarks, reaching 86% F-score and 11.5 frames per second (FPS) on the ICDAR 2015 incidental text dataset, 47% F-score and 16.1 FPS on the COCO-Text dataset, 69% F-score and 11.7 FPS on the ICDAR 2013 video text dataset.

中文翻译:

IOS-Net:用于大规模鲁棒文本检测的内到外监督网络

摘要 由于透视失真、尺度变化、方向不同、光照不均匀,准确检测场景文本是一项具有挑战性的任务。其中,尺度方差一直是一个核心问题,一般涉及两种类型:文本区域的各种大小和不同的纵横比。与大多数专注于解决一种规模差异的现有方法相比,本文提出了一种新颖的内部到外部监督网络 (IOS-Net),可以很好地解决这两种情况。具体来说,我们设计了一个分层监督模块(HSM),它由一个具有并行非对称卷积和跳过层融合结构的新初始单元组成。在 HSM 内部,我们将分层监督引入新的初始单元,以有效捕获具有不同纵横比的文本。在 HSM 之外,我们在堆叠的 HSM 上采用多尺度监督来准确检测各种大小的文本。此外,使用位置敏感分割来增强困难文本对象的表示和相邻对象的区分。所提出的方法在有代表性的公共基准测试中达到了最先进的性能,在 ICDAR 2015 附带文本数据集上达到了 86% 的 F-score 和每秒 11.5 帧(FPS),在 COCO 上达到了 47% 的 F-score 和 16.1 FPS -文本数据集,在 ICDAR 2013 视频文本数据集上的 F-score 为 69%,FPS 为 11.7 FPS。
更新日期:2020-07-01
down
wechat
bug