R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection,IEEE Transactions on Multimedia

当前位置： X-MOL 学术 › IEEE Trans. Multimedia › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-05-19 , DOI: 10.1109/tmm.2020.2995290
Yuxin Wang , Hongtao Xie , Zhengjun Zha , Youliang Tian , Zilong Fu , Yongdong Zhang

This paper introduces a novel bi-directional con-volutional framework to cope with the large-variance scale problem in scene text detection. Due to the lack of scale normalization in recent CNN-based methods, text instances with large-variance scale are activated inconsistently in feature maps, which makes it hard for CNN-based methods to accurately locate multi-size text instances. Thus, we propose the relationship network (R-Net) that maps multi-scale convolutional features to a scale-invariant space to obtain consistent activation of multi-size text instances. Firstly, we implement an FPN-like backbone with a Spatial Relationship Module (SPM) to extract multi-scale features with powerful spatial semantics. Then, a Scale Relationship Module (SRM) constructed on feature pyramid propagates contextual scale information in sequential features through a bi-directional convolutional operation. SRM supplements the multi-scale information in different feature maps to obtain consistent activation of multi-size text instances. Compared with previous approaches, R-Net effectively handles the large-variance scale problem without complicated post processing and complex hand-crafted hyperparameter setting. Extensive experiments conducted on several benchmarks verify that our R-Net obtains state-of-the-art performance on both accuracy and efficiency. More specifically, R-Net achieves an F-measure of 85.6% at 21.4 frames/s and an F-measure of 81.7% at 11.8 frames/s for ICDAR 2015 and MSRA-TD500 datasets respectively, which is the latest SOTA. The code is available on https://github.com/wangyuxin87/R-Net.

中文翻译：

R-Net：用于高效准确场景文本检测的关系网络

本文引入了一种新颖的双向卷积框架来应对场景文本检测中的大方差尺度问题。由于最近基于 CNN 的方法缺乏尺度归一化，具有大方差尺度的文本实例在特征图中的激活不一致，这使得基于 CNN 的方法很难准确定位多尺寸文本实例。因此，我们提出了关系网络（R-Net），将多尺度卷积特征映射到尺度不变空间，以获得多尺寸文本实例的一致激活。首先，我们使用空间关系模块（SPM）实现类似 FPN 的主干，以提取具有强大空间语义的多尺度特征。然后，在特征金字塔上构建的尺度关系模块（SRM）通过双向卷积运算在顺序特征中传播上下文尺度信息。 SRM补充了不同特征图中的多尺度信息，以获得多尺寸文本实例的一致激活。与以前的方法相比，R-Net 有效地处理了大方差尺度问题，无需复杂的后处理和复杂的手工超参数设置。在多个基准上进行的大量实验验证了我们的 R-Net 在准确性和效率方面均获得了最先进的性能。更具体地说，对于 ICDAR 2015 和 MSRA-TD500 数据集（最新的 SOTA），R-Net 在 21.4 帧/秒时实现了 85.6% 的 F 测量，在 11.8 帧/秒时实现了 81.7% 的 F 测量。代码可在 https://github.com/wangyuxin87/R-Net 上获取。

更新日期：2020-05-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11