当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Character Flow Framework for Multi-Oriented Scene Text Detection
Journal of Computer Science and Technology ( IF 1.2 ) Pub Date : 2021-05-31 , DOI: 10.1007/s11390-021-1362-4
Wen-Jun Yang , Bei-Ji Zou , Kai-Wen Li , Shu Liu

Scene text detection plays a significant role in various applications, such as object recognition, document management, and visual navigation. The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts. However, a large number of non-text pixels exist in the labels during the model training, leading to text mis-segmentation. In this paper, we propose a novel multi-oriented scene text detection framework, which includes two main modules: character instance segmentation (one instance corresponds to one character), and character flow construction (one character flow corresponds to one word). We use feature pyramid network (FPN) to predict character and non-character instances with arbitrary directions. A joint network of FPN and bidirectional long short-term memory (BLSTM) is developed to explore the context information among isolated characters, which are finally grouped into character flows. Extensive experiments are conducted on ICDAR2013, ICDAR2015, MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach. The F-measures are 92.62%, 88.02%, 83.69% and 77.81%, respectively.



中文翻译:

用于多方向场景文本检测的字符流框架

场景文本检测在各种应用中发挥着重要作用,例如对象识别、文档管理和视觉导航。基于实例分割的方法由于其在处理多向文本方面的优势,已在现有研究中得到广泛应用。然而,在模型训练过程中,标签中存在大量非文本像素,导致文本错误分割。在本文中,我们提出了一种新颖的多面向场景文本检测框架,它包括两个主要模块:字符实例分割(一个实例对应一个字符)和字符流构建(一个字符流对应一个单词)。我们使用特征金字塔网络(FPN)来预测任意方向的字符和非字符实例。开发了 FPN 和双向长短期记忆 (BLSTM) 的联合网络来探索孤立字符之间的上下文信息,最终将其分组为字符流。在 ICDAR2013、ICDAR2015、MSRA-TD500 和 MLT 数据集上进行了大量实验,以证明我们方法的有效性。这F 值分别为 92.62%、88.02%、83.69% 和 77.81%。

更新日期:2021-06-15
down
wechat
bug