Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2021-04-19 , DOI: 10.1007/s11263-021-01459-7
Yuliang Liu , Tong He , Hao Chen , Xinyu Wang , Canjie Luo , Shuaitao Zhang , Chunhua Shen , Lianwen Jin

Multi-orientation scene text detection has recently gained significant research attention. Previous methods directly predict words or text lines, typically by using quadrilateral shapes. However, many of these methods neglect the significance of consistent labeling, which is important for maintaining a stable training process, especially when it comprises a large amount of data. Here we solve this problem by proposing a new method, Orderless Box Discretization (OBD), which first discretizes the quadrilateral box into several key edges containing all potential horizontal and vertical positions. To decode accurate vertex positions, a simple yet effective matching procedure is proposed for reconstructing the quadrilateral bounding boxes. Our method solves the ambiguity issue, which has a significant impact on the learning process. Extensive ablation studies are conducted to validate the effectiveness of our proposed method quantitatively. More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors. Combining both OBD and these useful refinements, we achieve state-of-the-art performance on various benchmarks, including ICDAR 2015 and MLT. Our method also won the first place in the text detection task at the recent ICDAR2019 Robust Reading Challenge for Reading Chinese Text on Signboards, further demonstrating its superior performance. The code is available at https://git.io/TextDet.

中文翻译：

探索无序盒离散化网络多方向场景文本检测的能力

多方向场景文本检测近来受到了重要的研究关注。先前的方法通常通过使用四边形来直接预测单词或文本行。但是，这些方法中的许多方法都忽略了一致标记的重要性，这对于保持稳定的训练过程非常重要，尤其是当它包含大量数据时。在这里，我们通过提出一种新方法，即无序盒离散化（OBD）来解决此问题，该方法首先将四边形盒离散化为几个包含所有可能的水平和垂直位置的关键边。为了解码准确的顶点位置，提出了一种简单而有效的匹配程序来重构四边形边界框。我们的方法解决了歧义性问题，这对学习过程具有重大影响。进行了广泛的消融研究，以定量地验证了我们提出的方法的有效性。更重要的是，基于OBD，我们对一系列改进的影响进行了详细分析，这可能会激发其他人构建最新的文本检测器。结合OBD和这些有用的改进，我们在包括ICDAR 2015和MLT在内的各种基准上实现了最先进的性能。我们的方法在最近的文本检测任务中也获得了第一名我们在包括ICDAR 2015和MLT在内的各种基准上实现了最先进的性能。我们的方法在最近的文本检测任务中也获得了第一名我们在包括ICDAR 2015和MLT在内的各种基准上实现了最先进的性能。我们的方法在最近的文本检测任务中也获得了第一名ICDAR2019在招牌上阅读中文文本的强大阅读挑战，进一步证明了其卓越的性能。该代码位于https://git.io/TextDet。

更新日期：2021-04-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11