MOSTL: An Accurate Multi-Oriented Scene Text Localization,Circuits, Systems, and Signal Processing

当前位置： X-MOL 学术 › Circuits Syst. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MOSTL: An Accurate Multi-Oriented Scene Text Localization
Circuits, Systems, and Signal Processing ( IF 1.8 ) Pub Date : 2021-02-19 , DOI: 10.1007/s00034-021-01674-0
Fatemeh Naiemi , Vahid Ghods , Hassan Khalesi

Automatic text localization in natural environments is the main element of many applications including self-driving cars, identifying vehicles, and providing scene information to visually impaired people. However, text in the natural and irregular scene has different degrees in orientations, shapes, and colors that make it difficult to detect. In this paper, an accurate multi-oriented scene text localization (MOSTL) is presented to obtain high efficiency of detecting text-based on convolutional neural networks. In the proposed method, an improved ReLU layer (i.ReLU) and an improved inception layer (i.inception) were introduced. Firstly, the proposed structure is used to extract low-level visual features. Then, an extra layer has been used to improve the feature extraction. The i.ReLU and i.inception layers have improved valuable information in text detection. The i.ReLU layers cause to extract some low-level features appropriately. The i.inception layers (specially 3 × 3 convolutions) can obtain broadly varying-sized text more effectively than a linear chain of convolution layer (without inception layers). The output of i.ReLU layers and i.inception layers was fed to an extra layer, which enables MOSTL to detect multi-oriented even curved and vertical texts. We conducted text detection experiments on well-known databases including ICDAR 2019, ICDAR 2017, ICDAR 2015, ICDAR 2003, and MSRA-TD500. MOSTL results yielded performance improvement remarkably.

中文翻译：

MOSTL：准确的多方位场景文本本地化

自然环境中的自动文本本地化是许多应用程序的主要元素，包括自动驾驶汽车，识别车辆以及向视障人士提供场景信息。但是，自然和不规则场景中的文本在方向，形状和颜色上具有不同程度，从而使其难以检测。为了提高基于卷积神经网络的文本检测效率，提出了一种准确的多方向场景文本定位算法。在提出的方法中，引入了改进的ReLU层（i.ReLU）和改进的接收层（i.inception）。首先，所提出的结构用于提取低级视觉特征。然后，使用了额外的一层来改善特征提取。i.ReLU和i。初始层在文本检测中改善了有价值的信息。i.ReLU层会适当地提取一些底层特征。i.inception层（特别是3×3卷积）比线性卷积层链（没有inception层）可以更有效地获得大小变化的文本。i.ReLU层和i.inception层的输出被馈送到一个额外的层，这使MOSTL能够检测多方向的甚至弯曲和垂直的文本。我们在包括ICDAR 2019，ICDAR 2017，ICDAR 2015，ICDAR 2003和MSRA-TD500在内的知名数据库上进行了文本检测实验。MOSTL结果显着提高了性能。与线性卷积层链（无初始层）相比，初始层（特别是3×3卷积）可以更有效地获得大小变化的文本。i.ReLU层和i.inception层的输出被馈送到一个额外的层，这使MOSTL能够检测多方向的甚至弯曲和垂直的文本。我们在包括ICDAR 2019，ICDAR 2017，ICDAR 2015，ICDAR 2003和MSRA-TD500在内的知名数据库上进行了文本检测实验。MOSTL结果显着提高了性能。与线性卷积层链（无初始层）相比，初始层（特别是3×3卷积）可以更有效地获得大小变化的文本。i.ReLU层和i.inception层的输出被馈送到一个额外的层，这使MOSTL能够检测多方向的甚至弯曲和垂直的文本。我们在包括ICDAR 2019，ICDAR 2017，ICDAR 2015，ICDAR 2003和MSRA-TD500在内的知名数据库上进行了文本检测实验。MOSTL结果显着提高了性能。ICDAR 2015，ICDAR 2003和MSRA-TD500。MOSTL结果显着提高了性能。ICDAR 2015，ICDAR 2003和MSRA-TD500。MOSTL结果显着提高了性能。

更新日期：2021-02-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文