Deep OCR for Arabic script‐based language like Pastho,Expert Systems

当前位置： X-MOL 学术 › Expert Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep OCR for Arabic script‐based language like Pastho
Expert Systems ( IF 3.0 ) Pub Date : 2020-05-07 , DOI: 10.1111/exsy.12565
Saeeda Naz ₁ , Naila H. Khan ₂ , Shizza Zahoor ₁ , Muhammad I. Razzak ₃

Affiliation

Developing cursive script recognition systems have always been a challenging task for researchers. This article proposes a ligature‐based recognition system for the cursive Pashto script using four pre‐trained CNN models using a fine‐tuned approach. The SqueezeNet, ResNet, MobileNet and DenseNet models have been observed for the classification and the recognition of Pashto sub‐word (ligature). Overall, the proposed system is divided into two domains (Source and Target). The source domain contains the pre‐trained models used on the ImageNet Dataset. These models are later fine‐tuned using the transfer learning approach to be used for the Pashto ligature recognition. The data augmentation techniques of negative and contour are used to increase the representation of ligature images and the dataset size. The CNN models have been evaluated on the benchmarks Pashto ligatures FAST‐NU dataset. The proposed system achieved the highest recognition rate of up to 99.31% using the DenseNet architecture of Convolutional Neural Network for Pashto ligature.

中文翻译：

适用于基于阿拉伯脚本的语言的Deep OCR，例如Pastho

开发草书识别系统一直是研究人员的一项艰巨任务。本文为草书的普什图语脚本提出了一种基于连字的识别系统，该系统使用了四个经过微调的预训练的CNN模型。已经观察到SqueezeNet，ResNet，MobileNet和DenseNet模型用于分类和识别Pashto子词（连字）。总体而言，建议的系统分为两个域（源和目标）。源域包含ImageNet数据集上使用的预训练模型。稍后使用转移学习方法对这些模型进行微调，以用于普什图语连字识别。负数和轮廓线的数据增强技术用于增加连字图像的表示和数据集的大小。CNN模型已在基准普什图语连字FAST-NU数据集上进行了评估。所提出的系统使用卷积神经网络的Pashto连字的DenseNet架构实现了高达99.31％的最高识别率。

更新日期：2020-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11