Performance Analysis of State of the Art Convolutional Neural Network Architectures in Bangla Handwritten Character Recognition,Pattern Recognition and Image Analysis

当前位置： X-MOL 学术 › Pattern Recognit. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Performance Analysis of State of the Art Convolutional Neural Network Architectures in Bangla Handwritten Character Recognition
Pattern Recognition and Image Analysis Pub Date : 2021-04-08 , DOI: 10.1134/s1054661821010089
Tapotosh Ghosh , Min-Ha-Zul Abedin , Hasan Al Banna , Nasirul Mumenin , Mohammad Abu Yousuf

Abstract

Bangla handwritten character recognition is a popular research topic as its difficulty is higher than the recognition of other languages because of multiple formats of compound characters. State of the art Convolutional neural network (CNN) architectures are very much useful in computer vision applications. Some works have been carried out in Bangla handwritten character recognition but most of them either not very efficient or they can not classify a lot of characters. In this work, state of art pre-trained CNN architectures is used to classify 231 different Bangla handwritten characters using CMATERdb dataset. The images were first converted to B&W form with white as the foreground color. The size of the images is reduced to 28 × 28 form. These images are used as input to the CNN architectures. The weights of the state-of-the-art CNN models are kept as it was. The training learning rate was set to 0.001 and categorical cross-entropy as the error function. After 50 epochs, InceptionResNetV2 achieved the best accuracy (96.99%). DenseNet121 and InceptionNetV3 also provided remarkable recognition accuracy (96.55 and 96.20%, respectively). We also considered combination of trained InceptionResNetV2, InceptionNetV3 and DenseNet121 architectures which provided better recognition accuracy (97.69%) than other single CNN architectures but it is not feasible for using as it requires a lot of computation power and memory. The models were tested in the cases where characters look confusing to humans, but all the architectures showed equal capability in recognizing these images. Considering computational complexity, memory and capability of recognizing confused characters, InceptionResNetV2 can be said as the best performing model.

中文翻译：

孟加拉手写字符识别中最先进的卷积神经网络体系结构的性能分析

摘要

孟加拉语手写字符识别是一个受欢迎的研究主题，因为复合字符的多种格式，其难度高于其他语言的识别。先进的卷积神经网络（CNN）体系结构在计算机视觉应用中非常有用。在孟加拉语手写字符识别中已经进行了一些工作，但是其中大多数要么效率不高，要么不能对很多字符进行分类。在这项工作中，使用先进的预训练CNN架构使用CMATERdb数据集对231个不同的Bangla手写字符进行分类。首先将图像转换为黑白形式，以白色为前景色。图像尺寸缩小为28×28形式。这些图像用作CNN架构的输入。最新的CNN模型的权重保持不变。训练学习率设置为0.001，分类交叉熵作为误差函数。经过50个纪元后，InceptionResNetV2达到了最佳准确性（96.99％）。DenseNet121和InceptionNetV3还提供了出色的识别准确性（分别为96.55和96.20％）。我们还考虑了经过训练的InceptionResNetV2，InceptionNetV3和DenseNet121架构的组合，与其他单个CNN架构相比，它们提供了更高的识别精度（97.69％），但由于使用它需要大量的计算能力和内存，因此不可行。在字符看上去让人迷惑的情况下对模型进行了测试，但是所有体系结构都具有识别这些图像的相同能力。考虑到计算复杂度，

更新日期：2021-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>