当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform
arXiv - CS - Multimedia Pub Date : 2020-09-17 , DOI: arxiv-2009.08037
Pawan Kumar Singh, Shubham Sinha, Sagnik Pal Chowdhury, Ram Sarkar, Mita Nasipuri

Segmentation of handwritten document images into text lines and words is one of the most significant and challenging tasks in the development of a complete Optical Character Recognition (OCR) system. This paper addresses the automatic segmentation of text words directly from unconstrained Bangla handwritten document images. The popular Distance transform (DT) algorithm is applied for locating the outer boundary of the word images. This technique is free from generating the over-segmented words. A simple post-processing procedure is applied to isolate the under-segmented word images, if any. The proposed technique is tested on 50 random images taken from CMATERdb1.1.1 database. Satisfactory result is achieved with a segmentation accuracy of 91.88% which confirms the robustness of the proposed methodology.

中文翻译:

使用距离变换从无约束手写孟加拉语文档图像中进行分词

将手写文档图像分割为文本行和单词是开发完整的光学字符识别 (OCR) 系统中最重要和最具挑战性的任务之一。本文解决了直接从无约束的孟加拉手写文档图像中自动分割文本词的问题。应用流行的距离变换 (DT) 算法来定位单词图像的外边界。这种技术不会产生过度分割的词。应用简单的后处理程序来隔离欠分割的单词图像(如果有)。所提出的技术在从 CMATERdb1.1.1 数据库中获取的 50 张随机图像上进行了测试。以 91.88% 的分割精度获得了令人满意的结果,这证实了所提出方法的稳健性。
更新日期:2020-09-18
down
wechat
bug