Shirorekha based character segmentation for medieval handwritten Devnagari manuscript,International Journal of Information Technology

当前位置： X-MOL 学术 › Int. J. Inf. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Shirorekha based character segmentation for medieval handwritten Devnagari manuscript
International Journal of Information Technology Pub Date : 2021-04-15 , DOI: 10.1007/s41870-021-00664-4
Nikita Mehta , Jyotika Doshi

In the process of optical character recognition (OCR), segmentation is always a crucial phase. Here, segmentation refers to all types of segmentation—page segmentation, line segmentation, word segmentation and character segmentation. The character recognition rate of any OCR system is largely depending on correct and accurate segmentation. This paper addresses the character segmentation for medieval handwritten Devnagari manuscripts. These manuscripts are hundreds of years old. In recent Devnagari, shirorekha (upper horizontal line) is placed on each word; whereas in medieval Devnagari, a separate shirorekha is placed on each character. Using this unique feature as a key, a novel Shirorekha Based Character Segmentation (SBCS) method is proposed. In this technique, first the shirorekha is identified to separate characters. The shirorekha is examined horizontally to find breaks in it. Wherever there is a break in shirorekha, it is assumed to be a possible segmentation point for a character. Thereafter, possible segmentation points are scanned for vertically spacing between two characters. According to the gap between characters, the segmentation points are finalized. Using this approach, segmentation accuracy achieved is 88.28%. This accuracy is better as compared to many existing approaches applied on recent Devnagari script. As per our knowledge no research work for character segmentation for medieval Devnagari script is found. This is the first attempt of its kind.

中文翻译：

基于Shirorekha的中世纪手写Devnagari手稿的字符分割

在光学字符识别（OCR）的过程中，分段始终是至关重要的阶段。在这里，分段是指所有类型的分段-页面分段，行分段，单词分段和字符分段。任何OCR系统的字符识别率在很大程度上取决于正确和准确的分段。本文讨论了中世纪的手写的德夫纳加里手稿的字符分割。这些手稿已有数百年历史了。在最近的德夫纳加里语中，在每个单词上都放置了shirorekha（水平线）。而在中世纪的德夫纳加里（Devnagari），每个角色都放置一个单独的shirorekha。以这种独特的特征为关键，提出了一种新颖的基于Shirorekha的字符分割（SBCS）方法。在这种技术中，首先将shirorekha识别为分隔字符。对shirorekha进行水平检查以发现其中的中断。如果shirorekha出现中断，则认为这是字符的可能分割点。此后，扫描可能的分割点，以获取两个字符之间的垂直间距。根据字符之间的间隙，确定分割点。使用这种方法，可以实现88.28％的分割精度。与在最近的Devnagari脚本上应用的许多现有方法相比，此准确性更好。据我们所知，没有找到有关中世纪德夫纳加里文字的字符分割的研究工作。这是同类尝试中的第一次。扫描可能的分割点，以查找两个字符之间的垂直间距。根据字符之间的间隙，确定分割点。使用这种方法，可以实现88.28％的分割精度。与在最近的Devnagari脚本上应用的许多现有方法相比，此准确性更好。据我们所知，没有找到有关中世纪德夫纳加里文字的字符分割的研究工作。这是同类尝试中的第一次。扫描可能的分割点，以查找两个字符之间的垂直间距。根据字符之间的间隙，确定分割点。使用这种方法，可以实现88.28％的分割精度。与在最近的Devnagari脚本上应用的许多现有方法相比，此准确性更好。据我们所知，没有找到有关中世纪德夫纳加里文字的字符分割的研究工作。这是同类尝试中的第一次。据我们所知，没有找到有关中世纪德夫纳加里文字的字符分割的研究工作。这是同类尝试中的第一次。据我们所知，没有找到有关中世纪德夫纳加里文字的字符分割的研究工作。这是同类尝试中的第一次。

更新日期：2021-04-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文