当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recognition-based character segmentation for multi-level writing style
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2018-04-30 , DOI: 10.1007/s10032-018-0302-5
Papangkorn Inkeaw , Jakramate Bootkrajang , Phasit Charoenkwan , Sanparith Marukatat , Shinn-Ying Ho , Jeerayut Chaijaruwanich

Character segmentation is an important task in optical character recognition (OCR). The quality of any OCR system is highly dependent on character segmentation algorithm. Despite the availability of various character segmentation methods proposed to date, existing methods cannot satisfyingly segment characters belonging to some complex writing styles such as the Lanna Dhamma characters. In this paper, a new character segmentation method named graph partitioning-based character segmentation is proposed to address the problem. The proposed method can deal with multi-level writing style as well as touching and broken characters. It is considered as a generalization of existing approaches to multi-level writing style. The proposed method consists of three phases. In the first phase, a newly devised over-segmentation technique based on morphological skeleton is used to obtain redundant fragments of a word image. The fragments are then used to form a segmentation hypotheses graph. In the last phase, the hypotheses graph is partitioned into subgraphs each corresponding to a segmented character using the partitioning algorithm developed specifically for character segmentation purpose. Experimental results based on handwritten Lanna Dhamma characters datasets showed that the proposed method achieved high correct segmentation rate and outperformed existing methods for the Lanna Dhamma alphabet.

中文翻译:

基于识别的字符分割,实现多层写作风格

字符分割是光学字符识别(OCR)的重要任务。任何OCR系统的质量都高度依赖于字符分割算法。尽管迄今为止提出了各种字符分割方法,但是现有方法不能令人满意地分割属于某些复杂书写风格的字符,例如兰纳达玛字符。为了解决这个问题,本文提出了一种新的基于图分割的字符分割方法。所提出的方法可以处理多层次的书写风格以及触碰和打断字符。它被视为对多层次写作风格的现有方法的概括。所提出的方法包括三个阶段。在第一阶段 一种新的基于形态骨架的过度分割技术被用于获得单词图像的冗余片段。然后将这些片段用于形成分割假设图。在最后一个阶段中,使用专门为字符分割目的而开发的分区算法,将假设图分为子图,每个子图对应于一个分割的字符。基于手写的Lanna Dhamma字符数据集的实验结果表明,该方法实现了较高的正确分割率,并且优于现有的Lanna Dhamma字母方法。使用专门为字符分割目的而开发的分割算法,将假设图分为多个子图,每个子图分别对应于一个分割的字符。基于手写的Lanna Dhamma字符数据集的实验结果表明,该方法实现了较高的正确分割率,并且优于现有的Lanna Dhamma字母方法。使用专门为字符分割目的而开发的分割算法,将假设图分为多个子图,每个子图分别对应于一个分割的字符。基于手写的Lanna Dhamma字符数据集的实验结果表明,该方法实现了较高的正确分割率,并且优于现有的Lanna Dhamma字母方法。
更新日期:2018-04-30
down
wechat
bug