当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HanFont: large-scale adaptive Hangul font recognizer using CNN and font clustering
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2019-07-31 , DOI: 10.1007/s10032-019-00337-w
Jinhyeok Yang , Heebeom Kim , Hyobin Kwak , Injung Kim

We propose a large-scale Hangul font recognizer that is capable of recognizing 3300 Hangul fonts. Large-scale Hangul font recognition is a challenging task. Typically, Hangul fonts are distinguished by small differences in detailed shapes, which are often ignored by the recognizer. There are additional issues in practical applications, such as the existence of almost indistinguishable fonts and the release of new fonts after the training of the recognizer. Only a few recently developed font recognizers are scalable enough to recognize thousands of fonts, most of which focus on the fonts for western languages. The proposed recognizer, HanFont, is composed of a convolutional neural network (CNN) model designed to effectively distinguish the detailed shapes. HanFont also contains a font clustering algorithm to address the issues caused by indistinguishable fonts and untrained new fonts. In the experiments, HanFont exhibits a recognition rate of 94.11% for 3300 Hangul fonts including numerous similar fonts, which is 2.49% higher than that of ResNet. The cluster-level recognition accuracy of HanFont was 99.47% when the 3300 fonts were grouped into 1000 clusters. In a test on 100 new fonts without retraining the CNN model, HanFont exhibited 57.87% accuracy. The average accuracy for the top 56 untrained fonts was 75.76%.

中文翻译:

HanFont:使用CNN和字体聚类的大规模自适应Hangul字体识别器

我们提出了一种大型的Hangul字体识别器,它能够识别3300种Hangul字体。大规模的韩文字体识别是一项艰巨的任务。通常,韩文字体的区别在于细微的形状差异,识别器通常会忽略这些差异。实际应用中还存在其他问题,例如,几乎无法区分的字体的存在以及识别器训练后发布新字体的问题。只有少数最近开发的字体识别器具有足够的可伸缩性,可以识别成千上万种字体,其中大多数集中在西方语言的字体上。提出的识别器HanFont由卷积神经网络(CNN)模型组成,该模型旨在有效地区分详细形状。HanFont还包含一种字体聚类算法,以解决由难以区分的字体和未经训练的新字体引起的问题。在实验中,HanFont对3300种Hangul字体(包括众多相似字体)的识别率为94.11%,比ResNet高2.49%。将3300种字体分组为1000个簇时,HanFont的簇级别识别精度为99.47%。在不重新训练CNN模型的情况下,对100种新字体进行的测试中,HanFont展示了57.87%的准确性。前56种未经训练的字体的平均准确性为75.76%。将3300种字体分组为1000个簇时为47%。在不重新训练CNN模型的情况下,对100种新字体进行的测试中,HanFont展示了57.87%的准确性。前56种未经训练的字体的平均准确性为75.76%。将3300种字体分组为1000个簇时为47%。在不重新训练CNN模型的情况下,对100种新字体进行的测试中,HanFont展示了57.87%的准确性。前56种未经训练的字体的平均准确性为75.76%。
更新日期:2019-07-31
down
wechat
bug