当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ancient text recognition: a review
Artificial Intelligence Review ( IF 12.0 ) Pub Date : 2020-04-10 , DOI: 10.1007/s10462-020-09827-4
Sonika Rani Narang , M. K. Jindal , Munish Kumar

Optical character recognition (OCR) is an important research area in the field of pattern recognition. A lot of research has been done on OCR in the last 60 years. There is a large volume of paper-based data in various libraries and offices. Also, there is a wealth of knowledge in the form of ancient text documents. It is a challenge to maintain and search from this paper-based data. At many places, efforts are being done to digitize this data. Paper based documents are scanned to digitize data but scanned data is in pictorial form. It cannot be recognized by computers because computers can understand standard alphanumeric characters as ASCII or some other codes. Therefore, alphanumeric information must be retrieved from scanned images. Optical character recognition system allows us to convert a document into electronic text, which can be used for edit, search, etc. operations. OCR system is the machine replication of human reading and has been the subject of intensive research for more than six decades. This paper presents a comprehensive survey of the work done in the various phases of an OCR with special focus on the OCR for ancient text documents. This paper will help the novice researchers by providing a comprehensive study of the various phases, namely, segmentation, feature extraction and classification techniques required for an OCR system especially for ancient documents. It has been observed that there is a limited work is done for the recognition of ancient documents especially for Devanagari script. This article also presents future directions for the upcoming researchers in the field of ancient text recognition.

中文翻译:

古代文本识别:综述

光学字符识别(OCR)是模式识别领域的一个重要研究领域。在过去的 60 年中,人们对 OCR 进行了大量研究。各种图书馆和办公室都有大量纸质数据。此外,还有以古代文本文件形式存在的丰富知识。维护和搜索这些基于纸张的数据是一项挑战。许多地方都在努力将这些数据数字化。扫描纸质文档以将数据数字化,但扫描数据是图形形式。它无法被计算机识别,因为计算机可以将标准字母数字字符理解为 ASCII 或其他一些代码。因此,必须从扫描图像中检索字母数字信息。光学字符识别系统允许我们将文档转换为电子文本,可用于编辑、搜索等操作。OCR 系统是人类阅读的机器复制,六十年来一直是深入研究的主题。本文全面介绍了在 OCR 的各个阶段所做的工作,特别关注古代文本文档的 OCR。本文将通过对 OCR 系统(尤其是古代文档)所需的各个阶段(即分割、特征提取和分类技术)的全面研究来帮助新手研究人员。据观察,在识别古代文件方面所做的工作非常有限,尤其是对于梵文文字。本文还为即将到来的古代文本识别领域的研究人员提出了未来的方向。OCR 系统是人类阅读的机器复制,六十年来一直是深入研究的主题。本文全面介绍了在 OCR 的各个阶段所做的工作,特别关注古代文本文档的 OCR。本文将通过对 OCR 系统(尤其是古代文档)所需的各个阶段(即分割、特征提取和分类技术)的全面研究来帮助新手研究人员。据观察,在识别古代文件方面所做的工作非常有限,尤其是对于梵文文字。本文还为即将到来的古代文本识别领域的研究人员提出了未来的方向。OCR 系统是人类阅读的机器复制,六十年来一直是深入研究的主题。本文全面介绍了在 OCR 的各个阶段所做的工作,特别关注古代文本文档的 OCR。本文将通过对 OCR 系统(尤其是古代文档)所需的各个阶段(即分割、特征提取和分类技术)的全面研究来帮助新手研究人员。据观察,在识别古代文件方面所做的工作非常有限,尤其是对于梵文文字。本文还为即将到来的古代文本识别领域的研究人员提出了未来的方向。
更新日期:2020-04-10
down
wechat
bug