当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic Estimation of Age Distributions from the First Ottoman Empire Population Register Series by Using Deep Learning
Electronics ( IF 2.9 ) Pub Date : 2021-09-13 , DOI: 10.3390/electronics10182253
Yekta Said Can , M. Erdem Kabadayı

Recently, an increasing number of studies have applied deep learning algorithms for extracting information from handwritten historical documents. In order to accomplish that, documents must be divided into smaller parts. Page and line segmentation are vital stages in the Handwritten Text Recognition systems; it directly affects the character segmentation stage, which in turn determines the recognition success. In this study, we first applied deep learning-based layout analysis techniques to detect individuals in the first Ottoman population register series collected between the 1840s and the 1860s. Then, we employed horizontal projection profile-based line segmentation to the demographic information of these detected individuals in these registers. We further trained a CNN model to recognize automatically detected ages of individuals and estimated age distributions of people from these historical documents. Extracting age information from these historical registers is significant because it has enormous potential to revolutionize historical demography of around 20 successor states of the Ottoman Empire or countries of today. We achieved approximately 60% digit accuracy for recognizing the numbers in these registers and estimated the age distribution with Root Mean Square Error 23.61.

中文翻译:

使用深度学习自动估计第一奥斯曼帝国人口登记系列的年龄分布

最近,越来越多的研究应用深度学习算法从手写历史文档中提取信息。为了做到这一点,文档必须分成更小的部分。页和行分割是手写文本识别系统中的重要阶段;它直接影响到字符分割阶段,进而决定识别成功。在这项研究中,我们首先应用基于深度学习的布局分析技术来检测 1840 年代和 1860 年代之间收集的第一个奥斯曼人口登记系列中的个体。然后,我们将基于水平投影轮廓的线分割用于这些登记册中这些检测到的个人的人口统计信息。我们进一步训练了一个 CNN 模型,以从这些历史文件中自动识别检测到的个人年龄和估计的人的年龄分布。从这些历史记录中提取年龄信息意义重大,因为它具有彻底改变奥斯曼帝国约 20 个继承国或当今国家的历史人口统计的巨大潜力。我们在识别这些寄存器中的数字时实现了大约 60% 的数字准确度,并使用均方根误差 23.61 估计了年龄分布。
更新日期:2021-09-13
down
wechat
bug