当前位置: X-MOL 学术Explor. Econ. Hist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reading the Ransom: Methodological advancements in extracting the Swedish Wealth Tax of 1571
Explorations in Economic History ( IF 1.857 ) Pub Date : 2022-07-16 , DOI: 10.1016/j.eeh.2022.101470
Christopher Blomqvist , Kerstin Enflo , Andreas Jakobsson , Kalle Åström

We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words and help organize them into tabular formats. By demonstrating an automated process to transform loosely structured handwritten information from the 16th century into organized tables, our method should interest economic historians seeking to digitize and organize quantitative material from pre-industrial periods.



中文翻译:

阅读赎金:提取 1571 年瑞典财富税的方法进步

我们描述了一种从 16世纪开始阅读手写记录的深度学习方法世纪。该方法由分割模块和手写文本识别(HTR)模块的组合组成。基于转换器的 HTR 模块在阅读、分类和提取页面上每个单词的位置时利用语言和图像特征。该方法在一份独特的历史文件中得到了证明:1571 年的瑞典财富税。结果表明,分割模块的性能明显优于在最先进的程序中实施的布局分析,使我们能够追踪更多的文本块正确地在每一页上。HTR 模块除了能够对单词进行分类并帮助将它们组织成表格格式外,还具有低字符错误率 (CER)。通过演示从 16世纪以来转换结构松散的手写信息的自动化过程世纪进入有组织的表格,我们的方法应该会引起经济历史学家的兴趣,他们希望将前工业时期的量化材料数字化和组织起来。

更新日期:2022-07-16
down
wechat
bug