当前位置: X-MOL 学术Explor. Econ. Hist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Digitization and Data Frames for Card Index Records
Explorations in Economic History ( IF 1.857 ) Pub Date : 2022-07-15 , DOI: 10.1016/j.eeh.2022.101469
Someswar Amujala , Angela Vossmeyer , Sanjiv R. Das

We develop a methodology for converting card index archival records into usable data frames for statistical and textual analyses. Leveraging machine learning and natural-language processing tools from Amazon Web Services (AWS), we overcome hurdles associated with character recognition, inconsistent data reporting, column misalignment, and irregular naming. In this article, we detail the step-by-step conversion process and discuss remedies for common problems and edge cases, using historical records from the Reconstruction Finance Corporation.



中文翻译:

卡片索引记录的数字化和数据框

我们开发了一种将卡片索引档案记录转换为可用于统计和文本分析的数据框的方法。利用来自 Amazon Web Services (AWS) 的机器学习和自然语言处理工具,我们克服了与字符识别、不一致的数据报告、列错位和不规则命名相关的障碍。在本文中,我们使用 Reconstruction Finance Corporation 的历史记录详细介绍了逐步转换过程,并讨论了常见问题和极端情况的补救措施。

更新日期:2022-07-15
down
wechat
bug