Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples.
Historical Methods: A Journal of Quantitative and Interdisciplinary History ( IF 1.6 ) Pub Date : 2019-10-31 , DOI: 10.1080/01615440.2019.1630343
Martha Bailey 1, 2 , Connor Cole 1 , Catherine Massey 1
Affiliation  

New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850-1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)-a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.

中文翻译:


利用关联数据改进推理的简单策略:1850-1930 年 IPUMS 关联代表性历史样本的案例研究。



新的大规模关联数据正在彻底改变定量历史和人口学。本文提出了两种补充策略来改进链接历史数据的推理:使用验证变量来识别更高质量的链接,以及使用简单的基于回归的加权程序来提高定制研究样本的代表性。我们使用 1850-1930 综合公共使用微数据系列链接代表性样本 (IPUMS-LRS)(一个高质量、公开可用的链接历史数据集)展示了这些策略的潜在价值。我们表明,虽然 IPUMS-LRS 中的错误链接率似乎较低,但研究人员可以使用验证变量进一步降低错误率。我们还展示了研究人员如何使用简单的基于回归的程序重新加权关联样本,以平衡关联样本中观察到的特征与参考群体中的特征。
更新日期:2019-10-31
down
wechat
bug