当前位置: X-MOL 学术Appl. Geogr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving estimates of neighborhood change with constant tract boundaries
Applied Geography ( IF 4.732 ) Pub Date : 2021-05-24 , DOI: 10.1016/j.apgeog.2021.102476
John R Logan 1 , Wenquan Zhang 2 , Brian J Stults 3 , Todd Gardner 4
Affiliation  

Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. Spatial data from different sources often are based on different geographies that need to be reconciled, and some boundaries (e.g., administrative or political boundaries) change frequently. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches to areal and population interpolation. The case in point is census tract boundaries in the United States, which are redefined before every decennial census. Research on neighborhood effects and neighborhood change rely heavily on estimates of local area characteristics for a consistent area of time, for which they now routinely use estimates based on interpolation offered by sources such as the Neighborhood Change Data Base (NCDB) and Longitudinal Tract Data Base (LTDB). We identify a fundamental problem with how these estimates are created, and we reveal an alarming level of error in estimates of population characteristics in 2000 within 2010 boundaries. We do this by comparing estimates from one of these sources (the LTDB) to true values calculated by re-aggregating original 2000 census microdata to 2010 tract areas. We then demonstrate an alternative approach that allows the re-aggregated values to be publicly disclosed, using “differential privacy” (DP) methods to inject random noise that meets Census Bureau standards for protecting confidentiality of the raw data. We show that the DP estimates are considerably more accurate than the LTDB estimates based on interpolation, and we examine conditions under which interpolation is more susceptible to error. This study reveals cause for greater caution in the use of interpolated estimates from any source. Until and unless DP estimates can be publicly disclosed for a wide range of variables and years, research on neighborhood change should routinely examine data for signs of estimation error that may be substantial in a large share of tracts that experienced complex boundary changes.



中文翻译:

改进对恒定区域边界的邻域变化的估计

社会科学家通常依靠插值方法来调整可用数据以满足他们的研究需求。来自不同来源的空间数据通常基于需要协调的不同地理区域,并且某些边界(例如,行政或政治边界)经常变化。本研究提请注意,在使用面积和人口插值的标准方法将数据统一到恒定边界时,可能会出现重大错误。典型的例子是美国的人口普查区边界,在每十年一次的人口普查之前都会重新定义。对邻里效应和邻里变化的研究在很大程度上依赖于对一致时间区域内的局部区域特征的估计,他们现在通常使用基于邻域变化数据库 (NCDB) 和纵向区域数据库 (LTDB) 等来源提供的插值的估计。我们确定了如何创建这些估计值的一个基本问题,并且我们揭示了 2000 年人口特征估计值在 2010 年范围内的惊人误差水平。我们通过将来自这些来源之一(LTDB)的估计值与通过将原始 2000 年人口普查微数据重新汇总到 2010 年大片区域计算的真实值进行比较来做到这一点。然后,我们展示了一种替代方法,该方法允许公开披露重新汇总的值,使用“差分隐私”(DP)方法注入符合人口普查局标准的随机噪声,以保护原始数据的机密性。我们表明,DP 估计比基于插值的 LTDB 估计要准确得多,并且我们检查了插值更容易出错的条件。这项研究揭示了在使用来自任何来源的插值估计时更加谨慎的原因。除非并且除非可以公开披露各种变量和年份的 DP 估计值,否则对邻里变化的研究应该定期检查数据,以寻找估计误差的迹象,这些误差在经历了复杂边界变化的大部分地区可能是巨大的。

更新日期:2021-05-25
down
wechat
bug