Elsevier

Applied Geography

Volume 132, July 2021, 102476
Applied Geography

Improving estimates of neighborhood change with constant tract boundaries

https://doi.org/10.1016/j.apgeog.2021.102476Get rights and content

Highlights

  • Interpolated estimates are subject to substantial errors that are consequential for analyses of neighborhood change.

  • Error results from assuming the composition of areas within tracts is spatially uniform.

  • An alternative estimation procedure infuses random noise in the original census data, allowing its disclosure.

  • Such estimates for several population characteristics studied here are now publicly available.

Abstract

Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. Spatial data from different sources often are based on different geographies that need to be reconciled, and some boundaries (e.g., administrative or political boundaries) change frequently. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches to areal and population interpolation. The case in point is census tract boundaries in the United States, which are redefined before every decennial census. Research on neighborhood effects and neighborhood change rely heavily on estimates of local area characteristics for a consistent area of time, for which they now routinely use estimates based on interpolation offered by sources such as the Neighborhood Change Data Base (NCDB) and Longitudinal Tract Data Base (LTDB). We identify a fundamental problem with how these estimates are created, and we reveal an alarming level of error in estimates of population characteristics in 2000 within 2010 boundaries. We do this by comparing estimates from one of these sources (the LTDB) to true values calculated by re-aggregating original 2000 census microdata to 2010 tract areas. We then demonstrate an alternative approach that allows the re-aggregated values to be publicly disclosed, using “differential privacy” (DP) methods to inject random noise that meets Census Bureau standards for protecting confidentiality of the raw data. We show that the DP estimates are considerably more accurate than the LTDB estimates based on interpolation, and we examine conditions under which interpolation is more susceptible to error. This study reveals cause for greater caution in the use of interpolated estimates from any source. Until and unless DP estimates can be publicly disclosed for a wide range of variables and years, research on neighborhood change should routinely examine data for signs of estimation error that may be substantial in a large share of tracts that experienced complex boundary changes.

Introduction

Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. Spatial data from different sources often are based on different geographies that need to be reconciled, and some boundaries (e.g., administrative or political boundaries) change frequently. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches of areal and population interpolation. The case in point is census tract boundaries in the United States, which are redefined before every decennial census. We study the accuracy of standard methods of harmonizing such data over time to deal with changing boundaries. Previous research by Logan et al. (2016) took advantage of the public release of population counts from the 2000 Census using 2010 boundaries, showing that estimates using current methods from the Geographic Information Systems (GIS) toolkit were close to the true values in most tracts. We confirm that finding here, drawing on the original confidential 2000 data in a Federal Statistical Research Data Center (FSRDC). We also compare the “true” and estimated values of a selected set of other population characteristics. We find first that there is much more error in these estimates than in estimates of simple population counts. Second, an alternative approach that injects random noise into the true values, so that they can be publicly disclosed, provides considerably more accurate estimates. And third, there are identifiable conditions under which the interpolation-based tract estimate is more prone to error and therefore requires closer attention by researchers.

Section snippets

Geographical approaches to adjusting boundaries

Geographers have devoted much attention to the effect of discrepancies in the boundaries of the areal units used for the analysis of spatial data. A typical situation is when data are being drawn from different sources. For example, population data may be reported in census tracts, while crime data may be reported in police precincts, or election data in voting districts, or school data in school attendance zones. Another situation, the one we tackle here, is when there are changes over time in

Interpolation to harmonize census tract boundaries over time

We consider here the specific case of harmonizing data for census tracts in the United States over time, and we point to a critical source of error in the available interpolated estimates. The earliest national source was the Neighborhood Change Data Base (NCDB, originally developed by the Urban Institute) that first became available in 2002 and was quickly adopted by most social scientists studying tract data in the 1970–2000 period. Its great attraction was that for the first time it promised

Assessing interpolation estimates and a new alternative

We now turn to an effort to gauge the quality of the estimates provided by the LTDB. Records held in a Federal Statistical Research Data Center (RDC) allow us to determine the 2010 tract area where persons and households lived when enumerated in the 2000 Census, either for short form (intending to cover the full population) or long form (covering one in six households) samples. We can then aggregate these 2000 census records within 2010 geography to provide the best, unbiased estimate of the

Research design

This study includes all populated census tracts in 2000 and 2010 in the continental United States. These tracts can be categorized according to how their 2000 and 2010 boundaries compare. We treat as “unchanged” those cases where the difference in boundaries between a tract in year 1 and year 2 involves less than 1 percent of the land area of the year 2 tract. There are three main categories of changes: consolidations, splits, and complex changes. Consolidation is when several 2000 tracts are

Errors in LTDB and DP estimates

Table 2 summarizes the level of errors in the LTDB and DP estimates in terms of RMSE for every type of tract and for all six population variables. The key finding with respect to the purpose of this study is that the LTDB estimates of total population are much better than estimates of the under 18, non-Hispanic white, college-educated, and homeowner populations. This differential is small for unchanged and consolidated tracts. Here no interpolation was conducted for the LTDB; data were taken

Conclusion

The results are clear. Although interpolated estimates of tract “total population” are very reliable, there is even less error in the DP estimates. For other demographic characteristics, interpolation introduces considerable error, while the DP estimates are generally very close to the true values. How great is the problem? In a substantial share of cases for tracts with complex boundary changes, the LTDB estimates differ from the true value by five or ten percent or more.

Fortunately, though

Declaration of interest

No potential conflict of interest was reported by the authors.

Data availability statement

The publicly available data analyzed here are available at this website: “https://s4.ad.brown.edu/Projects/Diversity/Researcher/Bridging.htm” \o "https://s4.ad.brown.edu/Projects/Diversity/Researcher/Bridging.htm".

Acknowledgments

This research was supported by the Sociology Program of the National Science Foundation (grant 1756567). The Population Studies and Training Center at Brown University (P2CHD041020) provided general support. We thank John Friedman (Brown University) and Adam Smith (Boston University) for assistance with differential privacy methods. All results have been reviewed by the U.S. Census Bureau to ensure that no confidential information is disclosed (approval number CBDRB-FY20-208). Any opinions and

References (24)

  • R. Flowerdew et al.

    Using areal interpolation methods in geographic information systems

    Papers in Regional Science

    (1991)
  • Y. Xie

    The overlaid network algorithms for areal interpolation problem

    Computers, Environment and Urban Systems

    (1995)
  • I. Bracken et al.

    The generation of spatial population distributions from census centroid data

    Environment & Planning A

    (1989)
  • I. Bracken et al.

    Linkage of the 1981 and 1991 UK censuses using surface modelling concepts

    Environment & Planning A

    (1995)
  • R. Chetty et al.

    A practical method to reduce privacy loss when disclosing statistics based on small samples” NBER Working Paper 25626

    (2019)
  • C. Dwork et al.

    Calibrating noise to sensitivity in private data analysis

  • C. Eicher et al.

    Dasymetric mapping and areal interpolation: Implementation and evaluation

    Cartography and Geographic Information Science

    (2001)
  • M.F. Goodchild et al.

    A framework for the areal interpolation of socioeconomic data

    Environment & Planning A

    (1993)
  • M.F. Goodchild et al.

    Areal interpolation, A variant of the traditional spatial problem

    Geo-Processing

    (1980)
  • P.C. Kyriakidis

    A geostatistical framework for area-to-point spatial interpolation

    Geographical Analysis

    (2004)
  • P.C. Kyriakidis et al.

    Geostatistical prediction and simulation of point values from areal data

    Geographical Analysis

    (2005)
  • J.R. Logan et al.

    Validating population estimates for harmonized census tract data, 2000-2010

    Annals of the Association of American Geographers

    (2016)
  • View full text