Using gridded population and quadtree sampling units to support survey sample design in low-income settings.,International Journal of Health Geographics

当前位置： X-MOL 学术 › Int. J. Health Geogr. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using gridded population and quadtree sampling units to support survey sample design in low-income settings.
International Journal of Health Geographics ( IF 4.9 ) Pub Date : 2020-03-26 , DOI: 10.1186/s12942-020-00205-5
Sarchil Hama Qader _{1,

2} , Veronique Lefebvre ₃ , Andrew J Tatem _{1,

3} , Utz Pape ₄ , Warren Jochem ₁ , Kristen Himelein ₄ , Amy Ninneman ₃ , Philip Wolburg ₄ , Gonzalo Nunez-Chaim ₄ , Linus Bengtsson ₃ , Tomas Bird ₃

Affiliation

BACKGROUND Household surveys are the main source of demographic, health and socio-economic data in low- and middle-income countries (LMICs). To conduct such a survey, census population information mapped into enumeration areas (EAs) typically serves a sampling frame from which to generate a random sample. However, the use of census information to generate this sample frame can be problematic as in many LMIC contexts, such data are often outdated or incomplete, potentially introducing coverage issues into the sample frame. Increasingly, where census data are outdated or unavailable, modelled population datasets in the gridded form are being used to create household survey sampling frames. METHODS Previously this process was done by either sampling from a set of the uniform grid cells (UGC) which are then manually subdivided to achieve the desired population size, or by sampling very small grid cells then aggregating cells into larger units to achieve a minimum population per survey cluster. The former approach is time and resource-intensive as well as results in substantial heterogeneity in the output sampling units, while the latter can complicate the calculation of unbiased sampling weights. Using the context of Somalia, which has not had a full census since 1987, we implemented a quadtree algorithm for the first time to create a population sampling frame. The approach uses gridded population estimates and it is based on the idea of a quadtree decomposition in which an area successively subdivided into four equal size quadrants, until the content of each quadrant is homogenous. RESULTS The quadtree approach used here produced much more homogeneous sampling units than the UGC (1 × 1 km and 3 × 3 km) approach. At the national and pre-war regional scale, the standard deviation and coefficient of variation, as indications of homogeneity, were calculated for the output sampling units using quadtree and UGC 1 × 1 km and 3 × 3 km approaches to create the sampling frame and the results showed outstanding performance for quadtree approach. CONCLUSION Our approach reduces the manual burden of manually subdividing UGC into highly populated areas, while allowing for correct calculation of sampling weights. The algorithm produces a relatively homogenous population counts within the sampling units, reducing the variation in the weights and improving the precision of the resulting estimates. Furthermore, a protocol of creating approximately equal-sized blocks and using tablets for randomized selection of a household in each block mitigated potential selection bias by enumerators. The approach shows labour, time and cost-saving and points to the potential use in wider contexts.

中文翻译：

使用网格人口和四叉树抽样单位支持低收入环境中的调查样本设计。

背景家庭调查是低收入和中等收入国家（LMIC）人口、健康和社会经济数据的主要来源。为了进行此类调查，映射到查点区 (EA) 的人口普查人口信息通常充当抽样框，从中生成随机样本。然而，使用人口普查信息来生成此样本框架可能会出现问题，因为在许多中低收入国家背景下，此类数据通常已过时或不完整，可能会在样本框架中引入覆盖问题。在人口普查数据过时或不可用的情况下，网格形式的人口建模数据集越来越多地被用来创建家庭调查抽样框架。方法以前，此过程是通过从一组均匀网格单元 (UGC) 中进行采样来完成的，然后手动细分以达到所需的种群大小，或者通过对非常小的网格单元进行采样，然后将单元聚合成更大的单元以获得最小种群每个调查组。前一种方法是时间和资源密集型的，并且会导致输出采样单元的显着异质性，而后者会使无偏采样权重的计算复杂化。索马里自 1987 年以来就没有进行过全面人口普查，我们利用该国的背景，首次实施了四叉树算法来创建人口抽样框。该方法使用网格人口估计，它基于四叉树分解的思想，其中一个区域依次细分为四个相等大小的象限，直到每个象限的内容是同质的。结果这里使用的四叉树方法比 UGC（1 × 1 km 和 3 × 3 km）方法产生了更加均匀的采样单元。在全国和战前区域尺度上，使用四叉树和 UGC 1 × 1 km 和 3 × 3 km 方法计算输出采样单元的标准差和变异系数，作为同质性指标，以创建采样框和结果显示四叉树方法具有出色的性能。结论我们的方法减少了将 UGC 手动细分为人口稠密区域的人工负担，同时允许正确计算采样权重。该算法在抽样单位内产生相对同质的总体计数，减少了权重的变化并提高了结果估计的精度。此外，创建大致相等大小的区块并使用平板电脑随机选择每个区块中的家庭的协议减轻了普查员潜在的选择偏差。该方法显示了劳动力、时间和成本的节省，并指出了在更广泛的背景下的潜在用途。

更新日期：2020-04-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>