当前位置: X-MOL 学术Int. J. Geograph. Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Connecting family trees to construct a population-scale and longitudinal geo-social network for the U.S
International Journal of Geographical Information Science ( IF 5.7 ) Pub Date : 2020-09-30 , DOI: 10.1080/13658816.2020.1821885
Caglar Koylu 1 , Diansheng Guo 2 , Yuan Huang 2 , Alice Kasakoff 2 , Jack Grieve 3
Affiliation  

ABSTRACT

We collected 92,832 user-contributed and publicly available family trees from rootsweb.com, including 250 million individuals who were born in North America and Europe between 1630 and 1930. We cleaned and connected the family trees to create a population-scale and longitudinal family tree dataset using a workflow of data collection and cleaning, geocoding, fuzzy record linkage and a relation-based iterative search for connecting trees and deduplication of records. Given the largest connected component of nearly 40 million individuals, and a total of 80 million individuals, we generated, to date, the largest population-scale and longitudinal geo-social network over centuries. We evaluated the representativeness of the family tree dataset for historical population demography and mobility by comparing the data to the 1880 Census. Our results showed that the family trees were biased towards males, the elderly, farmers, and native-born white segments of the population. Individuals were highly mobile – in our 1880 sample of parent-child pairs where both were born in the U.S., 47% were born in different states. Our findings agreed with prior studies that people migrated from East to West in horizontal bands, and the trend was reflected in the dialects and regional structure of the U.S.



中文翻译:

连接家谱为美国构建人口规模和纵向的地理社交网络

摘要

我们从rootsweb.com收集了92,832个用户贡献和公开可用的家谱,其中包括1630年至1930年间出生在北美和欧洲的2.5亿人。我们对家谱进行了清理和连接,创建了人口规模的纵向家谱数据集使用数据收集和清理、地理编码、模糊记录链接和基于关系的迭代搜索的工作流程来连接树和记录的重复数据删除。鉴于由近 4000 万个人组成的最大连接组件,总共有 8000 万个人,我们生成了迄今为止最大的人口规模和几个世纪以来的纵向地理社交网络。我们通过将数据与 1880 年人口普查进行比较,评估了历史人口统计和流动性的家谱数据集的代表性。我们的结果表明,家谱偏向于男性、老年人、农民和本地出生的白人人群。个人的流动性很强——在我们 1880 年的亲子对样本中,他们都出生在美国,47% 的人出生在不同的州。我们的研究结果与之前的研究一致,即人们以水平带从东向西迁移,这种趋势反映在美国的方言和区域结构中

更新日期:2020-09-30
down
wechat
bug