当前位置: X-MOL 学术Social Science History › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recognizing Sample-Selection Bias in Historical Data
Social Science History ( IF 0.5 ) Pub Date : 2020-07-06 , DOI: 10.1017/ssh.2020.11
Ariell Zimran

Recent research has ignited a debate in social science history over whether and how to draw conclusions for whole populations from sources that describe only select subsets of these populations. The idiosyncratic availability and survival of historical sources create a threat of sample-selection bias—an error that arises when there are systematic differences between the observed sample and the population of interest. This danger is common in studying trends in health as measured by average stature—scholars can often observe these trends only for soldiers and other similar groups; but whether these patterns are representative of those of the broader population is unclear. This article illustrates what simple patterns in a potentially selected sample can be used to recognize the presence of sample-selection bias in a source, and to understand how such bias might affect conclusions drawn from this source. Applying this intuition to the use of military data to describe stature in the antebellum United States, I present several simple empirical exercises based on these patterns. Finally, I use the results of these exercises to describe how sample-selection bias might affect the use of these data in testing for differences in average stature between the Northeast and the Midwest.

中文翻译:

识别历史数据中的样本选择偏差

最近的研究在社会科学史上引发了一场争论,即是否以及如何从仅描述这些人群的选定子集的来源中为整个人群得出结论。历史来源的特殊可用性和生存造成了样本选择偏差的威胁——当观察到的样本和感兴趣的人群之间存在系统差异时,就会出现这种错误。这种危险在研究以平均身高衡量的健康趋势时很常见——学者们通常只能观察到士兵和其他类似群体的这些趋势;但这些模式是否代表更广泛人群的模式尚不清楚。本文说明了可能选择的样本中的哪些简单模式可用于识别源中样本选择偏差的存在,并了解这种偏见如何影响从该来源得出的结论。将这种直觉应用到使用军事数据来描述战前美国的地位,我提出了几个基于这些模式的简单经验练习。最后,我使用这些练习的结果来描述样本选择偏差如何影响这些数据在测试东北部和中西部之间的平均身高差异时的使用。
更新日期:2020-07-06
down
wechat
bug