当前位置: X-MOL 学术Sociological Methods & Research › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How to Borrow Information From Unlinked Data? A Relative Density Approach for Predicting Unobserved Distributions
Sociological Methods & Research ( IF 6.5 ) Pub Date : 2020-07-10 , DOI: 10.1177/0049124120926214
Siwei Cheng 1
Affiliation  

One of the most important developments in the current era of social sciences is the growing availability and diversity of data, big and small. Social scientists increasingly combine information from multiple data sets in their research. While conducting statistical analyses with linked data is relatively straightforward, borrowing information across unlinked data can be much more challenging due to the absence of unit-to-unit linkages. This article proposes a new methodological approach for borrowing information across unlinked surveys to predict unobserved distributions. The gist of the proposed approach lies in the idea of using the relative density between the observed and unobserved distributions in the reference data to characterize the difference between the two distributions and borrow that information to the base data. Relying on the assumption that the relative density between the observed and unobserved distributions is similar between data sets, the proposed relative density approach has the key advantage of allowing the researcher to borrow information about the shape of the distribution, rather than a few summary statistics. The approach also comes with a method for incorporating and quantifying the uncertainty in its output. We illustrate the formulation of this approach, demonstrate with simulation examples, and finally apply it to address the problem of employment selection in wage inequality research.



中文翻译:

如何从未链接的数据中借用信息?相对密度方法预测未观测到的分布

当前社会科学时代最重要的发展之一是大大小小的数据的可用性和多样性。社会科学家越来越多地在研究中结合来自多个数据集的信息。尽管对链接的数据进行统计分析相对简单,但是由于缺少单位之间的链接,因此在未链接的数据之间借用信息可能更具挑战性。本文提出了一种新的方法学方法,可用于在未关联的调查中借用信息以预测未观察到的分布。所提出方法的要点在于使用相对密度的思想在参考数据中观察到的和未观察到的分布之间进行比较,以表征两个分布之间的差异,并将该信息借给基础数据。依靠假设观察到的分布与未观察到的分布之间的相对密度在数据集之间相似的情况下,所提出的相对密度方法的主要优势是允许研究人员借用有关分布形状的信息,而不是一些汇总统计信息。该方法还带有一种用于合并和量化其输出中的不确定性的方法。我们说明了这种方法的公式,并通过仿真示例进行了演示,最后将其应用于解决工资不平等研究中的就业选择问题。

更新日期:2020-07-10
down
wechat
bug