当前位置: X-MOL 学术J. Stat. Plann. Inference › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semiparametric inference for merged data from multiple data sources
Journal of Statistical Planning and Inference ( IF 0.9 ) Pub Date : 2021-05-12 , DOI: 10.1016/j.jspi.2021.05.002
Takumi Saegusa

We consider general semiparametric inference when data are merged from multiple overlapping sources. Merged data exhibit several characteristics including heterogeneity across multiple data sources, potential unidentified duplicated records, and dependence due to sampling without replacement within each data source. In this paper, we establish a large sample theory for the weighted semiparametric M-estimation with data integration. Our estimator is computable without identifying duplication but corrects bias due to overlapping data sources. The main challenge is that asymptotic variance is not of closed form or contains expectations of unknown functions in general. We propose a novel computational procedure for variance estimation and show its consistency. The finite sample performance is evaluated through a simulation study. A Wilms tumor example is provided.



中文翻译:

来自多个数据源的合并数据的半参数推断

当数据来自多个重叠源时,我们考虑一般的半参数推理。合并后的数据具有多个特征,包括多个数据源之间的异质性,潜在的未识别重复记录以及每个数据源中由于没有替换而进行采样的依赖性。在本文中,我们为加权半参数建立了一个大样本理论中号数据集成的估计。我们的估算器可以计算,而无需识别重复项,但可以纠正由于重叠数据源而产生的偏差。主要的挑战是,渐近方差不是封闭形式的,或者通常不包含对未知函数的期望。我们提出了一种新颖的方差估计计算程序,并证明了其一致性。有限样本性能通过仿真研究进行评估。提供了威尔姆斯肿瘤的例子。

更新日期:2021-05-14
down
wechat
bug