当前位置: X-MOL 学术Biostatistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Nonparametric targeted Bayesian estimation of class proportions in unlabeled data.
Biostatistics ( IF 1.8 ) Pub Date : 2020-06-11 , DOI: 10.1093/biostatistics/kxaa022
Iván Díaz 1 , Oleksander Savenkov 1 , Hooman Kamel 2
Affiliation  

We introduce a novel Bayesian estimator for the class proportion in an unlabeled dataset, based on the targeted learning framework. The procedure requires the specification of a prior (and outputs a posterior) only for the target of inference, and yields a tightly concentrated posterior. When the scientific question can be characterized by a low-dimensional parameter functional, this focus on target prior and posterior distributions perfectly aligns with Bayesian subjectivism. We prove a Bernstein–von Mises-type result for our proposed Bayesian procedure, which guarantees that the posterior distribution converges to the distribution of an efficient, asymptotically linear estimator. In particular, the posterior is Gaussian, doubly robust, and efficient in the limit, under the only assumption that certain nuisance parameters are estimated at slower-than-parametric rates. We perform numerical studies illustrating the frequentist properties of the method. We also illustrate their use in a motivating application to estimate the proportion of embolic strokes of undetermined source arising from occult cardiac sources or large-artery atherosclerotic lesions. Though we focus on the motivating example of the proportion of cases in an unlabeled dataset, the procedure is general and can be adapted to estimate any pathwise differentiable parameter in a non-parametric model.

中文翻译:

未标记数据中类别比例的非参数目标贝叶斯估计。

我们基于有针对性的学习框架,为未标记数据集中的类比例引入了一种新颖的贝叶斯估计器。该过程仅需要为推理目标指定先验(并输出后验),并产生紧密集中的后验。当科学问题可以用低维参数泛函来表征时,这种对目标先验和后验分布的关注与贝叶斯主观主义完全一致。我们为我们提出的贝叶斯过程证明了 Bernstein-von Mises 类型的结果,这保证了后验分布收敛到有效的渐近线性估计量的分布。特别是,后验是高斯的,具有双重鲁棒性,并且在极限内有效,唯一的假设是某些有害参数的估计速度低于参数速率。我们进行数值研究来说明该方法的频率特性。我们还说明了它们在激励应用中的用途,以估计由隐匿性心脏来源或大动脉粥样硬化病变引起的未确定来源的栓塞性中风的比例。尽管我们关注的是未标记数据集中案例比例的激励示例,但该过程是通用的,可以适用于估计非参数模型中的任何路径可微参数。我们还说明了它们在激励应用中的用途,以估计由隐匿性心脏来源或大动脉粥样硬化病变引起的未确定来源的栓塞性中风的比例。尽管我们关注的是未标记数据集中案例比例的激励示例,但该过程是通用的,并且可以适应于估计非参数模型中的任何路径可微参数。我们还说明了它们在激励应用中的用途,以估计由隐匿性心脏来源或大动脉粥样硬化病变引起的未确定来源的栓塞性中风的比例。尽管我们关注的是未标记数据集中案例比例的激励示例,但该过程是通用的,可以适用于估计非参数模型中的任何路径可微参数。
更新日期:2020-06-11
down
wechat
bug