当前位置: X-MOL 学术Biostatistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regularized Bayesian transfer learning for population-level etiological distributions.
Biostatistics ( IF 2.1 ) Pub Date : 2021-10-13 , DOI: 10.1093/biostatistics/kxaa001
Abhirup Datta 1 , Jacob Fiksel 1 , Agbessi Amouzou 2 , Scott L Zeger 1
Affiliation  

Computer-coded verbal autopsy (CCVA) algorithms predict cause of death from high-dimensional family questionnaire data (verbal autopsy) of a deceased individual, which are then aggregated to generate national and regional estimates of cause-specific mortality fractions. These estimates may be inaccurate if CCVA is trained on non-local training data different from the local population of interest. This problem is a special case of transfer learning, i.e., improving classification within a target domain (e.g., a particular population) with the classifier trained in a source-domain. Most transfer learning approaches concern individual-level (e.g., a person's) classification. Social and health scientists such as epidemiologists are often more interested with understanding etiological distributions at the population-level. The sample sizes of their data sets are typically orders of magnitude smaller than those used for common transfer learning applications like image classification, document identification, etc. We present a parsimonious hierarchical Bayesian transfer learning framework to directly estimate population-level class probabilities in a target domain, using any baseline classifier trained on source-domain, and a small labeled target-domain dataset. To address small sample sizes, we introduce a novel shrinkage prior for the transfer error rates guaranteeing that, in absence of any labeled target-domain data or when the baseline classifier is perfectly accurate, our transfer learning agrees with direct aggregation of predictions from the baseline classifier, thereby subsuming the default practice as a special case. We then extend our approach to use an ensemble of baseline classifiers producing an unified estimate. Theoretical and empirical results demonstrate how the ensemble model favors the most accurate baseline classifier. We present data analyses demonstrating the utility of our approach.

中文翻译:

用于人口级病因分布的正则化贝叶斯迁移学习。

计算机编码的口头尸检 (CCVA) 算法根据死者的高维家庭问卷数据(口头尸检)预测死因,然后汇总这些数据以生成国家和地区特定原因死亡率分数的估计值。如果 CCVA 是在与本地感兴趣的人群不同的非本地训练数据上训练的,那么这些估计可能不准确。这个问题是迁移学习的一个特例,即使用在源域中训练的分类器来改进目标域(例如,特定群体)内的分类。大多数迁移学习方法都涉及个人级别(例如,一个人的)分类。流行病学家等社会和健康科学家通常对了解人群水平的病因分布更感兴趣。他们的数据集的样本量通常比用于图像分类、文档识别等常见迁移学习应用程序的样本量小几个数量级。域,使用在源域上训练的任何基线分类器,以及一个小的标记目标域数据集。为了解决小样本量的问题,我们在传输错误率之前引入了一种新的收缩,以保证在没有任何标记的目标域数据或基线分类器完全准确的情况下,我们的迁移学习与来自基线的预测的直接聚合一致分类器,从而将默认做法归为特殊情况。然后我们扩展我们的方法以使用一组基线分类器产生统一的估计。理论和经验结果证明了集成模型如何支持最准确的基线分类器。我们提供数据分析来证明我们的方法的实用性。
更新日期:2020-02-10
down
wechat
bug