当前位置: X-MOL 学术Stat. Biopharm. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-Based Clustering and Prediction With Mixed Measurements Involving Surrogate Classifiers
Statistics in Biopharmaceutical Research ( IF 1.5 ) Pub Date : 2021-01-26 , DOI: 10.1080/19466315.2020.1863257
Hua Shenam 1 , Alexander R. de Leon 1
Affiliation  

Abstract

Identification of underlying subpopulations to account for unobserved heterogeneity in the population is a challenging statistical problem, mainly because no explicit information about the latent classes is available. Although latent class analysis via finite mixture models is often used successfully to probabilistically identify subpopulations in applications, it often fails with data for which such subpopulations exhibit high latency. Borrowing strength from readily accessible auxiliary classifiers, even when subject to misclassification, may yield improved results in such settings. We develop in this article a joint modeling approach that combines data from multiple sources, including observed characteristics that are often used alone for clustering and classification, as well as results based on imperfect surrogate classifiers, to better identify the latent classes for more accurate classification and prediction. We outline maximum likelihood estimation for the joint model using the EM algorithm, and we show empirically via simulations that our methodology yields better estimates of the underlying latent class distributions than those obtained by ignoring the auxiliary information, while providing joint assessments of the surrogate classifiers. The advantages are significant when there is high latency and the surrogate classifiers are at least moderately accurate. We use real diagnostic data on dry eye disease, for which no gold standard is available, to illustrate our methodology.



中文翻译:

基于模型的聚类和预测与涉及代理分类器的混合测量

摘要

识别潜在的亚群以解释群体中未观察到的异质性是一个具有挑战性的统计问题,主要是因为没有关于潜在类别的明确信息可用。尽管通过有限混合模型的潜在类别分析通常成功地用于概率性地识别应用程序中的亚群,但对于此类亚群表现出高延迟的数据,它通常会失败。从易于访问的辅助分类器中借用强度,即使在受到错误分类的情况下,也可能在此类设置中产生改进的结果。我们在本文中开发了一种联合建模方法,该方法结合了来自多个来源的数据,包括经常单独用于聚类和分类的观察特征,以及基于不完美代理分类器的结果,更好地识别潜在类别以进行更准确的分类和预测。我们概述了使用 EM 算法的联合模型的最大似然估计,并且我们通过模拟经验证明,我们的方法比忽略辅助信息获得的潜在类别分布更好地估计,同时提供代理分类器的联合评估。当存在高延迟并且代理分类器至少适度准确时,优势是显着的。我们使用没有金标准的干眼病的真实诊断数据来说明我们的方法。我们通过模拟经验证明,我们的方法比忽略辅助信息获得的潜在类别分布更好地估计,同时提供代理分类器的联合评估。当存在高延迟并且代理分类器至少适度准确时,优势是显着的。我们使用没有金标准的干眼病的真实诊断数据来说明我们的方法。我们通过模拟经验证明,我们的方法比忽略辅助信息获得的潜在类别分布更好地估计,同时提供代理分类器的联合评估。当存在高延迟并且代理分类器至少适度准确时,优势是显着的。我们使用没有金标准的干眼病的真实诊断数据来说明我们的方法。

更新日期:2021-01-26
down
wechat
bug