Public health in genetic spaces: a statistical framework to optimize cluster-based outbreak detection,Virus Evolution

当前位置： X-MOL 学术 › Virus Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Public health in genetic spaces: a statistical framework to optimize cluster-based outbreak detection
Virus Evolution ( IF 5.5 ) Pub Date : 2020-01-01 , DOI: 10.1093/ve/veaa011
Connor Chato ₁ , Marcia L Kalish ₂ , Art F Y Poon _{1,

3,

4}

Affiliation

Abstract Genetic clustering is a popular method for characterizing variation in transmission rates for rapidly evolving viruses, and could potentially be used to detect outbreaks in ‘near real time’. However, the statistical properties of clustering are poorly understood in this context, and there are no objective guidelines for setting clustering criteria. Here, we develop a new statistical framework to optimize a genetic clustering method based on the ability to forecast new cases. We analysed the pairwise Tamura-Nei (TN93) genetic distances for anonymized HIV-1 subtype B pol sequences from Seattle (n = 1,653) and Middle Tennessee, USA (n = 2,779), and northern Alberta, Canada (n = 809). Under varying TN93 thresholds, we fit two models to the distributions of new cases relative to clusters of known cases: 1, a null model that assumes cluster growth is strictly proportional to cluster size, i.e. no variation in transmission rates among individuals; and 2, a weighted model that incorporates individual-level covariates, such as recency of diagnosis. The optimal threshold maximizes the difference in information loss between models, where covariates are used most effectively. Optimal TN93 thresholds varied substantially between data sets, e.g. 0.0104 in Alberta and 0.016 in Seattle and Tennessee, such that the optimum for one population would potentially misdirect prevention efforts in another. For a given population, the range of thresholds where the weighted model conferred greater predictive accuracy tended to be narrow (±0.005 units), and the optimal threshold tended to be stable over time. Our framework also indicated that variation in the recency of HIV diagnosis among clusters was significantly more predictive of new cases than sample collection dates (ΔAIC > 50). These results suggest that one cannot rely on historical precedence or convention to configure genetic clustering methods for public health applications, especially when translating methods between settings of low-level and generalized epidemics. Our framework not only enables investigators to calibrate a clustering method to a specific public health setting, but also provides a variable selection procedure to evaluate different predictive models of cluster growth.

中文翻译：

遗传空间中的公共卫生：优化基于集群的疫情检测的统计框架

摘要遗传聚类是一种流行的方法，用于表征快速进化的病毒传播率的变化，并且有可能用于“近实时”检测疫情。然而，在这种情况下，人们对聚类的统计特性知之甚少，并且没有用于设置聚类标准的客观指南。在这里，我们开发了一个新的统计框架，以基于预测新病例的能力来优化遗传聚类方法。我们分析了来自西雅图 (n = 1,653) 和美国田纳西州中部 (n = 2,779) 以及加拿大艾伯塔省北部 (n = 809) 的匿名 HIV-1 B 亚型 pol 序列的成对 Tamura-Nei (TN93) 遗传距离。在不同的 TN93 阈值下，我们将两个模型拟合到新病例相对于已知病例簇的分布： 1，假设簇增长与簇大小严格成正比的零模型，即个体之间的传播率没有变化； 2，一个加权模型，包含个体层面的协变量，例如诊断的新近度。最佳阈值最大化模型之间信息损失的差异，其中协变量的使用最有效。不同数据集之间的最佳 TN93 阈值差异很大，例如阿尔伯塔省为 0.0104，西雅图和田纳西州为 0.016，因此一个人群的最佳值可能会误导另一人群的预防工作。对于给定人群，加权模型赋予更高预测精度的阈值范围往往较窄（±0.005 个单位），并且最佳阈值往往随着时间的推移保持稳定。我们的框架还表明，簇间艾滋病毒诊断新近程度的差异比样本收集日期更能预测新病例（ΔAIC > 50）。这些结果表明，不能依赖历史先例或惯例来配置用于公共卫生应用的遗传聚类方法，特别是在低水平流行病和普遍流行病之间转换方法时。我们的框架不仅使研究人员能够根据特定的公共卫生环境校准聚类方法，而且还提供变量选择程序来评估聚类增长的不同预测模型。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文