当前位置: X-MOL 学术Virus Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Public health in genetic spaces: a statistical framework to optimize cluster-based outbreak detection
Virus Evolution ( IF 5.3 ) Pub Date : 2020-01-01 , DOI: 10.1093/ve/veaa011
Connor Chato 1 , Marcia L Kalish 2 , Art F Y Poon 1, 3, 4
Affiliation  

Abstract Genetic clustering is a popular method for characterizing variation in transmission rates for rapidly evolving viruses, and could potentially be used to detect outbreaks in ‘near real time’. However, the statistical properties of clustering are poorly understood in this context, and there are no objective guidelines for setting clustering criteria. Here, we develop a new statistical framework to optimize a genetic clustering method based on the ability to forecast new cases. We analysed the pairwise Tamura-Nei (TN93) genetic distances for anonymized HIV-1 subtype B pol sequences from Seattle (n = 1,653) and Middle Tennessee, USA (n = 2,779), and northern Alberta, Canada (n = 809). Under varying TN93 thresholds, we fit two models to the distributions of new cases relative to clusters of known cases: 1, a null model that assumes cluster growth is strictly proportional to cluster size, i.e. no variation in transmission rates among individuals; and 2, a weighted model that incorporates individual-level covariates, such as recency of diagnosis. The optimal threshold maximizes the difference in information loss between models, where covariates are used most effectively. Optimal TN93 thresholds varied substantially between data sets, e.g. 0.0104 in Alberta and 0.016 in Seattle and Tennessee, such that the optimum for one population would potentially misdirect prevention efforts in another. For a given population, the range of thresholds where the weighted model conferred greater predictive accuracy tended to be narrow (±0.005 units), and the optimal threshold tended to be stable over time. Our framework also indicated that variation in the recency of HIV diagnosis among clusters was significantly more predictive of new cases than sample collection dates (ΔAIC > 50). These results suggest that one cannot rely on historical precedence or convention to configure genetic clustering methods for public health applications, especially when translating methods between settings of low-level and generalized epidemics. Our framework not only enables investigators to calibrate a clustering method to a specific public health setting, but also provides a variable selection procedure to evaluate different predictive models of cluster growth.

中文翻译:

遗传空间中的公共卫生:优化基于集群的爆发检测的统计框架

摘要 遗传聚类是表征快速进化病毒传播速率变化的一种流行方法,并且有可能用于“近乎实时”地检测爆发。然而,在这方面对聚类的统计特性知之甚少,并且没有用于设置聚类标准的客观指南。在这里,我们开发了一个新的统计框架,以基于预测新病例的能力来优化遗传聚类方法。我们分析了来自西雅图(n = 1,653)和美国田纳西州中部(n = 2,779)和加拿大阿尔伯塔省北部(n = 809)的匿名 HIV-1 B 型 pol 序列的成对 Tamura-Nei (TN93) 遗传距离。在不同的 TN93 阈值下,我们将两个模型拟合到新病例相对于已知病例集群的分布:1、假设集群增长与集群规模严格成正比的零模型,即个体之间的传播率没有变化;和 2,一个加权模型,它结合了个体水平的协变量,例如诊断的新近度。最佳阈值最大化模型之间信息丢失的差异,其中协变量最有效地使用。最佳 TN93 阈值在数据集之间差异很大,例如在艾伯塔省为 0.0104,在西雅图和田纳西州为 0.016,因此一个人群的最佳值可能会误导另一个人群的预防工作。对于给定的人群,加权模型赋予更高预测准确性的阈值范围往往很窄(±0.005 个单位),并且最佳阈值随着时间的推移趋于稳定。我们的框架还表明,与样本收集日期(ΔAIC > 50)相比,集群之间 HIV 诊断新近度的变化显着更能预测新病例。这些结果表明,不能依赖历史优先顺序或惯例来配置用于公共卫生应用的遗传聚类方法,尤其是在低级流行病和普遍流行病环境之间转换方法时。我们的框架不仅使研究人员能够针对特定的公共卫生环境校准聚类方法,而且还提供了一个变量选择程序来评估不同的聚类增长预测模型。这些结果表明,不能依赖历史优先顺序或惯例来配置用于公共卫生应用的遗传聚类方法,尤其是在低级流行病和普遍流行病环境之间转换方法时。我们的框架不仅使研究人员能够针对特定的公共卫生环境校准聚类方法,而且还提供了一个变量选择程序来评估不同的聚类增长预测模型。这些结果表明,不能依赖历史优先顺序或惯例来配置用于公共卫生应用的遗传聚类方法,尤其是在低级流行病和普遍流行病环境之间转换方法时。我们的框架不仅使研究人员能够针对特定的公共卫生环境校准聚类方法,而且还提供了一个变量选择程序来评估不同的聚类增长预测模型。
更新日期:2020-01-01
down
wechat
bug