当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A computationally tractable birth-death model that combines phylogenetic and epidemiological data
bioRxiv - Genomics Pub Date : 2021-06-03 , DOI: 10.1101/2020.10.21.349068
Alexander E. Zarebski , Louis du Plessis , Kris V. Parag , Oliver G. Pybus

Inferring the dynamics of pathogen transmission during an outbreak is an important problem in both infectious disease epidemiology. In mathematical epidemiology, estimates are often informed by time series of confirmed cases, while in phylodynamics genetic sequences of the pathogen, sampled through time, are the primary data source. Each data type provides different, and potentially complementary, insight; recent studies have recognised that combining data sources can improve estimates of the transmission rate and number of infected individuals. However, inference methods are typically highly specialised and field-specific and are either computationally prohibitive or require intensive simulation, limiting their real-time utility. We present a novel birth-death phylogenetic model and derive a tractable analytic approximation of its likelihood, the computational complexity of which is linear in the size of the dataset. This approach combines epidemiological and phylodynamic data to produce estimates of key parameters of transmission dynamics and the number of unreported infections. Using simulated data we show (a) that the approximation agrees well with existing methods, (b) validate the claim of linear complexity and (c) explore robustness to model misspecification. This approximation facilitates inference on large datasets, which is increasingly important as large genomic sequence datasets become commonplace.

中文翻译:

一种结合系统发育和流行病学数据的计算上易于处理的生死模型

推断爆发期间病原体传播的动态是传染病流行病学中的一个重要问题。在数学流行病学中,估计值通常由确诊病例的时间序列提供信息,而在系统动力学中,病原体的遗传序列随时间采样,是主要数据源。每种数据类型都提供不同的、可能互补的洞察力;最近的研究已经认识到,结合数据源可以改进对传播率和感染人数的估计。然而,推理方法通常是高度专业化和特定领域的,要么计算量大,要么需要密集的模拟,限制了它们的实时效用。我们提出了一种新的生死系统发育模型,并推导出其可能性的易于分析的近似值,其计算复杂度与数据集的大小呈线性关系。这种方法结合了流行病学和系统动力学数据,以估计传播动态的关键参数和未报告的感染数量。使用模拟数据,我们表明 (a) 近似与现有方法非常吻合,(b) 验证线性复杂性的主张,以及 (c) 探索模型错误指定的鲁棒性。这种近似有助于对大型数据集的推断,随着大型基因组序列数据集变得司空见惯,这一点变得越来越重要。使用模拟数据,我们表明 (a) 近似与现有方法非常吻合,(b) 验证线性复杂性的主张,以及 (c) 探索模型错误指定的鲁棒性。这种近似有助于对大型数据集的推断,随着大型基因组序列数据集变得司空见惯,这一点变得越来越重要。使用模拟数据,我们表明 (a) 近似与现有方法非常吻合,(b) 验证线性复杂性的主张,以及 (c) 探索模型错误指定的鲁棒性。这种近似有助于对大型数据集的推断,随着大型基因组序列数据集变得司空见惯,这一点变得越来越重要。
更新日期:2021-06-03
down
wechat
bug