当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accuracy in Near-Perfect Virus Phylogenies
Systematic Biology ( IF 6.1 ) Pub Date : 2021-08-11 , DOI: 10.1093/sysbio/syab069
Joel O Wertheim 1 , Mike Steel 2 , Michael J Sanderson 3
Affiliation  

Phylogenetic trees from real-world data often include short edges with very few substitutions per site, which can lead to partially resolved trees and poor accuracy. Theory indicates that the number of sites needed to accurately reconstruct a fully resolved tree grows at a rate proportional to the inverse square of the length of the shortest edge. However, when inferred trees are partially resolved due to short edges, “accuracy” should be defined as the rate of discovering false splits (clades on a rooted tree) relative to the actual number found. Thus, accuracy can be high even if short edges are common. Specifically, in a “near-perfect” parameter space in which trees are large, the tree length $\xi$ (the sum of all edge lengths) is small, and rate variation is minimal, the expected false positive rate is less than $\xi/3$; the exact value depends on tree shape and sequence length. This expected false positive rate is far below the false negative rate for small $\xi$ and often well below 5% even when some assumptions are relaxed. We show this result analytically for maximum parsimony and explore its extension to maximum likelihood using theory and simulations. For hypothesis testing, we show that measures of split “support” that rely on bootstrap resampling consistently imply weaker support than that implied by the false positive rates in near-perfect trees. The near-perfect parameter space closely fits several empirical studies of human virus diversification during outbreaks and epidemics, including Ebolavirus, Zika virus, and SARS-CoV-2, reflecting low substitution rates relative to high transmission/sampling rates in these viruses.[Ebolavirus; epidemic; HIV; homoplasy; mumps virus; perfect phylogeny; SARS-CoV-2; virus; West Nile virus; Yule–Harding model; Zika virus.]

中文翻译:

近乎完美的病毒系统发育的准确性

来自真实世界数据的系统发育树通常包含短边,每个站点的替换很少,这可能导致部分解析的树和准确性差。理论表明,准确重建完全解析的树所需的站点数量以与最短边长度的平方反比成正比的速率增长。然而,当推断的树由于短边而被部分解析时,“准确度”应定义为发现错误分裂(有根树上的分支)相对于实际发现数量的比率。因此,即使短边很常见,精度也可以很高。具体来说,在一个“近乎完美”的参数空间中,树很大,树的长度$\xi$(所有边长的总和)很小,率变化最小,预期的误报率小于$ \xi/3$; 确切的值取决于树的形状和序列长度。这个预期的误报率远低于小 $\xi$ 的误报率,并且即使在一些假设放松的情况下也经常远低于 5%。我们以分析的方式展示了这个结果以获得最大简约性,并使用理论和模拟探索了它对最大似然的扩展。对于假设检验,我们表明,依赖于自举重采样的拆分“支持”度量始终意味着比近乎完美树中的误报率所暗示的支持更弱。近乎完美的参数空间与爆发和流行期间人类病毒多样化的多项实证研究非常吻合,包括埃博拉病毒、寨卡病毒和 SARS-CoV-2,反映了这些病毒相对于高传播/采样率的低替代率。[埃博拉病毒; 流行性; 艾滋病病毒; 同质化; 腮腺炎病毒; 完善的系统发育;SARS-CoV-2; 病毒; 西尼罗河病毒; Yule-Harding 模型;寨卡病毒。]
更新日期:2021-08-11
down
wechat
bug