当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Consistency of SVDQuartets and Maximum Likelihood for Coalescent-Based Species Tree Estimation
Systematic Biology ( IF 6.1 ) Pub Date : 2020-05-16 , DOI: 10.1093/sysbio/syaa039
Matthew Wascher 1 , Laura Kubatko 1, 2
Affiliation  

Numerous methods for inferring species-level phylogenies under the coalescent model have been proposed within the last 20 years, and debates continue about the relative strengths and weaknesses of these methods. One desirable property of a phylogenetic estimator is that of statistical consistency, which means intuitively that as more data are collected, the probability that the estimated tree has the same topology as the true tree goes to 1. To date, consistency results for species tree inference under the multispecies coalescent have been derived only for summary statistics methods, such as ASTRAL and MP-EST. These methods have been found to be consistent given true gene trees, but may be inconsistent when gene trees are estimated from data for loci of finite length (Roch et al., 2019). Here we consider the question of statistical consistency for four taxa for SVDQuartets for general data types, as well as for the maximum likelihood (ML) method in the case in which the data are a collection of sites generated under the multispecies coalescent model such that the sites are conditionally independent given the species tree (we call these data Coalescent Independent Sites (CIS) data). We show that SVDQuartets is statistically consistent for all data types (i.e., for both CIS data and for multilocus data), and we derive its rate of convergence. We additionally show that ML is consistent for CIS data under the JC69 model, and discuss why a proof for the more general multilocus case is difficult. Finally, we compare the performance of maximum likelihood and SDVQuartets using simulation for both data types.

中文翻译:

基于聚结的物种树估计的 SVDQuartets 和最大似然的一致性

在过去的 20 年里,已经提出了许多在聚结模型下推断物种级系统发育的方法,并且关于这些方法的相对优势和劣势的争论仍在继续。系统发育估计器的一个理想特性是统计一致性,这直观地意味着随着收集到的数据越多,估计的树与真实树具有相同拓扑结构的概率变为 1。迄今为止,物种树推断的一致性结果在多物种合并下,仅针对汇总统计方法得出,例如 ASTRAL 和 MP-EST。已发现这些方法在给定真实基因树的情况下是一致的,但当根据有限长度基因座的数据估计基因树时,这些方法可能不一致(Roch 等,2019)。在这里,我们考虑了一般数据类型的 SVDQuartets 的四个分类群的统计一致性问题,以及在数据是在多物种合并模型下生成的站点集合的情况下的最大似然 (ML) 方法,使得给定物种树,站点是有条件地独立的(我们称这些数据为 Coalescent Independent Site (CIS) 数据)。我们展示了 SVDQuartets 对于所有数据类型(即对于 CIS 数据和多轨迹数据)在统计上是一致的,并且我们得出了它的收敛速度。我们还展示了 ML 在 JC69 模型下对 CIS 数据是一致的,并讨论了为什么更一般的多位点情况的证明是困难的。最后,我们使用两种数据类型的模拟比较了最大似然和 SDVQuartets 的性能。
更新日期:2020-05-16
down
wechat
bug