当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Need for New Measures of Phylogenomic Support
Systematic Biology ( IF 6.5 ) Pub Date : 2022-01-25 , DOI: 10.1093/sysbio/syac002
Robert C Thomson 1 , Jeremy M Brown 2
Affiliation  

The scale of data sets used to infer phylogenies has grown dramatically in the last decades, providing researchers with an enormous amount of information with which to draw inferences about evolutionary history. However, standard approaches to assessing confidence in those inferences (e.g., nonparametric bootstrap proportions [BP] and Bayesian posterior probabilities [PPs]) are still deeply influenced by statistical procedures and frameworks that were developed when information was much more limited. These approaches largely quantify uncertainty caused by limited amounts of data, which is often vanishingly small with modern, genome-scale sequence data sets. As a consequence, today’s phylogenomic studies routinely report near-complete confidence in their inferences, even when different studies reach strongly conflicting conclusions and the sites and loci in a single data set contain much more heterogeneity than our methods assume or can accommodate. Therefore, we argue that BPs and marginal PPs of bipartitions have outlived their utility as the primary means of measuring phylogenetic support for modern phylogenomic data sets with large numbers of sites relative to the number of taxa. Continuing to rely on these measures will hinder progress towards understanding remaining sources of uncertainty in the most challenging portions of the Tree of Life. Instead, we encourage researchers to examine the ideas and methods presented in this special issue of Systematic Biology and to explore the area further in their own work. The papers in this special issue outline strategies for assessing confidence and uncertainty in phylogenomic data sets that move beyond stochastic error due to limited data and offer promise for more productive dialogue about the challenges that we face in reaching our shared goal of understanding the history of life on Earth.[Big data; gene tree variation; genomic era; statistical bias.]

中文翻译:

关于系统基因组支持新措施的必要性

在过去的几十年中,用于推断系统发育的数据集的规模急剧增长,为研究人员提供了大量信息来推断进化史。然而,评估这些推论的置信度的标准方法(例如,非参数引导比例 [BP] 和贝叶斯后验概率 [PPs])仍然深受信息有限时开发的统计程序和框架的影响。这些方法在很大程度上量化了由有限数据量引起的不确定性,而对于现代的基因组规模的序列数据集,这些数据量通常非常小。因此,今天的系统基因组研究经常报告对他们的推论几乎完全有信心,即使不同的研究得出强烈矛盾的结论,并且单个数据集中的位点和基因座包含的异质性比我们的方法假设或可以容纳的多得多。因此,我们认为,BPs 和边际 PPs 作为衡量现代系统发育数据集的系统发育支持的主要手段的效用已经超过了它们的效用,这些数据集具有相对于分类群数量的大量位点。继续依赖这些措施将阻碍在了解生命之树最具挑战性部分的剩余不确定性来源方面取得进展。相反,我们鼓励研究人员检查本期系统生物学特刊中提出的想法和方法,并在他们自己的工作中进一步探索该领域。本期特刊中的论文概述了评估系统基因组数据集的信心和不确定性的策略,这些策略超越了由于数据有限而导致的随机误差,并为就我们在实现了解生命历史的共同目标方面面临的挑战进行更富有成效的对话提供了希望在地球上。[大数据;基因树变异;基因组时代;统计偏差。]
更新日期:2022-01-25
down
wechat
bug