Estimating Bayesian Phylogenetic Information Content,Systematic Biology

当前位置： X-MOL 学术 › Syst. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating Bayesian Phylogenetic Information Content
Systematic Biology ( IF 6.1 ) Pub Date : 2016-05-06 , DOI: 10.1093/sysbio/syw042
Paul O Lewis ₁ , Ming-Hui Chen ₂ , Lynn Kuo ₂ , Louise A Lewis ₃ , Karolina Fučíková ₃ , Suman Neupane ₃ , Yu-Bo Wang ₂ , Daoyuan Shi ₂

Affiliation

Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data. [Bayesian; concatenation; conditional clade distribution; entropy; information; phylogenetics; saturation.]

中文翻译：

估计贝叶斯系统发育信息内容

测量数据的系统发育信息含量在系统学中有着悠久的历史。在这里，我们探讨了信息内容估计的贝叶斯方法。与先验分布的熵相比，后验分布的熵提供了一种衡量信息内容的自然方法。如果数据除了先验提供的信息之外没有与排序树拓扑相关的信息，则后验和先验将相同。数据中的信息不鼓励考虑先验允许的某些假设，导致后验分布比先验分布更集中（具有较低的熵）。我们专注于使用树拓扑的边缘后验分布来测量有关树拓扑的信息。我们表明，拓扑信息内容估计的准确性和计算效率都随着条件进化枝分布的使用而提高，这也允许拓扑信息内容按进化枝划分。我们探索了我们方法的两个重要应用：提供令人信服的饱和度定义和检测数据分区之间可能对串联数据分析产生负面影响的冲突。[贝叶斯；级联; 条件进化枝分布；熵; 信息; 系统发育学；饱和。] 提供令人信服的饱和度定义并检测数据分区之间可能对串联数据分析产生负面影响的冲突。[贝叶斯；级联; 条件进化枝分布；熵; 信息; 系统发育学；饱和。] 提供令人信服的饱和度定义并检测数据分区之间可能对串联数据分析产生负面影响的冲突。[贝叶斯；级联; 条件进化枝分布；熵; 信息; 系统发育学；饱和。]

更新日期：2016-05-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11