当前位置: X-MOL 学术J. Math. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Information geometry for phylogenetic trees
Journal of Mathematical Biology ( IF 1.9 ) Pub Date : 2021-02-15 , DOI: 10.1007/s00285-021-01553-x
M K Garba 1, 2 , T M W Nye 1 , J Lueg 3 , S F Huckemann 3
Affiliation  

We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera–Holmes–Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback–Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.



中文翻译:

系统发育树的信息几何

我们提出了一个新的系统发育树空间,我们称之为wald 空间. 其动机是开发一个适合系统发育统计分析的空间,但其几何基于比现有空间更具生物学原理的假设:在 wald 空间中,如果树木在基因序列数据上引起相似的分布,则它们是接近的。作为点集,wald 空间包含先前开发的 Billera–Holmes–Vogtmann (BHV) 树空间;它还包含断开的森林,如边积 (EP) 空间,但没有 EP 空间的某些奇点。我们研究了 wald 空间的两个相关几何。第一个是由每棵树上的二态对称马尔可夫替换过程引起的字符分布的 Fisher 信息度量的几何结构。无穷小地,该度量与 Kullback-Leibler 散度成正比,或等效地,如我们所示,与任何f-分歧。类似地获得第二个几何,但在每棵树上使用相关的连续值高斯过程,它可以被视为协方差矩阵的仿射不变度量的迹度量。我们推导出一种梯度下降算法,从协方差矩阵的环境空间投影到 wald 空间。对于这两种几何,我们推导出多项式时间计算测地线的计算方法,并从数值上表明这两种信息几何(离散和连续)非常相似。特别地,测地线是外在近似的。与 BHV 几何结构的比较表明,我们的规范空间和生物驱动空间大不相同。

更新日期:2021-02-16
down
wechat
bug