当前位置: X-MOL 学术Algorithmica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computing the Rooted Triplet Distance Between Phylogenetic Networks
Algorithmica ( IF 0.9 ) Pub Date : 2021-03-16 , DOI: 10.1007/s00453-021-00802-1
Jesper Jansson , Konstantinos Mampentzidis , Ramesh Rajaby , Wing-Kin Sung

The rooted triplet distance measures the structural dissimilarity of two phylogenetic trees or phylogenetic networks by counting the number of rooted phylogenetic trees with exactly three leaf labels (called rooted triplets, or triplets for short) that occur as embedded subtrees in one, but not both, of them. Suppose that \(N_1 = (V_1, E_1)\) and \(N_2 = (V_2, E_2)\) are phylogenetic networks over a common leaf label set of size n, that \(N_i\) has level \(k_i\) and maximum in-degree \(d_i\) for \(i \in \{1,2\}\), and that the networks’ out-degrees are unbounded. Write \(N = \max (|V_1|, |V_2|)\), \(M = \max (|E_1|, |E_2|)\), \(k = \max (k_1, k_2)\), and \(d = \max (d_1, d_2)\). Previous work has shown how to compute the rooted triplet distance between \(N_1\) and \(N_2\) in \(\mathrm {O}(n \log n)\) time in the special case \(k \le 1\). For \(k > 1\), no efficient algorithms are known; applying a classic method from 1980 by Fortune et al. in a direct way leads to a running time of \({\Omega }(N^{6} n^{3})\) and the only existing non-trivial algorithm imposes restrictions on the networks’ in- and out-degrees (in particular, it does not work when non-binary vertices are allowed). In this article, we develop two new algorithms with no such restrictions. Their running times are \(\mathrm {O}(N^{2} M + n^{3})\) and \(\mathrm {O}(M + N k^{2} d^{2} + n^{3})\), respectively. We also provide implementations of our algorithms, evaluate their performance on simulated and real datasets, and make some observations on the limitations of the current definition of the rooted triplet distance in practice. Our prototype implementations have been packaged into the first publicly available software for computing the rooted triplet distance between unrestricted networks of arbitrary levels.



中文翻译:

计算系统发生网络之间的根三重态距离

根三重距离测量结构的2个系统发生树或演化网络通过与正好三个叶标签(称为计数根系统发生树的数目相异扎根三胞胎三胞胎在一个嵌入式子树发生的简称),但不能同时,其中。假设\(N_1 =(V_1,E_1)\)\(N_2 =(V_2,E_2)\)是在大小为n的公共叶子标签集上的系统发生网络 ,\(N_i \)的级别为 ((k_i \) )和最大入度 \(d_i \)\(I \在\ {1,2 \} \) ,并且该网络出来度是无界的。写\(N = \ max(| V_1 |,| V_2 |)\)\(M = \ max(| E_1 |,| E_2 |)\)\(k = \ max(k_1,k_2)\),和\(d = \ max(d_1,d_2)\)。先前的工作已经展示了如何在特殊情况\(k \ le 1 中计算\(\ mathrm {O}(n \ log n)\)时间中 \(N_1 \)和 \(N_2 \)之间的根三联体距离 \)。对于\(k> 1 \),没有有效的算法是已知的;应用Fortune等人( 1980)的经典方法  直接导致运行时间为\({\ Omega}(N ^ {6} n ^ {3})\)并且只有现有的非平凡算法对网络的入站和出站度施加了限制(特别是在允许使用非二进制顶点时,该算法不起作用)。在本文中,我们开发了两种没有这种限制的新算法。它们的运行时间为\(\ mathrm {O}(N ^ {2} M + n ^ {3})\)\(\ mathrm {O}(M + N k ^ {2} d ^ {2} + n ^ {3})\)。我们还提供了算法的实现,评估了它们在模拟和真实数据集上的性能,并在实践中对当前定义的三重态距离的局限性进行了观察。我们的原型实现已打包到第一个公开可用的软件中,用于计算任意级别的无限制网络之间的根三联体距离。

更新日期:2021-03-16
down
wechat
bug