Computing nearest neighbour interchange distances between ranked phylogenetic trees,Journal of Mathematical Biology

当前位置： X-MOL 学术 › J. Math. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Computing nearest neighbour interchange distances between ranked phylogenetic trees
Journal of Mathematical Biology ( IF 2.2 ) Pub Date : 2021-01-25 , DOI: 10.1007/s00285-021-01567-5
Lena Collienne ₁ , Alex Gavryushkin ₁

Affiliation

Many popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is \({\mathbf {N}}{\mathbf {P}}\)-hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to \({\mathbf {N}}{\mathbf {P}}\)-hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).

中文翻译：

计算排序系统发育树之间的最近邻交换距离

许多用于搜索叶标记（系统发育）树空间的流行算法都基于树的重排操作。在任何这样的操作下，问题被简化为搜索顶点是树并且（无向）边由通过一个重排操作（有时称为移动）连接的树对给出的图。最受欢迎的是经典的最近邻交换、子树修剪和再移植，以及树二分和重新连接移动。然而，计算距离的问题是\({\mathbf {N}}{\mathbf {P}}\)-在这些图中的每一个中都很难，使得树推理和比较算法在实践中设计起来具有挑战性。尽管对角系统发育树是癌症研究、免疫学和流行病学等应用中感兴趣的核心对象之一，但这些树的最短路径问题的计算复杂性几十年来仍未解决。在本文中，我们通过确定复杂度取决于两种类型的树重排（等级移动和边缘移动）之间的权重差异，并从二次方变化，这是最低可能的这个问题的复杂度为\({\mathbf {N}}{\mathbf {P}}\)-hard，这是最高的。特别是，我们的结果提供了系统发育树重排操作的第一个示例，可以有效地计算最短路径和距离。具体来说，我们的算法可以扩展到具有数万片叶子的树（如果有效实施，可能会达到数十万片）。

更新日期：2021-01-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11