Tree-metrizable HGT networks

doi:10.1016/j.mbs.2019.108283

Mathematical Biosciences

Volume 318, December 2019, 108283

https://doi.org/10.1016/j.mbs.2019.108283 Get rights and content

Highlights

•
All phylogenetic trees of height 2 or more can be represented on a non-trivial HGT network.
•
There are structural constraints on which HGT networks can support tree metrics.
•
All trees can be grafted onto a network to obtain a tree-metrizable network.
•
Not all networks can be grafted onto a tree to obtain a tree-metrizable network, but we provide an infinite family that can.

Abstract

Phylogenetic trees are often constructed by using a metric on the set of taxa that label the leaves of the tree. While there are a number of methods for constructing a tree using a given metric, such trees will only display the metric if it satisfies the so-called “four point condition”, established by Buneman in 1971. While this condition guarantees that a unique tree will display the metric, meaning that the distance between any two leaves can be found by adding the distances on arcs in the path between the leaves, it doesn’t exclude the possibility that a phylogenetic network might also display the metric. This possibility was recently pointed out and “tree-metrized” networks — that display a tree metric — with a single reticulation were characterized. In this paper, we show that in the case of HGT (horizontal gene transfer) networks, in fact there are tree-metrized networks containing many reticulations.

Introduction

Phylogenetic trees have been used to represent the relationships among a set of taxa labelling the leaves since the days of Darwin [2]. Especially in the case of trees drawn with a root, the arcs of such rooted trees represent an evolutionary process proceeding over time away from the root and towards the leaves, and vertices in the tree represent divergence, or speciation, events. Likewise, phylogenetic networks have come to prominence recently as a way to represent evolutionary processes in which branches of the tree interact with each other. Two key examples of such interactions are hybridization, in which genetic contributions from different lineages combine to give rise to a new lineage, and horizontal gene transfer, in which genetic material from one lineage is acquired by a second [7].

In particular, horizontal gene transfer is highly relevant for studies of evolutionary history — it is thought to be the primary driver of early cellular evolution [10], and still is relevant to ongoing evolution, with over half of total genes in the genomes of human-associated microbiota involved in horizontal gene transfer [9]. We will therefore focus in particular on HGT networks throughout this article.

While phylogenetic trees and networks can be constructed in many ways, current approaches often involve a metric on the set of taxa. That is, a matrix giving the pairwise distances between each pair of leaves of the tree or network. While such distances are natural to define on a tree, there are different ways one may define the distance between leaves in a network; we will give more details of the approach we take to this, below.

In this paper we are concerned with metrics on a set of taxa that are able to be placed on a tree — “tree metrics” — but that can also be placed on a network. It was recently observed that some tree metrics have this property, and the resulting networks that have a single “reticulation” were characterized [5]. The present paper extends this by investigating networks with more than one reticulation that can nevertheless carry tree metrics. We call such networks “tree-metrizable”.

Fortunately, there is an explicit characterization of when a metric can be placed on a tree. The famous “four-point condition” (Theorem 2.2), due to Buneman [1], says that if a metric d on a set X satisfies the condition then there exists a unique weighted phylogenetic tree (with strictly positive edge weights) with leaf-set X whose induced metric is d. This, incidentally, provides a characterisation of weighted phylogenetic trees on X: a pair of trees are isomorphic as weighted graphs if and only if their induced metrics are identical.

Metrics aside, reticulation events such as horizontal gene transfer (HGT) and hybridization are not able to be represented on a tree. For these, various forms of phylogenetic network can be defined, generalizing phylogenetic trees.

In particular, reticulation arcs can be regarded as instantaneous events which carry weights representing the proportion of genetic material carried along them. This model considers the phylogenetic network to be a linear combination of the trees that it displays, and as such, a metric can be defined from the network by taking a convex linear combination of the metrics corresponding to the displayed trees, and whose coefficients are taken from the weights on the reticulation arcs of the network.

Surprisingly, it has recently been shown that it is possible for both hybridization networks and HGT networks (defined in Section 2.2) to produce metrics that satisfy the four-point condition [5]. That is, they may carry tree metrics. The implication is that a metric being a tree metric cannot rule out the evolutionary history of the taxa X being explained by a network. In fact, any tree metric can be displayed by a network ([5, Theorem 2]), although it may be a very simple one.

A natural question then, is what phylogenetic networks might possibly carry tree metrics? We call such networks tree-metrizable networks. This question, for (binary) hybridization networks, was answered in [5]: the answer was “not many”. There are tight restrictions on where hybridizations can occur for the network to carry a tree metric. The case of HGT networks, however, was left open. Conditions on networks with a single HGT arc were established, but an example of an HGT network with two HGT arcs was given that carries a tree metric, and what’s more, the tree metric corresponded to a tree that was not a base-tree of the network (in the sense of [6])!

This paper seeks to explore this phenomenon. That is, we are interested in the situation in which the inferred metric from a weighted rooted binary HGT network might satisfy the four-point condition, and so be indistinguishable from a tree. The key approach in this paper is to graft structures on to the leaf of a tree or network, while maintaining the existence of a metric on the leaves. With such tools, complicated tree-metrizable networks can be built up from a base tree or network.

The paper begins with background definitions and results on metrics on trees and HGT networks (Section 2). We then begin our exploration of tree-metrizable HGT networks in Section 3, by first extending the four-point condition to the HGT network context, and then deriving some natural extensions to the results of [5]. These results effectively show that tree-metrizable networks can be constructed with any number of HGT arcs at all, by adding certain HGT arcs (Lemma 3.4).

The final two sections show how complicated tree-metrizable networks can be constructed by grafting trees onto small tree-metrizable networks (Section 4), and then the reverse (Section 6). The paper concludes with a discussion of the results and some further questions, in Section 7.

Section snippets

Trees and tree metrics

Unless otherwise stated, all trees in this paper are rooted binary phylogenetic X-trees, as defined below:

Definition 2.1

A rooted binary phylogenetic X-tree on a set X is a rooted acyclic digraph with the following properties:

1.
the root vertex has in-degree 0 and out-degree 2;
2.
X labels the set of vertices with out-degree 0 and in-degree 1, called leaves; and
3.
all remaining vertices have in-degree 1 and out-degree 2.

The vertices of degree 3 — all other than the root and the leaves — are called internal vertices. We

Tree-metrizability: first results

The following lemma will make our calculations involving the four-point condition easier by phrasing the four-point condition in terms of the lengths of the internal arcs of a quartet, instead of the tree distances between leaves. We defer the proof to the Appendix.

Definition 3.1

For T a rooted tree, let T^U be the unrooted tree obtained by suppressing the root vertex. That is, if r is the root vertex, we delete the vertex r and edges (r, u) and (r, v), then add (u, v). All edges are then interpreted as

Leaf-Grafting

Theorem 2.8 provides an example of a network on four leaves that has two non-trivial reticulation arcs but still is tree-metrizable. In the previous section, we addressed the question of whether tree-metrizable networks with more reticulation arcs exist; in this section we address the analogous question for the number of leaves. In particular, we will show how “leaf-grafting” trees onto the leaves of a tree-metrizable network can create a non-trivial tree-metrizable network on any base tree at

Caterpillar networks

Leaf-grafting provides a neat method for constructing tree-metrizable networks on an arbitrary number of leaves with interesting properties. We will now define a class of tree-metrizable networks on n leaves with $n - 2$ non-trivial reticulations, referred to as caterpillar networks. In combination with leaf-grafting, this result shows that any tree T of height h can be represented on a tree-metrizable network with $h - 1$ reticulation arcs.

We will require the following standard definition.

Definition 5.1

Let T be a

Leaf-Grafts with network scions

The question of whether we can form a tree-metrizable network by leaf-grafting a network onto a tree is more complicated. For instance, consider the network N₁ shown in Fig. 8. It is formed by leaf-grafting N, a slight modification of the network from [5], onto a 2-leaf binary tree. The network N is tree-metrizable by a combination of Theorem 2.8 and Lemma 3.4. However, despite the fact that N₁ is formed by grafting a tree-metrizable network into a tree, N₁ is not itself tree-metrizable -

Discussion and future questions

In this paper we have explored the observation from [5] that it is possible for a phylogenetic network to carry a tree metric. This observation means that a tree metric may be consistent with a network (or even many networks), in addition to the unique tree specified by the four point condition (Theorem 2.2).

Specifically, we have asked which networks could possibly carry a tree metric, and addressed this in two main directions. Firstly, we have shown in Section 4 that one can “grow” a

Declaration of Competing Interest

None.

References (10)

J. Fischer et al.
New common ancestor problems in trees and directed acyclic graphs
Inf. Process. Lett.
(2010)
A. Francis et al.
Tree-like reticulation networks: when do tree-like distances also support reticulate evolution?
Math. Biosci.
(2015)
P. Buneman
The recovery of trees from measures of dissimilarity
Mathematics in the Archaeological and Historical Sciences
(1971)
J. Felsenstein
Inferring Phylogenies
(2004)
A. Francis et al.
Tree-based unrooted phylogenetic networks
Bull. Math. Biol.
(2018)

There are more references available in the full text version of this article.

Cited by (0)

View full text

Original research articleTree-metrizable HGT networks

Highlights

Abstract

Introduction

Section snippets

Trees and tree metrics

Tree-metrizability: first results

Leaf-Grafting

Caterpillar networks

Leaf-Grafts with network scions

Discussion and future questions

Declaration of Competing Interest

Inf. Process. Lett.

Math. Biosci.

The recovery of trees from measures of dissimilarity

Mathematics in the Archaeological and Historical Sciences

Inferring Phylogenies

Tree-based unrooted phylogenetic networks

Bull. Math. Biol.

Original research article
Tree-metrizable HGT networks