当前位置: X-MOL 学术arXiv.cs.CG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution
arXiv - CS - Computational Geometry Pub Date : 2018-04-03 , DOI: arxiv-1804.01398
Michael Lesnick, Ra\'ul Rabad\'an, Daniel I. S. Rosenbloom

A topological approach to the study of genetic recombination, based on persistent homology, was introduced by Chan, Carlsson, and Rabad\'an in 2013. This associates a sequence of signatures called barcodes to genomic data sampled from an evolutionary history. In this paper, we develop theoretical foundations for this approach. First, we present a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles. We then study the problem of obtaining a lower bound on the novelty profile using barcodes. We focus on a low-recombination regime, where the evolutionary history can be described by a directed acyclic graph called a galled tree, which differs from a tree only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the $1^\mathrm{st}$ barcode yields a lower bound on the novelty profile, and hence on the number of recombination events. For $i>1$, the $i^{\mathrm{th}}$ barcode is empty. In addition, we use a stability principle to strengthen these results to ones which hold for any subsample of an arbitrary evolutionary history. To establish these results, we describe the topology of the Vietoris--Rips filtrations arising from evolutionary histories indexed by galled trees. As a step towards a probabilistic theory, we also show that for a random history indexed by a fixed galled tree and satisfying biologically reasonable conditions, the intervals of the $1^{\mathrm{st}}$ barcode are independent random variables. Using simulations, we explore the sensitivity of these intervals to recombination.

中文翻译:

量化遗传创新:网状进化拓扑研究的数学基础

Chan、Carlsson 和 Rabad\'an 于 2013 年引入了一种基于持久同源性的遗传重组研究拓扑方法。这将称为条形码的签名序列与从进化历史中采样的基因组数据相关联。在本文中,我们为这种方法奠定了理论基础。首先,我们提出了潜在推理问题的新公式。具体来说,我们介绍并研究了新颖性概况,这是一种简单、稳定的进化历史统计数据,不仅可以计算重组事件,还可以量化重组如何产生遗传多样性。我们建议重组拓扑方法的(迄今为止隐含的)目标是估计新颖性。然后,我们研究了使用条形码获得新颖性配置文件下限的问题。我们专注于低重组机制,其中进化历史可以通过称为磨损树的有向无环图来描述,它与树的不同之处仅在于孤立的拓扑缺陷。我们表明,在这种情况下,在完全抽样假设下,$1^\mathrm{st}$ 条形码产生了新颖性概况的下限,因此产生了重组事件的数量。对于 $i>1$,$i^{\mathrm{th}}$ 条码为空。此外,我们使用稳定性原理将这些结果加强为适用于任意进化历史的任何子样本的结果。为了建立这些结果,我们描述了 Vietoris--Rips 过滤的拓扑结构,这些过滤是由被磨损的树索引的进化历史产生的。作为迈向概率论的一步,我们还表明,对于由固定磨损树索引并满足生物学合理条件的随机历史,$1^{\mathrm{st}}$ 条码的间隔是独立的随机变量。通过模拟,我们探索了这些区间对重组的敏感性。
更新日期:2020-01-17
down
wechat
bug