当前位置: X-MOL 学术BMC Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A coarse-graining, ultrametric approach to resolve the phylogeny of prokaryotic strains with frequent homologous recombination.
BMC Ecology and Evolution ( IF 2.3 ) Pub Date : 2020-05-07 , DOI: 10.1186/s12862-020-01616-5
Tin Yau Pang 1
Affiliation  

BACKGROUND A frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many of them reconstruct the ancestral recombination graph (ARG) to help infer the genome tree, and require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. METHODS Here, we propose a Coarse-Graining approach for Phylogenetic reconstruction (CGP), which is recombination-aware but forgoes ARG reconstruction. It accounts for the tendency of a higher effective recombination rate between genomes with a lower phylogenetic distance. It is applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. RESULTS Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches. CONCLUSIONS The CGP algorithm is more accurate than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.

中文翻译:

一种粗粒度超测量方法,用于解决具有频繁同源重组的原核生物菌株的系统发育。

背景技术原核基因组进化中的常见事件是同源重组,其中外源DNA片段取代了序列相似的基因组区域。重组可通过两种不同的方式影响系统发育重建中两个基因组的相对位置:(i)一个基因组可以与另一个基因组相似的DNA延伸重组,从而降低它们的成对序列差异。(ii)一个基因组可以与外群基因组的DNA片段重组,从而增加了成对的分歧。尽管存在几种可识别重组的系统发育算法,但其中许多不能解释两种重组类型。有些算法可以,但是效率很低。此外,其中许多重构祖先重组图(ARG)以帮助推断基因组树,并要求每个基因组的很大一部分都不受重组影响,这有时是不现实的假设。方法在这里,我们提出了一种粗粒系统发生重建(CGP)方法,该方法具有重组意识,但放弃了ARG重建。它解释了系统发育距离越短的基因组之间有效重组率越高的趋势。即使所有基因组区域都经历了大量重组,该方法也适用,并且可用于核苷酸和氨基酸序列。CGP考虑沿配对基因组比对的局部取代密度,使模型适合于取代密度的经验分布,以推断配对合并时间。给定所有成对的合并时间,CGP重建了代表垂直继承的超树。结果基于仿真,我们表明该方法可以重构具有精确拓扑,分支长度和根位置的超树。从氨基酸序列推断出这种重构树应用于一组大肠杆菌菌株后,它们与基因分布最一致,这种数据类型无法被许多替代方法利用。结论对于超系统发展重建,CGP算法比替代重组感知方法更准确。从氨基酸序列推论得出,重建的树木与基因分布最一致,而这种氨基酸类型是许多替代方法都无法利用的。结论对于超系统发展重建,CGP算法比替代重组感知方法更准确。从氨基酸序列推论,重建的树最符合基因分布,这是许多替代方法无法利用的数据类型。结论对于超系统发展重建,CGP算法比替代重组感知方法更准确。
更新日期:2020-05-07
down
wechat
bug