当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate reconstruction of bacterial pan- and core genomes with PEPPAN
Genome Research ( IF 7 ) Pub Date : 2020-11-01 , DOI: 10.1101/gr.260828.120
Zhemin Zhou 1 , Jane Charlesworth 1 , Mark Achtman 1
Affiliation  

Bacterial genomes can contain traces of a complex evolutionary history, including extensive homologous recombination, gene loss, gene duplications, and horizontal gene transfer. To reconstruct the phylogenetic and population history of a set of multiple bacteria, it is necessary to examine their pangenome, the composite of all the genes in the set. Here we introduce PEPPAN, a novel pipeline that can reliably construct pangenomes from thousands of genetically diverse bacterial genomes that represent the diversity of an entire genus. PEPPAN outperforms existing pangenome methods by providing consistent gene and pseudogene annotations extended by similarity-based gene predictions, and identifying and excluding paralogs by combining tree- and synteny-based approaches. The PEPPAN package additionally includes PEPPAN_parser, which implements additional downstream analyses, including the calculation of trees based on accessory gene content or allelic differences between core genes. To test the accuracy of PEPPAN, we implemented SimPan, a novel pipeline for simulating the evolution of bacterial pangenomes. We compared the accuracy and speed of PEPPAN with four state-of-the-art pangenome pipelines using both empirical and simulated data sets. PEPPAN was more accurate and more specific than any of the other pipelines and was almost as fast as any of them. As a case study, we used PEPPAN to construct a pangenome of approximately 40,000 genes from 3052 representative genomes spanning at least 80 species of Streptococcus. The resulting gene and allelic trees provide an unprecedented overview of the genomic diversity of the entire Streptococcus genus.

中文翻译:

使用 PEPPAN 准确重建细菌泛基因组和核心基因组

细菌基因组可能包含复杂进化历史的痕迹,包括广泛的同源重组、基因丢失、基因复制和水平基因转移。为了重建一组多种细菌的系统发育和种群历史,有必要检查它们的泛基因组,即该组中所有基因的复合体。在这里,我们介绍了 PEPPAN,这是一种新的管道,可以可靠地从数千个基因不同的细菌基因组构建泛基因组,这些基因组代表整个属的多样性。PEPPAN 优于现有的泛基因组方法,通过基于相似性的基因预测提供一致的基因和假基因注释,并通过结合基于树和同线性的方法来识别和排除旁系同源物。PEPPAN 包还包括 PEPPAN_parser,它实现了额外的下游分析,包括基于辅助基因含量或核心基因之间的等位基因差异计算树木。为了测试 PEPPAN 的准确性,我们实施了 SimPan,这是一种用于模拟细菌泛基因组进化的新型管道。我们使用经验和模拟数据集将 PEPPAN 的准确性和速度与四个最先进的泛基因组管道进行了比较。PEPPAN 比任何其他管道都更准确、更具体,并且几乎与任何其他管道一样快。作为案例研究,我们使用 PEPPAN 从 3052 个代表性基因组中构建了一个包含大约 40,000 个基因的泛基因组,这些基因组跨越至少 80 个物种。我们实施了 SimPan,这是一种用于模拟细菌泛基因组进化的新型管道。我们使用经验和模拟数据集将 PEPPAN 的准确性和速度与四个最先进的泛基因组管道进行了比较。PEPPAN 比任何其他管道都更准确、更具体,并且几乎与任何其他管道一样快。作为案例研究,我们使用 PEPPAN 从 3052 个代表性基因组中构建了一个包含大约 40,000 个基因的泛基因组,这些基因组跨越至少 80 个物种。我们实施了 SimPan,这是一种用于模拟细菌泛基因组进化的新型管道。我们使用经验和模拟数据集将 PEPPAN 的准确性和速度与四个最先进的泛基因组管道进行了比较。PEPPAN 比任何其他管道都更准确、更具体,并且几乎与任何其他管道一样快。作为案例研究,我们使用 PEPPAN 从 3052 个代表性基因组中构建了一个包含大约 40,000 个基因的泛基因组,这些基因组跨越至少 80 个物种。链球菌。由此产生的基因和等位基因树为整个链球菌属的基因组多样性提供了前所未有的概述。
更新日期:2020-11-02
down
wechat
bug