当前位置: X-MOL 学术Microb. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores.
Microbial Genomics ( IF 3.9 ) Pub Date : 2020-10-01 , DOI: 10.1099/mgen.0.000398
Oliver Schwengers 1, 2, 3 , Patrick Barth 2 , Linda Falgenhauer 1, 3, 4 , Torsten Hain 1, 3 , Trinad Chakraborty 1, 3 , Alexander Goesmann 1, 2
Affiliation  

Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/.

中文翻译:

Platon:利用基于蛋白质序列的复制子分布评分在短读长草图组装中鉴定和表征细菌质粒重叠群。

质粒是独立于染色体复制的染色体外遗传元件,在细菌的环境适应中起着至关重要的作用。由于潜在的动员或结合能力,质粒是抗菌素抗性基因和毒力因子的重要遗传载体,具有巨大且日益增加的临床意义。因此,它们受到全球科学界大规模基因组研究的影响。由于下一代测序方法的迅速改进,已测序的细菌基因组的数量不断增加,反过来又增加了对专门工具的需求,以 (i) 从组装草案中提取质粒序列,(ii) 推导出它们的起源和分布, (iii) 进一步调查他们的基因库。最近,已经出现了几种生物信息学方法和工具来解决这个问题;然而,质粒序列鉴定中的高灵敏度和特异性结合很少以与分类群无关的方式实现。此外,许多软件工具不适合大型高通量分析,或者由于其技术设计或软件实现而无法包含在现有软件管道中。在这项研究中,我们大规模研究了蛋白质编码基因复制子分布的差异,作为区分质粒携带和染色体携带的重叠群的新方法。我们定义并计算了一个新指标的统计区分阈值:复制子分布评分 (RDS),其准确度达到了 96.6%。通过将 RDS 指标与启发式方法相结合,利用几个质粒特定的高级重叠群特征进一步提高了最终性能。我们在一个名为 Platon 的新的高通量独立于分类群的生物信息学软件工具中实施了这个工作流程,用于从短读草稿组装中招募和表征质粒携带的重叠群。与 PlasFlow 相比,Platon 在广泛的细菌分类群上获得了更高的准确度 (97.5%) 和更平衡的预测 (F1=82.6%),并且在测序上与靶向工具 PlasmidFinder 和 PlaScope 相比具有更好或相等的性能 我们在一个名为 Platon 的新的高通量独立于分类群的生物信息学软件工具中实施了这个工作流程,用于从短读草稿组装中招募和表征质粒携带的重叠群。与 PlasFlow 相比,Platon 在广泛的细菌分类群上实现了更高的准确度 (97.5%) 和更平衡的预测 (F1=82.6%),并且在测序上与靶向工具 PlasmidFinder 和 PlaScope 相比具有更好或相等的性能 我们在一个名为 Platon 的新的高通量独立于分类群的生物信息学软件工具中实施了这个工作流程,用于从短读草稿组装中招募和表征质粒携带的重叠群。与 PlasFlow 相比,Platon 在广泛的细菌分类群上实现了更高的准确度 (97.5%) 和更平衡的预测 (F1=82.6%),并且在测序上与靶向工具 PlasmidFinder 和 PlaScope 相比具有更好或相等的性能 大肠杆菌 分离株。Platon 可在以下网址获得:http://platon.computational.bio/。
更新日期:2020-10-27
down
wechat
bug