当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters
GigaScience ( IF 9.2 ) Pub Date : 2021-01-19 , DOI: 10.1093/gigascience/giaa154
Satria A Kautsar 1 , Justin J J van der Hooft 2 , Dick de Ridder 1 , Marnix H Medema 1
Affiliation  

Background Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). Results Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. Conclusions BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

中文翻译:

BiG-SLiCE:一种高度可扩展的工具,可绘制 120 万个生物合成基因簇的多样性

背景 生物合成基因簇 (BGC) 的基因组挖掘已成为天然产物发现的一个组成部分。现在可公开获得的>200,000 个微生物基因组包含有关丰富的新型化学的信息。驾驭这种巨大的基因组多样性的一种方法是通过同源 BGC 的比较分析,这允许识别可以与代谢物或生物活性的存在相匹配的跨物种模式。然而,当前的工具受到瓶颈的阻碍,该瓶颈是由用于将这些 BGC 分组到基因簇家族 (GCF) 的昂贵的基于网络的方法所引起的。结果在这里,我们介绍了 BiG-SLiCE,这是一种旨在对大量 BGC 进行聚类的工具。通过在欧几里得空间中表示它们,BiG-SLiCE 可以以非成对、近线性的方式将 BGC 分组为 GCF。我们使用 BiG-SLiCE 分析了 1,225 个,在典型的 36 核 CPU 服务器上,10 天内从 209,206 个公开可用的微生物基因组和宏基因组组装的基因组中收集了 071 个 BGC。我们通过重建分类学中次生代谢多样性的全球地图来识别未知的生物合成潜力,证明了这种分析的实用性。BiG-SLiCE 还提供了一种“查询模式”,可以有效地将新测序的 BGC 放入先前计算的 GCF 中,以及一个强大的输出可视化引擎,便于用户友好的数据探索。结论 BiG-SLiCE 开辟了加速天然产物发现的新可能性,并为构建全球可搜索的 BGC 互连网络迈出了第一步。随着更多基因组从未被充分研究的分类群中测序,可以挖掘更多信息以突出它们潜在的新化学。
更新日期:2021-01-19
down
wechat
bug