当前位置: X-MOL 学术Nat. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improved metagenome binning and assembly using deep variational autoencoders
Nature Biotechnology ( IF 33.1 ) Pub Date : 2021-01-04 , DOI: 10.1038/s41587-020-00777-4
Jakob Nybo Nissen 1, 2 , Joachim Johansen 2 , Rosa Lundbye Allesøe 2 , Casper Kaae Sønderby 3 , Jose Juan Almagro Armenteros 1 , Christopher Heje Grønbech 3, 4 , Lars Juhl Jensen 2 , Henrik Bjørn Nielsen 5 , Thomas Nordahl Petersen 6 , Ole Winther 3, 4, 7 , Simon Rasmussen 2
Affiliation  

Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29–98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb.



中文翻译:

使用深度变分自动编码器改进宏基因组合并和组装

尽管最近在宏基因组分箱方面取得了进展,但从宏基因组数据重建微生物物种仍然具有挑战性。在这里,我们开发了用于宏基因组合并 (VAMB) 的变分自动编码器,该程序使用深度变分自动编码器在聚类之前对序列共丰度和k -mer 分布信息进行编码。我们表明,变分自动编码器能够集成这两种不同的数据类型,而无需事先了解数据集。VAMB 优于现有的最先进的 binners,分别在模拟和真实数据上重建 29-98% 和 45% 的近乎完整 (NC) 基因组。此外,VAMB 能够分离密切相关的菌株,平均核苷酸同一性 (ANI) 高达 99.5%,并重建 255 和 91 NC普通拟杆菌Bacteroides dorei样品特异性基因组作为来自 1,000 个人类肠道微生物组样品数据集的两个不同簇。我们使用该数据集中的 2,606 个 NC 箱来显示人类肠道微生物组的物种具有不同的地理分布模式。VAMB 可以在标准硬件上运行,可在 https://github.com/RasmussenLab/vamb 免费获得。

更新日期:2021-01-04
down
wechat
bug