当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
Nucleic Acids Research ( IF 14.9 ) Pub Date : 2018-03-19 , DOI: 10.1093/nar/gky174
Joel A Boyd 1 , Ben J Woodcroft 1 , Gene W Tyson 1
Affiliation  

Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.

中文翻译:

GraftM:用于在基因组学中对基因进行可扩展的系统发育信息分类的工具

大规模的宏基因组数据集可从环境样本中恢复数百个种群基因组。但是,这些基因组通常不代表复杂微生物群落的全部多样性。以基因为中心的方法可通过独立检查每个读数来获得多样性的全面视图,但是传统的成对比较方法通常对分类法进行过分分类,并且随着元基因组和数据库规模的增加而扩展性很差。在这里,我们介绍GraftM,该工具使用基因特有的程序包使用隐藏的马尔可夫模型(HMM)或DIAMOND数据库快速识别宏基因组数据中的基因家族,并使用放置到预先构建的基因树中对这些序列进行分类。GraftM的速度和准确性已通过计算机计算机进行了基准测试使用分类标记的体外模拟社区,发现在家庭一级具有更高的准确性,处理时间比目前可用的软件快2.0–3.7倍。使用16S rRNA和甲基辅酶M还原酶(McrA)特异性gpkgs对湿地基因组进行的探索揭示了在深度梯度上的分类学和功能变化。使用McrA gpkg对NCBI nr数据库进行的分析允许检测属于门系谱系的新序列。在线(https://github.com/geronimp/graftM_gpkgs)上有越来越多的gpkgs集合,可以在其中上载和交换精选的软件包。
更新日期:2018-03-19
down
wechat
bug