当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2020-09-29 , DOI: 10.1111/1755-0998.13255
Pierre Barbera 1 , Lucas Czech 1 , Sarah Lutteropp 1 , Alexandros Stamatakis 1, 2
Affiliation  

Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp, a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny‐aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp‐derived metrics can classify samples by their diversity‐correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp.

中文翻译:

SCRAPP:一种评估来自系统发育位置的微生物样本多样性的工具

微生物生态学研究目前受到 DNA 测序成本不断降低和数据分析方法准确性提高的推动。一种这样的分析方法是系统发育定位,它通过给定的系统发育参考树建立样品中匿名环境序列的系统发育同一性。然而,评估样本的多样性仍然具有挑战性,因为传统方法不能随着数据量的增加而很好地扩展和/或不利用系统发育位置信息。在这里,我们提出了scrapp,一种高度并行和可扩展的工具,它使用分子物种定界算法来量化样本给定系统发育位置的参考系统发育的多样性分布。报废采用一种新的方法来聚类系统发育放置,称为放置空间聚类,以有效地执行降维,以便在大数据量上进行扩展。此外,它使用系统发育感知分子物种定界方法 mPTP 来量化多样性。我们使用模拟和经验数据集评估了废品。我们使用模拟数据来验证我们的方法。对经验数据集的测试表明,与现有的常用方法相比,scrapp派生的指标可以通过样本的多样性相关特征对样本进行分类,甚至更好或更好。可在 https://github.com/pbdas/scrapp 上获得废品
更新日期:2020-09-29
down
wechat
bug