ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences,PeerJ

当前位置： X-MOL 学术 › PeerJ › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences
PeerJ ( IF 2.7 ) Pub Date : 2021-07-26 , DOI: 10.7717/peerj.11865
Dylan Catlett ₁ , Kevin Son ₁ , Connie Liang ₁

Affiliation

Background High-throughput sequencing of phylogenetically informative marker genes is a widely used method to assess the diversity and composition of microbial communities. Taxonomic assignment of sampled marker gene sequences (referred to as amplicon sequence variants, or ASVs) imparts ecological significance to these genetic data. To assign taxonomy to an ASV, a taxonomic assignment algorithm compares the ASV to a collection of reference sequences (a reference database) with known taxonomic affiliations. However, many taxonomic assignment algorithms and reference databases are available, and the optimal algorithm and database for a particular scientific question is often unclear. Here, we present the ensembleTax R package, which provides an efficient framework for integrating taxonomic assignments predicted with any number of taxonomic assignment algorithms and reference databases to determine ensemble taxonomic assignments for ASVs. Methods The ensembleTax R package relies on two core algorithms: taxmapper and assign.ensembleTax. The taxmapper algorithm maps taxonomic assignments derived from one reference database onto the taxonomic nomenclature (a set of taxonomic naming and ranking conventions) of another reference database. The assign.ensembleTax algorithm computes ensemble taxonomic assignments for each ASV in a data set based on any number of taxonomic assignments determined with independent methods. Various parameters allow analysts to prioritize obtaining either more ASVs with more predicted clade names or more robust clade name predictions supported by multiple independent methods in ensemble taxonomic assignments. Results The ensembleTax R package is used to compute two sets of ensemble taxonomic assignments for a collection of protistan ASVs sampled from the coastal ocean. Comparisons of taxonomic assignments predicted by individual methods with those predicted by ensemble methods show that conservative implementations of the ensembleTax package minimize disagreements between taxonomic assignments predicted by individual and ensemble methods, but result in ASVs with fewer ranks assigned taxonomy. Less conservative implementations of the ensembleTax package result in an increased fraction of ASVs classified at all taxonomic ranks, but increase the number of ASVs for which ensemble assignments disagree with those predicted by individual methods. Discussion We discuss how implementation of the ensembleTax R package may be optimized to address specific scientific objectives based on the results of the application of the ensembleTax package to marine protist communities. While further work is required to evaluate the accuracy of ensemble taxonomic assignments relative to taxonomic assignments predicted by individual methods, we also discuss scenarios where ensemble methods are expected to improve the accuracy of taxonomy prediction for ASVs.

中文翻译：

ensembleTax：一个 R 包，用于确定系统发育信息标记基因序列的整体分类分配

背景系统发育信息标记基因的高通量测序是评估微生物群落多样性和组成的广泛使用的方法。采样标记基因序列（称为扩增子序列变体，或 ASV）的分类分配赋予这些遗传数据生态学意义。为了将分类分配给 ASV，分类分配算法会将 ASV 与具有已知分类从属关系的参考序列集合（参考数据库）进行比较。然而，有许多分类分配算法和参考数据库可用，并且针对特定科学问题的最佳算法和数据库通常不清楚。在这里，我们提出了 ensembleTax R 包，它提供了一个有效的框架，用于将预测的分类分配与任意数量的分类分配算法和参考数据库相集成，以确定 ASV 的整体分类分配。方法 ensembleTax R 包依赖于两个核心算法：taxmapper 和 allocate.ensembleTax。Taxmapper 算法将从一个参考数据库派生的分类分配映射到另一个参考数据库的分类命名法（一组分类命名和排名约定）。allocate.ensembleTax 算法基于使用独立方法确定的任意数量的分类分配来计算数据集中每个 ASV 的集成分类分配。各种参数允许分析人员优先获得具有更多预测进化枝名称的更多 ASV，或者在集合分类分配中获得由多种独立方法支持的更稳健的进化枝名称预测。结果 ensembleTax R 包用于计算从沿海海洋采样的原生生物 ASV 集合的两组集合分类分配。单独方法预测的分类分配与集成方法预测的分类分配的比较表明，ensembleTax 包的保守实现最大限度地减少了单独方法和集成方法预测的分类分配之间的分歧，但导致 ASV 分配的分类等级较少。ensembleTax 包的不太保守的实现会导致在所有分类等级中分类的 ASV 比例增加，但会增加集成分配与个别方法预测的 ASV 不一致的 ASV 数量。讨论我们讨论如何根据 ensembleTax R 包在海洋原生生物群落中的应用结果来优化 ensembleTax R 包的实现，以实现特定的科学目标。虽然需要进一步的工作来评估集成分类分配相对于单个方法预测的分类分配的准确性，但我们还讨论了集成方法有望提高 ASV 分类预测准确性的场景。

更新日期：2021-07-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>