当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Just keep it simple? Benchmarking the accuracy of taxonomy assignment software in metabarcoding studies
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-07-16 , DOI: 10.1111/1755-0998.13473
Holly M Bik 1
Affiliation  

How do you put a name on an unknown piece of DNA? From microbes to mammals, high-throughput metabarcoding studies provide a more objective view of natural communities, overcoming many of the inherent limitations of traditional field surveys and microscopy-based observations (Deiner et al., 2017). Taxonomy assignment is one of the most critical aspects of any metabarcoding study, yet this important bioinformatics task is routinely overlooked. Biodiversity surveys and conservation efforts often depend on formal species inventories: the presence (or absence) of species, and the number of individuals reported across space and time. However, computational workflows applied in eukaryotic metabarcoding studies were originally developed for use with bacterial/archaeal data sets, where microbial researchers rely on one conserved locus (nuclear 16S rRNA) and have access to vast databases with good coverage across most prokaryotic lineages – a situation not mirrored in most multicellular taxa. In this issue of Molecular Ecology Resources, Hleap et al. (2021) carry out an extensive benchmarking exercise focused on taxonomy assignment strategies for eukaryotic metabarcoding studies utilizing the mitochondrial Cytochrome C oxidase I marker gene (COI). They assess the performance and accuracy of software tools representing diverse methodological approaches: from “simple” strategies based on sequence similarity and composition, to model-based phylogenetic and probabilistic classification tools. Contrary to popular assumptions, less complex approaches (BLAST and the QIIME2 feature classifier) consistently outperformed more sophisticated mathematical algorithms and were highly accurate for assigning taxonomy at higher levels (e.g. family). Lower-level assignments at the genus and species level still pose significant challenge for most existing algorithms, and sparse eukaryotic reference databases further limit software performance. This study illuminates current best practices for metabarcoding taxonomy assignments, and underscores the need for community-driven efforts to expand taxonomic and geographic representation in reference DNA barcode databases.

中文翻译:

保持简单?在元条形码研究中对分类分配软件的准确性进行基准测试

如何在未知的 DNA 片段上命名?从微生物到哺乳动物,高通量元条形码研究提供了更客观的自然群落视图,克服了传统实地调查和基于显微镜观察的许多固有局限性(Deiner 等,2017)。分类分配是任何元条形码研究中最关键的方面之一,但这项重要的生物信息学任务经常被忽视。生物多样性调查和保护工作通常取决于正式的物种清单:物种的存在(或不存在),以及跨空间和时间报告的个体数量。然而,应用于真核元条形码研究的计算工作流程最初是为与细菌/古细菌数据集一起使用而开发的,微生物研究人员依赖一个保守基因座(核 16S rRNA),并且可以访问覆盖大多数原核谱系的庞大数据库——这种情况在大多数多细胞分类群中没有反映。在本期分子生态资源,Hleap 等人。(2021) 开展了一项广泛的基准测试,重点是利用线粒体细胞色素 C 氧化酶 I 标记基因 (COI) 进行真核元条形码研究的分类分配策略。他们评估代表不同方法论方法的软件工具的性能和准确性:从基于序列相似性和组成的“简单”策略,到基于模型的系统发育和概率分类工具。与流行的假设相反,不太复杂的方法(BLAST 和 QIIME2 特征分类器)始终优于更复杂的数学算法,并且对于在更高级别(例如家庭)分配分类法非常准确。属和种级别的较低级别分配仍然对大多数现有算法构成重大挑战,和稀疏的真核参考数据库进一步限制了软件性能。本研究阐明了元条形码分类法分配的当前最佳实践,并强调了社区驱动努力扩展参考 DNA 条形码数据库中的分类学和地理表示的必要性。
更新日期:2021-09-09
down
wechat
bug