当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-04-27 , DOI: 10.1111/1755-0998.13407
Jose S Hleap 1, 2, 3 , Joanne E Littlefair 1, 4 , Dirk Steinke 5 , Paul D N Hebert 5 , Melania E Cristescu 1
Affiliation  

The effective use of metabarcoding in biodiversity science has brought important analytical challenges due to the need to generate accurate taxonomic assignments. The assignment of sequences to genus or species level is critical for biodiversity surveys and biomonitoring, but it is particularly challenging as researchers must select the approach that best recovers information on species composition. This study evaluates the performance and accuracy of seven methods in recovering the species composition of mock communities by using COI barcode fragments. The mock communities varied in species number and specimen abundance, while upstream molecular and bioinformatic variables were held constant, and using a set of COI fragments. We evaluated the impact of parameter optimization on the quality of the predictions. Our results indicate that BLAST top hit competes well with more complex approaches if optimized for the mock community under study. For example, the two machine learning methods that were benchmarked proved more sensitive to reference database heterogeneity and completeness than methods based on sequence similarity. The accuracy of assignments was impacted by both species and specimen counts (query compositional heterogeneity) which ultimately influence the selection of appropriate software. We urge researchers to: (i) use realistic mock communities to allow optimization of parameters, regardless of the taxonomic assignment method employed; (ii) carefully choose and curate the reference databases including completeness; and (iii) use QIIME, BLAST or LCA methods, in conjunction with parameter tuning to better assign taxonomy to diverse communities, especially when information on species diversity is lacking for the area under study.

中文翻译:

对元条形码真核生物当前分类分配策略的评估

由于需要生成准确的分类分配,在生物多样性科学中有效使用元条形码带来了重要的分析挑战。将序列分配到属或种水平对于生物多样性调查和生物监测至关重要,但它尤其具有挑战性,因为研究人员必须选择最能恢复物种组成信息的方法。本研究评估了使用 COI 条形码片段恢复模拟群落物种组成的七种方法的性能和准确性。模拟群落的物种数量和标本丰度各不相同,而上游分子和生物信息变量保持不变,并使用一组 COI 片段。我们评估了参数优化对预测质量的影响。我们的结果表明,如果针对所研究的模拟社区进行优化,BLAST top hit 可以与更复杂的方法竞争。例如,经过基准测试的两种机器学习方法被证明对参考数据库异质性和完整性比基于序列相似性的方法更敏感。分配的准确性受物种和标本计数(查询组成异质性)的影响,最终影响适当软件的选择。我们敦促研究人员:(i) 使用真实的模拟社区来优化参数,无论采用何种分类分配方法;(ii) 仔细选择和管理参考数据库,包括完整性;(iii) 使用 QIIME、BLAST 或 LCA 方法,结合参数调整更好地将分类法分配给不同的社区,
更新日期:2021-04-27
down
wechat
bug