当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
To denoise or to cluster? That is not the question. Optimizing pipelines for COI metabarcoding and metaphylogeography
bioRxiv - Genetics Pub Date : 2021-01-08 , DOI: 10.1101/2021.01.08.425760
A. Antich , C. Palacin , O.S. Wangensteen , X. Turon

The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines. Using a typical dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering COI datasets, and compared the outcome of applying these processes in different orders. Our results indicate that denoising based on the UNOISE3 algorithm preserves a higher intra-cluster variability. We suggest and test ways to improve this algorithm taking into account the natural variability of each codon position in coding genes. The order of the steps (denoising and clustering) has little influence on the final outcome. We recommend researchers to consider reporting their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.

中文翻译:

去噪还是聚类?那不是问题。优化用于COI元条形码和系统地理学的管道

元条形码技术在生物多样性研究中的最新应用伴随着一些相关的方法论争论。一个这样的问题涉及通过去噪或通过聚类方法对读段的处理,这些方法被错误地提出为替代方案。也有人提出,去噪序列变体应取代簇作为元条形码分析的基本单位,而忽略了以下事实:序列簇是物种水平实体(生物多样性研究的基本单位)的代理。我们在这里辩称,为核糖体标记物开发和测试的方法已被不加批判地应用于高度可变的标记物,例如细胞色素氧化酶I(COI),而无需进行概念或操作(例如,参数设置)调整。COI具有自然高的种内变异性,应进行评估和报告,因为它是非常有价值的信息的来源。我们认为去噪和聚类不是替代方案。相反,它们是互补的,并且两者都应在COI元条形码管道中一起使用。使用来自底栖海洋群落的典型数据集,我们比较了两种去噪程序(基于UNOISE3和DADA2算法),设置了用于对COI数据集进行去噪和聚类的合适参数,并比较了以不同顺序应用这些过程的结果。我们的结果表明,基于UNOISE3算法的去噪保留了较高的集群内可变性。考虑到编码基因中每个密码子位置的自然变异性,我们建议并测试了改进此算法的方法。步骤的顺序(去噪和聚类)对最终结果影响很小。我们建议研究人员考虑根据去噪序列(单倍型的代表)和形成的簇(物种的代表)报告其结果,并避免将后者的序列折叠为单个代表。这将允许在集群(理想情况下等同于物种级别的多样性)和集群内部级别的研究,并且将简化研究之间的可加性和可比性。
更新日期:2021-01-10
down
wechat
bug