当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Removing the bad apples: A simple bioinformatic method to improve loci‐recovery in de novo RADseq data for non‐model organisms
Methods in Ecology and Evolution ( IF 6.3 ) Pub Date : 2021-01-28 , DOI: 10.1111/2041-210x.13562
José Cerca 1, 2, 3 , Marius F. Maurstad 1, 4 , Nicolas Rochette 5, 6 , Angel Rivera‐Colón 5 , Niraj Rayamajhi 5 , Julian Catchen 5 , Torsten H. Struck 1
Affiliation  

  1. The restriction site‐associated DNA (RADseq) family of protocols involves digesting DNA and sequencing the region flanking the cut site, thus providing a cost and time‐efficient way for obtaining thousands of genomic markers. However, when working with non‐model taxa with few genomic resources, optimization of RADseq wet‐lab and bioinformatic tools may be challenging, often resulting in allele dropout—that is when a given RADseq locus is not sequenced in one or more individuals resulting in missing data. Additionally, as datasets include divergent taxa, rates of dropout will increase since restriction sites may be lost due to mutation. Mitigating the impacts of allele dropout is crucial, as missing data may lead to incorrect inferences in population genetics and phylogenetics.
  2. Here, we demonstrate a simple pipeline for the optimization of RADseq datasets which involves partitioning datasets into subgroups, namely by reducing and analysing the dataset at a population or species level. By running the software Stacks at a subgroup level, we were able to reliably identify and remove individuals with high levels of missing data (bad apples) likely stemming from artefacts in library preparation, DNA quality or sequencing artefacts.
  3. Removal of the bad apples generally led to an increase in loci and decrease in missing data in the final datasets.
  4. The biological interpretability of the data, as measured by the number of retrieved loci and missing data, was considerably increased.


中文翻译:

去除坏苹果:一种简单的生物信息学方法,可改善非模型生物从头RADseq数据中的基因座恢复

  1. 与限制酶切位点相关的DNA(RADseq)家族协议涉及消化DNA和对切割位点侧翼的区域进行测序,从而为获得数千个基因组标记提供了一种节省成本和时间的方式。但是,当使用基因组资源很少的非模型分类单元时,优化RADseq湿实验室和生物信息学工具可能具有挑战性,常常导致等位基因缺失-也就是说,当给定的RADseq基因座未在一个或多个个体中测序时,缺失数据。此外,由于数据集包含不同的分类单元,因此丢失的速率将增加,因为限制位点可能会因突变而丢失。减轻等位基因缺失的影响至关重要,因为缺少数据可能会导致对种群遗传学和系统发育学的错误推论。
  2. 在这里,我们演示了一个用于优化RADseq数据集的简单管道,该管道涉及将数据集划分为子组,即通过在种群或物种级别上减少和分析数据集。通过在子组级别上运行软件堆栈,我们能够可靠地识别和删除具有大量缺失数据(坏苹果)的个人,这些缺失数据可能是由于文库制备,DNA质量或测序伪像中的伪像所致。
  3. 去除坏苹果通常会导致最终数据集中基因座的增加和缺失数据的减少。
  4. 通过检索基因座和缺失数据的数量来衡量,数据的生物学解释能力得到了显着提高。
更新日期:2021-01-28
down
wechat
bug