当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimating and accounting for genotyping errors in RAD-seq experiments.
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2020-03-06 , DOI: 10.1111/1755-0998.13153
Luisa Bresadola 1 , Vivian Link 1, 2 , C Alex Buerkle 3 , Christian Lexer 4 , Daniel Wegmann 1, 2
Affiliation  

In non‐model organisms, evolutionary questions are frequently addressed using reduced representation sequencing techniques due to their low cost, ease of use, and because they do not require genomic resources such as a reference genome. However, evidence is accumulating that such techniques may be affected by specific biases, questioning the accuracy of obtained genotypes, and as a consequence, their usefulness in evolutionary studies. Here, we introduce three strategies to estimate genotyping error rates from such data: through the comparison to high quality genotypes obtained with a different technique, from individual replicates, or from a population sample when assuming Hardy‐Weinberg equilibrium. Applying these strategies to data obtained with Restriction site Associated DNA sequencing (RAD‐seq), arguably the most popular reduced representation sequencing technique, revealed per‐allele genotyping error rates that were much higher than sequencing error rates, particularly at heterozygous sites that were wrongly inferred as homozygous. As we exemplify through the inference of genome‐wide and local ancestry of well characterized hybrids of two Eurasian poplar (Populus ) species, such high error rates may lead to wrong biological conclusions. By properly accounting for these error rates in downstream analyses, either by incorporating genotyping errors directly or by recalibrating genotype likelihoods, we were nevertheless able to use the RAD‐seq data to support biologically meaningful and robust inferences of ancestry among Populus hybrids. Based on these findings, we strongly recommend carefully assessing genotyping error rates in reduced representation sequencing experiments, and to properly account for these in downstream analyses, for instance using the tools presented here.

中文翻译:

RAD-seq实验中基因分型错误的估计和计算。

在非模式生物中,进化论的问题由于其成本低廉,易于使用且不需要诸如参考基因组之类的基因组资源而经常使用简化表示法来解决。但是,越来越多的证据表明,此类技术可能会受到特定偏见的影响,从而质疑获得的基因型的准确性,并因此质疑其在进化研究中的实用性。在这里,我们介绍了三种从这些数据估算基因分型错误率的策略:通过与采用不同技术,从单个重复样本或假设Hardy-Weinberg平衡时的总体样本获得的高质量基因型进行比较。将这些策略应用于通过限制性位点相关DNA测序(RAD-seq)获得的数据,可以说,最流行的简化表示法测序技术揭示了每等位基因的基因分型错误率远高于测序错误率,尤其是在被误认为纯合子的杂合位点。正如我们通过推论两个欧亚杨树的特征明确的杂种的全基因组范围和本地血统来举例说明的(属物种,如此高的错误率可能会导致错误的生物学结论。通过在下游分析中正确考虑这些错误率,通过直接整合基因分型错误或通过重新校准基因型可能性,我们仍然能够使用RAD-seq数据来支持胡杨杂种在生物学上有意义的,可靠的祖先推断。基于这些发现,我们强烈建议您在简化表示法测序实验中仔细评估基因分型错误率,并在下游分析中适当考虑这些错误率,例如使用此处提供的工具。
更新日期:2020-03-06
down
wechat
bug