当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A reference‐free approach to analyse RADseq data using standard next generation sequencing toolkits
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-01-12 , DOI: 10.1111/1755-0998.13324
Rasmus Heller 1 , Casia Nursyifa 1 , Genís Garcia-Erill 1 , Jordi Salmona 2 , Lounes Chikhi 2, 3 , Jonas Meisner 1 , Thorfinn Sand Korneliussen 4 , Anders Albrechtsen 1
Affiliation  

Genotyping‐by‐sequencing methods such as RADseq are popular for generating genomic and population‐scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference‐free RADseq data processing that blends de novo elements from STACKS with the full suite of state‐of‐the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth‐of‐coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium‐depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.

中文翻译:

使用标准下一代测序工具包分析 RADseq 数据的无参考方法

基于测序的基因分型方法(如 RADseq)在从各种生物体中生成基因组和种群规模数据集方面很受欢迎。这些通常缺乏可用的参考基因组,限制用户使用 RADseq 特定软件进行处理。然而,与通用的下一代测序 (NGS) 工具包相比,这些都存在局限性。在这里,我们描述并测试了一个用于无参考 RADseq 数据处理的简单管道,该管道将来自 STACKS 的de novo元素与全套最先进的 NGS 工具混合在一起。具体来说,我们使用de novoSTACKS 使用 RADseq 程序集创建 RAD 基因座目录,作为读取映射、变异调用和位点过滤器的参考。使用来自 28 条斑马的 RADseq 数据,这些数据被测序到约 8 倍的覆盖深度,我们通过将站点频谱 (SFS) 与来自替代管道的频谱进行比较来评估我们的方法。大多数管道在 8 倍深度产生相似的 SFS,但只有基于基因型可能性的管道在低测序深度(2-4 倍)下表现相似。我们将 RADseq SFS 与八个重叠样本的中等深度 (~13x) 霰弹枪测序进行了比较,表明 RADseq SFS 持续略微偏向于稀有和不变的等位基因。使用模拟和人类数据,我们确认当 RADseq 数据中存在等位基因丢失 (AD) 时,这是预期的。RADseq 数据中的 AD 导致了~16% 的杂合性缺陷,在过滤 AD 后下降到约 5%。因此,AD 是我们 RADseq 数据中最重要的偏差来源。
更新日期:2021-01-12
down
wechat
bug