Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets.,Ecology and Evolution

当前位置： X-MOL 学术 › Ecol. Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets.
Ecology and Evolution ( IF 2.6 ) Pub Date : 2020-06-28 , DOI: 10.1002/ece3.6483
Justin Bohling ₁

Affiliation

The advent of high‐throughput sequencing (HTS) has made genomic‐level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics.

中文翻译：

评估参考基因组差异对经验RADseq数据集的分析的影响。

高通量测序（HTS）的出现使基因组水平的分析对于非模型生物变得可行。许多HTS流水线的关键步骤涉及将读数与参考基因组比对以鉴定变体。尽管有新的举措，但只有一小部分物种具有公开可用的参考基因组。因此，通常的做法是使读段与目标物种相关的生物的基因组比对。但是，这可能会影响阅读比对和偏向基因分型。在这项研究中，我使用为两种鲑鱼（Actinopterygii； Teleostei； Salmonidae）产生的经验RADseq数据集进行了实验，以解决这些问题。目前，有六个不同进化距离的鲑鱼的参考基因组。我将RADseq数据与所有六个基因组进行比对，并确定了具有几种不同基因型的变体，然后将其用于人口遗传分析。在目标物种和参考基因组之间的系统发生距离增加，可成功对齐并绘制质量的读段比例降低。参考基因组也影响产生的SNP的数量和在这些SNP的深度，尽管影响因基因型而异。种群结构的推论是混杂的：参考基因组差异的增加减少了分化的估计，但是在各种情况下发现了相似的种群关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。在目标物种和参考基因组之间的系统发生距离增加，可成功对齐并绘制质量的读段比例降低。参考基因组也影响产生的SNP的数量和在这些SNP的深度，尽管影响因基因型而异。种群结构的推论是混杂的：参考基因组差异的增加减少了分化的估计，但是在各种情况下发现了相似的种群关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。在目标物种和参考基因组之间的系统发生距离增加，可成功对齐并绘制质量的读段比例降低。参考基因组也影响产生的SNP的数量和在这些SNP处的深度，尽管影响因基因型而异。种群结构的推论是混杂的：参考基因组差异的增加减少了分化的估计，但是在各种情况下发现了相似的种群关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。参考基因组也影响产生的SNP的数量和在这些SNP的深度，尽管影响因基因型而异。种群结构的推论是混杂的：参考基因组差异的增加减少了分化的估计，但是在各种情况下发现了相似的种群关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。参考基因组也影响产生的SNP的数量和在这些SNP的深度，尽管影响因基因型而异。种群结构的推论是混杂的：参考基因组差异的增加减少了分化的估计，但是在各种情况下发现了相似的种群关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。参考基因组差异的增加减少了分化的估计，但在各种情况下发现了相似的群体关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。参考基因组差异的增加减少了分化的估计，但在各种情况下发现了相似的群体关系模式。这些发现揭示了参考基因组的选择如何影响生物信息流水线的输出。它还强调需要为新兴的生物多样性基因组学领域确定最佳实践和指南。

更新日期：2020-07-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>