当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-07-20 , DOI: 10.1111/1755-0998.13477
Andrea Parra-Salazar 1 , Jorge Gomez 1 , Daniela Lozano-Arce 1 , Paula H Reyes-Herrera 2 , Jorge Duitama 1
Affiliation  

Genotyping-by-sequencing (GBS) is a widely used and cost-effective technique for obtaining large numbers of genetic markers from populations by sequencing regions adjacent to restriction cut sites. Although a standard reference-based pipeline can be followed to analyse GBS reads, a reference genome is still not available for a large number of species. Hence, reference-free approaches are required to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, available tools to perform de novo analysis of GBS reads face issues of usability, accuracy and performance. Furthermore, few available tools are suitable for analysing data sets from polyploid species. In this manuscript, we describe a novel algorithm to perform reference-free variant detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow for efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-of-the-art variant detector already implemented in this tool. We performed benchmark experiments with three different empirical data sets of plants and animals with different population structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for many research groups conducting population genetic studies in a wide variety of species.

中文翻译:

强大而高效的软件,用于对二倍体和多倍体物种的基因分型测序数据进行无参考基因组多样性分析

测序基因分型 (GBS) 是一种广泛使用且具有成本效益的技术,可通过对限制性切割位点附近的区域进行测序来从种群中获取大量遗传标记。尽管可以遵循标准的基于参考的管道来分析 GBS 读数,但仍然无法为大量物种提供参考基因组。因此,需要无参考方法来生成可以从 GBS 实验中获得的遗传变异信息。不幸的是,用于对 GBS 读取进行从头分析的可用工具面临可用性、准确性和性能方面的问题。此外,很少有可用的工具适合分析来自多倍体物种的数据集。在这份手稿中,我们描述了一种新算法,用于从 GBS 读取中执行无参考变异检测和基因分型。对一致序列的动态哈希表的非精确搜索允许有效的读取聚类和排序。该算法已集成到下一代测序体验平台 (NGSEP) 中,以集成已在该工具中实施的最先进的变异检测器。我们使用三种不同的植物和动物经验数据集进行了基准实验,这些数据集具有不同的种群结构和倍性,并在不同的读取深度下使用不同的 GBS 协议进行测序。这些实验表明,与现有解决方案相比,NGSEP 具有可比性,在某些情况下具有更高的准确性和更高的计算效率。我们预计这一新进展将对许多在各种物种中进行种群遗传研究的研究小组有用。
更新日期:2021-07-20
down
wechat
bug