当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Increasing calling accuracy, coverage, and read depth in sequence data by the use of haplotype blocks
bioRxiv - Genetics Pub Date : 2021-01-08 , DOI: 10.1101/2021.01.07.425688
Torsten Pook , Adnane Nemri , Eric Gerardo Gonzalez Segovia , Henner Simianer , Chris-Carolin Schoen

High-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing technologies when resources are limited. In this work, we are proposing a new imputation pipeline ("HBimpute") that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and merge their reads locally. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced with 0.5X read-depth. Overall imputing error rates are cut in half compared to the state-of-the-art software BEAGLE, while the average read-depth is increased to 83X, thus enabling the calling of structural variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance in common breeding applications to that of genomic data from a 600k array. In particular for genome-wide association studies, the sequence data is shown to be performing slightly better. Furthermore, genomic prediction based on the overlapping markers from the array and sequence is leading to a slightly higher predictive ability for the imputed sequence data, thereby indicating that the data quality obtained from low read-depth sequencing is on par or even slightly higher than high-density array data. When including all markers for the sequence data, the predictive ability is slightly reduced indicating overall lower data quality in non-array markers.

中文翻译:

通过使用单倍型模块来提高序列数据中的调用准确性,覆盖范围和读取深度

大量品系的高通量基因分型仍然是植物遗传学中的一个关键挑战,要求遗传学家和育种者在资源有限的情况下,在各种不同的现有技术下,在数据质量和基因型品系的数量之间找到平衡。在这项工作中,我们提出了一种新的估算管线(“ HBimpute”),该管线可用于从读取深度较低的全基因组序列数据中生成高质量的基因组数据。管道的关键思想是使用HaploBlocker软件中的单倍型模块来识别本地相似的行并在本地合并其读取的内容。该管道的有效性在欧洲玉米地方品种的321条双倍单倍体系的数据集中得到了展示,并以0.5倍的读取深度进行了测序。与最先进的软件BEAGLE相比,总的插补错误率降低了一半,而平均读取深度增加到了83X,因此可以调用结构变化。通过将普通育种应用中的性能与600k阵列中基因组数据的性能进行比较,可以进一步评估获得的估算数据面板的有用性。特别是对于全基因组关联研究,序列数据显示出更好的性能。此外,基于来自阵列和序列的重叠标记的基因组预测导致对插补序列数据的预测能力略高,从而表明从低读取深度测序获得的数据质量与标准读值相当或略高-密度数组数据。当包括序列数据的所有标记时,
更新日期:2021-01-10
down
wechat
bug