当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2019-11-25 , DOI: 10.1142/s0219720019400122
Yuan Liu 1 , Yongchao Ma 1 , Evan Salsman 2 , Frank A Manthey 2 , Elias M Elias 2 , Xuehui Li 2 , Changhui Yan 1
Affiliation  

Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.

中文翻译:

一种将模糊读数映射到参考基因组以进行 NGS 分析的富集方法

将短读数映射到参考基因组是许多下一代测序 (NGS) 分析中的重要步骤。在具有大基因组的植物中,大部分读数可以与基因组的多个位置对齐,并具有同样好的对齐分数。如何将这些模棱两可的读数映射到基因组是一个具有挑战性的问题,对下游分析有很大影响。传统上,默认方法是将模糊读取随机分配到许多潜在位置之一。在这项研究中,我们探索了两种替代方法,这些方法基于一个假设,即一个位置产生歧义读取的可能性与该位置产生的读取总数成正比:(1)分配歧义的富集方法读取所有潜在位置中产生最多读取的位置,(2) 概率方法,根据与该位置产生的读数数量成比例的概率,将模糊读数分配给一个位置。我们系统地比较了所提出方法与默认随机方法的性能。我们的结果表明,在发现单核苷酸多态性(SNP)方面,富集方法比默认随机方法和概率方法产生了更好的结果。它不仅产生了更多的 SNP 标记,而且还产生了质量更好的 SNP 标记,这通过多种主流基因组分析得到证明,包括全基因组关联研究 (GWAS)、次要等位基因分布、种群结构和基因组预测。
更新日期:2019-11-25
down
wechat
bug