当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Alignment-free methods for polyploid genomes: quick and reliable genetic distance estimation
bioRxiv - Genetics Pub Date : 2020-10-25 , DOI: 10.1101/2020.10.23.352963
Acer VanWallendael , Mariano Alvarez

Polyploid genomes pose several inherent challenges to population genetic analyses. While alignment-based methods are fundamentally limited in their applicability to polyploids, alignment-free methods bypass most of these limits. We investigated the use of Mash, a k-mer analysis tool that uses the MinHash method to reduce complexity in large genomic datasets, for basic population genetic analyses of polyploid sequences. We measured the degree to which Mash correctly estimated pairwise genetic distance in simulated diploid and polyploid short-read sequences with various levels of missing data. Mash-based estimates of genetic distance were comparable to alignment-based estimates, and were less impacted by missing data. We also used Mash to analyze publicly available short-read data for three polyploid and one diploid species, then compared Mash results to published results. For both simulated and real data, Mash accurately estimated pairwise genetic differences for polyploids as well as diploids as much as 476 times faster than alignment-based methods, though we found that Mash genetic distance estimates could be biased by per-sample read depth. Mash may be a particularly useful addition to the toolkit of polyploid geneticists for rapid confirmation of alignment-based results and for basic population genetics in reference-free systems with poor quality DNA.

中文翻译:

多倍体基因组的无比对方法:快速可靠的遗传距离估算

多倍体基因组对种群遗传分析提出了一些固有的挑战。尽管基于比对的方法从根本上限制了它们对多倍体的适用性,但无比对方法却绕过了这些限制中的大多数。我们调查了Mash(k-mer分析工具)的使用,该工具使用MinHash方法来降低大型基因组数据集中的复杂性,用于多倍体序列的基本群体遗传分析。我们测量了Mash在具有不同水平的缺失数据的模拟二倍体和多倍体短读序列中正确估计成对遗传距离的程度。基于Mash的遗传距离估计与基于比对的估计相当,并且受缺失数据的影响较小。我们还使用Mash分析了三种多倍体和一种二倍体物种的公开短读数据,然后将Mash结果与已发布结果进行比较。对于模拟和真实数据,Mash准确估计多倍体和二倍体的成对遗传差异比基于比对的方法快476倍,尽管我们发现Mash遗传距离估计可能因每个样本的读取深度而有偏差。混搭可能是多倍体遗传学家工具包中特别有用的补充,可用于快速确认基于比对的结果以及DNA质量较差的无参考系统中的基本群体遗传学。
更新日期:2020-10-27
down
wechat
bug