当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Alignment-free methods for polyploid genomes: Quick and reliable genetic distance estimation
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-09-03 , DOI: 10.1111/1755-0998.13499
Acer VanWallendael 1 , Mariano Alvarez 2
Affiliation  

Polyploid genomes pose several inherent challenges to population genetic analyses. While alignment-based methods are fundamentally limited in their applicability to polyploids, alignment-free methods bypass most of these limits. We investigated the use of Mash, a k-mer analysis tool that uses the MinHash method to reduce complexity in large genomic data sets, for basic population genetic analyses of polyploid sequences. We measured the degree to which Mash correctly estimated pairwise genetic distance in simulated haploid and polyploid short-read sequences with various levels of missing data. Mash-based estimates of genetic distance were comparable to alignment-based estimates, and were less impacted by missing data. We also used Mash to analyse publicly available short-read data for three polyploid and one diploid species, then compared Mash results to published results. For both simulated and real data, Mash accurately estimated pairwise genetic differences for polyploids as well as diploids as much as 476 times faster than alignment-based methods, though we found that Mash genetic distance estimates could be biased by per-sample read depth. Mash may be a particularly useful addition to the toolkit of polyploid geneticists for rapid confirmation of alignment-based results and for basic population genetics in reference-free systems or those with only poor-quality sequence data available.

中文翻译:

多倍体基因组的无比对方法:快速可靠的遗传距离估计

多倍体基因组对群体遗传分析提出了几个固有的挑战。虽然基于比对的方法从根本上限制了它们对多倍体的适用性,但无比对方法绕过了大多数这些限制。我们研究了Mash的使用,这是一种 k-mer 分析工具,它使用 MinHash 方法来降低大型基因组数据集的复杂性,用于多倍体序列的基本群体遗传分析。我们测量了Mash在具有不同缺失数据水平的模拟单倍体和多倍体短读序列中正确估计成对遗传距离的程度。基于Mash的遗传距离估计与基于比对的估计相当,并且受缺失数据的影响较小。我们还使用了 Mash分析三种多倍体和一种二倍体物种的公开可用短读数据,然后将Mash结果与已发表的结果进行比较。对于模拟数据和真实数据,Mash准确估计多倍体和二倍体的成对遗传差异比基于比对的方法快 476 倍,尽管我们发现Mash遗传距离估计可能因每个样本读取深度而有偏差。Mash可能是多倍体遗传学家工具包的一个特别有用的补充,用于快速确认基于比对的结果,以及在无参考系统或只有低质量序列数据可用的系统中的基本群体遗传学。
更新日期:2021-09-03
down
wechat
bug