当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ancestry inference and grouping from principal component analysis of genetic data
bioRxiv - Genetics Pub Date : 2020-10-26 , DOI: 10.1101/2020.10.06.328203
Florian Privé

Here we propose a simple, robust and effective method for global ancestry inference and grouping from Principal Component Analysis (PCA) of genetic data. The proposed approach is particularly useful for methods that need to be applied in homogeneous samples. First, we show that Euclidean distances in the PCA space are proportional to $F_{ST}$ between populations. Then, we show how to use this PCA-based distance to infer ancestry in the UK Biobank and the POPRES datasets. We propose two solutions, either relying on projection of PCs to reference populations such as from the 1000 Genomes Project, or by directly using the internal data. Finally, we conclude that our method and the community would benefit from having an easy access to a reference dataset with an even better coverage of the worldwide genetic diversity than the 1000 Genomes Project.

中文翻译:

从遗传数据的主成分分析进行祖先推断和分组

在这里,我们为遗传数据的主成分分析(PCA)提出了一种简单,可靠且有效的全局祖先推断和分组方法。对于需要在均质样品中应用的方法,建议的方法特别有用。首先,我们证明PCA空间中的欧几里得距离与总体之间的$ F_ {ST} $成正比。然后,我们展示了如何使用这种基于PCA的距离来推断UK Biobank和POPRES数据集中的血统。我们提出了两种解决方案,要么依靠将PC投影到参考人群(例如1000个基因组计划),要么直接使用内部数据。最后,我们得出的结论是,与1000个基因组计划相比,轻松访问参考数据集可以更好地覆盖全球遗传多样性,我们的方法和社区将从中受益。
更新日期:2020-10-27
down
wechat
bug