当前位置: X-MOL 学术Ann. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluation of methods for adjusting population stratification in genome‐wide association studies: Standard versus categorical principal component analysis
Annals of Human Genetics ( IF 1.9 ) Pub Date : 2019-07-19 , DOI: 10.1111/ahg.12339
Asuman S Turkmen 1 , Yuan Yuan 2 , Nedret Billor 2
Affiliation  

Unaccounted population stratification can lead to false‐positive findings and can mask the true association signals in identification of disease‐related genetic variants. The computational simplicity of principal component analysis (PCA) makes it a widely used method for population stratification adjustment. However, given that genotype data are generally represented by numerical values 0, 1, and 2, corresponding to the number of minor alleles, it is more reasonable to consider genotype data as categorical data. Because PCA is inherently only suitable for continuous variables, it is not appropriate to directly apply PCA on genotype data. Second, although common variants have been extensively studied, little is known about the stratification of rare variants and its impact on association tests. Over the last decade, there has been a shift in the genome‐wide association studies toward studying low‐frequency (minor allele frequency [MAF] between 0.01 and 0.05) and rare (MAF less than 0.01) variants, which are now widely reputed as complex trait determinants. The fact that rare variants are not stratified in the same way as common variants necessitates the development of statistical methods that can capture stratification patterns for low‐frequency and rare variants. To address these limitations, we investigate performances of generalized PCA and similarity‐matrix‐based PCA methods to detect underlying structures for rare and common variants. We demonstrate, through simulated and real datasets, that a special case of generalized PCA (i.e., logistic PCA) is able to adjust for population stratification in rare variants much more effectively than standard PCA while their performances are comparable for common variants.

中文翻译:

全基因组关联研究中调整种群分层方法的评估:标准与分类主成分分析

未解释的人群分层可能导致假阳性结果,并可能掩盖疾病相关遗传变异识别中的真实关联信号。主成分分析 (PCA) 的计算简单性使其成为一种广泛使用的人口分层调整方法。但是,鉴于基因型数据一般用数值 0、1 和 2 表示,对应于次要等位基因的数量,将基因型数据视为分类数据更为合理。由于 PCA 本质上只适用于连续变量,因此直接将 PCA 应用于基因型数据是不合适的。其次,虽然常见变异已被广泛研究,但对罕见变异的分层及其对关联测试的影响知之甚少。在过去十年中,全基因组关联研究已经转向研究低频(次要等位基因频率 [MAF] 在 0.01 和 0.05 之间)和稀有(MAF 小于 0.01)变异,这些变异现在被广泛认为是复杂的性状决定因素。罕见变异的分层方式与常见变异不同,这一事实需要开发能够捕捉低频和罕见变异的分层模式的统计方法。为了解决这些限制,我们研究了广义 PCA 和基于相似性矩阵的 PCA 方法的性能,以检测罕见和常见变异的潜在结构。我们通过模拟和真实数据集证明了广义 PCA 的一个特殊情况(即,
更新日期:2019-07-19
down
wechat
bug