当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On polygenic risk scores for complex traits prediction
Biometrics ( IF 1.9 ) Pub Date : 2021-03-31 , DOI: 10.1111/biom.13466
Bingxin Zhao 1 , Fei Zou 1
Affiliation  

Polygenic risk scores (PRS) have gained substantial attention for complex traits prediction in genome-wide association studies (GWAS). Motivated by the polygenic model of complex traits, we study the statistical properties of PRS under the high-dimensional but sparsity free setting where the triplet ( n , p , m ) ( , , ) $(n,p,m) \rightarrow (\infty , \infty , \infty )$ with n , p , m $n, p, m$ being the sample size, the number of assayed single-nucleotide polymorphisms (SNPs), and the number of assayed causal SNPs, respectively. First, we derive asymptotic results on the out-of-sample (prediction) R-squared for PRS. These results help understand the widespread observed gap between the in-sample heritability (or partial R-squared due to the genetic features) estimate and the out-of-sample R-squared for most complex traits. Next, we investigate how features should be selected (e.g., by a p-value threshold) for constructing optimal PRS. We reveal that the optimal threshold depends largely on the genetic architecture underlying the complex trait and the sample size of the training GWAS, or the m / n $m/n$ ratio. For highly polygenic traits with a large m / n $m/n$ ratio, it is difficult to separate causal and null SNPs and stringent feature selection in principle often leads to poor PRS prediction. We numerically illustrate the theoretical results with intensive simulation studies and real data analysis on 33 complex traits with a wide range of genetic architectures in the UK Biobank database.

中文翻译:

关于复杂性状预测的多基因风险评分

多基因风险评分(PRS)在全基因组关联研究(GWAS)中的复杂性状预测方面引起了广泛关注。受复杂性状多基因模型的启发,我们研究了高维但无稀疏设置下 PRS 的统计特性,其中三元组 ( n , p , ) ( , , ) $(n,p,m) \rightarrow (\infty , \infty , \infty )$ n , p , $n, p, m$ 分别是样本大小、分析的单核苷酸多态性 (SNP) 的数量和分析的因果 SNP 的数量。首先,我们得出PRS的样本外(预测) R平方的渐近结果。这些结果有助于理解样本内遗传力(或由于遗传特征导致的部分R平方)估计值与大多数复杂性状的样本外R平方之间普遍观察到的差距。接下来,我们研究应该如何选择特征(例如,通过p值阈值)来构建最佳 PRS。我们揭示了最佳阈值在很大程度上取决于复杂性状背后的遗传结构和训练 GWAS 的样本量,或者 / n $m/n$ 比率。对于具有大的多基因性状 / n $m/n$ 比率,很难区分因果 SNP 和无效 SNP,原则上严格的特征选择通常会导致 PRS 预测不佳。我们通过对 UK Biobank 数据库中具有广泛遗传结构的 33 个复杂性状的深入模拟研究和真实数据分析,对理论结果进行了数值说明。
更新日期:2021-03-31
down
wechat
bug