当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies.
Genetic Epidemiology ( IF 2.1 ) Pub Date : 2020-03-19 , DOI: 10.1002/gepi.22290
James J Fryett 1 , Andrew P Morris 2 , Heather J Cordell 1
Affiliation  

In transcriptome‐wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods—LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model, and Random Forests—by performing cross‐validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, the expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and the development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power.

中文翻译:

研究预测准确性以及样本量、血统和组织在全转录组关联研究中的影响。

在全转录组关联研究(TWAS)中,使用基因型数据预测基因表达值并测试与表型的关联。这种检测关联的方法的能力至少部分依赖于预测的准确性。在这里,我们通过使用 Geuvadis 项目的数据进行交叉验证,比较了六种不同方法(LASSO、岭回归、弹性网络、最佳线性无偏预测器、贝叶斯稀疏线性混合模型和随机森林)的预测精度。我们还检查了(a)不同样本量下的预测准确性,(b)当预测模型训练和测试群体的祖先不同时,以及(c)当用于训练模型的组织与要预测的组织不同时。我们发现,对于大多数基因来说,表达量无法准确预测,但一般来说,稀疏统计模型在预测方面往往优于多基因模型。当模型训练集大小减小或跨祖先预测时,平均预测精度会降低,而当跨组织预测时,平均预测精度会略有降低。我们的结论是,使用稀疏统计模型和开发跨多个种族和组织的大型参考面板将更好地预测基因表达,从而可能提高 TWAS 能力。
更新日期:2020-03-19
down
wechat
bug