Penalized Regression and Risk Prediction in Genome-Wide Association Studies.,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Penalized Regression and Risk Prediction in Genome-Wide Association Studies.
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2013-02-22 , DOI: 10.1002/sam.11183
Erin Austin ₁ , Wei Pan ₁ , Xiaotong Shen ₂

Affiliation

An important task in personalized medicine is to predict disease risk based on a person's genome, e.g. on a large number of single‐nucleotide polymorphisms (SNPs). Genome‐wide association studies (GWAS) make SNP and phenotype data available to researchers. A critical question for researchers is how to best predict disease risk. Penalized regression equipped with variable selection, such as least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD), is deemed to be promising in this setting. However, the sparsity assumption taken by the LASSO, SCAD, and many other penalized regression techniques may not be applicable here: it is now hypothesized that many common diseases are associated with many SNPs with small to moderate effects. In this article, we use the GWAS data from the Wellcome Trust Case Control Consortium (WTCCC) to investigate the performance of various unpenalized and penalized regression approaches under true sparse or non‐sparse models. We find that in general penalized regression outperformed unpenalized regression; SCAD, truncated L₁−penalty (TLP), and LASSO performed best for sparse models, while elastic net regression was the winner, followed by ridge, TLP, and LASSO, for non‐sparse models. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013

中文翻译：

全基因组关联研究中的惩罚回归和风险预测。

个性化医疗的一项重要任务是根据人的基因组（例如大量的单核苷酸多态性（SNP））来预测疾病风险。全基因组关联研究 (GWAS) 为研究人员提供 SNP 和表型数据。研究人员面临的一个关键问题是如何最好地预测疾病风险。配备变量选择的惩罚回归，例如最小绝对收缩和选择算子（LASSO）和平滑剪切绝对偏差（SCAD），在这种情况下被认为是有前途的。然而，LASSO、SCAD 和许多其他惩罚回归技术所采用的稀疏性假设可能不适用于此处：现在假设许多常见疾病与许多具有小到中等影响的 SNP 相关。在本文中，我们使用 Wellcome Trust Case Control Consortium (WTCCC) 的 GWAS 数据来研究各种未惩罚和惩罚回归方法在真正稀疏或非稀疏模型下的性能。我们发现，一般来说，惩罚回归优于非惩罚回归；对于稀疏模型，SCAD、截断L ₁ -惩罚 (TLP) 和 LASSO 表现最佳，而对于非稀疏模型，弹性网络回归获胜，其次是岭、TLP 和 LASSO。 © 2013 Wiley periodicals, Inc. 统计分析和数据挖掘，2013 年

更新日期：2013-02-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11