当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression
Genetic Epidemiology ( IF 2.1 ) Pub Date : 2021-09-01 , DOI: 10.1002/gepi.22430
Célia Escribe 1, 2 , Tianyuan Lu 1, 3 , Julyan Keller-Baruch 1, 4 , Vincenzo Forgetta 1 , Bowei Xiao 1, 3 , J Brent Richards 1, 4, 5, 6 , Sahir Bhatnagar 5, 7 , Karim Oualkacha 8 , Celia M T Greenwood 1, 4, 5, 9
Affiliation  

Medical research increasingly includes high-dimensional regression modeling with a need for error-in-variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error-corrected cross-validation to enable error-in-variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high-dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross-validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate-adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error-in-variables adjustments more accessible for high-dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics-facilitated personalized medicine research.

中文翻译:

块坐标下降算法改进了误差变量回归中的变量选择和估计

医学研究越来越多地包括需要变量误差方法的高维回归建模。Convex Conditioned Lasso (CoCoLasso)利用重新制定的Lasso目标函数和纠错交叉验证来实现变量误差回归,但需要大量计算。在这里,我们开发了一个块坐标下降凸条件套索BDCoCoLasso) 用于对仅部分被测量误差破坏的高维数据进行建模的算法。该算法以迭代方式分别优化未损坏和损坏特征的估计,以降低计算成本,并具有交叉验证误差的特殊校准公式。通过模拟,我们表明BDCoCoLasso算法成功地处理了比 CoCoLasso 大得多的特征集,并且正如预期的那样,随着测量误差的强度和复杂性的增加,它在估计精度和一致性方面优于朴素Lasso 。此外,还添加了一个新的平滑裁剪绝对偏差惩罚选项,该选项可能适用于某些数据集。我们应用BDCoCoLasso从 UK Biobank 中选择的数据的算法。我们开发并展示了协变量调整遗传风险评分对体重指数、骨矿物质密度和寿命的效用。我们证明,通过在部分损坏的数据中利用比天真的Lasso更多的信息, BDCoCoLasso可以实现更高的预测精度。这些创新与 R 包BDCoCoLasso 一起,使高维数据集更容易访问变量中的错误调整。我们认为BDCoCoLasso算法具有广泛应用于各个领域的潜力,包括基因组学促进的个性化医学研究。
更新日期:2021-09-01
down
wechat
bug