当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimized variable selection via repeated data splitting.
Statistics in Medicine ( IF 2 ) Pub Date : 2020-04-13 , DOI: 10.1002/sim.8538
Marinela Capanu 1 , Mihai Giurcanu 2 , Colin B Begg 1 , Mithat Gönen 1
Affiliation  

Model selection in high‐dimensional settings has received substantial attention in recent years, however, similar advancements in the low‐dimensional setting have been lacking. In this article, we introduce a new variable selection procedure for low to moderate scale regressions (n >p ). This method repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. In an extensive simulation study, we show that the proposed variable selection technique enjoys superior performance compared with candidate methods (backward elimination via repeated data splitting, univariate screening at 0.05 level, adaptive LASSO, SCAD), being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy at our institution.

中文翻译:

通过重复数据拆分优化变量选择。

近年来,高维设置中的模型选择受到了广泛关注,然而,低维设置中的类似进展一直缺乏。在本文中,我们介绍了一种新的变量选择程序,用于中小规模回归 ( n > p)。该方法反复将数据分成两组,一组用于估计,一组用于验证,以获得经验优化的阈值,然后用于筛选要包含在最终模型中的变量。在广泛的模拟研究中,我们表明所提出的变量选择技术与候选方法(通过重复数据拆分的向后消除、0.05 水平的单变量筛选、自适应 LASSO、SCAD)相比具有优越的性能,是包含噪声最低的方法之一。预测变量,同时具有检测正确模型的最高能力,并且不受预测变量之间相关性的影响。我们通过将这些方法应用于我们机构接受肝切除术的一组患者来说明这些方法。
更新日期:2020-04-13
down
wechat
bug