当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Binacox: automatic cut-point detection in high-dimensional Cox model with applications in genetics
Biometrics ( IF 1.9 ) Pub Date : 2021-08-18 , DOI: 10.1111/biom.13547
Simon Bussy 1, 2 , Mokhtar Z Alaya 3 , Anne-Sophie Jannot 4 , Agathe Guilloux 5
Affiliation  

We introduce binacox, a prognostic method to deal with the problem of detecting multiple cut-points per feature in a multivariate setting where a large number of continuous features are available. The method is based on the Cox model and combines one-hot encoding with the binarsity penalty, which uses total-variation regularization together with an extra linear constraint, and enables feature selection. Original nonasymptotic oracle inequalities for prediction (in terms of Kullback–Leibler divergence) and estimation with a fast rate of convergence are established. The statistical performance of the method is examined in an extensive Monte Carlo simulation study, and then illustrated on three publicly available genetic cancer data sets. On these high-dimensional data sets, our proposed method outperforms state-of-the-art survival models regarding risk prediction in terms of the C-index, with a computing time orders of magnitude faster. In addition, it provides powerful interpretability from a clinical perspective by automatically pinpointing significant cut-points in relevant variables.

中文翻译:

Binacox:高维 Cox 模型中的自动切点检测及其在遗传学中的应用

我们介绍了binacox,这是一种预后方法,用于处理在具有大量连续特征的多元设置中检测每个特征的多个切点的问题。该方法基于 Cox 模型,将 one-hot encoding 与binarsity相结合penalty,它使用全方差正则化和额外的线性约束,并启用特征选择。建立了用于预测(根据 Kullback-Leibler 散度)和具有快速收敛速度的估计的原始非渐近 oracle 不等式。该方法的统计性能在广泛的蒙特卡罗模拟研究中进行了检验,然后在三个公开可用的遗传癌症数据集上进行了说明。在这些高维数据集上,我们提出的方法在 C 指数方面优于最先进的关于风险预测的生存模型,计算时间更快几个数量级。此外,它通过自动查明相关变量中的重要切点,从临床角度提供强大的可解释性。
更新日期:2021-08-18
down
wechat
bug