当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coefficient tree regression for generalized linear models
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2021-07-02 , DOI: 10.1002/sam.11534
Özge Sürer 1 , Daniel W. Apley 1 , Edward C. Malthouse 1
Affiliation  

Large regression data sets are now commonplace, with so many predictors that they cannot or should not all be included individually. In practice, derived predictors are relevant as meaningful features or, at the very least, as a form of regularized approximation of the true coefficients. We consider derived predictors that are the sum of some groups of individual predictors, which is equivalent to predictors within a group sharing the same coefficient. However, the groups of predictors are usually not known in advance and must be discovered from the data. In this paper we develop a coefficient tree regression algorithm for generalized linear models to discover the group structure from the data. The approach results in simple and highly interpretable models, and we demonstrated with real examples that it can provide a clear and concise interpretation of the data. Via simulation studies under different scenarios we showed that our approach performs better than existing competitors in terms of computing time and predictive accuracy.

中文翻译:

广义线性模型的系数树回归

大型回归数据集现在司空见惯,预测变量太多,不能或不应该单独包括在内。在实践中,派生的预测变量与有意义的特征相关,或者至少作为真实系数的正则化近似形式。我们考虑派生的预测变量是一些个体预测变量组的总和,这相当于一组内的预测变量共享相同的系数。但是,预测变量组通常事先不知道,必须从数据中发现。在本文中,我们为广义线性模型开发了一种系数树回归算法,以从数据中发现组结构。该方法产生简单且高度可解释的模型,我们用真实的例子证明了它可以提供对数据的清晰简洁的解释。通过不同场景下的模拟研究,我们表明我们的方法在计算时间和预测准确性方面的表现优于现有竞争对手。
更新日期:2021-07-02
down
wechat
bug