当前位置: X-MOL 学术J. R. Stat. Soc. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modelling high-dimensional categorical data using nonconvex fusion penalties
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 3.1 ) Pub Date : 2021-07-22 , DOI: 10.1111/rssb.12432
Benjamin G. Stokell 1 , Rajen D. Shah 1 , Ryan J. Tibshirani 2
Affiliation  

We propose a method for estimation in high-dimensional linear models with nominal categorical data. Our estimator, called SCOPE, fuses levels together by making their corresponding coefficients exactly equal. This is achieved using the minimax concave penalty on differences between the order statistics of the coefficients for a categorical variable, thereby clustering the coefficients. We provide an algorithm for exact and efficient computation of the global minimum of the resulting nonconvex objective in the case with a single variable with potentially many levels, and use this within a block coordinate descent procedure in the multivariate case. We show that an oracle least squares solution that exploits the unknown level fusions is a limit point of the coordinate descent with high probability, provided the true levels have a certain minimum separation; these conditions are known to be minimal in the univariate case. We demonstrate the favourable performance of SCOPE across a range of real and simulated datasets. An R package CatReg implementing SCOPE for linear models and also a version for logistic regression is available on CRAN.

中文翻译:

使用非凸融合惩罚对高维分类数据建模

我们提出了一种在具有名义分类数据的高维线性模型中进行估计的方法。我们的估计器称为 SCOPE,通过使它们的相应系数完全相等来将级别融合在一起。这是通过对分类变量的系数的阶次统计之间的差异使用最小最大凹惩罚来实现的,从而对系数进行聚类。我们提供了一种算法,用于在具有可能具有多个级别的单个变量的情况下精确有效地计算所得非凸目标的全局最小值,并在多变量情况下在块坐标下降过程中使用该算法。我们表明,利用未知级别融合的 oracle 最小二乘解是坐标下降的极限点,概率很高,只要真实水平有一定的最小间隔;已知这些条件在单变量情况下是最小的。我们证明了 SCOPE 在一系列真实和模拟数据集上的良好性能。一个 R 包CatReg为线性模型实现了 SCOPE,还有一个用于逻辑回归的版本在 CRAN 上可用。
更新日期:2021-07-30
down
wechat
bug