Journal of Applied Statistics ( IF 1.2 ) Pub Date : 2020-11-18 Kuangnan Fang, Peng Wang, Xiaochen Zhang, Qingzhao Zhang
ABSTRACT
In the application of high-dimensional data classification, several attempts have been made to achieve variable selection by replacing the -penalty with other penalties for the support vector machine (SVM). However, these high-dimensional SVM methods usually do not take into account the special structure among covariates (features). In this article, we consider a classification problem, where the covariates are ordered in some meaningful way, and the number of covariates p can be much larger than the sample size n. We propose a structured sparse SVM to tackle this type of problems, which combines the non-convex penalty and cubic spline estimation procedure (i.e. penalizing second-order derivatives of the coefficients) to the SVM. From a theoretical point of view, the proposed method satisfies the local oracle property. Simulations show that the method works effectively both in feature selection and classification accuracy. A real application is conducted to illustrate the benefits of the method.
中文翻译:
具有有序特征的结构化稀疏支持向量机
摘要
在高维数据分类的应用中,已经进行了多种尝试,以通过替换变量来实现变量选择。 -支持向量机(SVM)的其他惩罚。但是,这些高维SVM方法通常不考虑协变量(特征)之间的特殊结构。在本文中,我们考虑一个分类问题,其中协变量以某种有意义的方式排序,并且协变量p的数量可能远大于样本大小n。我们提出了一种结构化的稀疏SVM来解决此类问题,该方法将非凸罚分和三次样条估计程序(即惩罚系数的二阶导数)组合到了SVM中。从理论上讲,所提出的方法满足了本地的oracle属性。仿真表明,该方法在特征选择和分类精度上均有效。进行了实际应用以说明该方法的好处。