Polyhedral separation via difference of convex (DC) programming

Astorino, Annabella; Francesco, Massimo Di; Gaudioso, Manlio; Gorgone, Enrico; Manca, Benedetto

doi:10.1007/s00500-021-05758-6

Polyhedral separation via difference of convex (DC) programming

Focus
Open access
Published: 07 April 2021

Volume 25, pages 12605–12613, (2021)
Cite this article

Download PDF

You have full access to this open access article

Soft Computing Aims and scope Submit manuscript

Polyhedral separation via difference of convex (DC) programming

Download PDF

Annabella Astorino¹,
Massimo Di Francesco ORCID: orcid.org/0000-0002-3861-9664²,
Manlio Gaudioso¹,
Enrico Gorgone² &
…
Benedetto Manca²

1062 Accesses
4 Citations
Explore all metrics

Abstract

We consider polyhedral separation of sets as a possible tool in supervised classification. In particular, we focus on the optimization model introduced by Astorino and Gaudioso (J Optim Theory Appl 112(2):265–293, 2002) and adopt its reformulation in difference of convex (DC) form. We tackle the problem by adapting the algorithm for DC programming known as DCA. We present the results of the implementation of DCA on a number of benchmark classification datasets.

Separation of Finitely Many Convex Sets and Data Pre-classification

Strict Polynomial Separation of Two Sets

Article 01 April 2019

V. N. Malozemov & A. V. Plotkin

Convex Hulls in Solving Multiclass Pattern Recognition Problem

1 Introduction

The classification of an object is a decision-making process whose outcome is the assignment of a specific class membership to the object under observation. Medical diagnosis (Mangasarian et al. 1995), chemistry (Jurs 1986), cybersecurity (Astorino et al. 2017), image processing (Khalaf et al. 2017) are only some of the possible application areas of classification.

Each object (sample) is characterized by a finite number of (quantitative and/or qualitative) attributes, usually referred to as the features.

Construction of a classifier is a supervised learning activity, where a dataset of samples, whose class membership is known in advance, is given as the input. The objective is to gather a mathematical model capable to correctly classify newly incoming samples whose class membership is, instead, unknown.

Classification deals, mainly, with separation of sets of samples in the feature space, which is assumed to be ${\mathbb {R}}^n$. Whenever classes are two, we are faced with a binary classification problem. In this paper, in fact, the training dataset is partitioned into two subsets, say $\mathcal {A}$ and $\mathcal {B}$, and thus, the problem consists in finding a separation surface, if any, between them.

Most of the models rely on setting an appropriate optimization problem whose output is either a separating surface or a nearly separating one, resulting in the minimization of some measure of the classification error.

Starting from the pioneering works by Bennett and Mangasarian (1992) and (Vapnik 1995), the hyperplane has been considered as the election surface to be looked for, although the use of nonlinear separation surfaces has been pursued too by Rosen (1965), Plastria et al. (2014) and Astorino and Gaudioso (2005, 2009).

The literature on classification is huge. We cite Vapnik (1995), Cristianini and Shawe-Taylor (2000), Schölkopf et al. (1999) and Sra et al. (2011) as basic references in support vector machine (SVM) framework and Thongsuwan et al. (2020) as a recent approach in the deep learning.

It is well known (see, e.g., Cristianini and Shawe-Taylor (2000)) that if the convex hulls of the two sets $\mathcal {A}$ and $\mathcal {B}$ do not intersect, there exists a separating hyperplane such that the set $\mathcal {A}$ is on one side of such hyperplane and $\mathcal {B}$ is on the other side. It can be calculated by linear programming Bennett and Mangasarian (1992), and the two sets are referred to as linearly separable. On the other hand, if $\mathrm{conv}(\mathcal {A}) \cap \mathrm{conv}(\mathcal {B}) \ne \emptyset $, a number of algorithms can be adopted to determine a quasi-separating hyperplane such that the related error functions are minimized (see, for example, Mangasarian (1999)).

In this paper, we deal with binary classification based on the use of a polyhedral surface. The concept of polyhedral separability was introduced in Megiddo (1988) and applied within the classification framework in Astorino and Gaudioso (2002) and Astorino and Fuduli (2015).

Whenever, in fact, the two sets $\mathcal {A}$ and $\mathcal {B}$ are not linearly separable, it is possible to resort to polyhedral separation, that is to determine $h>1$ hyperplanes such that $\mathcal {A}$ is in the convex polyhedron given by the intersection of h half-spaces and $\mathcal {B}$ lies outside such polyhedron.

In Astorino and Gaudioso (2002), an optimization model was proposed to calculate a set of h hyperplanes generating a polyhedral separation, whenever possible, for the sets $\mathcal {A}$ and $\mathcal {B}$. The model consists, as usual in classification, in minimizing an error function to cope with the case when the two sets are not h-polyhedrally separable. Parallel to SVM, the model was extended in Astorino and Fuduli (2015) to accommodate for margin maximization.

The error function adopted in Astorino and Gaudioso (2002) is neither convex nor concave, and it was dealt with by means of successive linearizations.

In this paper, we focus on the numerical treatment of the optimization problem to be solved in order to get a polyhedral separation surface. In particular, we fully exploit the DC (difference of convex) nature of the objective function (Hiriart-Urruty 1986), and thus, differently from Astorino and Gaudioso (2002), we adopt an algorithm designed to treat DC functions. In fact, the literature provides a wide set of efficient algorithms in this area nowadays.

In Pham Dinh (2005), Pham Dinh and Le Thi Hoai (2014), an iterative algorithm was introduced to minimize functions of the form $f = f_1-f_2$, with g and h convex functions. The algorithm, called DCA, considers at iteration t the linearization $f_2^{(t)}$ of function $f_2$ at point $x^t$ and determines the next iterate $x^{t+1}$ as an optimal solution of the convex problem

$$\begin{aligned} \min _x f_1(x)-f_2^{(t)}(x) \end{aligned}$$

(1)

DCA has proven to be an efficient method to tackle DC problems, even non-smooth and different artificial intelligence problems have been approached by means of the DCA (Astorino et al. 2010, 2012, 2014; Astorino 2014; Khalaf et al. 2017).

Several other methods have been more recently proposed in the literature (see Gaudioso et al. (2018) and Joki et al. (2017)) which allow to solve large-scale DC programs too.

In this paper, we build on the model Astorino and Gaudioso (2002) and adopt a decomposition of the error function for the h-polyhedral separability problem as the difference of two convex functions. We then apply the DCA to carry out an extensive experimentation on several classes of benchmark instances.

The paper is organized as follows. In Sect. 2, we describe the h-polyhedral classification model and its reformulation as a difference of convex optimization problem. In Sect. 3, we describe how the DCA has been adapted to the DC reformulations. In Sect. 4, we present the results of our implementation on a number of benchmark classification problems. Some conclusions are drawn in Sect. 5.

2 The polyhedral separability model

Let $\mathcal {A}=\{a_1, \dots , a_m\}$ and $\mathcal {B}=\{b_1, \dots , b_k\}$ be two finite sets of ${\mathbb {R}}^n$.

Definition 1

The sets $\mathcal {A}$ and $\mathcal {B}$ are h-polyhedrally separable if there exist a set of h hyperplanes $\{(v^j, \eta ^j)\}$, $v^j \in {\mathbb {R}}^n$, $\eta ^j \in {\mathbb {R}}$, $j=1,\ldots ,h$, such that

$$\begin{aligned} \begin{aligned} a_i^Tv^j&< \eta ^j \quad \forall i=1, \dots , m, j=1, \dots , h\\ b_l^Tv^j&> \eta ^j \quad \forall l=1, \dots , k, \hbox { and at least one}\ j \in \{1, \dots , h\} \end{aligned} \end{aligned}$$

(2)

The following proposition gives an equivalent characterization of h-polyhedral separability:

Proposition 1

The sets $\mathcal {A}$ and $\mathcal {B}$ are h-polyhedrally separable if and only if there exist h hyperplanes $\{(w^j, \gamma ^j)\}$ such that

$$\begin{aligned} \begin{aligned} a_i^Tw^j&\le \gamma ^j-1 \quad \forall i=1, \dots , m, j=1, \dots , h\\ b_l^Tw^j&\ge \gamma ^j +1\quad \forall l=1, \dots , k, \hbox { and at least one}\ j \in \{1, \dots , h\}. \end{aligned} \end{aligned}$$

(3)

Proof

[Astorino and Gaudioso (2002), Proposition 2.1] $\square $

Moreover, in Astorino and Gaudioso (2002) [Proposition 2.2] it is proven that a necessary and sufficient condition for the sets $\mathcal {A}$ and $\mathcal {B}$ to be h-polyhedrally separable (for some $h \le |\mathcal {B}|$) is given by

$$\begin{aligned} \mathrm{conv}(\mathcal {A}) \cap \mathcal {B} = \emptyset . \end{aligned}$$

(4)

Remark 1

The roles of $\mathcal {A}$ and $\mathcal {B}$ in (4) are not symmetric.

According to Proposition 1, a point $a_i \in \mathcal {A}$ is well classified by the set of hyperplanes $\{w^j, \gamma ^j\}$ if $a_i^Tw^j - \gamma ^j + 1 \le 0$ for all $j=1, \dots , h$. Therefore, we can compute the classification error of the point $a_i$ with respect to $\{w^j, \gamma ^j\}$ as

$$\begin{aligned} \max _{1 \le j \le h}\{0, a_i^Tw^j - \gamma ^j + 1\}. \end{aligned}$$

(5)

Analogously, if $\min _{1\le j \le h}\{-b_l^Tw^j + \gamma ^j +1\} \le 0$, a point $b_l \in \mathcal {B}$ is well classified by the set $\{w^j, \gamma ^j\}$. Thus, the error of classification of the point $b_l$ is

$$\begin{aligned} \max \left\{ 0, \min _{1\le j \le h}\{-b_l^Tw^j + \gamma ^j +1\}\right\} . \end{aligned}$$

(6)

Given a set of h hyperplanes $\{w^j, \gamma ^j\}$, we denote with $W = [w^1: \cdots : w^h]$ the matrix whose jth column is the vector $w^j$ and with $\varGamma = (\gamma ^1, \dots , \gamma ^h)$ the vector whose components are the $\gamma ^j$’s. The classification error function for the h-polyhedral separability problem for the sets $\mathcal {A}$ and $\mathcal {B}$, with respect to the hyperplanes $\{w^j, \gamma ^j\}$, is then given by

$$\begin{aligned} e(W, \varGamma ) := e_1(W, \varGamma )+e_2(W, \varGamma ), \end{aligned}$$

(7)

where

$$\begin{aligned} e_1(W, \varGamma ) := \frac{1}{m}\sum _{i = 1}^m \max _{1\le j \le h}\{\max \{0, a_i^Tw^j - \gamma ^j + 1\}\} \end{aligned}$$

(8)

and

$$\begin{aligned} e_2(W, \varGamma ) := \frac{1}{k}\sum _{l=1}^k\max \left\{ 0, \min _{1\le j\le h}\{-b_l^Tw^j + \gamma ^j + 1\}\right\} \end{aligned}$$

(9)

represent the errors for points of $\mathcal {A}$ and $\mathcal {B}$, respectively. Function $e(W, \varGamma )$ is nonnegative and piecewise affine; $e_1(W, \varGamma )$ is convex, and $e_2(W, \varGamma )$ is quasi-concave; moreover, in Astorino and Gaudioso (2002) it has also been proven that the sets $\mathcal {A}$ and $\mathcal {B}$ are h-polyhedrally separable if and only if there exists a set of h hyperplanes $(W^*, \varGamma ^*)$ such that $e(W^*, \varGamma ^*) = 0$ and, in that case, $w^j=0$ for all $j=1, \dots , h$ cannot be the optimal solution.

In Astorino and Gaudioso (2002), the problem of minimizing the error function (7) is tackled by solving, at each iteration, a linear program providing a descent direction. Here, instead, we rewrite $e(W, \varGamma )$ as difference of convex functions, and then, we address its minimization through ad hoc DC techniques.

To obtain such a reformulation, considert the following identity, which is valid for any set of h affine functions $z^j(x)$, $j=1, \dots , h$:

$$\begin{aligned} \begin{aligned} \max (0, \min _j z^j(x))&= \max (0, -\max _j(-z^j(x)))\\&= \max (0, \max _j(-z^j(x))) - \max _j(-z^j(x)). \end{aligned} \end{aligned}$$

(10)

By applying (10) to $e_2(W, \varGamma )$, we obtain the following DC decomposition of $e(W, \varGamma )$:

$$\begin{aligned} e(W, \varGamma ) = {\hat{e}}_1(W, \varGamma ) - {\hat{e}}_2(W, \varGamma ), \end{aligned}$$

(11)

where both

$$\begin{aligned} {\hat{e}}_1(W, \varGamma ) = e_1(W, \varGamma ) + \frac{1}{k}\sum _{l=1}^k \max [0, \max _j(b_l^Tw^j - \gamma ^j-1)]\}\nonumber \\ \end{aligned}$$

(12)

and

$$\begin{aligned} {\hat{e}}_2(W, \varGamma ) = \frac{1}{k}\sum _{l=1}^k\{\max _j(b_l^Tw^j - \gamma ^j - 1)\} \end{aligned}$$

(13)

are convex.

The DC decomposition (11) has been already discussed in Strekalovsky et al. (2015), where the authors suggested an algorithm that combines a local and a global search in order to find a global minimum of the error function. In the numerical experience we are going to discuss in the next sections, we confine ourselves to find just local minima of the error functions involved.

3 Exploiting the function structure in the DCA implementation

We have applied DCA to the minimization of the error function (11). Before discussing our experiment setting, we describe how we have adapted DCA to deal with polyhedral separation applied to a number of datasets from the classification literature.

At iteration t, in any possible configuration $(W_t, \varGamma _t)$ of the h hyperplanes we can calculate for each $l=1,\ldots ,k$ the index $j_l$ where the maximum in (13) is achieved:

$$\begin{aligned} j_l= \arg \max _{1\le j\le h}(b_l^Tw_{(t)}^{j}-\gamma _{(t)}^{j}- 1), \quad l=1, \dots , k \end{aligned}$$

(14)

and we define consequently the linearization of function $\hat{e}_2$ at iteration t:

$$\begin{aligned} \hat{e}_2^t(W, \varGamma ) := \frac{1}{k}\sum _{l=1}^k(b_l^Tw^{j_l} - \gamma ^{j_l} - 1) \end{aligned}$$

(15)

which satisfies ${\hat{e}}_2^t(W, \varGamma ) \le \hat{e}_2(W, \varGamma )$. We observe that the choice of the index (14) means selecting a subgradient $g_t\in \partial {\hat{e}}_2(W_t, \varGamma _t)$, which yields the construction of (15), the linearization function of ${\hat{e}}_2(W_t, \varGamma _t)$, according to the DCA scheme.

Then, we consider the convexification of the original DC function:

$$\begin{aligned} e^t(W, \varGamma ) = \hat{e}_1(W, \varGamma ) - \hat{e}_2^t(W, \varGamma ), \end{aligned}$$

(16)

so that next configuration $(W_{t+1}, \varGamma _{t+1})$ is obtained by solving the convex program

$$\begin{aligned} \displaystyle \min _{W, \varGamma } e^t(W, \varGamma ),\end{aligned}$$

which in turn can be put in the form of the following linear program, thanks to the introduction of the additional variables $\xi _i$, $i=1,\ldots ,m$ and $\zeta _l$, $l=1,\ldots ,k$:

$$\begin{aligned} \begin{aligned}&(W_{t+1}, \varGamma _{t+1}) = \arg \min \frac{1}{m}\sum _{i=1}^m \xi _i \\&\quad + \frac{1}{k}\sum _{l=1}^k(\zeta _l - b_l^Tw^{j_l} + \gamma ^{j_l} + 1)\\&\quad \xi _i \ge 0 \quad \quad \quad \quad \quad \quad \quad i=1, \ldots , m\\&\quad \xi _i \ge a_i^Tw^j - \gamma ^j + 1, \quad j=1, \dots , h, \quad i=1, \ldots , m\\&\quad \zeta _l \ge 0 \quad \quad \quad \quad \quad \quad \quad l=1,\ldots ,k\\&\quad \zeta _l \ge b_l^Tw^j - \gamma ^j - 1, \quad j=1, \dots , h, \quad l=1,\ldots ,k. \end{aligned} \end{aligned}$$

(17)

Summing up, the DCA-based algorithm for the minimization of (11) can be stated as follows:

The above algorithm is a descent method. It is easy to verify that if $e(W_{t+1}, \varGamma _{t+1}) = e(W_{t}, \varGamma _{t})$, then $(W_{t}, \varGamma _{t})$ is a critical point, i.e.,

$$\begin{aligned} \partial {\hat{e}}_1(W_t, \varGamma _t) \cap \partial {\hat{e}}_2(W_t, \varGamma _t)\ne \emptyset \,. \end{aligned}$$

Hence, this result provides the stopping criterion at step 3. For more theoretical details and the convergence theorem, see Pham Dinh and Le Thi Hoai (2014), Pham Dinh (2005).

Table 1 Datasets

Full size table

Table 2 The number of variables/constraints

Full size table

Table 3 Training and testing correctness percentage

Full size table

Table 4 CPU time (secs)

Full size table

Table 5 Numerical results

Full size table

Following the SVM paradigm, aimed at obtaining a good generalization capability, we have added to $\hat{e}_1(W, \varGamma )$ in (11) the margin term:

$$\begin{aligned} \frac{1}{2}\sum _{j=1}^h \Vert w^j\Vert ^2, \end{aligned}$$

(18)

thus coming out with the following DC model:

$$\begin{aligned} {\bar{e}}(W, \varGamma ) = \left( \frac{C}{2}\sum _{j=1}^h \Vert w^j\Vert ^2+{\hat{e}}_1(W, \varGamma )\right) - \hat{e}_2(W, \varGamma ),\nonumber \\ \end{aligned}$$

(19)

where $C>0$ is a trade-off parameter between the two objectives of maximizing the margin and minimizing the classification error. The minimization of (19) can be addressed by DCA, too. In this case, at each iteration we have to solve a quadratic program that differs from the linear program (17) only for the quadratic margin term (18). Consequently, the algorithmic scheme is unchanged except for the step 2 where a quadratic program has to be solved.

Table 6 Percentage of correctness with h-PolSepDC ($h\ge 2$)

Full size table

4 Numerical experiments

In the numerical experiments, we have implemented two DCA codes:

h-PolSepDC, where we minimize (11) (the separation problem with no separation margin) by solving at each iteration the linear program (17);
h-PolSepDC-QP, where we minimize (19) (a margin has been accounted for) by solving at each iteration a quadratic program.

In particular, we have chosen $h=2$ according to the hyperparameter tuning performed in Astorino and Gaudioso (2002).

Moreover, since the role of the sets ${\mathcal {A}}$ and ${\mathcal {B}}$ is not symmetric in the definition of polyhedral separability, in the numerical experiments one has to define which is ${\mathcal {A}}$ and ${\mathcal {B}}$ in any dataset. So, we have called set ${\mathcal {A}}$ the one with less number of points, following, also for this issue, the rule adopted in Astorino and Gaudioso (2002).

We have used MATLAB R2015b calling CPLEX library, under a 2,6 GHz Intel Core i7 processor, on an OS X 10.12.6 operating system.

To evaluate the impact of the DC decomposition of the error function (7) with respect to the classic non-smooth optimization approach, we have reimplemented, in MATLAB, the algorithm proposed in Astorino and Gaudioso (2002) (2-PolSep code). Finally, for sake of completeness we have also used the standard MATLAB SVM package to run the linear separability classification problem (SVM-LINEAR code).

We have considered several test problems drawn from the binary classification literature which are described in Table 1. In particular, all datasets are taken from the LIBSVM (Library for Support Vector Machines) repository Chang and Lin (2011), except for g50c and g10n, which are described in Chapelle and Zien (2005).

For all datasets, we have performed a standard tenfold cross-validation protocol and in Table 2, we summarize the LP/QP problems solved at each fold, in terms of number of variables and constraints.

For each approach, in the columns train and test of Table 3 we report the average percentage of training and testing correctness, respectively. The best results in terms of testing correctness have been underlined.

A preliminary tuning for the parameter C in 2-PolSepDC-QP and SVM-LINEAR codes has been performed, and we have selected, for each dataset, that value optimizing the performance on the testing set.

Table 7 Percentage of correctness with h-PolSepDC-QP ($h\ge 2$)

Full size table

The numerical results indicate the good performance, in terms of correctness, of the DC-based approaches w.r.t. both the algorithm Astorino and Gaudioso (2002) and standard SVM. In particular, the DC model equipped with margin maximization (Code 2-PolSepDC-QP) has provided the best testing correctness in 13 out of 15 datasets. By comparing 2-PolSepDC and 2-PolSepDC-QP, we can observe that the addition of margin provides a better testing correctness except for galaxy and a9a datasets. On the contrary, 2-PolSepDC has a better performance in terms of training correctness except for Tic-Tac-Toe and letter-a datasets. This means that the classifier coming out from 2-PolSepDC-QP has a higher generalization capability.

As for computation time (see Table 4), the DC decomposition has been much more effective than 2-PolSep method. Moreover, the increase in computation time with respect to single-hyperplane separation model SVM-LINEAR has not been particularly severe.

2-PolSep and the 2-PolSepDC are different algorithms to solve the same problem, i.e., the minimization of (11). By comparing the objective function values obtained by the two codes starting from the same initial point, we note that the second approach provides better solutions, but the difference is not so significant.

Since in the definition of polyhedral separability the role of the sets ${\mathcal {A}}$ and ${\mathcal {B}}$ is not symmetric, we compare the results also in terms of recall, specificity, precision and F1-score (see Table 5). The trend of these key performance indicators confirms the goodness of the DC-based approaches.

For completeness, we have launched both the codes h-PolSepDC and h-PolSepDC-QP with $h>2$. The running time is not dramatically larger, but the numerical experiments show that there is no significant improvement in terms of correctness. Even worse, in some cases the improvement of training performance is not accompanied with an improvement of testing one. This proves that a high value of h—number of hyperplanes—provides classifiers with a smaller generalization capability. For instance, we report some results (see Tables 6 and 7).

5 Conclusions

We have adopted a difference of convex decomposition of the error function in polyhedral separation and have tackled the resulting optimization problem via DCA algorithm.

The numerical results we have obtained demonstrate the good performance of the approach both in terms of classification correctness and computation time.

Future research would investigate the integration between feature selection Gaudioso et al. (2017) and polyhedral separation aimed at detecting a possibly smaller subsets of significant attributes in terms of classification correctness.

References

Astorino A, Chiarello A, Gaudioso M, Piccolo A (2017) Malicious URL detection via spherical classification. Neural Comput Appl 28:699–705
Article Google Scholar
Astorino A, Fuduli A (2015) Support vector machine polyhedral separability in semisupervised learning. J Optim Theory Appl 164(3):1039–1050
Article MathSciNet Google Scholar
Astorino A, Fuduli A, Gaudioso M (2010) DC models for spherical separation. J Global Optim 48(4):657–669
Article MathSciNet Google Scholar
Astorino A, Fuduli A, Gaudioso M (2012) Margin maximization in spherical separation. Comput Optim Appl 53(2):301–322
Article MathSciNet Google Scholar
Astorino A, Gaudioso M (2002) Polyhedral separability through successive LP. J Optim Theory Appl 112(2):265–293
Article MathSciNet Google Scholar
Astorino A, Gaudioso M (2005) Ellipsoidal separation for classification problems. Optim Methods Softw 20(2–3):261–270
MathSciNet MATH Google Scholar
Astorino A, Gaudioso M (2009) A fixed-center spherical separation algorithm with kernel transformations for classification problems. CMS 6(3):357–372
Article MathSciNet Google Scholar
Astorino A, Gaudioso M, Seeger A (2014) Conic separation of finite sets I. The homogeneous case. J Convex Anal 21(1):1–28
MathSciNet MATH Google Scholar
Astorino A, Gaudioso M, Seeger A (2014) Conic separation of finite sets. II. The nonhomogeneous case. J Convex Anal 21(3):819–831
MathSciNet MATH Google Scholar
Bennett K, Mangasarian O (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1(1):23–34
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Article Google Scholar
Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp 57–64
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Book Google Scholar
Gaudioso M, Giallombardo G, Miglionico G, Bagirov A (2018) Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations. J Global Optim 71(1):37–55
Article MathSciNet Google Scholar
Gaudioso M, Gorgone E, Labbe M, Rodriguez-Chia A (2017) Lagrangian relaxation for svm feature selection. Comput Oper Res 87:137–145
Article MathSciNet Google Scholar
Hiriart-Urruty JB (1986) Generalized differentiability duality and optimization for problems dealing with differences of convex functions. Lecture Notes Econ Math Syst 256:37–70
Article MathSciNet Google Scholar
Joki K, Bagirov A, Karmitsa N, Makela M (2017) A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes. J Global Optim 68(3):501–535
Article MathSciNet Google Scholar
Jurs P (1986) Pattern recognition used to investigate multivariate data in analytical chemistry. Science 232(4755):1219–1224
Article Google Scholar
Khalaf W, Astorino A, D’Alessandro P, Gaudioso M (2017) A DC optimization-based clustering technique for edge detection. Optim Lett 11(3):627–640
Article MathSciNet Google Scholar
Mangasarian O (1999) Arbitrary-norm separating plane. Oper Res Lett 24(1–2):15–23
Article MathSciNet Google Scholar
Mangasarian O, Street W, Wolberg W (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43(4):570–577
Article MathSciNet Google Scholar
Megiddo N (1988) On the complexity of polyhedral separability. Discrete Comput Geom 3(4):325–337
Article MathSciNet Google Scholar
Pham Dinh T, Le Thi Hoai A (2014) Recent advances in DC programming and DCA. Trans Comput Intell 8:1–37
Google Scholar
Pham Dinh T et al (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133(1–4):23–46
MathSciNet MATH Google Scholar
Plastria F, Carrizosa E, Gordillo J (2014) Multi-instance classification through spherical separation and VNS. Comput Oper Res 52:326–333
Article MathSciNet Google Scholar
Rosen JB (1965) Pattern separation by convex programming. J Math Anal Appl 10:123–134
Article MathSciNet Google Scholar
Schölkopf B, Burges C, Smola A (1999) Advances in kernel methods. Support vector learning. MIT Press, Cambridge
MATH Google Scholar
Sra S, Nowozin S, Wright S (2011) Optimization for machine learning. The MIT Press, Cambridge
Book Google Scholar
Strekalovsky A, Gruzdeva T, Orlov A (2015) On the problem polyhedral separability: a numerical solution. Autom Remote Control 76(10):1803–1816
Article MathSciNet Google Scholar
Thongsuwan S, Jaiyen S, Padcharoen A, Agarwal P (2020) Convxgb: a new deep learning model for classification problems based on cnn and xgboost. Nuclear Eng Technol (2020)
Vapnik V (1995) The nature of the statistical learning theory. Springer, New York
Book Google Scholar

Download references

Funding

Open access funding provided by Università degli Studi di Cagliari within the CRUI-CARE Agreement. The fifth author was supported by KASBA—funded by Regione Autonoma della Sardegna. This research was also supported by Fondazione di Sardegna through the project Algorithms for Approximation with Applications.

Author information

Authors and Affiliations

ICAR - National Research Council, Rende, Italy
Annabella Astorino & Manlio Gaudioso
Dipartimento di Matematica e Informatica, Università di Cagliari, Cagliari, Italy
Massimo Di Francesco, Enrico Gorgone & Benedetto Manca

Authors

Annabella Astorino
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Di Francesco
View author publications
You can also search for this author in PubMed Google Scholar
Manlio Gaudioso
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Gorgone
View author publications
You can also search for this author in PubMed Google Scholar
Benedetto Manca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimo Di Francesco.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Marcello Sanguineti.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Astorino, A., Francesco, M.D., Gaudioso, M. et al. Polyhedral separation via difference of convex (DC) programming. Soft Comput 25, 12605–12613 (2021). https://doi.org/10.1007/s00500-021-05758-6

Download citation

Accepted: 16 March 2021
Published: 07 April 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00500-021-05758-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Polyhedral separation via difference of convex (DC) programming

Abstract

Similar content being viewed by others

Separation of Finitely Many Convex Sets and Data Pre-classification

Strict Polynomial Separation of Two Sets

Convex Hulls in Solving Multiclass Pattern Recognition Problem

1 Introduction

2 The polyhedral separability model

Definition 1

Proposition 1

Proof

Remark 1

3 Exploiting the function structure in the DCA implementation

4 Numerical experiments

5 Conclusions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Polyhedral separation via difference of convex (DC) programming

Abstract

Similar content being viewed by others

Separation of Finitely Many Convex Sets and Data Pre-classification

Strict Polynomial Separation of Two Sets

Convex Hulls in Solving Multiclass Pattern Recognition Problem

1 Introduction

2 The polyhedral separability model

Definition 1

Proposition 1

Proof

Remark 1

3 Exploiting the function structure in the DCA implementation

4 Numerical experiments

5 Conclusions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation