Incremental DC optimization algorithm for large-scale clusterwise linear regression

doi:10.1016/j.cam.2020.113323

Journal of Computational and Applied Mathematics

Volume 389, June 2021, 113323

https://doi.org/10.1016/j.cam.2020.113323 Get rights and content

Abstract

The objective function in the nonsmooth optimization model of the clusterwise linear regression (CLR) problem with the squared regression error is represented as a difference of two convex functions. Then using the difference of convex algorithm (DCA) approach the CLR problem is replaced by the sequence of smooth unconstrained optimization subproblems. A new algorithm based on the DCA and the incremental approach is designed to solve the CLR problem. We apply the Quasi-Newton method to solve the subproblems. The proposed algorithm is evaluated using several synthetic and real-world data sets for regression and compared with other algorithms for CLR. Results demonstrate that the DCA based algorithm is efficient for solving CLR problems with the large number of data points and in particular, outperforms other algorithms when the number of input variables is small.

Introduction

Clusterwise linear regression (CLR) is a method to approximate regression functions using two or more linear functions. It is based on two well-known techniques: clustering and regression. CLR has many applications including the consumer benefit segmentation [1], market segmentation [2], rainfall prediction [3] and PM10 prediction [4]. Algorithms for solving the CLR problems include those which are extensions of clustering algorithms such as $k$ -means [5], [6], [7] and expectation–maximization (EM) [8] and those based on mixture models [2], [9], [10], [11] and optimization approaches [9], [12], [13], [14], [15], [16], [17], [18], [19].

CLR is a global optimization problem. However, conventional global optimization algorithms as well as exact algorithms from [15], [17], [18] are not always applicable to solve CLR problems in data sets with the relatively large number of data points and/or input variables. In addition, these algorithms are not efficient when the large number of linear functions are required to approximate data as they may require prohibitively large computational effort and may not find any solution in a reasonable time. Therefore, algorithms which are capable of finding approximate global solutions to CLR problems in such data sets are usually applied. Such algorithms have been developed, for instance in [12], [13], [14], [20], and they are based on nonsmooth optimization formulation of the CLR problem and the incremental approach.

In this paper, we propose an algorithm for solving CLR problems which is particularly well-suited for large data sets. The proposed algorithm is based on the DCA [21] and the incremental approach. More specifically, the algorithm employs the DC representation of the CLR problem and applies the DCA to replace the nonsmooth CLR problem by the sequence of smooth subproblems. In addition, it exploits the partial separability of the objective function in subproblems and decompose them into problems with much smaller number of variables. These novelties of the algorithm distinguish it from other CLR algorithms and in particular, from those introduced in [12], [13], [14], [20], [22]. The incremental approach enables us to generate starting points which are rough approximations to the solution of the CLR problem and in this way to address the nonconvexity of the CLR problem.

We demonstrate the performance of the proposed algorithm using some synthetic and real-world data sets for regression and compare it with several CLR algorithms: the Spath algorithm [5], [6], the EM algorithm [9], [23], [24], the CLR algorithm based on smoothing techniques [14] and the DC based CLR algorithm [20].

The structure of the paper is as follows. Section 2 provides necessary information on nonsmooth DC optimization and CLR. The optimization formulation of the CLR problem and its DC representation are given in Section 3. The DCA for solving CLR problems is presented in Section 4 and the new algorithm is introduced in Section 5. Computational results are reported in Section 6. Section 7 contains some concluding remarks.

Section snippets

Theoretical background

We denote by $R^{n}$ the $n$ -dimensional Euclidean space with the inner product $〈 x, y 〉 = \sum_{i = 1}^{n} x_{i} y_{i}, x, y \in R^{n}$ and the associated norm $‖ x ‖ = {〈 x, x 〉}^{1 ∕ 2}$ .

A function $f : R^{n} \to R$ is called locally Lipschitz on $R^{n}$ if for any bounded subset $X \subset R^{n}$ there exists $L > 0$ such that $| f (x) - f (y) | \leq L ‖ x - y ‖ \forall x, y \in X$ . The generalized directional derivative of the locally Lipschitz function $f$ at a point $x \in R^{n}$ with respect to a direction $u \in R^{n}$ is defined as [25] $f^{\circ} (x, u) = \underset{y \to x, α ↓ 0}{lim sup} \frac{f (y + α u) - f (y)}{α} .$ The subdifferential of the function $f$ at $x \in R^{n}$ is $\partial f ($

CLR problem and its DC representation

Consider a data set $A = {(a^{i}, b_{i}) \in R^{n} \times R : i = 1, \dots, m}$ , where $a^{i} \in R^{n}$ are values of $n$ input variables and $b_{i} \in R$ are their outputs. The aim of $k$ -CLR, $k \geq 1$ is to find simultaneously an optimal partition of the set $A$ into $k$ clusters and regression coefficients within clusters in order to minimize the overall fit function. Let $A^{j}, j \in J_{k} \equiv {1, \dots, k}$ be clusters such that they are nonempty, pairwise disjoint and $A = ⋃_{j \in J_{k}} A^{j}$ . In $k$ -CLR, each cluster $A^{j}$ is approximated by the hyperplane with the coefficients $(x^{j}, y_{j})$ , $x^{j} \in R^{n}, y_{j}$

DC algorithm for CLR problem

Algorithm 1 can be applied to solve Problem (3) by reformulating the stopping criterion in Step 3 as follows.

Let ${(x^{h}, y_{h})}, x^{h} \in R^{n k}, y_{h} \in R^{k}$ be a set of linear regression coefficients found at the $h$ th iteration of the DCA and $A^{1}, \dots, A^{k}$ be the corresponding cluster partition of the data set $A$ . Since the function $f_{k 1}$ is continuously differentiable the stopping criterion in Step 3 is reduced to $ξ_{2}^{h} = \nabla f_{k 1} (x^{h}, y_{h}),$ where $ξ_{2}^{h} \in \partial f_{k 2} (x^{h}, y_{h})$ . To simplify this condition, first for any $(a, b) \in A$ we compute the set $R (a,$

The proposed CLR algorithm

In this section, we introduce the new algorithm, called DCA-CLR, for solving Problem (3). Note that this problem is nonconvex and the choice of starting points is very important when a local search method is applied to solve it. To deal with its nonconvexity, the new algorithm is designed based on the combination of the MDCA and an incremental approach. It starts with the calculation of one linear function and gradually adds one linear function at each iteration.

According to Proposition 3

Computational results

We evaluate the performance of the DCA-CLRand compare it with some existing CLR algorithms using various synthetic and real-world data sets for regression. The following algorithms are used for comparison:

$•$
the nonsmooth DC programming algorithm for CLR (NDC-CLR) [20];
$•$
the CLR algorithm based on smoothing techniques (S-CLR) [14];
$•$
the multistart version of the Späth algorithm (M-Späth) [5], [6]; and
$•$
the expectation–maximization algorithm for CLR (EM-CLR) [24].

The M-Späth uses the simple randomized

Conclusions

In this paper we developed an algorithm to solve a clusterwise linear regression (CLR) problem using DC representation of its objective function. The proposed algorithm, called DCA-CLR, applies the DCA to replace the DC-CLR problem by the sequence of smooth convex optimization problems. Furthermore, the partial separability of the objective functions in these subproblems is utilized to decompose them into problems with much smaller number of variables. Such an approach allows to apply smooth

Acknowledgments

The research by Dr. A.M. Bagirov and Dr. S. Taheri was supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (Project No. DP190100580) and Dr. E. Cimen was supported by Anadolu University Scientific Research Projects Commission, Turkey under the Grant No. 1506F499. The authors would like to thank two anonymous referees for their valuable comments that helped to improve the quality of the paper.

References (35)

WedelM. et al.
Consumer benefit segmentation using clusterwise linear regression
Int. J. Res. Mark.
(1989)
PredaC. et al.
Clusterwise pls regression on a stochastic process
Comput. Statist. Data Anal.
(2005)
BagirovA. et al.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
Atmos. Res.
(2017)
PoggiJ.-M. et al.
PM10 forecasting using clusterwise regression
Atmos. Environ.
(2011)
Garcìa-EscuderoL. et al.
Robust clusterwise linear regression through trimming
Comput. Statist. Data Anal.
(2010)
BagirovA. et al.
Nonsmooth nonconvex optimization approach to clusterwise linear regression problems
European J. Oper. Res.
(2013)
CarbonneauR. et al.
Globally optimal clusterwise regression by mixed logical-quadratic programming
European J. Oper. Res.
(2011)
CarbonneauR. et al.
Extensions to the repetitive branch-and-bound algorithm for globally-optimal clusterwise regression
Comput. Oper. Res.
(2012)
SpäthH.
Algorithm 39: Clusterwise linear regression
Computing
(1979)
SpäthH.
Algorithm 48: A fast algorithm for clusterwise linear regression
Computing
(1981)

SpäthH.

Mathematical Algorithms for Linear Regression

(2014)

S. Gaffney, P. Smyth, Trajectory clustering using mixtures of regression models, in: S. Chaudhuri, D. Madigan (Eds.),...

DeSarboW. et al.

A maximum likelihood methodology for clusterwise linear regression

J. Classification

(1988)

ParkY. et al.

Algorithms for generalized cluster-wise linear regression

INFORMS J. Comput.

(2017)

BagirovA. et al.

DC programming algorithm for clusterwise linear l1 regression

J. Oper. Res. Soc. China

(2017)

BagirovA. et al.

An algorithm for clusterwise linear regression based on smoothing techniques

Optim. Lett.

(2015)

BertsimasD. et al.

Classification and regression via integer optimization

Oper. Res.

(2007)

Cited by (5)

Foreword to the virtual special issue dedicated to the 3rd International Conference NUMTA 2019 “Numerical Computations: Theory and Algorithms”
2021, Journal of Computational and Applied Mathematics
Bundle Enrichment Method for Nonsmooth Difference of Convex Programming Problems
2023, Algorithms
Heterogeneous Learning of Functional Clustering Regression and Application to Chinese Air Pollution Data
2023, International Journal of Environmental Research and Public Health
Methods and Applications of Clusterwise Linear Regression: A Survey and Comparison
2023, ACM Transactions on Knowledge Discovery from Data
An augmented subgradient method for minimizing nonsmooth DC functions
2021, Computational Optimization and Applications

View full text