Incremental DC optimization algorithm for large-scale clusterwise linear regression

https://doi.org/10.1016/j.cam.2020.113323Get rights and content

Abstract

The objective function in the nonsmooth optimization model of the clusterwise linear regression (CLR) problem with the squared regression error is represented as a difference of two convex functions. Then using the difference of convex algorithm (DCA) approach the CLR problem is replaced by the sequence of smooth unconstrained optimization subproblems. A new algorithm based on the DCA and the incremental approach is designed to solve the CLR problem. We apply the Quasi-Newton method to solve the subproblems. The proposed algorithm is evaluated using several synthetic and real-world data sets for regression and compared with other algorithms for CLR. Results demonstrate that the DCA based algorithm is efficient for solving CLR problems with the large number of data points and in particular, outperforms other algorithms when the number of input variables is small.

Introduction

Clusterwise linear regression (CLR) is a method to approximate regression functions using two or more linear functions. It is based on two well-known techniques: clustering and regression. CLR has many applications including the consumer benefit segmentation [1], market segmentation [2], rainfall prediction [3] and PM10 prediction [4]. Algorithms for solving the CLR problems include those which are extensions of clustering algorithms such as k-means [5], [6], [7] and expectation–maximization (EM) [8] and those based on mixture models [2], [9], [10], [11] and optimization approaches [9], [12], [13], [14], [15], [16], [17], [18], [19].

CLR is a global optimization problem. However, conventional global optimization algorithms as well as exact algorithms from [15], [17], [18] are not always applicable to solve CLR problems in data sets with the relatively large number of data points and/or input variables. In addition, these algorithms are not efficient when the large number of linear functions are required to approximate data as they may require prohibitively large computational effort and may not find any solution in a reasonable time. Therefore, algorithms which are capable of finding approximate global solutions to CLR problems in such data sets are usually applied. Such algorithms have been developed, for instance in [12], [13], [14], [20], and they are based on nonsmooth optimization formulation of the CLR problem and the incremental approach.

In this paper, we propose an algorithm for solving CLR problems which is particularly well-suited for large data sets. The proposed algorithm is based on the DCA [21] and the incremental approach. More specifically, the algorithm employs the DC representation of the CLR problem and applies the DCA to replace the nonsmooth CLR problem by the sequence of smooth subproblems. In addition, it exploits the partial separability of the objective function in subproblems and decompose them into problems with much smaller number of variables. These novelties of the algorithm distinguish it from other CLR algorithms and in particular, from those introduced in [12], [13], [14], [20], [22]. The incremental approach enables us to generate starting points which are rough approximations to the solution of the CLR problem and in this way to address the nonconvexity of the CLR problem.

We demonstrate the performance of the proposed algorithm using some synthetic and real-world data sets for regression and compare it with several CLR algorithms: the Spath algorithm [5], [6], the EM algorithm [9], [23], [24], the CLR algorithm based on smoothing techniques [14] and the DC based CLR algorithm [20].

The structure of the paper is as follows. Section 2 provides necessary information on nonsmooth DC optimization and CLR. The optimization formulation of the CLR problem and its DC representation are given in Section 3. The DCA for solving CLR problems is presented in Section 4 and the new algorithm is introduced in Section 5. Computational results are reported in Section 6. Section 7 contains some concluding remarks.

Section snippets

Theoretical background

We denote by Rn the n-dimensional Euclidean space with the inner product x,y=i=1nxiyi,x,yRn and the associated norm x=x,x12.

A function f:RnR is called locally Lipschitz on Rn if for any bounded subset XRn there exists L>0 such that |f(x)f(y)|Lxyx,yX. The generalized directional derivative of the locally Lipschitz function f at a point xRn with respect to a direction uRn is defined as [25] f(x,u)=lim supyx,α0f(y+αu)f(y)α.The subdifferential of the function f at xRn is f(

CLR problem and its DC representation

Consider a data set A={(ai,bi)Rn×R:i=1,,m}, where aiRn are values of n input variables and biR are their outputs. The aim of k-CLR, k1 is to find simultaneously an optimal partition of the set A into k clusters and regression coefficients within clusters in order to minimize the overall fit function. Let Aj,jJk{1,,k} be clusters such that they are nonempty, pairwise disjoint and A=jJkAj. In k-CLR, each cluster Aj is approximated by the hyperplane with the coefficients (xj,yj), xjRn,yj

DC algorithm for CLR problem

Algorithm 1 can be applied to solve Problem (3) by reformulating the stopping criterion in Step 3 as follows.

Let {(xh,yh)},xhRnk,yhRk be a set of linear regression coefficients found at the hth iteration of the DCA and A1,,Ak be the corresponding cluster partition of the data set A. Since the function fk1 is continuously differentiable the stopping criterion in Step 3 is reduced to ξ2h=fk1(xh,yh),where ξ2hfk2(xh,yh). To simplify this condition, first for any (a,b)A we compute the set R(a,

The proposed CLR algorithm

In this section, we introduce the new algorithm, called DCA-CLR, for solving Problem (3). Note that this problem is nonconvex and the choice of starting points is very important when a local search method is applied to solve it. To deal with its nonconvexity, the new algorithm is designed based on the combination of the MDCA and an incremental approach. It starts with the calculation of one linear function and gradually adds one linear function at each iteration.

According to Proposition 3

Computational results

We evaluate the performance of the DCA-CLRand compare it with some existing CLR algorithms using various synthetic and real-world data sets for regression. The following algorithms are used for comparison:

  • the nonsmooth DC programming algorithm for CLR (NDC-CLR) [20];

  • the CLR algorithm based on smoothing techniques (S-CLR) [14];

  • the multistart version of the Späth algorithm (M-Späth) [5], [6]; and

  • the expectation–maximization algorithm for CLR (EM-CLR) [24].

The M-Späth uses the simple randomized

Conclusions

In this paper we developed an algorithm to solve a clusterwise linear regression (CLR) problem using DC representation of its objective function. The proposed algorithm, called DCA-CLR, applies the DCA to replace the DC-CLR problem by the sequence of smooth convex optimization problems. Furthermore, the partial separability of the objective functions in these subproblems is utilized to decompose them into problems with much smaller number of variables. Such an approach allows to apply smooth

Acknowledgments

The research by Dr. A.M. Bagirov and Dr. S. Taheri was supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (Project No. DP190100580) and Dr. E. Cimen was supported by Anadolu University Scientific Research Projects Commission, Turkey under the Grant No. 1506F499. The authors would like to thank two anonymous referees for their valuable comments that helped to improve the quality of the paper.

References (35)

  • SpäthH.

    Mathematical Algorithms for Linear Regression

    (2014)
  • S. Gaffney, P. Smyth, Trajectory clustering using mixtures of regression models, in: S. Chaudhuri, D. Madigan (Eds.),...
  • DeSarboW. et al.

    A maximum likelihood methodology for clusterwise linear regression

    J. Classification

    (1988)
  • ParkY. et al.

    Algorithms for generalized cluster-wise linear regression

    INFORMS J. Comput.

    (2017)
  • BagirovA. et al.

    DC programming algorithm for clusterwise linear l1 regression

    J. Oper. Res. Soc. China

    (2017)
  • BagirovA. et al.

    An algorithm for clusterwise linear regression based on smoothing techniques

    Optim. Lett.

    (2015)
  • BertsimasD. et al.

    Classification and regression via integer optimization

    Oper. Res.

    (2007)
  • View full text