Consistent model selection in segmented line regression

https://doi.org/10.1016/j.jspi.2015.09.008Get rights and content

Highlights

  • Segmented line regression models, constrained and unconstrained, are studied.

  • Consistency of the estimated number of change-points is investigated.

  • Simulation studies are conducted to assess the performance of the proposed criteria.

Abstract

The Schwarz criterion or Bayes Information Criterion (BIC) is often used to select a model dimension, and some variations of the BIC have been proposed in the context of change-point problems. In this paper, we consider a segmented line regression model with an unknown number of change-points and study asymptotic properties of Schwarz type criteria in selecting the number of change-points. Noticing the over-estimating tendency of the traditional BIC observed in some empirical studies and being motivated by asymptotic behavior of the modified BIC proposed by Zhang and Siegmund (2007), we consider a variation of the Schwarz type criterion that applies a harsher penalty equivalent to the model with one additional unknown parameter per segment. For the segmented line regression model without the continuity constraint, we prove the consistency of the number of change-points selected by the criterion with such type of a modification and summarize the simulation results that support the consistency. Further simulations are conducted for the model with the continuity constraint, and we empirically observe that the asymptotic behavior of this modified version of BIC is comparable to that of the criterion proposed by Liu et al. (1997).

Introduction

A main concern in regression model selection is how to select the “best” set of independent variables, and two major approaches of the model selection are hypothesis testing and information criteria approaches. In the context of change-point problems, both approaches have been applied to select the number of change-points, and their analytic and empirical properties have been investigated by many researchers. One of widely used information criteria is the Bayes Information Criterion (BIC) proposed by Schwarz (1978). This Schwarz criterion selects the model dimension by finding the Bayes solution that maximizes a posterior probability of the model, and Schwarz (1978) derived the following criterion by evaluating the leading terms of its asymptotic expansion: SC(p)=supθplog(lik(θp))p2logn=log(lik(θˆp))p2logn, where lik(θp) is the likelihood function of θp for the model with dimension p and θˆp is the maximum likelihood estimator of θp. As in general information criteria, the Schwarz criterion has two parts, the log of the maximized likelihood function and the penalty function that penalizes for the model dimension, and the method selects the model that maximizes SC(p). Note that its validity is established in Schwarz (1978) for “the case of independent, identically distributed observations, and linear models”.

Yao (1988) studied the problem to select the number of change-points in means of normally distributed random variables, where the total number of unknown parameters for the model with k change-points is p=2(k+1). For the number of change-points estimated by minimizing SC(p) with p=2k, Yao (1988) proved its consistency. Lee (1997) considered a similar type of a criterion to select the number of change-points in a sequence of random variables from an exponential family distribution. Under some mild conditions on spacings of successive change-points, Lee (1997) proved the consistency of the number of change-points estimated by the Schwarz type criterion whose penalty term is greater than 2k(1+ϵ0)logn for some ϵ0>0. Zhang and Siegmund (2007) noted that the usage of the Schwarz criterion “is not theoretically justified” in their situation due to irregularities in the likelihood function and proposed a modified BIC derived as an asymptotic approximation of the Bayes factor to determine the number of change-points in means of normally distributed random variables. For other types of modifications and applications for detecting mean changes, see Ninomiya (2005), Pan and Chen (2006), and Hannart and Naveau (2012).

In the context of segmented line regression, similar approaches have been proposed to select the number of change-points. Kim et al. (2000) proposed the permutation test to select the number of change-points in the segmented line regression model where segments are assumed to be continuous at change-points, called the joinpoint regression model in their paper. Kim et al. (2009) considered the traditional BIC, BIC(k)=2nSC(2k)=log(RSSk/n)+2klognn, where RSSk is the residual sum of squares for the model with k change-points, and compared its performance with those of the permutation test procedure of Kim et al. (2000) and the method based on generalized cross validation used in MARS of Friedman (1991). Note that the penalty term of 2klognn is chosen based on 2k+3 unknown parameters for the joinpoint regression model with k change-points. Liu et al. (1997) considered a general segmented line regression model allowing a discontinuity at the change-point and non-Gaussian errors, proposed a penalty term with a bigger order than that of BIC(k), and proved the consistency of the dimension selected by minimizing their criterion: MIC(k)=log(RSSk/(np))+pc0(logn)2+δ0n, where p=p(k)=(k+1)p+k for the model with k change-points and p covariates and c0 and δ0 are positive constants. Two Bayesian model selection methods based on the Bayes factor and a Bayesian version of BIC were developed in Tiwari et al. (2005) who investigated their empirical properties via simulations and compared their performances with that of the permutation procedure of Kim et al. (2000). Martinez-Beneito et al. (2011) also proposed a Bayesian model selection method that provides posterior probabilities and is flexible to work with Poisson count data.

This paper is motivated from empirical results where the traditional BIC indicated a tendency to over-estimate the number of change-points (See Table 1 of Kim et al., 2009, Table 1 of Zhang and Siegmund, 2007). When the argument of Zhang and Siegmund (2007) is applied to segmented line regression, the penalty of the modified BIC is harsher than that of the traditional BIC, asymptotically corresponding to one additional unknown parameter per segment under some conditions, and this motivated us to consider a BIC type criterion whose penalty is 4klognn for the segmented line regression model without the continuity constraint and 3klognn for the model with the continuity constraint. Note that for segmented line regression with k change-points, the number of unknown parameters is 3k+3 for the model without the continuity constraint and 2k+3 for the model with the continuity constraint. Let BICd(k)=log(RSSk/n)+PEd(k)=log(RSSk/n)+dklognn, for penalty coefficients, d. Then the traditional BICs are BIC3 for the unconstrained model and BIC2 for the constrained model.

Our interest in this paper is on asymptotic behavior of BICd, a simple model selection criterion whose penalty term has the same order as that of the traditional BIC, and we focus on asymptotic properties of BIC4 for the model without the continuity constraint and BIC3 for the model with the continuity constraint. In Section  2, we formally introduce the unconstrained model and prove the consistency of the model dimension selected by BIC4 for the unconstrained model with Gaussian errors. This result provides a consistent model selection criterion that imposes an asymptotically milder penalty than MIC does. In Section  3, we present the results of a simulation study where we compare the performance of BIC4 with those of BIC3 and MIC. Section  4 includes empirical results and discussion on the constrained case where the segments are constrained to be continuous at the change-points. Further discussion is presented in Section  5.

Section snippets

Selection methods and consistency: Unconstrained model

Suppose that we observe (x1,y1),,(xn,yn) and consider a segmented line regression model such that yi=βj,0+βj,1xi+ϵi,if  τj1<xiτj(j=1,,κ+1), where κ is the unknown number of change-points, the τ’s are unknown change-points with τ0=minixi1n and τκ+1=maxixi, and the ϵi are independent N(0,σ2).

Let us consider the Schwarz type criterion (1) defined above and estimate κ as κˆ=argmin0kKBICd(k), where K is a pre-determined maximum number of change-points. For the segmented line regression model

Simulations: Unconstrained model

This section summarizes simulations conducted to study asymptotic behavior of BIC4 whose consistency is proved in Section  2 and to compare its performance with those of BIC3, the traditional BIC, and MIC whose consistency is established in Liu et al. (1997). For each case of the model parameters chosen, 300 replications of data sets are simulated, and P(κˆ<κ0),P(κˆ=κ0), and P(κˆ>κ0) are estimated where κ0 is the true number of change-points and κˆ(0κˆK) is the estimate obtained by each

Selection methods in constrained model

Recall the segmented line regression model introduced in (2): yi=βj,0+βj,1xi+ϵi,if  τj1<xiτj(j=1,,κ+1). When the segments are assumed to be continuous at the change-points, βj,0+βj,1τj=βj+1,0+βj+1,1τj, for j=1,,κ, and this model can also be represented as yi=β0+β1xi+δ1(xiτ1)+++δκ(xiτκ)++ϵi, where a+=max(0,a).

For this segmented line regression model with the continuity constraint, called a joinpoint regression model in Kim et al. (2000), the total number of unknown parameters is 2k+3 for

Discussion

In this paper, we considered information based selection criteria to estimate the number of change-points in segmented line regression, and proved the consistency of the number of change-points estimated by BIC4 for the unconstrained model with normal errors. The simulation results summarized in Table 1, Table 2, Table 3 for the unconstrained model support the theoretical results that (i) the under-fitting probability of BICd converges to zero for any d>0 (Theorem 1) and (ii) the over-fitting

Acknowledgments

A part of H-J. Kim’s research was conducted during her visit at National Cancer Institute. J. Kim’s research was supported by a research grant from Inha University.

References (13)

There are more references available in the full text version of this article.

Cited by (0)

View full text