Consistent model selection in segmented line regression
Introduction
A main concern in regression model selection is how to select the “best” set of independent variables, and two major approaches of the model selection are hypothesis testing and information criteria approaches. In the context of change-point problems, both approaches have been applied to select the number of change-points, and their analytic and empirical properties have been investigated by many researchers. One of widely used information criteria is the Bayes Information Criterion (BIC) proposed by Schwarz (1978). This Schwarz criterion selects the model dimension by finding the Bayes solution that maximizes a posterior probability of the model, and Schwarz (1978) derived the following criterion by evaluating the leading terms of its asymptotic expansion: where is the likelihood function of for the model with dimension and is the maximum likelihood estimator of . As in general information criteria, the Schwarz criterion has two parts, the log of the maximized likelihood function and the penalty function that penalizes for the model dimension, and the method selects the model that maximizes . Note that its validity is established in Schwarz (1978) for “the case of independent, identically distributed observations, and linear models”.
Yao (1988) studied the problem to select the number of change-points in means of normally distributed random variables, where the total number of unknown parameters for the model with change-points is . For the number of change-points estimated by minimizing with , Yao (1988) proved its consistency. Lee (1997) considered a similar type of a criterion to select the number of change-points in a sequence of random variables from an exponential family distribution. Under some mild conditions on spacings of successive change-points, Lee (1997) proved the consistency of the number of change-points estimated by the Schwarz type criterion whose penalty term is greater than for some . Zhang and Siegmund (2007) noted that the usage of the Schwarz criterion “is not theoretically justified” in their situation due to irregularities in the likelihood function and proposed a modified BIC derived as an asymptotic approximation of the Bayes factor to determine the number of change-points in means of normally distributed random variables. For other types of modifications and applications for detecting mean changes, see Ninomiya (2005), Pan and Chen (2006), and Hannart and Naveau (2012).
In the context of segmented line regression, similar approaches have been proposed to select the number of change-points. Kim et al. (2000) proposed the permutation test to select the number of change-points in the segmented line regression model where segments are assumed to be continuous at change-points, called the joinpoint regression model in their paper. Kim et al. (2009) considered the traditional BIC, where is the residual sum of squares for the model with change-points, and compared its performance with those of the permutation test procedure of Kim et al. (2000) and the method based on generalized cross validation used in MARS of Friedman (1991). Note that the penalty term of is chosen based on unknown parameters for the joinpoint regression model with change-points. Liu et al. (1997) considered a general segmented line regression model allowing a discontinuity at the change-point and non-Gaussian errors, proposed a penalty term with a bigger order than that of BIC, and proved the consistency of the dimension selected by minimizing their criterion: where for the model with change-points and covariates and and are positive constants. Two Bayesian model selection methods based on the Bayes factor and a Bayesian version of BIC were developed in Tiwari et al. (2005) who investigated their empirical properties via simulations and compared their performances with that of the permutation procedure of Kim et al. (2000). Martinez-Beneito et al. (2011) also proposed a Bayesian model selection method that provides posterior probabilities and is flexible to work with Poisson count data.
This paper is motivated from empirical results where the traditional BIC indicated a tendency to over-estimate the number of change-points (See Table 1 of Kim et al., 2009, Table 1 of Zhang and Siegmund, 2007). When the argument of Zhang and Siegmund (2007) is applied to segmented line regression, the penalty of the modified BIC is harsher than that of the traditional BIC, asymptotically corresponding to one additional unknown parameter per segment under some conditions, and this motivated us to consider a BIC type criterion whose penalty is for the segmented line regression model without the continuity constraint and for the model with the continuity constraint. Note that for segmented line regression with change-points, the number of unknown parameters is for the model without the continuity constraint and for the model with the continuity constraint. Let for penalty coefficients, . Then the traditional BICs are BIC3 for the unconstrained model and BIC2 for the constrained model.
Our interest in this paper is on asymptotic behavior of , a simple model selection criterion whose penalty term has the same order as that of the traditional BIC, and we focus on asymptotic properties of BIC4 for the model without the continuity constraint and BIC3 for the model with the continuity constraint. In Section 2, we formally introduce the unconstrained model and prove the consistency of the model dimension selected by BIC4 for the unconstrained model with Gaussian errors. This result provides a consistent model selection criterion that imposes an asymptotically milder penalty than MIC does. In Section 3, we present the results of a simulation study where we compare the performance of BIC4 with those of BIC3 and MIC. Section 4 includes empirical results and discussion on the constrained case where the segments are constrained to be continuous at the change-points. Further discussion is presented in Section 5.
Section snippets
Selection methods and consistency: Unconstrained model
Suppose that we observe and consider a segmented line regression model such that where is the unknown number of change-points, the ’s are unknown change-points with and , and the are independent .
Let us consider the Schwarz type criterion (1) defined above and estimate as where is a pre-determined maximum number of change-points. For the segmented line regression model
Simulations: Unconstrained model
This section summarizes simulations conducted to study asymptotic behavior of BIC4 whose consistency is proved in Section 2 and to compare its performance with those of BIC3, the traditional BIC, and MIC whose consistency is established in Liu et al. (1997). For each case of the model parameters chosen, 300 replications of data sets are simulated, and , and are estimated where is the true number of change-points and is the estimate obtained by each
Selection methods in constrained model
Recall the segmented line regression model introduced in (2): When the segments are assumed to be continuous at the change-points, for , and this model can also be represented as where .
For this segmented line regression model with the continuity constraint, called a joinpoint regression model in Kim et al. (2000), the total number of unknown parameters is for
Discussion
In this paper, we considered information based selection criteria to estimate the number of change-points in segmented line regression, and proved the consistency of the number of change-points estimated by BIC4 for the unconstrained model with normal errors. The simulation results summarized in Table 1, Table 2, Table 3 for the unconstrained model support the theoretical results that (i) the under-fitting probability of converges to zero for any (Theorem 1) and (ii) the over-fitting
Acknowledgments
A part of H-J. Kim’s research was conducted during her visit at National Cancer Institute. J. Kim’s research was supported by a research grant from Inha University.
References (13)
Information criterion for Gaussian change-point model
Statist. Probab. Lett.
(2005)- et al.
Application of modified information criterion to multiple change point problems
J. Multivariate Anal.
(2006) Estimating the number of change-points via Schwarz criterion
Statist. Probab. Lett.
(1988)Multivariate adaptive regression splines
Ann. Statist.
(1991)- et al.
An improved Bayesian information criterion for multiple change-point models
J. Amer. Statist. Assoc.
(2012) - et al.
Permutation tests for joinpoint regression with applications to cancer rates
Stat. Med.
(2000)