ReviewPiecewise autoregression for general integer-valued time series
Introduction
We consider a -valued () process where the conditional mean is a function (see below) of the whole information up to time and of an unknown parameter belongs to a compact subset (). The inference in the cases where is constant or the distribution of is known has been studied by many authors in several directions; see for instance, Fokianos et al. (2009), Fokianos and Tjøstheim, 2011, Fokianos and Tjøstheim, 2012, Davis and Liu (2016), Douc et al. (2017) among others, for some recent works. We consider here a more general setting where is piecewise constant (multiple change-point problem) and that the distribution of is unknown. We refer to Franke et al. (2012), Kang and Lee (2014), Doukhan and Kengne (2015), Leung et al. (2017) and the references therein for some tests for change-point detection in integer-valued time series.
Let be a trajectory generated as in model (1.1) and assume that the parameter is piecewise constant. Also, assume that , and such that, is generated from the th stationary regime ; i.e., it is a trajectory of the process (which are not actually observed for , see Section 2 for some details) satisfying: where is the -field generated by the whole information up to time and is a measurable non-negative function assumed to be known up to the parameter . is the number of segments (or regimes) of the model; the th segment corresponds to and depends on the parameter . are the change-point locations; by convention, and . To ensure the identifiability of the change-point locations, it is reasonable to assume that for . The case corresponds to the model without change. In the sequel, we assume that the random variables , have the same (up to the parameter ) distribution and denote by the distribution of . For instance, for an INGARCH representation, we have where , . The parameters vector of the th regime is . Therefore, is a compact subset of such that for all , . For all , we assume that ; hence, there exists a sequence of non-negative real numbers such that . Then, for any . For instance, if the distribution is Poisson, negative binomial or binary, then we get respectively a Poisson, negative binomial, binary INGARCH process; see some examples in Section 4.
Our main focus of interest is the estimation of the unknown parameters in the model (1.2). This can be viewed as a classical model selection problem. Assume that the observations are generated from (1.2). Let be the upper bound of the number of segments (note that ). Denote by the set of partitions of into at most contiguous segments. Set a generic element of segments in . Consider the collection where, for a given , is the families of sequence which are piecewise constant on the partition . Any depends on the parameter which is the piecewise values of on each segment. Set . Denote by a generic element of , with partition and parameter . denotes the number of the piecewise segments, also called the dimension of . The true model with dimension , depends on a partition and the parameter .
For any , set and denote by the distribution of ; let be the probability density function of this distribution. For , let be the conditional distribution of . We consider the log-likelihood contrast conditioned to : , Thus, the minimal contrast estimator of on the collection is obtained by minimizing the contrast over ; that is, . The main approaches of the model selection procedures take into account the model complexity and select the estimator such that, minimizes the penalized criterion where is a penalty function, possibly data-dependent. We now address the following issues.
(i) Semi-parametric setting. Kashikar et al. (2013) have carried out structural breaks in Poisson INAR process from the MCMC and Gibbs sampling approach. Cleynen and Lebarbier, 2014, Cleynen and Lebarbier, 2017 have recently considered the change-point type problem (1.2) with i.i.d. observations; in their works, the distribution is assumed to be known and could be Poisson, Negative binomial or belongs to the exponential family distribution. From the practical viewpoint, we consider the case where is unknown and deal with the Poisson quasi-likelihood (see for instance, Ahmad and Francq, 2016). So in the sequel, is the Poisson quasi-likelihood contrast and is the Poisson quasi-maximum likelihood estimator (PQMLE).
(ii) Multiple change-point problem from a non-asymptotic point of view. This question is tacked by model selection approach. Numerous works have been devoted to this issue; see among others, Lebarbier (2005), Arlot and Massart (2009), Cleynen and Lebarbier, 2014, Cleynen and Lebarbier, 2017 and Arlot et al. (2016).
In this (quasi)log-likelihood framework, it is more usual to consider the Kullback–Leibler risk. For any , the Kullback–Leibler divergence between and is where denotes the expectation with respect to the true distribution of the observations. In the case where is the likelihood contrast, we get . The “ideal” partition (the one whose estimator is closest to according to the Kullback–Leibler risk) satisfying: The corresponding estimator , called the oracle, depends on the true sample distribution, and cannot be computed in practice. The goal is to calibrate the penalty term, such that the segmentation provides an estimator where the risk of is close as possible to the risk of the oracle, namely such that for a non-negative constant , expected close to 1. This issue is addressed in the above mentioned papers, and the results obtained are heavily relied on the independence of the observations. In our setting here, it seems to be a more difficult task. But, we believe that the coupling method can be used as in Lerasle (2011) to overcome this difficulty. We leave this question as the topic of a different research project.
(iii) Multiple change-point problem from an asymptotic point of view. The aim here is to consistently estimate the parameters of the change-point model. This issue has been addressed by several authors using the classical contrast/criteria optimization or binary/sequential segmentation/estimation; see for instance, Bai and Perron (1998), Davis et al. (2008), Harchaoui and Lévy-Leduc (2010), Bardet et al. (2012), Davis and Yau (2013), Davis et al. (2016), Ma and Yau (2016), Yau and Zhao (2016), Inclán and Tiao (1994), Bai (1997), Fryzlewicz and Subba Rao (2014), Fryzlewicz (2014), among others, for some advanced towards this issue. These works and many other papers in the literature on the asymptotic study of multiple change-point problem are often focused on continuous valued time series; moreover, the case of a large class of semi-parametric model for discrete-valued time series (such as those discussed earlier) have not yet addressed.
We consider (1.2) and derive a penalized contrast of type (1.3). We assume that there exists a partition of such that , where is the corresponding partition of obtained from . We provide sufficient conditions on the penalty , for which the estimators and are consistent; that is: where is the corresponding partition of obtained from .
The paper is organized as follows. In Section 2, we set some notations, assumptions and define the Poisson QMLE. In Section 3, we derive the estimation procedure and provide the main results. Some simulation results are displayed in Section 4 whereas Section 5 focuses on applications on the US recession data and the daily number of trades in the stock of Technofirst. Section 6 is devoted to a summary and conclusion. The Supporting Information provides the proofs of the main results.
Section snippets
Notations and Poisson QMLE
We set the following classical Lipschitz-type condition on the function .
Assumption A
For any , the function is times continuously differentiable on and there exists a sequence of non-negative real numbers satisfying (or for ); such that for any , where denotes any vector, matrix norm.
In the whole paper, it is assumed that for , there exists a stationary and ergodic process
Estimation procedure and main results
In this section, we carry out the estimation of the number of breaks and the instants of breaks by using a penalized contrast. Some asymptotic studies are also reported.
Some simulations results
In this section, we implement the procedure on the R software (developed by the CRAN project). We will restrict our attention to the estimation of the vector ; i.e, the number of segments and the instants of breaks . For the performances of the estimator of the parameter , we refer to the works of Ahmad and Francq (2016). For each process, we generate replications following the scenarios considered. The estimated number of segments is computed by using the criteria
Real data application
We apply our change-point procedure to two examples of real data series. To compute the estimator , the -penalty is used with (where ) and .
Summary and conclusion
This paper focuses on the multiple change-point problem in a general class of integer-valued time series. A penalized contrast estimator based on the Poisson quasi-maximum likelihood of the model is proposed. The theoretical study establishes the consistency of the proposed estimator. A data-driven procedure based on the slope heuristic is also proposed to calibrate the penalty term of the contrast. The simulation study based on three penalty procedures (BIC, and slope heuristic) displays
Acknowledgments
The authors are grateful to the Executive Editors, Co-Editors and the two anonymous Referees for many relevant suggestions and comments which helped to improve the contents of this article.
References (34)
- et al.
On consistency of minimum description length model selection for piecewise autoregressions
J. Econometrics
(2016) - et al.
On weak dependence conditions for Poisson autoregressions
Statist. Probab. Lett.
(2012) - et al.
Correction to on weak dependence conditions for Poisson autoregressions
Statist. Probab. Lett.
(2013) - et al.
Log-linear Poisson autoregression
J. Multivariate Anal.
(2011) Structural changes in autoregressive models for binary time series
J. Statist. Plann. Inference
(2013)Detecting multiple change-points in the mean of Gaussian process by model selection
Signal Process.
(2005)- et al.
Poisson QMLE of count time series models
J. Time Series Anal.
(2016) - et al.
A kernel multiple change-point algorithm via model selection
(2016) - et al.
Data-driven calibration of penalties for least-squares regression
J. Mach. Learn. Res.
(2009) Estimating multiple breaks one at a time
Econometric Theory
(1997)
Estimating and testing linear models with multiple structural changes
Econometrica
Multiple breaks detection in general causal time series using penalized quasi-likelihood
Electron. J. Stat.
Slope Heuristics: Overview and Implementation RR-INRIA 7223
Segmentation of the Poisson and negative binomial rate models: a penalized estimator
ESAIM Probab. Stat.
Model selection for the segmentation of multiparameter exponential family distributions
Electron. J. Stat.
Break detection for a class of nonlinear time series models
J. Time Series Anal.
Theory and inference for a class of observation-driven models with application to time series of counts
Statist. Sinica
Cited by (17)
Deep learning for ψ-weakly dependent processes
2024, Journal of Statistical Planning and InferenceChange-points analysis for generalized integer-valued autoregressive model via minimum description length principle
2024, Applied Mathematical ModellingDensity Power Divergence Estimator for General Integer-Valued Time Series with Exogenous Covariates
2023, Communications in Mathematics and Statistics
- 1
Supported by the Institute for advanced studies - IAS (CY Cergy Paris Université, France), the MME-DII center of excellence (ANR-11-LABEX-0023-01) and by the CEA-MITIC (Université Gaston Berger, Sénégal).
- 2
Developed within the ANR BREAKRISK, France : ANR-17-CE26-0001-01.