1 Theoretical properties

One of the main strengths of the proposed methodology, possibly due to the SDLL, is that it can work well both in a change point regime as well as in a frequent jump regime: In a change point regime the minimum distance to the next change, \(\delta _i:=\min (\eta _i-\eta _{i-1},\eta _{i+1}-\eta _i)\), is reasonably large while the magnitude of the change \(f^\prime _i\) is bounded from above and can be small (even tend to zero as \(T \rightarrow \infty\)). In a frequent jump regime \(\delta _i\) is small (related to outlier detection) and necessarily corresponding jumps \(f_i^{\prime }\) need to be large to be detectable. In both situations, an adaptation of Lemma 1 of Wang et al. (2018) shows that no consistent estimator of the locations of change point exists when \(\sigma ^{-2} \min _{1 \le i \le N} (\delta _i\,(f_i^{\prime })^2) < \log (T)\).

While WBS2.SDLL is shown to perform well in both regimes numerically, the paper does not provide a theoretical underpinning of this good behaviour, in the sense that only a linear-time change point setting with \(\delta _T:=\min _i\delta _i\) being of the same order as the sample size T is considered: Such an assumption is not necessary for consistent change point detection and, moreover, it excludes models such as extreme.teeth (ET) and extreme.extreme.teeth (EET), which are reasonably considered as belonging to the frequent jump regime with \(\delta _T \le 5\). In the future, it will be very exciting to see which theoretical framework will help us to better understand the performance of statistical procedures that aim at handling both regimes simultaneously.

In addition, the best currently available results for the localisation rate attained by WBS as well as the requirement on the magnitude of changes for their detection, are sub-optimal when \(\delta _T/T \rightarrow 0\) (see Appendix A of Cho and Kirch 2019). Baranowski et al. (2019) and Wang et al. (2018) suggest modifications of WBS that alleviate the sub-optimality at the cost of introducing additional tuning parameters such as a threshold or an upper bound on the length of random intervals. However, even in these papers, the assumptions are formulated in terms of \(\min _i\delta _i\,\min _i (f_i^{\prime })^2\), which does not reflect that the strength of multiscale procedures lies in their ability to handle data sets containing both small changes with long distances to neighbouring change points, as well as large changes with shorter distances (see e.g., the mix model). Cho and Kirch (2019) consider multiscale change point situations by working with \(\min _i (\delta _i\,(f_i^{\prime })^2)\) in the theoretical investigation of a more systematic moving sum (MOSUM)-type procedure for candidate generation.

2 SDLL with alternative candidate generation methods

As already pointed out by the author, both components of the proposed algorithm, i.e., candidate generation and model section, can be used in combination with other methods. For example, in Cho and Kirch (2019), a version of WBS2 has been adopted as a candidate generation method for the localised pruning method proposed for model selection. We will now show that deterministic candidate generation methods, such as the multiscale MOSUM procedure (Chan and Chen 2017; Cho and Kirch 2019), can be used with SDLL. Our first tentative attempt at generating a complete solution path of candidates with a reasonable measure of importance attached, is described in Sect. 3 below. Based on some initial simulation results reported in Table 1, we conclude that deterministic candidate generation methods can be a good alternative, and that this approach merits further research. Such a deterministic method will always yield the same result when applied to the same data set, whereas WBS-based methods can produce different outcomes in different runs (as observed in Cho and Kirch 2019 on array comparative genomic hybridization data sets). In particular, WBS-based results are reproducible only if the seed of the random number generation is also reported. In Section 4.1 of the present paper, the use of a ‘median’ of several runs is proposed to reduce this problem, which clearly comes at the cost of additional computation time.

3 MOSUM-candidate generation and some simulations

Many of the methods included in the comparative simulation studies of the present paper have been designed for the change point regime with their default parameters chosen accordingly, e.g., to save computation time. For example, the algorithm referred to as ‘MOSUM’ in the present paper, implemented in the R package mosum (Meier et al. 2019a), has a tuning parameter that relates to the smallest \(\delta _T\) permitted, and its default value is set at 10, which we consider as a reasonable lower bound for a change point problem. Also, the default choice of the parameter \(\alpha \in [0, 1]\), which stems from change point testing and sets a threshold for candidate generation in the algorithm, is somewhat conservative (\(\alpha = 0.1\)) and not very meaningful in the frequent jump regime. Moving away from the change point regime, we set the minimum bandwidth as small as possible in generating the bandwidth set \({\mathcal{G}}\),Footnote 1 and also set a more liberal threshold with \(\alpha = 0.9\). With these choices, MOSUM shows much better performance than that reported in the present paper, see Table 1 below.

Additionally, we explore the possibility of deterministic candidate generation based on moving sum statistics for a given set of bandwidth pairs \((G_l, G_r) \in{\mathcal{G}} \times{\mathcal{G}}\):

$$\begin{aligned}{{\tilde{M}}}_k(G_l, G_r; X) = \sqrt{\frac{G_l G_r}{G_l + G_r}} \left( \frac{1}{G_l}\sum _{t = k - G_l + 1}^k X_t - \frac{1}{G_r} \sum _{t = k + 1}^{k + G_r} X_t \right) . \end{aligned}$$

At each scale \((G_l, G_r)\), we identify all \({{\widetilde{k}}}\) which maximises \(\vert{{\tilde{M}}}_k(G_l, G_r; X) \vert\) locally within \(({{\widetilde{k}}}- G_l,{{\widetilde{k}}} + G_r)\), denote the collection of such \({{\widetilde{k}}}\) by \({\mathcal{K}}(G_l, G_r)\), and set \(M_k(G_l, G_r; X) ={{\tilde{M}}}_k(G_l, G_r; X) \cdot{\mathbb{I}}\{k \in ({{\widetilde{k}}} - G_l,{{\widetilde{k}}} + G_r), \,{{\widetilde{k}}} \in{\mathcal{K}}(G_l, G_r)\}\). We aggregate the MOSUM statistics generated at multiple scales as \(V(k) = \sum _{(G_l, G_r)} M_k(G_l, G_r; X)\), and then generate a solution path as in Algorithm 1, which is complete if the scale (1, 1) is included.

figure a

Referring to the methodology combining Algorithm 1 with SDLL as MOSUM.SDLL, Table 1 shows the results from applying WBS2.SDLL, MOSUM.SDLL (both with \(\lambda = 0.9\)) and MOSUM (with the aforementioned choice of parameters) to ET and EET summarised over 1000 realisations. All methods perform better for EET than for ET since the signal-to-noise ratio \(\sigma ^{-2} \min _i \delta _i\,(f_i^\prime )^2\) is greater for ET (see also Sect. 1 above).

As already mentioned, MOSUM adapted for the frequent jump regime works considerably better than the default version calibrated for the change point regime. While being more conservative than the SDLL-based methods for ET, MOSUM still outperforms the others in terms of the absolute and the squared error measures and overall, it returns reasonably good estimators at a fraction of the time. MOSUM.SDLL shows that the deterministic candidate generation provides a promising alternative to WBS2: It performs slightly worse than WBS2.SDLL in identifying the correct number of change points (\(N = 199\)) but the mean squared error of \({{\widehat{f}}}\) indicates that MOSUM.SDLL may return estimators of better localisation accuracy.

Table 1 Simulations results as in Table 2 of Fryzlewicz (2020)