1 Introduction

The anthropogenic increase in atmospheric carbon dioxide (CO\(_2\)) has been considered the ultimate cause for climate change, and hence, a more comprehensive understanding of the global carbon cycle, where freshwaters play an important role, is essential. The primary inputs of CO\(_2\) into the river systems are the in situ respiration of organic material and the influx of soil CO\(_2\) from terrestrial and wetland environments through the decomposition of organic material and root respiration, while the principal source of aqueous CO\(_2\) is photosynthesis (Richey 2013). Many of the world rivers are supersaturated with CO\(_2\), resulting in a large outgassing that has recently been identified as a significant component of the global carbon budget (Cole et al. 2007; Butman and Raymond 2011; Wang et al. 2017).

A measure of the capacity for CO\(_2\) exchange between freshwater and atmosphere, reflecting whether a river acts as sink or source of atmospheric CO\(_2\), is the excess partial pressure of carbon dioxide (EpCO\(_2\)) (Raymond et al. 1997). EpCO\(_2\) is highly dynamic in low-order rivers; it varies spatially and seasonally due to differences in land use, water discharge, temperature, biological respiration and photosynthesis rates (Raymond et al. 1997; Wang et al. 2017). The ratio of in situ respiration to photosynthesis may also change over the course of a day, resulting in large EpCO\(_2\) fluctuations between day and night, due to changes in temperature and solar radiation (Reiman and Xu 2019; Wang et al. 2017). Therefore, sub-daily measurements across different seasonal periods in such systems are necessary to evaluate the variability in EpCO\(_2\) across different timescales (Dawson et al. 2009; Waldron et al. 2007; Wang et al. 2017). A wavelet analysis of 15-min resolution EpCO\(_2\) data in a small order stream of the river Dee in Scotland showed that the major contributor to the total EpCO\(_2\) variance is the intra-day variability, as opposed to inter-day variability, which also seems not constant over time (Elayouty et al. 2016). Establishing whether and how these daily patterns vary over time in relation to underlying hydrological and biological conditions poses statistical challenges due to the large volume of noisy data, persistent and dynamic correlation between observations, and varying relationships between the process drivers over different timescales.

By segmenting the river Dee high-resolution sequence of EpCO\(_2\) into natural consecutive daily intervals, the data for each day can be viewed as realisations of a stochastic functional time series (FTS) \(\{X_k, k \in \mathbb {Z}\}\). Each \(X_k\) in the series is a continuous smooth function \(X_k(t), t \in \mathcal {T}\), with k denoting the day as the discrete time parameter and t being the intra-day time parameter defined on the continuous domain \(\mathcal {T}\). The main goal of the paper is thus to develop and evaluate a methodology that enables us to investigate whether and how the sources of variability among the daily functions of EpCO\(_2\) change over time and possibly reveal the hydrological and/or biological drivers of these changes.

Functional data analysis (FDA) has recently proven to be an appropriate tool for the analysis of high-frequency data represented in the form of continuous functions (Ramsay and Silverman 1997; Wang et al. 2015; Ullah and Finch 2013), and functional principal component analysis (FPCA) is a key dimension reduction technique in FDA that can be used to explore the covariance structure in a set of functions. But, like traditional time series, data in FTS are dependent in time. It is also common for such environmental time series to be non-stationary over long time periods (Halliday et al. 2012; Horváth et al. 2014). Consequently, traditional FPCA may fail to explain appropriately the varying covariance structure and hence provide inefficient inference through the dimensionality reduction. To accommodate for temporal correlation and non-stationarity in multivariate time series, several PCA techniques are available in both the time domain (Ku et al. 1995; Miller and Bowman 2012; Rodrigues 2007; Vanhatalo et al. 2017; Melnikov et al. 2016) and the frequency domain (Brillinger 1981; Ombao and Ringo 2006; Pena and Yohai 2016). For the PCA of temporally correlated functional data, Jaimungal and Eddie (2007) first modelled the functions’ basis coefficients over time using vector auto-regressive (VAR) models and then applied standard PCA to the coefficients’ residuals which are less correlated. This, however, can lead to the loss of valuable information about the temporal patterns present in the data. Alternatively, Hormann et al. (2015) extended the theory of dynamic PCA in the frequency domain founded by Brillinger (1981) to the context of functional data. In this approach, the FPCs are obtained based on the spectral density which contains information on the covariances at all time lags. This enables the approximation of the curves, using orthogonal PCs which involve lagged observations, based on information available at nearby observations. Panaretos and Tavakoli (2013) proposed a similar idea for extending dynamic PCA to FTS, in which the reconstructed curves are based on a stochastic integration with respect to some orthogonal increment functional stochastic process. Both of these recent developments did not account for non-stationarity in the covariance structure over time in a FTS. But, Yan et al. (2018) considered the application of traditional FPCA to longitudinal data on a series of moving time windows of the functional domain to extract window-specific FPC scores under some smoothing constraints. The FPC scores obtained within each window are used in dynamic prediction assuming independence between functions, which may not be a realistic assumption in FTS. More recently, Dubey and Muller (2019) extended FPCA to the case where observations are metric-valued functions such as time-varying probability distributions or time-dynamic networks. But because the proposed object FPCs rely on an auto-covariance metric measuring only the association between the functions at lag 0, they can miss the time dependence between the objects.

In this paper, we develop a nonparametric time–frequency-domain FPCA procedure (and associated inference) that investigates and statistically assesses the changes over time in the variability structure of a non-stationary FTS, taking into account the potentially valuable information contained in the past and future observed functions. This version of FPCA allows the principal modes of variability to vary over time simultaneously with changes in the covariance structure in the FTS. This, in turn, will help optimal reconstruction of the curves from the low-dimensional time-varying FPCs that better explain the temporally varying covariance structure in the series. Based on these time-varying FPCs, a nonparametric inference procedure is proposed to test for the significance of changes in the auto-covariance structure. Although this time-varying functional principal components analysis (FPCA) has been motivated here by a highly dynamic series of daily aqueous CO\(_2\) profiles to study its dynamics and modes of variability, it can be applied to any dynamic FTS.

The rest of the paper is organised as follows. Section 2 gives the motivation for the proposed methodology and the transformation of 15-min raw time series of EpCO\(_2\) into a FTS of daily EpCO\(_2\) profiles. Section 3 describes an empirical estimation procedure of a smooth auto-covariance structure. Then, Sect. 4 presents the methodological and practical implementation of the proposed time-varying FPCs to the daily profiles of EpCO\(_2\). Section 5 evaluates the changes in these components over time, where a bootstrap-based inference procedure is proposed for the detection of changes in the auto-covariance structure. In Sect. 6, the performance of the time-varying FPCs is assessed numerically using an extensive simulation study under different scenarios of complete and incomplete FTS. Finally, the results are discussed and concluding remarks are drawn in Sect. 7. Online supplementary material includes technical definitions for FTS and frequency-domain methods in Appendix A, with additional results and figures for the proposed methodology in Appendix B.

2 EpCO\(_2\) Data from Discrete to Functional Time Series

As primary analysis of the high-resolution EpCO\(_2\) data highlighted varying smooth daily patterns of the process over time, the drivers of such variability in EpCO\(_2\) over the span of a day and how these diel profiles vary across seasons and different hydrological conditions are of specific interest to geo-chemists. To investigate this, the discrete (15-min) EpCO\(_2\) data can be converted into smooth daily profiles and then explored using FTS and FPCA approaches, which will be introduced here through the application. Elayouty et al. (2016) found that EpCO\(_2\) exhibits a seasonal pattern, reaching low levels in summer due to autotrophic processes and lack of carbon and organic inputs associated with lower discharge. To avoid such seasonality effects on the mean level from dominating the variability in the data, a smooth global mean was first estimated and subtracted from the data before further analysis.

Given the smoothness of the EpCO\(_2\) process, the 15-min de-trended data over the span of each day are used to derive daily (smooth) functions using a basis function expansion. This constitutes a FTS so that each functional observation \(X_k(t)\) on day k can be expressed by:

$$\begin{aligned} X_k(t)=\sum _{j=1}^p Z_{kj}\psi _j(t)=\varvec{\psi }(t)^\top \mathbf {Z}_k, \end{aligned}$$
(1)

where \(\varvec{\psi }(t)\) is the vector of basis functions \((\psi _1(t),\ldots ,\psi _p(t))^\top \) and \(\mathbf {Z}_k\) is the vector of corresponding coefficients \((Z_{k1},\ldots ,Z_{kp})^\top \) to be estimated for the kth function. The most popular choices of basis systems to smooth environmental data include B-splines and Fourier series. Here, a cubic B-splines basis is used, which is characterised by its flexibility that does not force a strongly cyclic variation. A large dimension, \(p=15\), is specified for the basis to capture the function characteristics with a second-order derivative roughness penalty. The smoothing parameter controlling the curvature relative to the model fit is set equal 1, based on the choice of the restricted maximum likelihood method that tends to be more robust to data dependence. A sample of the resulting 1095 daily EpCO\(_2\) curves is displayed in Fig. 1.

Fig. 1
figure 1

Sample of daily (de-trended) EpCO\(_2\) smooth curves in the different years

3 Estimation of Smooth Covariance Structure for FTS

Let \(X_k(t),~ k=1,\dots ,N, ~t \in \mathcal {T}\) be the sequence of serially correlated random continuous daily curves of de-trended EpCO\(_2\) drawn from the FTS process \((X_k: k \in \mathbb {Z})\). To study how the directions of variability in this FTS vary over time, we introduce a smoothly time-varying version of the frequency-domain FPCs by Hormann et al. (2015). The proposed technique relies on performing frequency-domain FPCA at each time k by using a time-varying (local) spectral density operator, whose construction is based on a whole family of lag-h smoothed covariance operators of \(X_k\). A covariance operator is a functional analogue of scalar TS covariance. Following the basis expansion in (1), the local lag-h covariance matrix that corresponds to the lag-h covariance operator at day k is given by \(V^X_{k,h}=V^{\mathbf {Z}}_{k,h}\varvec{\Psi }\), where \(V^{\mathbf {Z}}_{k,h}=\mathbb {E}\Big [\mathbf {Z}_{k} \mathbf {Z}_{k+h}^\top \Big ]\) is the lag-h auto-covariance matrix of the coefficients \((\mathbf {Z}_k)\) at time k and \(\varvec{\Psi }:=(\langle \psi _i, \psi _j \rangle = \int _\mathcal {T}\psi _i(t)\psi _j(t)dt: 1 \le i, j \le p)\) is a \(p \times p\) symmetric matrix of the inner product of basis functions evaluated sometimes by numerical integration based on the chosen basis system; see Appendix A in supplementary material. This implies that the main interest lies in estimating the lag-h covariances of the basis expansion coefficients, \(V^{\mathbf {Z}}_{k,h} ~\forall ~ k,h \in \mathbb {Z}\).

To estimate these lag-h covariance matrices locally at each time point (day) k, we suggest using the Nadaraya–Watson covariance estimator. Due to the limited number of replicates at each time point (one functional observation per day), the sequence of lag-h covariances is smoothed over time using a weighting kernel and the sample smooth lag-h auto-covariance matrices of the coefficients \(\mathbf {Z}_k\) at each day k are computed as follows:

$$\begin{aligned} \hat{V}^{\mathbf {Z}}_{k,h}&=\frac{1}{\sum _{k'=1}^{N-h} w_{k,k'}}\sum _{k'=1}^{N-h}w_{k,k'}\mathbf {Z}_{k'} \mathbf {Z}_{k'+h}^{\top },&h\ge 0 \nonumber \\ \hat{V}^{\mathbf {Z}}_{k,h}&=(\hat{V}^{\mathbf {Z}}_{k,-h})^{\top },&h< 0, \end{aligned}$$
(2)

where \(w_{k,k'}\) is the weight assigned to the pair of observations \(k'\) and \(k'+h\), based on a chosen kernel w(.). The weighting kernel is a monotonically decreasing function of the distance \(|k-k'|\) regardless of lag h, ensuring that higher weights are assigned to observations \(k'\) near the target point k to contribute most in the estimate. Since the auto-correlation function between daily curves was found to decay exponentially with h, we have chosen to use here the Gaussian density kernel:

$$\begin{aligned} w(k-k',s)=\text {exp}\Bigg (-\frac{1}{2}\Big (\frac{k-k'}{s}\Big )^2\Bigg ), \end{aligned}$$
(3)

where s is the parameter controlling the kernel’s width and the smoothness of the estimated covariance. For the choice of the smoothing parameter s, a sensitivity analysis was performed and \(s= 20\) seems to offer a good approximation for the original data. The sensitivity analysis indicated that using a smaller s tends to over-fit the curves, while a larger s overlooks the local variability in the curves; see Figure S.1. Note that as the smoothing parameter s tends to infinity, \(\hat{V}^X_{k,h}\) tends towards the overall lag-h covariance of the data \(\hat{V}^X_{h}\).

4 Time-Varying Frequency-Domain FPCA

Based on the local lag-h covariances \((V^X_{k,h}:k,h \in \mathbb {Z})\), we define the matrix corresponding to the time-varying spectral density operator of the FTS \(X_k\) at time k and frequency \(\theta \) by:

$$\begin{aligned} F^{X}_{k,\theta }=\frac{1}{2\pi }\sum _{h \in \mathbb {Z}} V_{k,h}^{X} \mathrm {exp}(-\mathrm {i}h\theta )=\frac{1}{2\pi }\Big \{\sum _{h \in \mathbb {Z}} V_{k,h}^{\mathbf {Z}} \mathrm {exp}(-\mathrm {i}h\theta ) \Big \}\varvec{\Psi }^{\top }=F^{\mathbf {Z}}_{k,\theta }\varvec{\Psi }^{\top }, \end{aligned}$$
(4)

where \(\mathrm {i}\) denotes the imaginary unit. After the estimation of the locally auto-covariances \(\hat{V}^{\mathbf {Z}}_{k,h}\) as described above, \(F^{X}_{k,\theta }\) is estimated empirically at each time point k using the lag window estimator \(\omega \) as follows:

$$\begin{aligned} \hat{F}^{X}_{k,\theta }=\frac{1}{2\pi }\sum _{|h|\le D}\omega \Big (\frac{h}{D}\Big )\hat{V}^{\mathbf {Z}}_{k,h}\text {exp}(-\mathrm {i}h\theta )\varvec{\Psi }^{\top }, ~~~ k=1,\ldots ,N-D;\theta \in [-\pi ,\pi ]. \end{aligned}$$
(5)

Here, we used the Bartlett weight kernel \(\omega (h/D)=1 - |h/D|\) with bandwidth \(D={\lfloor \sqrt{N}\rfloor }=33\), which controls the extent of data observed through the window for spectral density estimation. For more details about the choices of \(\omega \), refer Politis (2011). For the choice of D, a more sophisticated calibration can be used for better results; however, in such hydrological systems correlation between observations is not expected to go beyond a month.

An eigen-analysis is then performed on the local spectral densities \(\hat{F}^{X}_{k,\theta }\) to produce smoothly time-varying frequency-domain FPCs at each time point k. The eigen-decomposition yields the eigenvalues \(\hat{\lambda }_{k,m}(\theta )\) with eigenvectors \(\hat{\varphi }_{k,m}(\theta )\), \(\forall m =1,\dots ,p\), such that \(\hat{\lambda }_{k,m}(\theta )\) is the mth largest eigenvalue of \(\hat{F}_{k,\theta }^{X}\). However, the spectral density matrix \(\hat{F}_{k,\theta }^{X}\) is found not necessarily Hermitian, since the basis system chosen here to approximate the functional data is not orthogonal due to penalisation, which may result in non-real eigenvalues. To ensure obtaining real eigenvalues, we perform the eigen-analysis on \(\varvec{\Psi }^{1/2}\hat{F}^{\mathbf {Z}}_{k,\theta }\varvec{\Psi }^{1/2\top }\) instead of \(\hat{F}^{X}_{k,\theta }\) given by (5), where \(\varvec{\Psi }^{1/2}\) is the Cholesky decomposition of \(\varvec{\Psi }\). This yields the eigenvalues \(\hat{\lambda }_{k,m}(\theta )\) and the eigenvectors \(\hat{\varphi }_{k,m}^{\star }(\theta )=\varvec{\Psi }^{1/2}\hat{\varphi }_{k,m}(\theta )\). The frequency-domain eigenfunctions \(\hat{\varphi }_{k,m}(\theta )\) of \(\hat{F}^{X}_{k,\theta }\) can then be obtained by \(\varvec{\psi }(t)^\top \varvec{\Psi }^{-1/2}\hat{\varphi }^\star _{k,m}(\theta )\), from which the time-varying FPCs \(\phi _{kml}(t)\) are estimated via its inverse Fourier transform as follows:

$$\begin{aligned} \hat{\phi }_{kml}(t)= \frac{\varvec{\psi }(t)^\top }{2\pi }\int _{-\pi }^{\pi } \hat{\varphi }_{k,m}(\theta )\mathrm {exp}(\mathrm {i}l\theta )d\theta = \varvec{\psi }(t)^{\top }\hat{\varvec{b}}_{kml}, \end{aligned}$$
(6)

for a set of lags \(l=-L,\ldots ,L\) at each time \(k=1,\ldots ,N-D\) (\(D > L\)). By definition, \(\sum _{l\in \mathbb {Z}}\Vert \hat{\phi }_{kml}(t)\Vert ^2=1\) at each time point k. To guarantee a fair and good approximation across all time points, the maximum lag L is chosen such that the average \(\frac{1}{N-D}\sum _{l=-L}^L\Vert \hat{\phi }_{kml}(t)\Vert ^2\) \(\ge \) \(1-\epsilon \), for some small \(\epsilon \). Here, L is set equal to 30 for which \(\epsilon < 0.01\). Note that the eigenvectors \(\hat{\varphi }_{k,m}(\theta )\) do not have analytic form, and so, the time-domain functional filters \(\hat{\phi }_{kml}(t)\) are obtained via numerical integration; see Hormann et al. (2015).

Similar to the frequency-domain FPCs by Hormann et al. (2015), the score of the mth FPC at time k is obtained by filtering the original series using the estimated FPCs as follows:

$$\begin{aligned} \hat{Y}_{m,k}= \sum _{l=-L}^{L} \int _{t \in \mathcal {T}}X_{k-l}(t) \hat{\phi }_{kml}(t)dt =\sum _{l=-L}^{L}\mathbf {Z}_{k-l}^{\top }\varvec{\Psi }\hat{\varvec{b}}_{kml}. \end{aligned}$$
(7)

To compute the scores for \(1 \le k \le L\), we set \(X_{-L+1}=\cdots =X_0=\mathbb {E}[X_k]\); which may create bias at the start of the observation period. Using the first q time-varying FPCs, \(q\le p\), and corresponding scores, we can thus approximate the original daily curves \(X_k(t)\) of EpCO\(_2\) by:

$$\begin{aligned} \hat{X}_k^q(t)=\sum _{m=1}^q\sum _{l=-L}^L\hat{Y}_{m,k+l}\hat{\phi }_{kml}(t),\quad k=1,\ldots ,N-D. \end{aligned}$$
(8)

The asymptotic properties of the (Nadaraya–Watson) smooth covariance estimator are studied, and convergence is guaranteed at a suitable rate as the degree of smoothing tends to zero (Petersen et al. 2019). These results can be approximately extended to the local spectral density estimator given the consistency of \(\hat{\mathcal {F}}^X_\theta \); see Hormann et al. (2015). The complexity of the eigen-decomposition of this spectral density as a function of its individual components prevents a simple derivation of expressions for the bias and variance of eigenvalues and eigenfunctions. However, by considering the case where the true spectral density at each time k has a unique eigen-decomposition then the convergence of the estimated local spectral density at least ensures the convergence of the estimated local eigenvalues and eigenfunctions to their true values, given the right amount of smoothness. Based on this statement and the asymptotic properties of dynamic FPCs discussed in Hormann et al. (2015), when smoothed appropriately, \(\mathrm {E}\Vert X_k\Vert ^2\approx \sum _{m=1}^\infty \int _{-\pi }^{\pi }\lambda _{k,m}(\theta )d\theta \) for all k belonging to the set \(1,\dots ,N-D\). Thus, the proportion of variance explained by the first q time-varying FPCs \(\sum _k\sum _{m\le q}\int _{-\pi }^{\pi }\lambda _{k,m}(\theta )d\theta /\sum _k\mathrm {E}\Vert X_k\Vert ^2\) can be estimated as 1 minus the normalised mean square error (NMSE) given by:

$$\begin{aligned} \text {NMSE}(q)=\sum _{k=1}^{N-D}\Vert X_k(t)-\hat{X}_k^q(t)\Vert ^2\Big /\sum _{k=1}^{N-D}\Vert X_k(t)\Vert ^2. \end{aligned}$$
(9)

This NMSE is also considered a measure for the information loss resulting from reducing the data dimensionality to dimension q.

Figure 2 shows that, using 1 component, the time-varying version promotes better approximation in retrieving the original mean pattern of the EpCO\(_2\) data as well as the within-day variability. That is, this one time-varying FPC takes over the roles of the subsequent ones and account for 94% of the variability at \(s=20\) versus 74% for the first dynamic FPC. More components can be used in the approximation but one is enough here for demonstration purposes. These results imply that the time-varying dynamic FPCA simplifies the complexity in data variability using a smaller number of components, compared to dynamic FPCA that does not account for the varying time dependence structure. Comparing the scores of the first dynamic and time-varying FPCs, we found that they are similar over time except in periods where the time-varying FPC seems to account for more variability in the series; see Figure S.2. Those periods characterised by higher EpCO\(_2\) volatility are mostly observed between April and October, where the diel pattern is more pronounced due to higher temperatures, longer daylight lengths and increased autotrophic processes in summer relative to winter. Those differences in EpCO\(_2\) diel patterns are not driven solely by seasonality but also regulated by the system hydrology (Reiman and Xu 2019).

Fig. 2
figure 2

a 10 successive daily curves of de-trended EpCO\(_2\) and the corresponding reconstructions based on b the first dynamic FPC and c the first time-varying FPC with \(s =20\)

5 Evaluation of the Temporal Variations in EpCO\(_2\)

It appears from the analysis above that the dependence structure in the FTS of EpCO\(_2\) changes throughout time. For a better understanding of the drivers of those EpCO\(_2\) diel variations, a K-means clustering of the diel patterns based on the coefficients of the three central functional filters of the first time-varying FPC is performed and results are discussed in Sect. 5.1. Next, a method of inference along with a graphical output based on the frequency-domain local eigenvalues of particular FPCs is presented in Sect. 5.2 to empirically detect changes in the auto-covariance structure of the EpCO\(_2\) FTS.

5.1 K-Means Clustering

The results of K-means clustering of the central functional filters’ coefficients of the first frequency-domain FPC highlight the presence of three main clusters governing the variability in the EpCO\(_2\) diel patterns and the differences in the clustering structure between the 3 years; see Fig. 3. To study the differences in the covariance structure between clusters, the average impact of consecutive scores (\(\hat{Y}_{1,k-1},\hat{Y}_{1,k}, \hat{Y}_{1,k+1}\)) on the kth daily curve of EpCO\(_2\) in each of the clusters is investigated in Fig. 4. The purple cluster appears to underlie the less pronounced diel patterns that mostly prevail in autumn and winter where a change in the average EpCO\(_2\) at day \(k-1\) is likely to have a delayed impact on the night/dark levels of day k. The blue cluster governs periods with persisting sharp diel patterns across adjacent days. These diurnal patterns are more common in spring and summer where a change in EpCO\(_2\) at day \(k-1\) results in a rapid change over the daylight hours of the following day. The turquoise cluster includes the periods with more volatile and less correlated diel patterns, where the diel pattern is more likely to be driven by the mean level of EpCO\(_2\) on the day.

Fig. 3
figure 3

Class membership of EpCO\(_2\) daily curves obtained using K-means clustering based on the corresponding coefficients of the three central functional filters of the first FPC. The solid curve is the corresponding daily averaged EpCO\(_2\)

Fig. 4
figure 4

The functional mean \(\mu (t)\) of the de-trended EpCO\(_2\) daily curves (solid line) and \(\mu (t)+\sum _{l=-1}^1\delta _l\bar{\hat{\phi }}_{1l}(t)\) with \((\delta _{-1},\delta _0,\delta _1)=\) \((-1,1,1)\)(top) and \((1,-1,-1)\) (bottom) for the three clusters

We can thus conclude that the EpCO\(_2\) diel pattern is very likely driven by the autotrophic processes that increase and become more persistent in summer due to higher temperatures, longer daylight lengths and increased solar radiation. However, in a small-order river, the hydrological activity and discharge events represent another important factors that affect this diurnal activity in different ways, which coincides with the conclusion of Wang et al. (2017). This is that, discharge events on one hand can transport soil CO\(_2\) to the river which may contribute to rising biological activity and hence the riverine CO\(_2\), while on the other hand, EpCO\(_2\) may decline due to the dilution effect of constant or large discharge events (Reiman and Xu 2019). The above results indicate that the system is coherent over time except some periods (in Turquoise), which can be referred to as transitional periods. This degree of coherence between the diel patterns of EpCO\(_2\) also varies over time. For instance, the system is less variable and changes are slower in the winter, possibly due to the low levels of nutrients and temperature, whereas the biological activity accompanied with soil run-off is likely to be accelerated in summer as a result of the relatively stronger correlation between days in summer. These results vary to some extent between the years based on the underlying mixing conditions of water temperature and hydrological regimes. To verify changes over time in the system correlation structure, we propose the following inference procedure.

5.2 Inference Method

To detect changes in the auto-covariance structure of a FTS, we propose utilising a similar approach to that suggested for multivariate time series by Miller and Bowman (2012), but rather based on the spectral density eigenvalues corresponding to the leading time-varying FPCs. The general null hypothesis of interest here, \(H_0\): \((X_k: k \in \mathbb {Z})\), is a weakly stationary FTS, versus the alternative hypothesis, \(H_A\): \((X_k: k \in \mathbb {Z})\), is a locally stationary FTS. By removing the trend from the data as described in Sect. 2, the null hypothesis becomes equivalent to testing that the FTS has a covariance structure and hence a spectral density that do not vary throughout time (i.e. \(V_{k,h}^X=V_h^X~\forall k,h\) and \(F_{k,\theta }^X=F_\theta ^X~ \forall k,\theta \)). This hypothesis can be investigated empirically by evaluating whether or not the observed changes over time in dynamic FPCs are consistent with sampling variation. We, therefore, formulated the test statistic for evaluating whether the whole spectra of a particular frequency-domain FPC m changes over time (i.e. \(H_0\): \(\Lambda _{m,k}=\Lambda _m~\forall ~k\) Vs  \(H_A\): \(\Lambda _{m,k}\ne \Lambda _m~\text {for at least one } k\)) as follows:

$$\begin{aligned} T_m=\sum _{k=1}^{N-D} |\hat{\Lambda }_{mk}-\hat{\Lambda }_{m}|, \theta \in {[}-\pi ,\pi ], \end{aligned}$$
(10)

where \(\hat{\Lambda }_{mk}=\int _{-\pi }^{\pi }\hat{\lambda }_{mk}(\theta )d\theta \) and \(\hat{\lambda }_{m}(\theta )=\int _{-\pi }^{\pi }\hat{\lambda }_{m}(\theta )d\theta \). Note that \(\hat{\lambda }_{m}(\theta )\) is the mth eigenvalue of the stationary spectral density \(\hat{F}_{\theta }^{X}\) of the fully observed time series at frequency \(\theta \).

Kokoszka and Jouzdani (2020) have derived asymptotic results for the frequency-domain eigenvalues of a stationary FTS, such that \(\sqrt{N/D} |\hat{\Lambda }_m-\Lambda _m|\xrightarrow {d} N(0,\sigma _m^2)\). However, it is difficult to modify these results to suit the local nature of the covariance structure and spectral density presented here and therefore a resampling-based inference approach is used. The reference distribution is constructed in a way that illustrates the expected behaviour of a particular eigenvalue over time if the null hypothesis, that the auto-covariance structure does not vary over time, is true. For FTS like EpCO\(_2\), subsequent daily curves are auto-correlated, and hence, the reference distribution is constructed using parametric bootstrapping. This is done by simulating FTS from a stationary model fitted to the observed data that incorporates the correlation via a known covariance structure that does not change over time.

Exploratory analysis of the EpCO\(_2\) daily curves indicated that a functional AR(1) provides a reasonable choice for modelling the correlation in these data. Following from this, the distribution of the test statistic under the null hypothesis, that the underlying process follows a stationary functional AR(1), is constructed by simulating 200 FTSs from the functional AR(1) fitted to the EpCO\(_2\) data. Practically, this simulation is performed in a finite dimension p using the basis functions \(\{\psi _j\}_{j=1}^p\) such that the basis coefficients, \(\mathbf {Z}_k=(\langle X_k,\psi _1\rangle ,\ldots ,\langle X_k, \psi _p\rangle )\), are generated from a VAR(1) of the form \( \mathbf {Z}_{k}=\mathfrak {R}\mathbf {Z}_{k-1}+\tilde{\varvec{\varepsilon }}_{k}\), fitted to the basis coefficients of the de-trended EpCO\(_2\) daily functions, where \(\mathfrak {R}\) is the matrix of auto-regressive parameters estimated using least squares and \(\tilde{\varvec{\varepsilon }}_k \sim \text {MVN}(\mathbf {0},\varvec{\Sigma })\) are independent and identically distributed noise with a covariance structure that does not change over time. For each simulated FTS, having the same number of curves as the original data, the smooth covariance structure and spectral density are estimated at each time point using a Gaussian weighting kernel with \(s=20\). The sequences of local eigenvalues associated with the leading time-varying FPCs can then be used to evaluate whether the power spectra of those PCs, and hence the variability structure in the data, vary over time. For illustration, only the first component responsible for maximum variability is considered here. The first largest eigenvalues of local spectral densities are estimated and summed across all \(\theta \in (-\pi ,\pi )\) at each time point. Figure 5 displays the sequence of the first general eigenvalue \(\hat{\Lambda }_{1k}\) along with a corresponding point-wise 95% variability band, which indicates that the power spectra of the first frequency-domain PC vary over time with higher values observed in summer and considerably low values during winter. To protect ourselves more effectively against falsely rejecting the null hypothesis of stationarity as a result of multiple testing, we compute the overall test statistic, given by (10), based on the eigenvalue of the 1st component. This test produces a p-value less than 0.001 highlighting a significant change in the variability structure throughout time in the sense that most drastic intra-day fluctuations are observed in summer. Although the above inference procedure is presented in terms of the EpCO\(_2\) application, it can be widely applied to other applications. But note that the construction of the reference distribution will change based on the choice of the appropriate model for the correlation structure in the FTS.

Fig. 5
figure 5

Plot of the first time-varying dynamic FPC eigenvalues obtained at each time point averaged over all frequencies \(\theta \in (-\pi ,\pi )\) along with the corresponding 95% variability band. The blue horizontal line represents the corresponding eigenvalue of the first stationary dynamic FPC

6 Simulation Studies

To evaluate the performance of our proposed time-varying dynamic FPCs, we have adopted two simulation settings with increasing complexity. The first setting involves densely sampled FTS for a variety of non-stationary data-generating conditions with different intensity of time dependence within the series. The second setting involves sparsely sampled FTS with different levels of sparsity. The two settings show that the proposed FPCs do not require the original FTS to be stationary or complete without missing data. For each simulated FTS, the first q dynamic FPCs and their time-varying analogue are estimated and used to recover the simulated FTS. The performance of these approximations is compared using the NMSE in (9). The smaller the NMSE, the superior the approximation.

Simulation of non-stationary (complete) FTS We start by generating a FTS \(\{X_k\}\) from a stationary functional AR(1) with a correlation structure that mimic that of the EpCO\(_2\) data application as described in Sect. 5.2. Next, we estimate the traditional time-domain eigenvalues \(\varvec{\lambda }=(\lambda _1,\ldots ,\lambda _p)\) and eigenfunctions \(E_1(t),\ldots ,E_p(t)\) of this simulated FTS, ignoring the time dependence in the series. By extending the findings of Mardia et al. (1979) to a functional context, we can thus construct a functional process \(\mathcal {X}_k\) by:

$$\begin{aligned} \mathcal {X}_k(t) \approx \bar{\mathcal {X}}(t) + \sum _{m=1}^p S_{mk}E_m(t), \quad t \in \mathcal {T}, \end{aligned}$$
(11)

where \(\bar{\mathcal {X}}(t)\) is the functional mean of \(\mathcal {X}_1(t),\ldots ,\mathcal {X}_N(t)\), \(S_{mk}\) is the score of the mth PC for the kth observation and \(E_m(t)\) is an mth eigenfunction for the data. Based on this result, we can simulate locally stationary FTS with a power spectrum that varies smoothly over a grid of, say \(B=20\), time blocks by multiplying \(\varvec{\lambda }\) by b/10 and getting \(\varvec{\lambda }_b\) for \(b=1,\dots ,B\). At each block b, the scores for each PC are generated from a VAR(1) process with an auto-regression matrix \(\mathfrak {R}^\star \) and MVN(\(\mathbf {0},\varvec{\Sigma }_b\)) noise, such that \(\varvec{\Sigma }_b = \text {diag}(\varvec{\lambda }_b)\). The matrix \(\mathfrak {R}^\star \) is set equal to \(\kappa G/\Vert G\Vert \), where \(\kappa \) denotes a pre-specified dependence coefficient and G is a \(p \times p\) matrix with elements \((G_{m,m'}:m\ge 1,m' \le p)\) that are mutually independent \(N(0, \text {exp}\{-(m+m')\})\). By multiplying these simulated scores by the eigenfunctions \(E_1(t),\ldots ,E_p(t)\), we obtain a block stationary process \(\mathcal {X}_k\) of 400 observations.

The above simulation is repeated 200 times for \(\kappa = 0.1,0.3,0.6,0.9\), reflecting weak to strong dependence structure in the data. For each choice of \(\kappa \), the simulated FTSs are approximated by the first \(q=1,2,3\) stationary and time-varying dynamic FPCs at different values of the weighting kernel smoothing parameter s. We considered the values \(s=20,40,100\) for low to high levels of smoothness of the estimated covariance structure. Figure 6 and Figure S.3 and Table S.1 in supplementary material show the mean and standard deviation of NMSE computed for each scenario. Results indicate that in almost all scenarios the time-varying FPCs outperform the stationary ones with lower mean NMSE between the simulated series and their reconstructed versions. The NMSE of both versions of FPCs decreases and approximation improves as the dependence in the series increases. Those differences between the two reconstructions become negligible as the number of PCs and serial dependence increase. It is also noticed that although the variance of NMSE increases with serial dependence, it is systematically lower for the time-varying version.

Fig. 6
figure 6

Box plots of NMSE between the simulated curves and their recovered versions using 1, 2 and 3 (from left to right) dynamic (red) and time-varying (smooth) dynamic FPCs with a smoothing parameter \(s=\)100 (olive), 40 (blue) and 20 (purple) based on the results from 200 non-stationary simulation runs with \(\kappa =0.1\) (top) & \(\kappa =0.9\) (bottom). Results for \(\kappa =0.3\) & \(\kappa =0.6\) are in supplementary material

We have also used the above simulation to empirically compare the time-varying eigenvalues’ to the true underlying frequency-domain eigenvalues of the generating process, compared to the time-invariant ones. This is done by calculating the mean squared errors, \(\frac{1}{N-D}\sum _{k=1}^{N-D}\Vert \Lambda _{km}-\hat{\Lambda }_{km}\Vert ^2\), between the mth true eigenvalue and its corresponding estimate from the eigen-analysis of the empirically estimated spectral densities summed over all \(\theta \in [-\pi ,\pi ]\) and aggregated across the whole grid of time points k. By expressing each eigenfunction \(E_m(t)\) in terms of the same basis functions used to simulate the original FTS as \(\varvec{\psi }(t)^{\top }\mathbf {c}_m\), the true frequency-domain eigenvalues of \(\mathcal {X}_k\) for each block b can be obtained via the eigen-decomposition of the spectral density matrix:

$$\begin{aligned} F_b^{\mathcal {X}}(\theta )=\frac{1}{2\pi }\varvec{\Psi }^{1/2}\mathbf {C}(\mathbf {I}_{15}-\mathfrak {R}^\star e^{-\text {i}\theta })^{-1}\varvec{\Sigma }_b(\mathbf {I}_{15}-\mathfrak {R}^\star e^{-\text {i}\theta })^{-1}\mathbf {C}^{\top }\varvec{\Psi }^{1/2\top }, \end{aligned}$$
(12)

where \(\mathbf {C}\) is a \(p \times p\) matrix whose mth column is the vector \(\mathbf {c}_m\) and \(\mathbf {I}_{15}\) is an identity matrix of order 15. Results in Figure S.4 of supplementary material indicate that, for non-stationary FTS, the difference between the estimated time-varying frequency-domain eigenvalues and the corresponding true values is smaller than its counterpart for the stationary eigenvalues. Results also show that a value of \(s=40\) provides a better accuracy for the estimates, which highlights the importance of the smoothing parameter choice to the estimation accuracy.

Simulation of non-stationary (sparse) FTS Here, we generate a sequence of \(N=400\) functional observations from \( X_{k+1}(t)=\Upsilon _k(X_{k}(t))+\varepsilon _{k+1}(t)\), where \(\Upsilon _k(.)\) is an auto-regressive operator with a Hilbert Schmidt norm \(\Vert \Upsilon _k(.)\Vert _{\mathcal {S}}\) that varies smoothly over time between 0.1 and 0.9. To obtain the realisations of the curves at a finite set of time points and produce sparsity in the data, this simulation is performed in a discrete setting by assuming that the operator \(\Upsilon _k\) has a Gaussian kernel \( \Upsilon _k(t,s) = \Vert \Upsilon _k\Vert _{\mathcal {S}} \mathrm {exp}\Big (-\frac{t^2+s^2}{2}\Big )\). To ensure the smoothness of those densely simulated functions, the innovations \(\varepsilon _{k}\) are generated from a smooth Gaussian process with a Matèrn covariance function. This simulation is repeated 200 times. Then, for each FTS, a random collection of curves constituting \(\nu =\) 10% and 35% of the total number of curves in the series are made incomplete. For each randomly chosen incomplete curve, a block of \(\tau =\) 35%, 60% and 100% of the raw data, representing the realisations of the curve at a set of finite time points, are made missing. Thus, for each FTS, different levels of sparsity using a mixture of \(\nu \) and \(\tau \) are applied. The estimated stationary and time-varying dynamic FPCs are then used to reconstruct the original series, and the NMSE is calculated. As frequency-domain FPCA requires a complete FTS, a naive imputation of the missing values is used as initial values. Results in Figures S.5–S.7 indicate that, in all missing data patterns, the missing data were better recovered using the time-varying FPCs with \(s=20\) relative to the stationary FPCs.

7 Discussion and Conclusions

FDA has proven to be an appropriate tool for extracting the most important characteristics of high-dimensional data and reducing the data dimensionality. A common situation in environmental applications is when functional data are observed sequentially in time and typically exhibit auto-correlation and/or non-stationarity over time. Traditional FPCA ignores information carried by nearby observations, which may lead to inappropriate dimension reduction. In this paper, we propose a time-varying version of frequency-domain FPCA that accounts for both time dependence and covariance non-stationarity in FTS. The motivating application is the analysis of daily curves of excess partial pressure of CO\(_2\) derived from a 15-min sensor records over 3 years in a small catchment of the river Dee in Scotland.

The proposed time-varying FPCA relies on the estimation of a local spectral density estimated based on a smooth covariance structure. The method presented here uses kernel-based smoothing methods to estimate the lag-h covariances at each time point. Although a Gaussian weighting kernel has been used here, alternative weight functions can be used to adapt the method of smoothing to the nature of the studied time series. Such smoothness of the covariance structure ensures identifying time-varying principal modes of variability for non-stationary FTS that are not cross-correlated at short lags. The time-varying FPCs have shown their ability to approximate the original curves of the highly dynamic system of EpCO\(_2\) using a smaller number of leading PCs, by taking into account the time-varying covariance structure. According to the simulation study, the time-varying FPCs outperform their stationary counterparts in constructing the original curves with a better accuracy, with lower NMSE, in the absence and presence of missing data in non-stationary FTS. The simulation study indicates that differences become more substantial as temporal dependence between curves decreases and system becomes more dynamic. We have also noticed that, given a reasonable degree of smoothness, the estimated eigenvalues of the local spectral densities provide good estimates for the corresponding true ones. These results are mainly empirical, and future work should involve further investigation of the theoretical properties of those estimators which are discussed in Sect. 4 through simulation but not rigorously proven.

We have also proposed a bootstrap inference procedure to detect changes in the covariance structure of a FTS based on the time-varying frequency-domain FPCs. Although this procedure has been illustrated in details using the EpCO2 data, it can be applied to another FTS given a suitable null stationary model to simulate the reference distribution. The proposed test has detected significant changes in the covariance and variability structure of the series of EpCO\(_2\) daily curves over time. The test identified time periods where there are shifts in the system dynamics, thus demonstrating the need of non-stationary functional PCA to explain the changing variability structure in such complex systems. The time-varying FPCs highlighted that changes in the EpCO\(_2\) diel patterns are not driven solely by seasonality but also by changes in the system hydrology depending on the strength and extent of correlation in the system across the days. Evaluating these temporal changes in the diel patterns of EpCO\(_2\) is important for a better understanding of the carbon cycle and for providing reliable estimates of the carbon budgets in freshwater systems.

Possible extensions include multivariate time-varying dynamic functional principal component analysis, following the new idea by Happ and Greven (2018). Such an approach would focus on multivariate functional data, where correlation does not only exist between the curves of one FTS but also across different FTS, not necessarily defined on the same domain or the same number of dimensions. This, in turn, could help to better understand the dynamics in EpCO\(_2\) in relation to variability in other series like the flow velocity in the river (Long et al. 2015). Another challenge that is currently being under consideration is to reduce the computational cost of applying the time-varying FPCs to a longer FTS.

Environmental data often exhibit dependence and non-stationarity over both time and space. A paper by King et al. (2018) proposed a new approach to study the variability of air pollution over time and across space using FDA techniques. In this method, the yearly profiles at a certain location represented a FTS and modelled as a function of: (1) an annual mean level, (2) a linear combination of smooth annual trends with site-specific coefficients that vary over time and (3) an annual specific residual effect. The second component is based on the eigenfunctions of the pooled covariance, obtained by ignoring the dependence across space and years, with coefficients that vary over time and space according to a space–time covariance model. This raises interest in developing FPCs that appropriately reduce the data dimensionality while accounting for dependence and non-stationarity in time and space.

8 Supplementary Material

The supplementary material available online contains an Appendix for some technical definitions and details of FTS and frequency-domain methods, in addition to more results and figures of the simulation studies and application to EpCO\(_2\) data. The R code necessary for the computation of the time-varying frequency-domain FPCs is also available. The folder freqdom by Hormann and Kidzinski (2017) is a dependency package for this work. New and modified functions are available in the folder smfreqdom.