1 Introduction

Coronavirus disease 2019 (COVID-19) rapidly spreads around many other countries, since December 2019 when arises in China (see Sivakumar 2020; Wang et al. 2020; Zhou et al. 2020). The effective allocation of medical resources requires the derivation of predictive techniques, describing the spatiotemporal dynamics of COVID-19 (see, e.g., Du et al. 2020; Khan and Atangana 2020; Nishiura et al. 2020; Remuzzi and Remuzzi 2020, just to mention a few). Epidemiological models can contribute to the analysis of the causes, dynamics, and spread of this pandemic (see, e.g, Huppert and Katriel 2013; Keeling and Rohani 2008; Laaroussi et al. 2018, and the references therein). Short-term forecasts can be obtained adopting the framework of compartmental SIR (susceptible-infectious-recovered) models, based on ordinary differential equations (see, e.g. Angulo et al. 2013; Elhia et al. 2014; Ji et al. 2012; Kermack and McKendrick 1927; Kuznetsov and Piccardi 1994; Milner and Zhao 2008; Pathak et al. 2010; Tornatore et al. 2005; Yu et al. 2009; Zhang et al. 2008). An extensive literature is available, including different versions of compartmental models, like SIR-susceptible (SIRS, Dushoff et al. 2004), and delay differential equations (see Beretta et al. 2001; McCluskey 2010; Sekiguchi and Ishiwata 2010). Spatial extensions, based on reaction-diffusion models, reflecting the infectious disease spread over a spatial region can be found, for instance, in Guin and Mandal (2014) and Webb (1981). SEIRD (susceptible, exposed, infected, recovered, deceased) models, incorporating the spatial spread of the disease with inhomogeneous diffusion terms are also analyzed (see Roques and Bonnefon 2016 and Roques et al. 2011). The stochastic version of SIR-type models intends to cover several limitations detected regarding uncertainly in the observations, and the hidden dynamical epidemic process. Markov chain SIR based modelling (see Anderson and Britton 2000; Xu et al. 2007), and some recent stochastic formulations involving complex networks (see Volz 2008; Zhou et al. 2006) or drug-resistant influenza (see Chao et al. 2012) constitute some alternatives. A Bayesian hierarchical statistical SIRS model framework is adopted in Aalen et al. (2008); Abboud et al. (2019); Anderson and Britton (2000); Fleming and Harrington (1991) taking into account the observation error in the counts, and uncertainty in the parameter space. Beyond SIR modeling, the multivariate survival analysis approach offers a suitable modelling framework, regarding infection, incubation and recovering random periods, affecting the containment of COVID-19 (see, e.g., Bolker and Grenfell 1996; Keeling et al. 1997; Pak et al. 2020; Wasiur et al. 2019).

In a first stage, most of the above referred models have been adapted and applied to approximate the space/time evolution of COVID-19 incidence and mortality. That is the case, for instance, of the three models presented in Roosa et al. (2020), which were validated with outbreaks of other diseases different from COVID-19. Alternative SEIR type models, involving stochastic components, are formulated in Kucharski et al. (2020). A revised SEIR model has also been proposed in Zhang et al. (2020) (see also He et al. 2020). A \(\theta \)-SEIHRD model, able to estimate the number of cases, deaths, and needs of beds in hospitals, is introduced in Ivorra et al. (2020), adapted to COVID-19, based on the Be-CoDiS model (see Ivorra et al. 2015). Due to the low quality of the records available, and the hidden sample information, the most remarkable feature in this research area is the balance between complexity and indentifiability of model parameters. Recently, an attempt to simplify modelling strategies, applied to COVID-19 data analysis, is presented in Ramosa et al. (2020), in terms of \(\theta \)-SEIHQRD model. Mitigation of undersampling is proposed in Langousis and Carsteanu (2020), based on re-scaling of summary statistics characterizing sample properties of the pandemic process, useful between countries with similar levels of health care.

Nowadays ML models have established themselves as serious contenders to classical statistical models in the area of forecasting. Research started in the eighties with the development of the neural network model. Subsequently, research extended this concept to alternative models, such as support vector machines, decision trees, and others (see, e.g., Alpaydin 2004; Blanquero et al. 2020; Hastie et al. 2001; Mohammady et al. 2021). In general, curve regression techniques based on a function basis, usually in the space of square integrable functions with respect to a suitable probability measure, allow short- and long- term forecast. Thus, depending on our choice of the function basis, and the probability measure selected, particle and field views could be combined. Note that the classical stochastic diffusion models offer a particle rather than a field view (see, e.g., Malesios et al. 2016).

Linear regression, multilayer perceptron and vector autoregression methods have been applied in Sujath et al. (2020a, 2020b) to predicting COVID-19 spread, anticipating the potential patterns of COVID-19 effects (see also Section 2 of Sujath et al. (2020a), on related work). Early stage location of COVID-19 is addressed in Barstugan et al. (2020), applying machine learning strategies actualized on stomach Computed Tomography pictures. Chien and Chen (2020) evaluates association between meteorological factors and COVID-19 spread. They concluded that average temperature, minimum relative humidity, and precipitation were better predictors, displaying possible non-linear correlations with COVID-19 variables. These conclusions are crucial in the subsequent machine learning regression based analysis.

This paper presents a multiple objective space-time forecasting approach, where curve trigonometric log-regression is combined with multivariate time series spatial residual analysis. In our curve regression model fitting, we are interested on reflecting the cyclical behavior of COVID-19 mortality induced by the hardening or relaxation of the containment measures, adopted to mitigate the increase of infections and mortality. The trigonometric basis (sines and cosines) is then selected in our spatial heterogeneous curve log-regression model fitting. The ratio of the expected minimized empirical risk, and the corresponding expected value of the quadratic loss function at such a minimizer is considered for model selection (see, e.g., Chapelle et al. 2002). Note that this selection procedure provides an agreement between the expected minimum empirical risk, and the corresponding expected theoretical loss function value.

The penalized factor proposed in Chapelle et al. (2002), applied to our choice of the truncation parameter, leads to the dimension of the subspace where our curve regression estimator is approximated at any spatial location. This model selection procedure is asymptotically equivalent to Akaike correction factor. A robust modification of the Akaike information criterion can be found, for example, in Agostinelli (2001). As an alternative, one can consider cross-validation criterion for selecting the best subset of explanatory variables (see Takano and Miyashiro 2020, where a mixed-integer optimization approach is proposed in this context).

Beyond asymptotic analysis, model selection from finite sample sizes constitutes a challenging topic in our approach. To address this problem, a bootstrap estimator of the ratio between the expected quadratic loss function and the expected training quadratic error, from different sets of explanatory variables, is implemented. Bootstrap confidence intervals are also provided for the spatial mean of the curve regression predictor, and for the expected training error of the curve regression, and of the multivariate time-series residual predictor. The bootstrap approximation of the probability distribution of these statistics is also computed.

In our multivariate time series analysis of the regression residuals, a classical and Bayesian componentwise estimation of the spatial linear correlation is achieved. The presented multiple objective forecasting approach is applied to the spatiotemporal analysis of COVID-19 mortality in the first wave affecting the Spanish Communities, since March, 8, 2020 until May, 13, 2020. Our results show a remarkable qualitative agreement with the reported epidemiological data.

The spatiotemporal approach presented in this paper makes the fusion of generalized random field theory, and our multiple-objective space-time forecasting, based on nonlinear parametric regression, and bayesian analysis of the spatiotemporal correlation structure. Regarding the site-specific or specificatory knowledge bases (see Christakos et al. 2002), in our approach, several information sources can be incorporated in the description of the hidden epidemic process. Particularly, we distinguish here between hard-data or hard measurements providing a satisfactory level of accuracy for practical purposes, and soft-data displaying a non-negligible amount of uncertainty. That is, in this second data category, we include missing observations or imperfect observations, categorical data and fuzzy inputs (see also Christakos 2000, 2002; Christakos and Hristopulos 1998, and the references therein). In this paper, we consider hard-data sets given by numerical values of our count process at the Spanish Communities analyzed. Our soft-data sample complements hard measures, in terms of interpolated, smoothed, and spatial projected data. Particularly, spatial correlations between regions are incorporated in terms of soft-data. Additional information about the continuous functional nature of the underlying space-time COVID-19 mortality process is also reflected in our soft-data set. This information helps the implementation of the proposed estimation methodology in the framework of Functional Data Analysis (FDA) techniques.

As commented before, last advances in spatiotemporal mapping of epidemiological data incorporate ML regression models to improve and help the understanding of general or core knowledge bases. Thus, model fitting is achieved according to epidemiological systems laws, population dynamics, and theoretical space-time dependence models (see Christakos 2008, and the references therein). See also Barstugan et al. 2020; Chien and Chen 2020 and Sujath et al. (2020a) in the hard-data context. It is well-known that the limited availability of hard-data affects space-time analysis. Hence, the incorporation of soft-data into ML regression models can help this analysis, providing a global view of the available sample information (see, e.g., Christakos et al. 2002). Particularly, in our empirical comparative analysis, involving ML regression models and our approach, input hard- and soft-data information is incorporated. Cross-validation, bootstrapping confidence intervals and probability density estimation support our comparative study. Specifically, random k-fold (\(k=5,10\)) cross-validation first evaluates the performance of the compared regression models from hard- and soft-data, in terms of Symmetric Mean Absolute Percentage Errors (SMAPEs). Bootstrap confidence intervals and probability density estimation of the spatially averaged SMAPEs approximate the distributional characteristics of the random k-fold cross-validation errors. Thus, a complete picture of SMAPEs supports our evaluation of the predictive ability of the regression models tested, from the analyzed hard- and soft-data sets.

From the empirical comparative analysis carried out, we can conclude that almost the best performance in both, hard- and soft-data categories, is displayed by Radial Basis Function Neural Network (RBF), and Gaussian Processes (GP). Both approaches are improved, when soft-data are incorporated into the regression analysis. Slightly differences are observed in the performance of Support Vector Regression (SVR) and Bayesian Neural Networks (BNN). Multilayer Perceptron (MLP) gets over GRNN, presenting better estimation results when hard-data are analyzed. The sample values and distributional characteristics of cross-validation SMAPEs, in Generalized Regression Neural Network (GRNN), are similar to the ones obtained in trigonometric curve regression, when spatial residual analysis is achieved in terms of empirical second-order moments. Note that, GRNN is also favored by the soft-data category. In this category, BNN and our approach show very similar performance, when trigonometric regression is combined with Bayesian multivariate time series residual prediction. Indeed, some slightly better bootstrapping distributional characteristics of our approach respect to BNN are observed in the soft-data category.

The outline of the paper is the following. The modeling approach is introduced in Sect. 2. Section 3 describes the multiple objective forecasting methodology. This methodology is applied to the spatiotemporal statistical analysis of COVID-19 mortality in Spain in Sect. 4. The empirical comparative study with ML regression models is given in Sect. 5. Conclusions about our data-driven model ranking can be found in Sect. 6. In the Supplementary Material, a brief introduction to our implementation of ML models from hard- and soft-data is provided. Additional numerical estimation results, based on the complete sample, are also displayed. Particularly, the observed and predicted mortality cumulative cases, and log-risk curves are displayed.

2 Data model

Let \((\varOmega ,{\mathcal {A}},{\mathcal {P}})\) be the basic probability space. Consider \(H=L^{2}({\mathbb {R}}^{d}),\) \(d\ge 2,\) the space of square-integrable functions on \({\mathbb {R}}^{d},\) to be the underlying real separable Hilbert space. In the following, we denote by \({\mathcal {B}}^{d}\) the Borel \(\sigma \)-algebra in \({\mathbb {R}}^{d},\) \(d\ge 1.\)

Let \(X=\{ X_{t}({\mathbf {z}}),\ {\mathbf {z}}\in {\mathbb {R}}^{d}, \ t\in {\mathbb {R}}_{+}\}\) be our spatiotemporal input hard-data process on \((\varOmega ,{\mathcal {A}},{\mathcal {P}}),\) satisfying \(E\left[ \Vert X_{t}(\cdot )\Vert _{H}^{2}\right] <\infty ,\) for any time \(t\in {\mathbb {R}}_{+}.\) The input soft-data process over any spatial bounded set \(D\in {\mathcal {B}}^{d}\) is then defined as

$$\begin{aligned} \left\{ X_{t}(h)= \int _{D}X_{t}({\mathbf {z}})h({\mathbf {z}})d{\mathbf {z}},\ h\in {\mathcal {C}}_{0}^{\infty }(D),\ t\in {\mathbb {R}}_{+}\right\} , \end{aligned}$$
(1)

where \({\mathcal {C}}_{0}^{\infty }(D)\) denotes the space of infinite differentiable functions, with compact support contained in D. For each bounded set \(D\in {\mathcal {B}}^{d},\) define

$$\begin{aligned} \varLambda =\{\varLambda _{t}(h)=\exp \left( X_{t}(h)\right) ,\ h\in {\mathcal {C}}_{0}^{\infty }(D),\ t\in {\mathbb {R}}_{+}\}. \end{aligned}$$

Assume that, for any finite positive interval \({\mathcal {T}}\in {\mathcal {B}},\) and bounded set \(D\in {\mathcal {B}}^{d},\)

$$\begin{aligned} {\mathcal {I}}_{{\mathcal {T}}}(h)&\underset{{\mathcal {L}}^{2}(\varOmega ,{\mathcal {A}},P)}{=}&\int _{{\mathcal {T}}}\exp \left( X_{t}(h)\right) dt<\infty ,\quad \forall h\in {\mathcal {C}}_{0}^{\infty }(D), \end{aligned}$$
(2)

where \(\underset{{\mathcal {L}}^{2}(\varOmega ,{\mathcal {A}},P)}{=}\) denotes the identity in the second-order moment sense. Let \(\{N_{h}:(\varOmega ,{\mathcal {A}},{\mathcal {P}})\times {\mathcal {B}}\longrightarrow {\mathbb {N}}, \ h\in H\}\) be a family of random counting measures. Given the observation \(\left\{ x_{t}(h),\ t\in {\mathcal {T}}\right\} \) at the finite temporal interval \({\mathcal {T}}\in {\mathcal {B}}\) of the input soft-data process over the spatial h-window in D,  the conditional probability distribution of the number of random events \(N_{h}({\mathcal {T}})\) that occur in \({\mathcal {T}} \in {\mathcal {B}}\) is a Poisson probability distribution with mean \(\int _{{\mathcal {T}}}\exp \left( x_{t}(h)\right) dt,\) for every \(h\in {\mathcal {C}}_{0}^{\infty }(D)\) and \(D\in {\mathcal {B}}^{d}.\) We refer to \({\mathcal {I}}_{{\mathcal {T}}}(h)\) as the generalized cumulative mortality risk random process over the interval \({\mathcal {T}}.\) Hence, the input hard-data process \(X=\{ X_{t}({\mathbf {z}}),\ {\mathbf {z}}\in {\mathbb {R}}^{d}, \ t\in {\mathbb {R}}_{+}\}\) defines the spatiotemporal mortality log-risk process.

From the sample values of our input soft-data process, the following observation model is considered in the curve regression model fitting

$$\begin{aligned}&\ln \left( \varLambda _{t} \right) (\psi _{p,\varpi _{p}})= g_{t}(\psi _{p,\varpi _{p}}, \varvec{\theta }(p) )+\varepsilon _{t}(\psi _{p,\varpi _{p}})\nonumber \\&\quad = \left\langle g_{t}(\cdot ,\varvec{\theta }(p) ), \psi _{p,\varpi _{p}}(\cdot )\right\rangle _{H}\nonumber \\&\qquad + \left\langle \varepsilon _{t}(\cdot ),\psi _{p,\varpi _{p}}(\cdot )\right\rangle _{H},\ t\in {\mathbb {R}}_{+},\ p=1,\dots ,P, \end{aligned}$$
(3)

where

$$\begin{aligned} g_{t}(\psi _{p,\varpi _{p}}, \varvec{\theta }(p) )= & {} \int _{{\mathcal {D}}_{p}}g_{t}({\mathbf {z}}, \varvec{\theta }(p)) \psi _{p,\varpi _{p}}({\mathbf {z}}) d{\mathbf {z}}\nonumber \\ \left\langle f,g\right\rangle _{H}= & {} \int _{{\mathbb {R}}^{d}}f({\mathbf {z}})g({\mathbf {z}})d{\mathbf {z}}, \end{aligned}$$
(4)

with \(\{\psi _{p,\varpi _{p}},\ p=1,\dots ,P\}\subset H\) denoting a function family in H,  whose elements have respective compact supports \({\mathcal {D}}_{p},\) \(p=1,\dots ,P,\) defining the p small-areas where the counts are aggregated, satisfying suitable regularity conditions. For each \(p=1,\dots ,P,\) the vector \(\varpi _{p}\) contains the center and bandwidth parameters, defining the window selected in the analysis of the small-area p. For each \(p\in \{1,\dots ,P\},\) \(\varvec{\theta }(p) =(\theta ^{1}(p),\ldots ,\theta ^{q}(p))\in \varTheta \) represents the unknown parameter vector to be estimated at the p region, and \(\varTheta \) is the open set defining the parameter space, whose closure \(\varTheta ^{c}\) is a compact set in \({\mathbb {R}}^{q}.\) We assume that \(g_{t}\) is of the form (see, e.g., Ivanov et al. 2015)

$$\begin{aligned} g_{t}(\varvec{\theta }(p))=\sum _{k=1}^{N} \left( A_{k}(p)\cos (\varphi _{k}(p) t)+B_{k}(p)\sin (\varphi _{k}(p)t)\right) , \ p=1,\dots ,P,\quad t\in {\mathbb {R}}_{+}, \end{aligned}$$
(5)

whose spatial-dependent parameters are given by the temporal scalings \(\left( \varphi _{1}(\cdot ),\dots , \varphi _{N}(\cdot )\right) ,\) and the Fourier coefficients \(\left( A_{1}(\cdot ), B_{1}(\cdot ),\ldots , A_{N}(\cdot ), B_{N}(\cdot )\right) .\) For simplifications purposes, we will consider that the scaling parameters \(\varphi _{k},\) \(k=1,\dots ,N,\) are known, and fixed over the P spatial regions. Also, \(C_{k}^{2}(\cdot )= A_{k}^{2}(\cdot )+B_{k}^{2}(\cdot )>0,\) for \(k=1,\ldots ,N,\) where N denotes the truncation parameter, that will be selected according to the penalized factor proposed in Chapelle et al. (2002), as we explain in more detail in Sect. 3. Thus,

$$\begin{aligned} \varvec{\theta }(p)= & {} (A_{1}(p),B_{1}(p),\ldots ,A_{N}(p),B_{N}(p)),\quad p=1,\dots , P. \end{aligned}$$

To analyze the spatial correlation between regions, a multivariate autoregressive model is considered for prediction of the regression residual term at each region \(p\in \{1,\dots , P\}.\) Particularly, for any \(T\ge 2,\) \(\varepsilon _{t}\) in equation (3) is assumed to satisfy the state equation, for \(p=1,\dots , P,\)

$$\begin{aligned} \varepsilon _{t}(\psi _{p,\varpi _{p}})=\sum _{q=1}^{P}\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})\varepsilon _{t-1}(\psi _{q,\varpi _{q}}) +\nu _{t}(\psi _{p,\varpi _{p}}), \end{aligned}$$
(6)

where, for any \(t\in {\mathbb {R}}_{+},\) and \(p,q=1,\dots , P,\)

$$\begin{aligned} \varepsilon _{t}(\psi _{p,\varpi _{p}})= & {} \int _{{\mathcal {D}}_{p}}\varepsilon _{t}({\mathbf {z}})\psi _{p,\varpi _{p}}({\mathbf {z}})d{\mathbf {z}}\\ \nu _{t}(\psi _{p,\varpi _{p}})= & {} \int _{{\mathcal {D}}_{p}}\nu _{t}({\mathbf {z}})\psi _{p,\varpi _{p}}({\mathbf {z}})d{\mathbf {z}}\\ \rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})= & {} \int _{{\mathcal {D}}_{p}\times {\mathcal {D}}_{p}}\rho ({\mathbf {z}},{\mathbf {y}})\psi _{p,\varpi _{p}}({\mathbf {z}}) \psi _{q,\varpi _{q}}({\mathbf {y}})d{\mathbf {y}}d{\mathbf {z}}. \end{aligned}$$

Here, \(\left( \nu _{t}(\psi _{p,\varpi _{p}}),\ p=1,\dots , P\right) ,\) \(t\in {\mathbb {R}}_{+},\) are assumed to be independent zero–mean Gaussian P–dimensional vectors. For \(p,q\in \{1,\dots P\},\) the projection \(\rho (\psi _{p,\varpi _{p}})(\psi _{q,\varpi _{q}})\) then keeps the temporal linear autocorrelation at each spatial region for \(p=q,\) and the temporal linear cross-correlation between regions for \(p\ne q\) of the regression error \(\{\varepsilon _{t}(\cdot ),\ t\in {\mathbb {R}}_{+}\}\) (see, Bosq 2000).

3 Implementation of the curve regression model and spatial residual analysis

Let \({\mathcal {D}}_{1},\dots ,{\mathcal {D}}_{P}\) be the small-areas, where the counts are aggregated, and \(\{\psi _{p,\varpi _{p}},\ \varpi _{p}=(c_{p},\rho _{p}),\ p=1,\dots ,P\}\subset H\) be the functions with respective compact supports \({\mathcal {D}}_{1},\dots ,{\mathcal {D}}_{P}.\) Particularly, we denote by \(c_{p},\) \(p=1,\dots ,P,\) the centers respectively allocated at the regions \({\mathcal {D}}_{1},\dots ,{\mathcal {D}}_{P},\) and by \(\rho _{1},\dots ,\rho _{P},\) the bandwidth parameters providing the associated window sizes.

In practice, from the observation model (3), to find \(g_ {t}\) in (5) minimizing the expected quadratic loss function, or expected risk, we look for the minimizer \(\widehat{\varvec{\theta }}_{T}(p)\) of the empirical regression risk

$$\begin{aligned}&L_{T}(\widehat{\varvec{\theta }}_{T}(p))=\inf _{\varvec{\theta }(p) \in \varTheta ^{c}} L_{T}(\varvec{\theta }(p)) \nonumber \\&\quad =\inf _{\varvec{\theta }(p) \in \varTheta ^{c} }\frac{1}{T}\sum _{t=1}^{T}\left| \ln \left( \varLambda _{t}\right) (\psi _{p,\varpi _{p}})-g_{t}(\varvec{\theta }(p))\right| ^{2}. \end{aligned}$$
(7)

Truncation parameter N is then selected to controlling the ratio between the expected quadratic loss function at \(\widehat{\varvec{\theta }}_{T}(p),\) and the expected value of the minimized empirical risk from the identity

$$\begin{aligned}&E\left[ \ln \left( \varLambda _{t} \right) (\psi _{p,\varpi _{p}})- g_{t}(\psi _{p,\varpi _{p}}, \widehat{\varvec{\theta }}_{T}(p)\right] ^{2}\nonumber \\&=E\left[ L_{T}(\widehat{\varvec{\theta }}_{T}(p))\right] \left( 1-\frac{N}{T}\right) ^{-1}\left( 1+\frac{\sum _{i=1}^{N}1/\lambda _{i}}{T}\right) , \end{aligned}$$
(8)

where, for \(i=1,\dots ,N,\) \(1/\lambda _{i}\) denotes the inverse of the ith eigenvalue of the matrix \(\varPhi ^{T}\varPhi ,\) with \(\varPhi \) being a \(T\times N\) matrix, whose elements are the values of the N trigonometric basis functions selected at the time points \(t=1,\dots ,T.\) Parameter N should be such that \(N<<T.\) Note that, asymptotically, when \(N\rightarrow \infty ,\) \(\varPhi ^{T}\varPhi \) goes to the identity matrix, and for \(i=1,\dots ,N,\) \(1/\lambda _{i}\sim 1.\) In equation (8), we have considered the minimized empirical risk

$$\begin{aligned} L_{T}(\widehat{\varvec{\theta }}_{T}(p))=\frac{1}{T}\widetilde{{\mathcal {R}}}^{T}(p)\left( I_{T\times T}-\varPhi \left( \varPhi ^{T}\varPhi \right) ^{-1}\varPhi ^{T}\right) \widetilde{{\mathcal {R}}}(p), \end{aligned}$$
(9)

for each spatial region \(p=1,\dots , P,\) where

$$\begin{aligned} \widetilde{{\mathcal {R}}}(p)=\left( \sum _{k=N+1}^{\infty } \left( A_{k}(p)\cos (\varphi _{k} t)+B_{k}(p)\sin (\varphi _{k}t)\right) + \varepsilon _{t}(\psi _{p,\varpi _{p}}),\ t=1,\dots ,T\right) . \end{aligned}$$

Our regression predictor is then computed, for any \(t\in {\mathbb {R}}_{+},\) from the identity

$$\begin{aligned} \widehat{\ln \left( \varLambda _{t} \right) }(\psi _{p,\varpi _{p}})=g_{t}(\widehat{\varvec{\theta }}_{T}(p)),\quad p=1,\dots ,P \end{aligned}$$
(10)

(see Theorem 1 in Ivanov et al. (2015) about conditions for the weak-consistency of (10)).

The regression residuals

$$\begin{aligned} {\mathbf {Y}}=\left\{ Y_{t}(\psi _{p,\varpi _{p}})= \ln \left( \varLambda _{t}\right) (\psi _{p,\varpi _{p}})- g_{t}(\widehat{\varvec{\theta }}_{T}(p)),\ t=1,\dots ,T,\ p=1,\dots , P\right\} , \end{aligned}$$

and the empirical nuclear autocovariance and cross-covariance operators

$$\begin{aligned}&{\widehat{R}}_{0,T}^{{\mathbf {Y}}}(\psi _{p,\varpi _{p}})(\psi _{q,\varpi _{q}})\nonumber \\&\quad =\frac{1}{T}\sum _{t=1}^{T}Y_{t}(\psi _{p,\varpi _{p}})Y_{t}(\psi _{q,\varpi _{q}}) \nonumber \\&{\widehat{R}}_{1,T}^{{\mathbf {Y}}}(\psi _{p,\varpi _{p}})(\psi _{q,\varpi _{q}})\nonumber \\&\quad =\frac{1}{T-1}\sum _{t=1}^{T-1}Y_{t}(\psi _{q,\varpi _{q}})Y_{t+1}(\psi _{p,\varpi _{p}}),\ p,q=1,\dots ,P,\nonumber \\ \end{aligned}$$
(11)

will be considered in the estimation of the spatial linear residual correlation (see Bosq 2000). A truncation parameter k(T) is also considered here to remove the ill-posed nature of this estimation problem. Particularly, k(T) must satisfy \(k(T)\rightarrow \infty ,\) \(k(T)/T\rightarrow 0,\) \(T\rightarrow \infty .\) A suitable choice of k(T) also ensures strong-consistency of the estimator

$$\begin{aligned}&{\widehat{\rho }}_{k(T)}(\psi _{p,\varpi _{p}})(\psi _{q,\varpi _{q}})\nonumber \\&\quad = \sum _{k,l=1}^{k(T)}\frac{\left\langle \psi _{p,\varpi _{p}},\phi _{k,T} \right\rangle _{H}\left\langle \psi _{q,\varpi _{q}}, \phi _{l,T} \right\rangle _{H}}{\lambda _{k,T}({\widehat{R}}_{0,T}^{{\mathbf {Y}}})} {\widehat{R}}_{1,T}^{{\mathbf {Y}}}(\phi _{k,T})(\phi _{l,T}), \nonumber \\ \end{aligned}$$
(12)

for \(p,q=1,\dots ,P\) (see Bosq 2000). Here,

$$\begin{aligned} {\widehat{R}}_{0,T}^{{\mathbf {Y}}}= & {} \sum _{k=1}^{T}\lambda _{k,T}({\widehat{R}}_{0,T}^{{\mathbf {Y}}})[\phi _{k,T}\otimes \phi _{k,T}], \end{aligned}$$
(13)

where \(\{\lambda _{k,T}({\widehat{R}}_{0,T}^{{\mathbf {Y}}}),\ k=1,\dots ,T\}\) and \(\left\{ \phi _{k,T}, \ k\ge 1\right\} \) denote the empirical eigenvalues and eigenvectors of \({\widehat{R}}_{0,T}^{{\mathbf {Y}}},\) respectively. Particularly, we consider \(k(T)=\ln (T)\) (see Bosq 2000). The classical plug-in predictor is then computed, for each \(p=1,\dots ,P,\) as

$$\begin{aligned} {\widehat{Y}}_{t}^{k(T)}(\psi _{p,\varpi _{p}})=\sum _{q=1}^{P}{\widehat{\rho }}_{k(T)}(\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}}) Y_{t-1}(\psi _{q,\varpi _{q}}),\quad t\ge 1. \end{aligned}$$
(14)

Under the Gaussian distribution of \(\nu _{t},\) in the Bayesian estimation of \(\rho ,\) from (6), the likelihood function, defining the objective function, is given by, for each \(p=1,\dots ,P,\)

$$\begin{aligned}&{\widetilde{L}}_p(\varepsilon _{1p},\dots ,\varepsilon _{Tp},\varepsilon _{0q},\dots ,\varepsilon _{(T-1)q}\nonumber \\&\qquad \rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}}),q=1,\dots ,P)\nonumber \\&\quad = \frac{\exp \left( -\frac{1}{2\sigma ^2_{p}}\sum _{t=1}^T\left( \varepsilon _{t}(\psi _{p,\varpi _{p}})-\sum _{q=1}^{P}\varepsilon _{t-1}(\psi _{q,\varpi _{q}})\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}}) \right) ^2 \right) }{\left( \sigma _{p}\sqrt{2\pi }\right) ^T}\nonumber \\&\qquad \times \prod _{q=1}^{P} \left[ \rho (\psi _{q,\varpi _{q}}) (\psi _{p,\varpi _{p}})\right] ^{a_{pq}-1}\nonumber \\&\qquad \left( 1-\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})\right) ^{b_{pq}-1}\nonumber \\&\qquad \times \frac{{\mathbb {I}}_{\{0<\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})<1\}}}{{\mathbb {B}}(a_{pq}, b_{pq})}\nonumber \\&\quad = \frac{1}{\left( \sigma _{p}\sqrt{2\pi }\right) ^T} \exp \left( -\frac{1}{2\sigma ^2_{p}}\sum _{t=1}^T\left[ \nu _{t}(\psi _{p,\varpi _{p}})\right] ^2 \right) \nonumber \\&\qquad \times \prod _{q=1}^{P} \left[ \rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})\right] ^{a_{pq}-1}\nonumber \\&\qquad \left( 1-\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})\right) ^{b_{pq}-1}\nonumber \\&\qquad \times \frac{{\mathbb {I}}_{\{0<\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})<1\}}}{{\mathbb {B}}(a_{pq}, b_{pq})},\nonumber \\ \end{aligned}$$
(15)

where, for each \(p=1,\dots ,P,\) the beta probability distributions with shape parameters \(a_{pq}\) and \( b_{pq},\) \(q=1,\dots ,P,\) respectively define the prior probability distributions of the independent random variables \(\{\rho (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}}), \ q=1,\dots ,P\}.\) Here, for each \(p=1,\dots ,P,\) \(\varepsilon _{tp}=\varepsilon _{t}(\psi _{p,\varpi _{p}})=\left\langle \varepsilon _{t},\psi _{p,\varpi _{p}}\right\rangle _{H},\) and \(\sigma _{p} = \sqrt{E[\varepsilon _t(\psi _{p,\varpi _{p}})]^2},\) for \(t=0,\dots ,T.\) As before, \(\psi _{p,\varpi _{p}}\) weights the spatial sample information about the p small-area, for \(p=1,\dots ,P.\) As usual, \({\mathbb {I}}_{0<\cdot <1}\) denotes the indicator function on the interval (0, 1),  and \({\mathbb {B}}(a_{pq}, b_{pq})\) is the beta function,

$$\begin{aligned} {\mathbb {B}}(a_{pq}, b_{pq})=\frac{\varGamma (a_{pq})\varGamma (b_{pq})}{\varGamma (a_{pq}+b_{pq})}. \end{aligned}$$

From (15), the Bayesian predictor is obtained, for \(p= 1,\dots ,P,\) as

$$\begin{aligned} {\widetilde{\varepsilon }}_{t}(\psi _{p,\varpi _{p}})=\sum _{q=1}^{P}{\widetilde{\rho }} (\psi _{q,\varpi _{q}})(\psi _{p,\varpi _{p}})\varepsilon _{t-1}(\psi _{q,\varpi _{q}}),\quad t\ge 1, \end{aligned}$$
(16)

with \(\left( {\widetilde{\rho }} (\psi _{1,\varpi _{1}})(\psi _{p,\varpi _{p}}),\dots ,{\widetilde{\rho }} (\psi _{P,\varpi _{P}})(\psi _{p,\varpi _{p}}) \right) \) being computed by maximizing (15), to find the posterior mode (see Bosq and Ruiz-Medina 2014, where Bayesian estimation is introduced in an infinite-dimensional framework). We refer to (16) as the Bayesian plug-in predictor of the residual mortality log-risk process at the p small area, for \(p= 1,\dots ,P.\) In practice, equation (15) is approximated from the computed values of the regression residual process.

4 Statistical analysis of COVID-19 mortality

Our analysis is based on daily records of COVID-19 mortality reported by the Carlos III Health Institute, since March, 8 to May, 13, 2020, at the 17 Spanish Communities. We first describe the main steps of the proposed estimation algorithm, referring to the inputs and outputs at different stages.

  1. Step 1

    Daily records of COVID-19 mortality are accumulated over the entire period at every Spanish Community. The resulting step cumulative curves are interpolated at 265 temporal nodes, and cubic B-spline smoothed. Their derivatives and logarithmic transforms are then computed.

  2. Step 2

    Our soft-input-data process is obtained from the spatial projection of the outputs in Step 1 onto the compactly supported basis \(\{\psi _{p,\varpi _{p}},\ p=1,\dots ,P=17\}.\) We choose the tensorial product of Daubechies wavelet bases. Here, for \(p=1,\dots ,17,\) \(\varpi _{p}=(N(p), j(p),{\mathbf {k}}(p)),\) whose components respectively provide the order of the Daubechies wavelet functions, the resolution level, and the vector of spatial displacements, according to the area occupied by each Spanish community (see, e.g., Daubechies 1988).

  3. Step 3

    The choice \(N=6\) in (8) corresponds to 1.1304 value of the ratio between the mean quadratic loss function and expected minimized empirical risk. Hence 12 coefficients should be estimated. Note that the eigenvalues in (8) are computed from the trigonometric basis.

  4. Step 4

    Under \(N=6\) in Step 3, the least-squares estimates of the 12 Fourier coefficients are computed from (7), in terms of the soft-input-data process obtained as output in Step 2.

  5. Step 5

    The regression residuals are then calculated from Step 4.

  6. Step 6

    The auto- and cross-covariance operators in (11) are computed from the outputs of Step 5. The residual spatial linear correlation matrix is then obtained from (12). The truncation scheme \(k(T)=\ln (T)\) has been adopted, with \(T=265.\)

  7. Step 7

    The residual predictor (14) is computed from Step 6.

  8. Step 8

    100 bootstrap samples are generated from the empirical autocorrelation projections. The bootstrap prior fitted suggests us to consider a scaled beta probability density with shape parameters 14 and 13.

  9. Step 9

    Assuming a Gaussian scenario for our log-regression residuals, our constrained nonlinear multivariate objective function (15) is computed from the prior proposed in Step 8.

  10. Step 10

    To maximize the objective function computed in Step 9, we implement an hybrid genetic algorithm, constructed from ’gaoptimset’ MaLab function, implemented with the ’HybridFcn’ option that handles to a function to continuing optimization after the genetic algorithm terminates. This last function applies quasi-Newton methodology in the optimization procedure, involving an inverse Hessian matrix estimate.

  11. Step 11

    The soft-data based bayesian predictor (16) of the residual COVID-19 mortality log-risk is finally computed from the outputs in Step 10.

  12. Step 12

    Our multiple objective space-time predictor is obtained from Steps 4 and 11, by addition the regression and residual predictors, applying inverse spatial wavelet transform.

Tables 12 below display the parameter estimates \({\widehat{A}}_{k}(\cdot )\) and \({\widehat{B}}_{k}(\cdot ),\) \(k=1,\dots ,6,\) where \(\varphi _{k}=\frac{2\pi }{265}\) has been considered, for \(k=1,\dots , N=6.\) In these tables and below, the following Spanish Community (SC) codes appear: C1 for Andalucía; C2 for Aragón; C3 for Asturias; C4 for Islas Baleares; C5 for Canarias; C6 for Cantabria; C7 for Castilla La Mancha; C8 for Castilla y León; C9 for Cataluña; C10 for Comunidad Valenciana; C11 for Extremadura; C12 for Galicia; C13 for Comunidad de Madrid; C14 for Murcia; C15 for Navarra; C16 for País Vasco, and C17 for La Rioja.

Table 1 Regression parameter estimates \({\widehat{A}}_k(\cdot ),\) \(k = 1, \ldots , 6, \) at the 17 Spanish Communities
Table 2 Regression parameter estimates \({\widehat{B}}_k(\cdot ),\) \(k = 1, \ldots , 6, \) at the 17 Spanish Communities

Bootstrap curve confidence intervals at confidence level \(1-\alpha =0.95\), based on 1000 bootstrap samples, are computed for the spatial mean, over the 17 Spanish Communities, of the curve regression predictors. Their construction is based on the bias corrected and accelerated percentile method (\({\mathcal {I}}_{1}\)); Normal approximated interval with bootstrapped bias and standard error (\({\mathcal {I}}_{2}\)); basic percentile method (\({\mathcal {I}}_{3}\)), and bias corrected percentile method (\({\mathcal {I}}_{4}\)) (see Fig. 1). The minimized regression empirical risk values \(L_{265}(\widehat{\varvec{\theta }}_{265}(p)),\) \(p=1,\dots ,17,\) are displayed in Table 3.

Fig. 1
figure 1

At the top, COVID-19 mortality mean cumulative curve in Spain, since March, 8, 2020 to May, 13, 2020 (continuous red line, 265 temporal nodes), and bootstrap curve confidence intervals, at the left-hand-side, \({\mathcal {I}}_{1}\) (dashed blue lines) and \({\mathcal {I}}_{2}\) (dashed magenta lines), and at the right-hand-side, \({\mathcal {I}}_{3}\) (dashed green lines) and \({\mathcal {I}}_{4}\) (dashed yellow lines). Plots at the center and bottom reflect the same information respectively referred to the mean intensity (spatial averaged COVID-19 mortality risk curve), and log-intensity (spatial averaged COVID-19 mortality log-risk curve) curves in Spain. All the confidence bootstrap intervals are computed at confidence level \(1-\alpha =0.95,\) from 1000 bootstrap samples

Table 3 Computed values \(L_{265}(\widehat{\varvec{\theta }}_{265}(p)),\) \(p=1,\dots ,17\)

Figure 2 at the top displays the 1000 bootstrap sample values

$$\begin{aligned} \overline{L_{265}}(\omega _{i})=\frac{1}{P} \sum _{p=1}^{P}L_{265}(\omega _{i},\widehat{\varvec{\theta }}_{265}(p)), \quad \omega _{i}\in \varOmega , \ i=1,\dots ,1000, \end{aligned}$$

of the spatial averaged minimized empirical quadratic risk in the trigonometric regression. Note that the sample mean of these values is \(\overline{{\overline{L}}}= 0.0262,\) showing a good performance of the least-squares regression predictor, according to the value \(T(265,12)= 1.1304\) obtained. The bootstrap histogram and the corresponding approximation of the probability density function, computed from \(\overline{L_{265}}(\omega _{i}),\) \(i=1,\dots , 1000,\) are also plotted at the bottom of Fig. 2.

Fig. 2
figure 2

1000 bootstrap samples have been generated of the spatially averaged minimum empirical regression risk (SAMERR). The corresponding sample values are displayed at the top. The bootstrap histogram can be found at the bottom-left-hand side. The bootstrap probability density is plotted at the bottom-right-hand-side

Bootstrap confidence intervals for \(\overline{L_{265}}\) have also been computed at level \(1-\alpha =0.95,\) from 1000 and 10000 bootstrap samples. Table 4 displays these intervals respectively based on the bias corrected and accelerated percentile method (\({\mathcal {I}}_{1}\)); Normal approximated interval with bootstrapped bias and standard error (\({\mathcal {I}}_{2}\)); basic percentile method (\({\mathcal {I}}_{3}\)); bias corrected percentile method (\({\mathcal {I}}_{4}\)), and Student-based confidence interval (\({\mathcal {I}}_{5}\)).

Table 4 Bootstrap confidence intervals for \(\overline{L_{265}}\) (confidence level \(1-\alpha =0.95\))

The classical and Bayesian plug-in predictors of the residual COVID-19 mortality log-risk process at each one of the Spanish Communities are respectively computed from equations (14) and (16) for \(P=17\).

Given the empirical spectral characteristics observed in the regularized approximation \({\widehat{\rho }}_{k(T)}\) of \(\rho \) in (12), from the singular value decomposition of the empirical operators in (11), our choice of the prior for the projections of \(\rho \) has been a scaled, by factor 1/3,  Beta prior with hyper-parameters \(a_{pq}= 14,\) and \(b_{pq}=13,\) for \(p,q=1,\dots , 17.\) The suitability of this data-driven choice, regarding localization of the mode, and the tails thickness, is illustrated in Fig. 3. Specifically, at the right plot in Fig. 3, both, the scaled Beta probability density, with shape parameters 14 and 13 (red-square line), and the fitted probability density (blue-square line), from the generated bootstrap samples, based on the empirical projections of \(\rho ,\) are displayed. Note that the observed range of the empirical projections of \(\rho \) is well fitted, as one can see from the left plot in Fig. 3.

Fig. 3
figure 3

At the left-hand side, empirical projections of the autocorrelation operator \(\rho ,\) reflecting temporal autocorrelation and cross-correlation between the 17 Spanish Communities analyzed. At the right-hand side, the considered prior probability density (red squares) of a scaled, by factor 1/3,  Beta distributed random variable with shape parameters 14 and 13 is compared with the bootstrap fitting of an empirical prior (blue squares)

Bootstrap confidence intervals \({\mathcal {I}}_{1},\dots ,{\mathcal {I}}_{5}\) at level \(1-\alpha =0.95\), for the expected training standard error of the multivariate time series classical and Bayesian residual COVID-19 mortality log-risk predictors, based on 1000 bootstrap samples, are displayed in Table 5:

Table 5 Bootstrap confidence intervals for the expected training standard error of the classical and Bayesian residual COVID-19 mortality log-risk predictors (\(1-\alpha =0.95\))

Maps plotted in Fig. 4 show the observed spatiotemporal evolution of COVID-19 mortality risk, and its prediction, from the fitted curve trigonometric regression model, and the subsequent classical and Bayesian time series analysis.

Fig. 4
figure 4

COVID-19 mortality risk maps, since March, 8 to May, 13, 2020. Observed (left-hand-side) and estimated (right-hand side) maps, computed from trigonometric regression, combined with classical (first line) and Bayesian (second line) residual predictors

5 An empirical comparative study

The ML regression models introduced in the Supplementary Material are applied to COVID-19 mortality analysis, and compared, via random k-fold cross-validation and bootstrap estimators, with the multiple objective space-time forecasting approach presented. We distinguish two categories respectively referred to the strong-sense (hard-data) and weak-sense (soft-data) definition of our data set. Random k-fold (\(k=5,10\)) cross-validation, in terms of Symmetric Mean Absolute Percentage Errors (SMAPEs), evaluates the performance of the compared regression models, from hard- and soft-data. Bootstrap confidence intervals, and probability density estimates of the spatially averaged SMAPEs are also computed. Section 6 provides a data-driven model classification, based on SMAPEs, in the two categories analyzed, from random k-fold cross-validation, and the bootstrap estimation procedures applied.

5.1 Results from random k-fold cross-validation

After interpolation and cubic B-spline smoothing of our original data set, the logarithmic transform and linear scaling are applied. We held out the first ten points and the last three, for each COVID-19 mortality log-risk curve, as an out of sample set. Our approach is implemented in the second-category from soft-data. In this implementation, we consider \(N=6,\) adopting the model selection criterion given in equation (8) (see Chapelle et al. 2002). In the multivariate time series classical and Bayesian prediction, our choice of \(k(T)=k(265)=8\) provides a balance between \(k(T)=[\ln (T)]^{-}=[\ln (265)]^{-}=5,\) signing an agreement with the separation and velocity decay of the empirical eigenvalues of the autocovariance operator, and the parameter value \(k(T)=9,\) controlling model complexity according to the sample size \(T=265.\) The random fluctuations observed at the k(T) empirical projections of the spatial autocorrelation matrix \(\rho \) are also well-fitted by our choice of the shape hyperparameters, characterizing the prior Beta probability density.

Model fitting is evaluated in terms of the Symmetric Mean Absolute Percentage Errors (SMAPEs), given by, for \(P=17,\) and \(T=265,\)

$$\begin{aligned} \frac{1}{T}\sum _{t=1}^{T}\frac{\left| \widehat{\ln (\varLambda _{t})}(\psi _{p,\rho _{p}})-\ln (\varLambda _{t})(\psi _{p,\rho _{p}})\right| }{\left( \left| \ln (\varLambda _{t})(\psi _{p,\rho _{p}})\right| + \left| \widehat{\ln (\varLambda _{t})}(\psi _{p,\rho _{p}})\right| \right) /2},\quad p=1,\dots ,P. \end{aligned}$$
(17)

We have computed the mean of the SMAPEs obtained at each one of the k iterations of the random k-fold cross-validation procedure. This validation technique consists of random splitting the functional sample into a training and validation samples at each one of the k iterations. Model fitting is performed from the training sample, and the target outputs are defined from the validation or testing sample. By running each model ten times and averaging SMAPEs, we remove the fluctuations due to the random initial weights (for MLP and BNN models), and the differences in the parameter estimation in all methods, due to the random specification of the sample splitting in the random k-fold cross-validation procedure.

The ten-running based random 10-fold cross-validation SMAPEs are displayed in Table 6, for the six ML techniques tested, GRNN, MLP, SVR, BNN, RBF, and GP, when hard-data are considered (see also Table 3 of the Supplementary Material on random 5-fold cross-validation results). Table 7 provides the ten-running based random 10-fold cross-validation results, from soft-data category (see also Table 4 of the Supplementary Material on random 5-fold cross validation results). The corresponding cross-validation results of the presented approach from soft-data are displayed in Table 8.

ML model hyperparameter selection has been achieved by applying random k-fold cross-validation (\(k=5,10\)). Our selection has been made from a suitable set of candidates. Specifically, the optimal numbers of hidden (NH) nodes in the implementation of MLP and BNN have been selected from the candidate sets [0, 1, 3, 5, 7, 9] and [1, 3, 5, 7, 9], respectively. The random cross-validation results in both cases, \(k=5,10,\) lead to the same choice of the NH optimal value. Namely, NH\(=1\) for MLP, and NH\(=5\) for BNN. The last one displays slight differences with respect to the values NH\(=3,7,\) in the random 10-fold cross-validation implementation. In the same way, we have selected the respective spread \(\beta \) and bandwidth h parameters in the RBF and GRNN procedures. Thus, after applying random k-fold cross-validation, with \(k=5,10,\) the optimal values \(\beta =2.5,\) and \(h=0.05\) are obtained, from the candidate sets [2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20] and [0.05, 0.1, 0.2, 0.3, 0.5, 0.6, 0.7], respectively (see Supplementary Material). Better performance from hard-data is observed in linear SVR. In its implementation, automatic hyperparameter optimization from fitrsvm MatLab function is applied. While, from the soft-data category, the best option corresponds to the Gaussian kernel based nonlinear SVR model fitting (applying the same option of automatic hyperparameter optimization, in the argument of fitrsvm MatLab function). In the implementation of GP, we follow the same tuning procedure for model selection. In this case, for both categories, we have selected Bayesian cross-validation optimization (in the hyperparameter optimization argument of the fitrgp MatLab function).

In all the results displayed, the SMAPE–MEAN (M.) and SMAPE–TOTAL (T.) have been computed as performance measures, for comparing the ML models tested, and our approach.

Table 6 Hard-data category. Averaged SMAPEs, based on 10 running of random 10-fold cross-validation
Table 7 Soft-data category. Averaged SMAPEs, based on 10 running of random 10-fold cross-validation
Table 8 Our approach. Averaged SMAPEs, based on 10 running of random 10-fold cross-validation, for testing trigonometric regression combined with Classical (C.) and Bayesian (B.) residual analysis

5.2 Bootstrap based classification results

For the ML regression models tested, in the hard- and soft-data categories, bootstrap confidence intervals (\(1-\alpha =0.95\) confidence level) for the spatially averaged SMAPEs, based on 1000 bootstrap samples, are constructed. Our approach requires the soft-data information to be incorporated. As before, the computed bootstrap confidence intervals \({\mathcal {I}}_{i},\) \(i=1,\dots ,5,\) are respectively based on the bias corrected and accelerated percentile method (\({\mathcal {I}}_{1}\)); Normal approximated interval with bootstrapped bias and standard error (\({\mathcal {I}}_{2}\)); basic percentile method (\({\mathcal {I}}_{3}\)); bias corrected percentile method (\({\mathcal {I}}_{4}\)), and Student-based confidence interval (\({\mathcal {I}}_{5}\)) (see Tables 9 and 10). The bootstrap histogram, and probability density of the spatially averaged SMAPEs are displayed in Figs. 5 and 6, for the hard-data category, and in Figs. 7, 8 and 9, for the soft-data category. The data-driven performance-based model classification results obtained are discussed in Sect. 6.

Table 9 Hard-data category. Bootstrap confidence intervals (\(1-\alpha =0.95\)) for the spatially averaged SMAPEs from 1000 bootstrap samples (\(T=265,\) \(P=17\))
Table 10 Soft-data category. Bootstrap confidence intervals (\(1-\alpha =0.95\)) for the spatially averaged SMAPEs from 1000 bootstrap samples (\(T=265,\) \(P=17\))
Fig. 5
figure 5

Hard-data category. From 1000 bootstrap samples, spatially averaged SMAPEs histograms and probability densities are plotted, for GRNN (top), MLP (center), and linear SVR (bottom)

Fig. 6
figure 6

Hard-data category. From 1000 bootstrap samples, spatially averaged SMAPEs histograms and probability densities are plotted, for BNN (top), RBF (center), and GP (bottom)

Fig. 7
figure 7

Soft-data category. From 1000 bootstrap samples, spatially averaged SMAPEs histograms and probability densities are plotted, for GRNN (top), MLP (center) and non-linear SVR (bottom)

Fig. 8
figure 8

Soft-data category. From 1000 bootstrap samples, spatially averaged SMAPEs histograms and probability densities are plotted, for BNN (top), RBF (center) and GP (bottom)

Fig. 9
figure 9

Soft-data category. From 1000 bootstrap samples, spatially averaged SMAPEs histograms and probability densities are plotted, for trigonometric regression, combined with empirical-moment based classical (top), and Bayesian (bottom) residual prediction

6 Final comments

One can observe the agreement between the respective performance-based model classification results, obtained from random k-fold cross-validation, and bootstrap estimation in Sects. 5.1 and 5.2. In the hard-data category, the best performance is displayed by RBF and GP. Similar bootstrapping characteristics are observed for BNN and SVR, with slightly larger values of spatially averaged SMAPEs, reflected in the location of the mode, in the histograms and probability densities displayed in Figs. 5 and 6. These four regression methodologies show a similar degree of variability, regarding the spatially averaged SMAPEs sample values. A higher variability than RBF, GP, BNN and SVR is displayed by the bootstrap sample values of spatially averaged SMAPEs in MLP validation. MLP bootstrapped mode is also slightly shifted to the right. The worst performance corresponds to GRNN (see also Table 6). In the soft-data category, where our approach is incorporated to the empirical comparative study, almost the same empirical ML model ranking holds. Some differences are found in the bootstrap confidence intervals, and histogram and probability densities computed. For instance, GRNN seems to be favored by soft-data category, while MLP displays worse performance in this category. Hence, smaller differences between GRNN and MPL are displayed in the soft-data category. A slightly improvement in the soft-data category of BNN relative to SVR is observed, preserving almost the same performance. RBF and GP display better performance in the soft-data category, being RFB a bit superior to GP in this category (see Table 10 and Fig. 8). The trigonometric regression, and multivariate time series residual prediction approach based on the empirical moments displays similar results to GRNN, with slightly better performance of GRNN, observed in the bootstrap intervals and histogram/probability density (see Figs. 7 and 9). However, as given in Figs. 8 and 9, the trigonometric regression and Bayesian residual prediction presents almost the same ‘performance as BNN, with some slightly better probability distribution features of our approach respect to BNN (see also bootstrap intervals). Our approach is less affected by the random splitting of the sample, in the implementation of the random k-fold cross validation procedure, since a dynamical spatial residual model is fitted in a second (objective) step. Thus, the proposed multivariate time series classical and Bayesian regression residual modeling fits the short-term spatial linear correlations displayed by the soft-data category. However, the price we pay for increasing model complexity is reflected in the resulting SMAPEs based random k-fold and bootstrap model classification results obtained.

The spatial component effect is reflected in Tables 6 (hard-data), where spatial heterogeneities displayed by random 10-fold cross-validation SMAPEs errors are observed (see also Table 3 in the Supplementary Material). While Table 7 (see also Table 4 in the Supplementary Material) reveals the benefits obtained in some of the ML regression models tested from soft-data information. Particularly, in this category, possible spatial linear correlations are incorporated to the analysis, in terms of soft-data.