Elsevier

Spatial Statistics

Volume 44, August 2021, 100524
Spatial Statistics

Copula-based multiple indicator kriging for non-Gaussian random fields

https://doi.org/10.1016/j.spasta.2021.100524Get rights and content

Abstract

In spatial statistics, the kriging predictor is the best linear predictor at unsampled locations, but not the optimal predictor for non-Gaussian processes. In this paper, we introduce a copula-based multiple indicator kriging model for the analysis of non-Gaussian spatial data by thresholding the spatial observations at a given set of quantile values. The proposed copula model allows for flexible marginal distributions while modeling the spatial dependence via copulas. We show that the covariances required by kriging have a direct link to the chosen copula function. We then develop a semiparametric estimation procedure. The proposed method provides the entire predictive distribution function at a new location, and thus allows for both point and interval predictions. The proposed method demonstrates better predictive performance than the commonly used variogram approach and Gaussian kriging in the simulation studies. We illustrate our methods on precipitation data in Spain during November 2019, and heavy metal dataset in topsoil along the river Meuse, and obtain probability exceedance maps.

Introduction

Spatial data is actively generated in various research areas, including environmental science, earth science, ecological and hydrological sciences, and it often displays the presence of non-Gaussianity, for example, data from wind speed (Zhu and Genton, 2012), precipitation (Marchenko and Genton, 2010), heavy metal concentrations in soil (Lin et al., 2010), groundwater pollution (Arslan, 2012). An important goal in geostatistics is to obtain predictions from the observed data at unsampled spatial locations. Gaussian random fields are extensively used in the analysis of spatial data as they can be simply characterized by a mean and covariance structure. The classical geostatistical tool, kriging, is the best linear unbiased predictor but is optimal only when the process is Gaussian (Cressie, 1993). However, the assumption of Gaussianity is rarely met in practice. Hence, better predictors are needed for non-Gaussian data, e.g., skewed, heavy-tailed, or categorical data. The need to make accurate predictions from the observed data, as well as quantify the associated prediction uncertainties, is common to many scientific disciplines.

As a motivating application, we consider precipitation data from the European Climate Assessment & Dataset project (Klein Tank et al., 2002), publically available for research at https://www.ecad.eu. We study the average daily precipitation intensities in millimeters (mm) for November 2019 from 169 meteorological stations in Spain. The stations are sparsely located, and it is of great interest to predict the precipitation at new locations along with uncertainty and create probability exceedance maps for high precipitation intensities, which can help reveal climate patterns. However, the marginal distribution of precipitation data is usually far from Gaussian distribution (Allcroft and Glasbey, 2003, Johns et al., 2003). Typically, precipitation data is positively skewed and has potential outliers. Hence, it is important to build flexible geostatistical models without distributional assumptions to make accurate predictions as well as explain the spatial variations in precipitation.

Various approaches have been considered to model non-Gaussian geostatistical data by using, e.g., trans-Gaussian random fields (Cressie, 1993, De Oliveira et al., 1997), scale-mixing Gaussian random fields (Fonseca and Steel, 2011), multiple indicator kriging (Journel and Alabert, 1989), skew Gaussian processes (Zhang and El-Shaarawi, 2010, Rimstad and Omre, 2014). An alternative method to indicator kriging is disjunctive kriging (Matheron, 1976), which transforms the data to a standard normal distribution using Hermite polynomials. It is a standard approach in geostatistics to find a nonlinear transformation that enables to fit Gaussian processes; however, transformations do not guarantee normality and may change the features of the process (Changyong et al., 2014). In particular, logarithm and square root transformations induce a link between the mean and covariance of data in the untransformed scale, resulting in nonstationarity (Wallin and Bolin, 2015). Of the previously mentioned approaches, multiple indicator kriging offers a nonlinear, nonparametric solution to spatial interpolation without any distributional assumptions on the marginal distribution (Journel, 1983). Indicator kriging has been previously used to study grade estimation in minerals (Badel et al., 2011), soil or groundwater contamination (Juang and Lee, 2000, Van Meirvenne and Goovaerts, 2001, Goovaerts et al., 2005), spatial variation and mapping of precipitation (Haberlandt, 2007). The indicator approach involves taking an indicator transform of the random field to create sets of indicator variables by thresholding the random field at several quantile levels. We formulate the spatial prediction problem as indicator cokriging. The proposed procedure can account for asymmetric spatial continuity of different quantiles of the distribution, since the correlation of low values may be significantly different from the correlation of high values, which cannot be accommodated by Gaussian-type modeling techniques.

Typically, the spatial dependence structure of the underlying random field is modeled using variograms. However, variogram estimation is adversely affected if data is not Gaussian. Moreover, a variogram just describes the mean dependence between variables and not the dependence over the whole range of the distribution. Bárdossy (2006) proposed using copulas to model spatial variability, which circumvents the disadvantages of variograms. Copulas are multivariate distribution functions with uniformly distributed margins that make it feasible to model the dependence structure of random variables independently of their marginal distributions (Nelsen, 2006). Several studies have investigated copula models for spatial data (Kazianka and Pilz, 2010, Gräler and Pebesma, 2011, Krupskii et al., 2018). Copula models have been widely applied in geostatistics, e.g., financial applications in pricing and credit risk analysis (Cherubini et al., 2004), estimation of groundwater quality parameters (Bárdossy and Li, 2008), interpolation of air temperature data (Alidoost et al., 2018), where the usual Gaussian dependence was not appropriate. Copulas are also used to describe the dependence between extremes (Salvadori and De Michele, 2013, Gräler, 2014). While existing methods for multiple indicator kriging use variograms to describe spatial dependence, we propose to use copulas to model the dependence structure of the indicator variables and use it for spatial interpolation. To model the spatial dependence of indicator variables using copulas, we derive the relationship between indicator covariances and copula functions and further develop estimation and prediction procedures.

One of the challenges in the spatial interpolation of non-Gaussian random fields is to obtain the entire predictive distribution at unknown spatial locations, which is essential for probabilistic predictions (Gneiting et al., 2007, Gneiting and Katzfuss, 2014). Even if one is only interested in the center of the predictive distribution, non-Gaussian data can create significant impacts on the estimation of parameters and prediction if the model is not resistant to extreme observations. Therefore, it is of great importance to build a flexible model that predicts the complete predictive distribution for non-Gaussian processes and is robust to outliers.

In this article, we propose a new method for spatial probabilistic prediction for non-Gaussian random fields using copulas. We propose a copula-based multiple indicator kriging (CMIK) model and a semiparametric maximum pseudo-likelihood estimator. The proposed model makes no distributional assumptions on the marginal distribution of the random field, therefore increasing the model’s flexibility. We describe the spatial dependence structure using copula functions by exploiting the relationship between the indicator covariances and copulas, which overcomes the disadvantages of variogram modeling. Our contribution is to formulate the spatial prediction problem for non-Gaussian processes as indicator cokriging using copulas. The proposed framework utilizes copulas to model covariances of the indicator variables, which allows us to take advantage of the desirable properties of any copula methods. Unlike other copula models for spatial data, we analyze multivariate binary data by thresholding the dataset. Therefore, at each unsampled location, we predict the CDF directly, rather than spatial observations. The proposed method is also flexible on the choice of copulas, leading to an unbiased kriging predictor. We can also compute the conditional quantiles from the conditional distribution function and use the conditional median for point prediction and conditional tail quantiles to construct prediction intervals. We consider both Gaussian and non-Gaussian copulas to model the spatial dependence and evaluate our method’s predictive performance through simulation studies and applications to precipitation data and heavy metal concentrations in soil data. We show that the proposed methods are flexible to model non-Gaussian data and perform better than the existing variogram multiple indicator kriging approaches and those based on Gaussianity assumptions.

The rest of our paper is organized as follows. In Section 2, we introduce the CMIK model framework and spatial modeling of indicator covariances using copulas. We discuss the estimation of copulas and describe the dependence structure using copulas. We also present the spatial probabilistic prediction procedure by predicting the conditional CDF at unsampled locations. In Section 3, we simulate non-Gaussian random fields and evaluate the performance of the proposed method for point and probabilistic predictions. In Section 4, we consider two applications of our methodology, first, to the precipitation dataset from Spain, and second to the heavy metal dataset along the river Meuse. Finally, Section 5 contains discussions and conclusions.

Section snippets

Copula-based multiple indicator kriging

Let Z(s), sRd, d1, be a second-order stationary random field observed at locations s1,,sn. To predict Z at an unknown location s0, we need to estimate the predictive distribution of Z(s0)Z(s1),,Z(sn) for non-Gaussian random fields. We aim to predict the conditional CDF or conditional quantile function at a new location. The prediction of conditional quantiles provides a convenient way to construct point predictions, such as median, which is more suitable for point prediction of skewed and

Simulation study

In this section, we simulate non-Gaussian random fields to evaluate the performance of proposed multiple indicator kriging methods with spatial copulas. We simulate data from a transGaussian random field named Tukey g-and-h random fields, which have very flexible marginal distributions (Xu and Genton, 2017). The random field is based on the Tukey’s g-and-h transformations (Tukey, 1977) given by τg,h(x)=g1{exp(gx)1}exp(hx2/2),g0,h0,zexp(hx2/2),g=0,h0,which is a strictly monotone function of

Application to precipitation data

In this section, we apply the proposed methods of multiple indicator kriging using copulas to the precipitation dataset introduced in Section 1. The average daily precipitation (in millimeters) of Spain for November 2019 at 169 stations is plotted in Fig. 2a using the R package ‘ggmap’ (Kahle and Wickham, 2013). The spatial locations appear quite sparse on the map, and a major human desire is to predict the observations at unknown locations and characterize the uncertainty. The histogram of the

Discussion

In this article, we build a flexible model for spatial probabilistic prediction for non-Gaussian spatial data, which can be applied to various applications. We introduce a copula-based multiple indicator kriging method and describe the spatial dependence of indicator variables using copulas. The proposed method is semiparametric, and makes no explicit assumption on the marginal distribution of the random field, and is resistant to outliers. It provides a complete solution to the spatial

Acknowledgments

The research reported in this publication was supported by funding from King Abdullah University of Science and Technology (KAUST), Saudi Arabia under award number OSR-2019-CRG7-3800. The precipitation dataset used in this research was taken from the European Climate Assessment & Dataset (ECA&D) project available at https://www.ecad.eu. The research of Wang was partly supported by the IR/D program from the US National Science Foundation (NSF) and the NSF grant DMS-1712760. Any opinion,

References (56)

  • RimstadK. et al.

    Skew-Gaussian random fields

    Spat. Stat.

    (2014)
  • Van MeirvenneM. et al.

    Evaluating the probability of exceeding a site-specific soil cadmium contamination threshold

    Geoderma

    (2001)
  • AgarwalG. et al.

    Bivariate functional quantile envelopes with application to radiosonde wind data

    Technometrics

    (2020)
  • AllcroftD.J. et al.

    A latent Gaussian Markov random-field model for spatiotemporal rainfall disaggregation

    J. R. Stat. Soc. Ser. C. Appl. Stat.

    (2003)
  • BárdossyA.

    Copula-based geostatistical models for groundwater quality parameters

    Water Resour. Res.

    (2006)
  • BárdossyA. et al.

    Geostatistical interpolation using copulas

    Water Resour. Res.

    (2008)
  • BurroughP.A. et al.

    Principles of Geographical Information Systems

    (2015)
  • CarvalhoD. et al.

    An overview of multiple indicator kriging

    Geostat. Lessons

    (2017)
  • ChangyongF. et al.

    Log-transformation and its implications for data analysis

    Shanghai Arch. Psychiatry

    (2014)
  • CherubiniU. et al.

    Copula Methods in Finance

    (2004)
  • CressieN.A.

    Statistics for Spatial Data

    (1993)
  • DawidA.P.

    Present position and potential developments: Some personal views statistical theory the prequential approach

    J. R. Stat. Soc. A

    (1984)
  • De HaanL. et al.

    Extreme Value Theory: An Introduction

    (2007)
  • De OliveiraV. et al.

    Bayesian prediction of transformed Gaussian random fields

    J. Amer. Statist. Assoc.

    (1997)
  • DeutschC.V. et al.

    GSLib

  • DieboldF.X. et al.

    Evaluating density forecasts with applications to financial risk management

    Internat. Econom. Rev.

    (1998)
  • FonsecaT.C. et al.

    Non-Gaussian spatiotemporal modelling through scale mixing

    Biometrika

    (2011)
  • FritschF.N. et al.

    Monotone piecewise cubic interpolation

    SIAM J. Numer. Anal.

    (1980)
  • Cited by (5)

    • Application of geostatistical methods to groundwater salinization problems: A review

      2022, Journal of Hydrology
      Citation Excerpt :

      For example, factorial kriging (FK) is used when the measurement errors of the sample data need to be considered. In the presence of non-Gaussian datasets, several extensions of kriging can be used, such as lognormal kriging, indicator kriging (IK), disjunctive kriging (Kuisi et al., 2009; Yazdanpanah, 2016; Tabandeh et al., 2021) and copula-based kriging (Agarwal et al., 2021; Li et al., 2021). Classical kriging under a multivariate Gaussian assumption for the attribute random field furnishes the conditional mean (expectation) and variance of the unknown attribute value at a prediction location given (conditional upon) the nearby sample data; these two parameters define a local (at a prediction location) Gaussian conditional distribution.

    View full text