Skip to main content
Log in

Effective probability distribution approximation for the reconstruction of missing data

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Spatially distributed processes can be modeled as random fields. The complex spatial dependence is then incorporated in the joint probability density function. Knowledge of the joint probability density allows predicting missing data. While environmental data often exhibit significant deviations from Gaussian behavior (rainfall, wind speed, and earthquakes being characteristic examples), only a few non-Gaussian joint probability density functions admit explicit expressions. In addition, random field models are computationally costly for big datasets. We propose an “effective distribution” approach which is based on the product of univariate conditional probability density functions modified by local interactions. The effective densities involve local parameters that are estimated by means of kernel regression. The prediction of missing data is based on the median value from an ensemble of simulated states generated from the effective distribution model. The latter can capture non-Gaussian dependence and is applicable to large spatial datasets, since it does not require the storage and inversion of large covariance matrices. We compare the predictive performance of the effective distribution approach with classical geostatistical methods using Gaussian and non-Gaussian synthetic data. We also apply the effective distribution approach to the reconstruction of gaps in large raster data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Adler RJ (1981) The geometry of random fields. Wiley, New York

    Google Scholar 

  • Ailliot P, Allard D, Monbet V, Naveau P (2015) Stochastic weather generators: an overview of weather type models. J Soc Fr Stat 156:101–113

    Google Scholar 

  • Alexandropoulos GC, Sagias NC, Berberidis K (2007) On the multivariate Weibull fading model with arbitrary correlation matrix. Antennas Wirel Propag Lett IEEE 6:93–95

    Article  Google Scholar 

  • Allard D (2012) Modeling spatial and spatio-temporal non Gaussian processes. In: Porcu E, Montero JM, Schlather M (eds) Advances and challenges in space–time modelling of natural events, Lecture notes in statistics, vol 207. Springer, Berlin, pp 141–164

    Chapter  Google Scholar 

  • Anderson TW (1984) An introduction to multivariate statistical analysis, 3rd edn. Wiley, New York

    Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. CRC Press, Boca Raton

    Book  Google Scholar 

  • Baxevani A, Lennartsson J (2015) A spatiotemporal precipitation generator based on a censored latent Gaussian field. Water Resour Res 51:4338–4358

    Article  Google Scholar 

  • Baxevani A, Podgórski K, Wegener J (2014) Sample path asymmetries in non-Gaussian random processes. Scand J Stat 41:1102–1123

    Article  Google Scholar 

  • Bertschinger E (2001) Multiscale Gaussian random fields and their application to cosmological simulations. Astrophys J Suppl Ser 137:1–20

    Article  Google Scholar 

  • Beuman TH, Turner AM, Vitelli V (2012) Stochastic geometry and topology of non-Gaussian fields. Proc Natl Acad Sci 109:19943–19948

    Article  Google Scholar 

  • Beuman TH, Turner AM, Vitelli V (2013) Extrema statistics in the dynamics of a non-Gaussian random field. Phys Rev E 87:022142

    Article  CAS  Google Scholar 

  • Bolin D, Wallin J (2016) Spatially adaptive covariance tapering. Spat Stat 18:163–178

    Article  Google Scholar 

  • Brook D (1964) On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems. Biometrika 51:481–483

    Article  Google Scholar 

  • Catelan P, Lucchin F, Matarrese S (1988) Peak number density of non-Gaussian random fields. Phys Rev Lett 61:267–270

    Article  CAS  Google Scholar 

  • Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, New York

    Book  Google Scholar 

  • Christakos G (1992) Random field models in earth sciences. Academic Press, San Diego

    Google Scholar 

  • Cleveland WS, Loader C (1996) Smoothing by local regression: principles and methods. In: Härdle W, Shimek G (eds) Statistical theory and computational aspects of smoothing, Proceedings of the COMPSTAT ’94 satellite meeting, pp 10–49. Springer

  • Cooley D, Sain SR (2010) Spatial hierarchical modeling of precipitation extremes from a regional climate model. J Agric Biol Environ Stat 15:381–402

    Article  Google Scholar 

  • Cressie N (1993) Spatial statistics. Wiley, New York

    Google Scholar 

  • Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B Stat Methodol 70:209–226

    Article  Google Scholar 

  • Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken

    Google Scholar 

  • Davison AC, Huser R, Thibaud E (2013) Geostatistics of dependent and asymptotically independent extremes. Math Geosci 45:511–529

    Article  Google Scholar 

  • Deutsch CV (2002) Geostatistical reservoir modeling. Oxford University Press, New York

    Google Scholar 

  • Diggle PJ, Tawn JA, Moyeed RA (1998) Model-based geostatistics. J R Stat Soc Ser C Appl Stat 47:299–350

    Article  Google Scholar 

  • Elogne SN, Thomas C, Perrin O (2008) Nonparametric estimation of smooth stationary covariance functions by interpolation methods. Stat Inference Stoch Process 11:177–205

    Article  Google Scholar 

  • Emery X, Lantuéjoul C (2006) TBSIM: a computer program for conditional simulation of three-dimensional gaussian random fields via the turning bands method. Comput Geosci 32:1615–1628

    Article  Google Scholar 

  • Feynman RP (1982) Statistical mechanics. Benjamin and Cummings, Reading

    Google Scholar 

  • Garcia O (1981) Simplified method-of-moments estimation for the Weibull distribution. NZ J For Sci 11:304–306

    Google Scholar 

  • García-Soidán PH, Febrero-Bande M, González-Manteiga W (2004) Nonparametric kernel estimation of an isotropic semivariogram. J Stat Plan Inference 121:65–92

    Article  Google Scholar 

  • Gelfand AE (2012) Hierarchical modeling for spatial data problems. Spatial Stat 1:30–39

    Article  Google Scholar 

  • Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. CRC Press, Boca Raton

    Book  Google Scholar 

  • Genton MG, Kleiber W (2015) Cross-covariance functions for multivariate geostatistics. Stat Sci 30:147–163

    Article  Google Scholar 

  • Gerber F, de Jong R, Schaepman ME, Schaepman-Strub G, Furrer R (2018) Predicting missing values in spatio-temporal remote sensing data. IEEE Trans Geosci Remote Sens 56:2841–2853

    Article  Google Scholar 

  • Ghosh S (2018) Kernel smoothing: principles, methods and applications. Wiley, Hoboken

    Google Scholar 

  • Gneiting T, Kleiber W, Schlather M (2010) Matérn cross-covariance functions for multivariate random fields. J Am Stat Assoc 105:1167–1177

    Article  CAS  Google Scholar 

  • Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York

    Google Scholar 

  • Grigoriu M (2013) Stochastic calculus: applications in science and engineering. Springer, London

    Google Scholar 

  • Hall P, Fisher N, Hoffman B (1994) Properties of nonparametric estimators of autocovariance for stationary random fields. Ann Stat 22:2115–2134

    Article  Google Scholar 

  • Hennessey JP Jr (1977) Some aspects of wind power statistics. J Appl Meteorol 16:119–128

    Article  Google Scholar 

  • Hristopulos DT (2003) Spartan Gibbs random field models for geostatistical applications. SIAM J Sci Comput 24:2125–2162

    Article  Google Scholar 

  • Hristopulos DT (2015a) Covariance functions motivated by spatial random field models with local interactions. Stoch Env Res Risk Assess 29:739–754

    Article  Google Scholar 

  • Hristopulos DT (2015b) Stochastic local interaction (SLI) model: bridging machine learning and geostatistics. Comput Geosci 85:26–37

    Article  Google Scholar 

  • Hristopulos DT, Elogne S (2007) Analytic properties and covariance functions of a new class of generalized Gibbs random fields. IEEE Trans Inf Theory 53:4667–4679

    Article  Google Scholar 

  • Hristopulos DT, Porcu E (2014) Multivariate Spartan spatial random field models. Probab Eng Mech 37:84–92

    Article  Google Scholar 

  • Hristopulos DT, Petrakis M, Kaniadakis G (2014) Finite-size effects on return interval distributions for weakest-link-scaling systems. Phys Rev E 89:052142

    Article  CAS  Google Scholar 

  • Hristopulos DT, Petrakis MP, Kaniadakis G (2015) Weakest-link scaling and extreme events in finite-sized systems. Entropy 17:1103–1122

    Article  Google Scholar 

  • Kardar M (2007) Statistical physics of fields. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Kazianka H, Pilz J (2010) Spatial interpolation using copula-based geostatistical models. In: Atkinson PM, Lloyd CD (eds) geoENV VII—geostatistics for environmental applications. Springer, Dordrecht, pp 307–319

    Chapter  Google Scholar 

  • Kazianka H, Pilz J (2011) Bayesian spatial modeling and interpolation using copulas. Comput Geosci 37:310–319

    Article  Google Scholar 

  • Kleiber W, Katz RW, Rajagopalan B (2012) Daily spatiotemporal precipitation simulation using latent and transformed Gaussian processes. Water Resour Res 48:W01523/17

    Article  Google Scholar 

  • Kotz S, Nadarajah S (2004) Multivariate t-distributions and their applications. Cambridge University Press, New York

    Book  Google Scholar 

  • Lantuéjoul C (2002) Geostatistical simulation: models and algorithms. Springer, Berlin

    Book  Google Scholar 

  • Lebrun R, Dutfoy A (2009) An innovating analysis of the nataf transformation from the copula viewpoint. Probab Eng Mech 24:312–320

    Article  Google Scholar 

  • Li H, Zhang D (2013) Stochastic representation and dimension reduction for non-Gaussian random fields: review and reflection. Stoch Environ Res Risk Assess 27:1621–1635

    Article  Google Scholar 

  • Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the SPDE approach. J R Stat Soc B 73:423–498

    Article  Google Scholar 

  • Martinez J, Iglewicz B (1984) Some properties of the Tukey g and h family of distributions. Commun Stat Theory Methods 13:353–369

    Article  Google Scholar 

  • Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:1835–1851

    Article  Google Scholar 

  • Monahan AH (2018) Idealized models of the joint probability distribution of wind speeds. Nonlinear Process Geophys 25:335–353

    Article  Google Scholar 

  • Monbet V, Prevosto M (2001) Bivariate simulation of non stationary and non Gaussian observed processes: application to sea state parameters. Appl Ocean Res 23:139–145

    Article  Google Scholar 

  • Mussardo G (2010) Statistical field theory. Oxford University Press, New York

    Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142

    Article  Google Scholar 

  • Nataf A (1962) Determination des distributions dont les marges sont données. C R Acad Sci 225:42–43

    Google Scholar 

  • Olea RA (2012) Geostatistics for engineers and earth scientists. Springer, New York

    Google Scholar 

  • Oliveira VD, Kedem B, Short DA (1997) Bayesian prediction of transformed Gaussian random fields. J Am Stat Assoc 92:1422–1433

    Google Scholar 

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge (online; accessed on October 31, 2018)

    Google Scholar 

  • Rivoirard J (1987) Two key parameters when choosing the kriging neighborhood. Math Geol 19:851–856

    Article  Google Scholar 

  • Rulloni V, Bustos O, Flesia AG (2012) Large gap imputation in remote sensed imagery of the environment. Comput Stat Data Anal 56:2388–2403

    Article  Google Scholar 

  • Sagias NC, Karagiannidis GK (2005) Gaussian class multivariate Weibull distributions: theory and applications in fading channels. IEEE Trans Inf Theory 51:3608–3619

    Article  Google Scholar 

  • Sang H, Gelfand AE (2009) Hierarchical modeling for extreme values observed over space and time. Environ Ecol Stat 16:407–426

    Article  Google Scholar 

  • Smith RS, O’Conell MD (2005) Interpolation and gridding of aliased geophysical data using constrained anisotropic diffusion to enhance trends. Geophysics 70:V121–V127

    Article  Google Scholar 

  • Sornette D (2004) Critical phenomena in natural sciences. Springer, Berlin

    Google Scholar 

  • Tukey JW (1977) Exploratory data analysis, vol 1. Addison-Wesley, Reading

    Google Scholar 

  • Vanmarcke E (2010) Random fields: analysis and synthesis. World Scientific, Hackensack

    Book  Google Scholar 

  • Vapnik VN (2000) The nature of statistical learning. Springer, New York

    Book  Google Scholar 

  • Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42

    Google Scholar 

  • Wackernagel H (2003) Multivariate geostatistics. Springer, Berlin

    Book  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhya Ser A 26:359–372

    Google Scholar 

  • Xu G, Genton M (2017) Tukey g-and-h random fields. J Am Stat Assoc 112:1236–1249

    Article  CAS  Google Scholar 

  • Yaglom AM (1987) Correlation theory of stationary and related random functions I. Springer, New York

    Book  Google Scholar 

  • Yu K, Mateu J, Porcu E (2007) A kernel-based method for nonparametric estimation of variograms. Stat Neerl 61:173–197

    Article  Google Scholar 

  • Žukovič M, Hristopulos DT (2012) Reconstruction of missing data in remote sensing images using conditional stochastic optimization with global geometric constraints. Stoch Environ Res Risk Assess 27:785–806

    Article  Google Scholar 

  • Žukovič M, Hristopulos DT (2013) A directional gradient-curvature method for gap filling of gridded environmental spatial data with potentially anisotropic correlations. Atmos Environ 77:901–909

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dionissios T. Hristopulos or Anastassia Baxevani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hristopulos, D.T., Baxevani, A. Effective probability distribution approximation for the reconstruction of missing data. Stoch Environ Res Risk Assess 34, 235–249 (2020). https://doi.org/10.1007/s00477-020-01765-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-020-01765-5

Keywords

Navigation