Abstract
This paper presents a novel and versatile framework for building ensemble spatial interpolation functions. As with all ensemble methods, the central idea is to assemble a voting scheme where a set of weak interpolation functions are combined, by using an aggregation function, to produce a strong ensemble response. In the presented scheme, voter interpolation functions are weak because they deal with a minimal portion of sample data extracted from spatial random partition elements, while the ensemble as a whole uses all the available information as much as possible. The random partitions scheme behaves as a bootstrapping strategy applied in a spatial context. Experiments show that the proposed framework has the promising ability to produce robust interpolation functions that can both scale to handle large sample data sets and deal with uncertainty quantifications, although weak voter interpolation functions are deterministic or highly data-consuming.
Similar content being viewed by others
Data Availability
For further experimentation, data used in this work and a Python 3.x open-source library with algorithm implementations can be found here: https://github.com/alges/pyESI.
Notes
By the French mathematician Marie-Jean-Antoine Nicolas de Caritat, 1785.
\(\theta = [a_1^{\theta }, b_1^{\theta }] \times \cdots \times [a_d^{\theta }, b_d^{\theta}].\)
Sub-index \(({{{\mathcal {P}}}, {{\mathcal {M}}}})\) expresses that the interpolation is made using the information provided by positions and measurements.
Tree data structures are organized in levels: root node is at level 0, its child nodes are at level 1, and so on. The depth or height is the maximum level of a leaf (terminal) node in a tree (Preiss 1999).
This can be easily verified by noticing that each tree node has two children, so every tree level node amount is increasing exponentially with base 2.
References
Akima, H. (1978). A method of bivariate interpolation and smooth surface fitting for irregularly distributed data points. ACM Transactions on Mathematical Software (TOMS), 4(2), 148–159. https://doi.org/10.1145/355780.355786.
Battalgazy, N., & Madani, N. (2019). Categorization of mineral resources based on different geostatistical simulation algorithms: a case study from an iron ore deposit. Natural Resources Research. https://doi.org/10.1007/s11053-019-09474-9.
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 10(1145/361002), 361007.
Boisvert, J. B., & Deutsch, C. V. (2011). Programs for kriging and sequential Gaussian simulation with locally varying anisotropy using non-Euclidean distances. Computers and Geosciences. https://doi.org/10.1016/j.cageo.2010.03.021.
Breiman, L. (2001). Random forests. Machine Learning. https://doi.org/10.1023/A:1010933404324.
Burrough, P. A. (1986). Principles of geographical information systems for land resources assessment. Principles of geographical information systems for land resources assessment. https://doi.org/10.1097/00010694-198710000-00012.
Chan, P. K., & Stolfo, S. J. (1995). A comparative evaluation of voting and meta-learning on partitioned data. In Machine learning proceedings 1995, ICML’95, https://doi.org/10.1016/b978-1-55860-377-6.50020-7.
Chan, P. K., & Stolfo, S. J. (1997). On the accuracy of meta-learning for scalable data mining. Journal of Intelligent Information Systems. https://doi.org/10.1023/A:1008640732416.
Chilès, J. P., & Delfiner, P. (2012). Geostatistics: Modeling spatial uncertainty. 2nd edition. New York: Wiley. https://doi.org/10.1002/9781118136188.
Cohen, S., & Intrator, N. (2000). A hybrid projection based and radial basis function architecture. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/3-540-45014-9_14.
Cohen, S., & Intrator, N. (2002). A hybrid projection-based and radial basis function architecture: Initial values and global optimisation. Pattern Analysis and Applications. https://doi.org/10.1007/s100440200010.
Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1–3), 253–285.
Cressie, N. (2015). Statistics for spatial data. New York: Wiley.
Davies, M. M., & Van Der Laan, M. J. (2016). Optimal spatial prediction using ensemble machine learning. International Journal of Biostatistics. https://doi.org/10.1515/ijb-2014-0060.
Den Hertog, D., Kleijnen, J. P., & Siem, A. Y. (2006). The correct Kriging variance estimated by bootstrapping. Journal of the Operational Research Society. https://doi.org/10.1057/palgrave.jors.2601997.
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees. Machine Learning. https://doi.org/10.1023/A:1007607513941.
Duin, R. P. (2002). The combining classifier: To train or not to train? Proceedings - International Conference on Pattern Recognition. https://doi.org/10.1109/icpr.2002.1048415.
Džeroski, S., & Ženko, B. (2004). Is combining classifiers with stacking better than selecting the best one? Machine Learning. https://doi.org/10.1023/B:MACH.0000015881.36452.6e.
Emery, X., & Arroyo, D. (2018). On a continuous spectral algorithm for simulating non-stationary Gaussian random fields. Stochastic Environmental Research and Risk Assessment. https://doi.org/10.1007/s00477-017-1402-3.
Emery, X., & Maleki, M. (2019). Geostatistics in the presence of geological boundaries: Application to mineral resources modeling. Ore Geology Reviews. https://doi.org/10.1016/j.oregeorev.2019.103124.
Evgeniou, T., Pontil, M., & Elisseeff, A. (2004). Leave one out error, stability, and generalization of voting combinations of classifiers. Machine Learning. https://doi.org/10.1023/B:MACH.0000019805.88351.60.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1), 3133–3181.
Fouedjio, F., & Séguret, S. (2016). Predictive geological mapping using closed-form non-stationary covariance functions with locally varying anisotropy: Case study at El Teniente Mine (Chile). Natural Resources Research. https://doi.org/10.1007/s11053-016-9293-4.
Franco-Villoria, M., & Ignaccolo, R. (2017). Bootstrap based uncertainty bands for prediction in functional kriging. Spatial Statistics. https://doi.org/10.1016/j.spasta.2017.06.005.
Franke, R. (1982). Smooth interpolation of scattered data by local thin plate splines. Computers and Mathematics with Applications. https://doi.org/10.1016/0898-1221(82)90009-8.
Franke, R., & Nielson, G. M. (1991). Scattered data interpolation and applications: A tutorial and survey. In Geometric modeling, Springer (pp. 131–160). https://doi.org/10.1007/978-3-642-76404-2_6.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407.
Georganos, S., Grippa, T., Niang Gadiaga, A., Linard, C., Lennert, M., Vanhuysse, S., et al. (2019). Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International. https://doi.org/10.1080/10106049.2019.1595177.
Gielsdorf, F., & Hillmann, T. (2012). Mathematics and statistics. In Kresse, W., & Danko, D. M. (eds.), Springer handbook of geographic information, Berlin: Springer (pp. 7–10). https://doi.org/10.1007/978-3-540-72680-7_2.
Guhaniyogi, R., & Banerjee, S. (2019). Multivariate spatial meta kriging. Statistics and Probability Letters. https://doi.org/10.1016/j.spl.2018.04.017.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of statistical learning. 2nd ed. Springer. https://doi.org/10.1007/978-0-387-84858-7.
Hengl, T., Heuvelink, G. B., Kempen, B., Leenaars, J. G., Walsh, M. G., Shepherd, K. D., et al. (2015). Mapping soil properties of Africa at 250 m resolution: Random forests significantly improve current predictions. PLoS ONE. https://doi.org/10.1371/journal.pone.0125814.
Hothorn, T., & Lausen, B. (2005). Bundling classifiers by bagging trees. Computational Statistics and Data Analysis. https://doi.org/10.1016/j.csda.2004.06.019.
Huang, Y. S., & Suen, C. Y. (1995). A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/34.368145.
Jacobs, R. A. (1995). Methods for combining experts’ probability assessments. Neural Computation, 7(5), 867–888. https://doi.org/10.1162/neco.1995.7.5.867.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation. https://doi.org/10.1162/neco.1991.3.1.79.
Jordan, M. I., & Xu, L. (1995). Convergence results for the EM approach to mixtures of experts architectures. Neural Networks. https://doi.org/10.1016/0893-6080(95)00014-3.
Journel, A. G., & Huijbregts, C. J. (1978). Mining geostatistics (Vol. 600). London: Academic press.
Kleijnen, J. P. C. (2012). Simulation optimization via bootstrapped kriging: Tutorial. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.1860175.
Krcho, J. (1973). Morphometric analysis of relief on the basis of geometric aspect of field theory. Acta Geographica Universitatis Comenianae, Geographico-Physica, 1(1), 7–233.
Kuncheva, L. I. (2014). Combining pattern classifiers: methods and algorithms (2nd ed.). New York: Wiley. https://doi.org/10.1002/9781118914564.
Lakshminarayanan, B., Roy, D. M., & Teh, Y. W. (2014). Mondrian forests: Efficient online random forests. Advances in Neural Information Processing Systems, 4, 3140–3148.
Lakshminarayanan, B., Roy, D. M., & Teh, Y. W. (2016). Mondrian forests for large-scale regression when uncertainty matters. In Proceedings of the 19th international conference on artificial intelligence and statistics, AISTATS 2016.
LantuéJoul, C. (2002). Geostatistical simulation. Berlin: Springer. https://doi.org/10.1007/978-3-662-04808-5.
Laslett, G. M., McBratney, A. B., Pahl, P. J., & Hutchinson, M. F. (1987). Comparison of several spatial prediction methods for soil pH. Journal of Soil Science, 38(2), 325–341. https://doi.org/10.1111/j.1365-2389.1987.tb02148.x.
Li, J., & Heap, A. D. (2014). Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software, 53, 173–189. https://doi.org/10.1016/j.envsoft.2013.12.008.
Li, J., Heap, A. D., Potter, A., & Daniell, J. J. (2011). Application of machine learning methods to spatial interpolation of environmental variables. Environmental Modelling and Software. https://doi.org/10.1016/j.envsoft.2011.07.004.
Liu, Y., Cao, G., Zhao, N., Mulligan, K., & Ye, X. (2018). Improve ground-level PM2.5 concentration mapping using a random forests-based geostatistical approach. Environmental Pollution. https://doi.org/10.1016/j.envpol.2017.12.070.
Matheron, G. (1965). Les variables régionalisées et leur estimation: une application de la théorie des fonctions aléatoires aux sciences de la nature. Masson et CIE.
McCauley, J. D., & Engel, B. A. (1997). Approximation of noisy bivariate traverse data for precision mapping. Transactions of the American Society of Agricultural Engineers, 40(1), 237–245. https://doi.org/10.13031/2013.21236.
Menafoglio, A., Gaetani, G., & Secchi, P. (2018). Random domain decompositions for object-oriented Kriging over complex domains. Stochastic Environmental Research and Risk Assessment, 32(12), 3421–3437. https://doi.org/10.1007/s00477-018-1596-z.
Mitáš, L., & Mitášová, H. (1988). General variational approach to the interpolation problem. Computers and Mathematics with Applications. https://doi.org/10.1016/0898-1221(88)90255-6.
Mitáš, L., & Mitášová, H. (1999). Finding appropriate interpolation methods for. Geographical information systems: Principles, techniques, management and applications, 1, 481–492.
Nwaila, G. T., Zhang, S. E., Frimmel, H. E., Manzi, M. S., Dohm, C., Durrheim, R. J., et al. (2020). Local and target exploration of conglomerate-hosted gold deposits using machine learning algorithms: a case study of the witwatersrand gold ores. South Africa: Natural Resources Research. https://doi.org/10.1007/s11053-019-09498-1.
Orton, T. G., Pringle, M. J., Bishop, T. F., Menzies, N. W., & Dang, Y. P. (2020). Increment-averaged kriging for 3-D modelling and mapping soil properties: Combining machine learning and geostatistical methods. Geoderma. https://doi.org/10.1016/j.geoderma.2019.114094.
Philip, G. M., & Watson, D. F. (1987). Neighborhood discontinuities in bivariate interpolation of scattered observations. Mathematical Geology, 19(1), 69–74. https://doi.org/10.1007/BF01275435.
Preiss, B. R. (1999). Data structures and algorithms. New York: Wiley.
Re, M., & Valentini, G. (2012). Ensemble methods: A review. In M. J. Way, J. D. Scargle, K. M. Ali, & A. N. Srivastava (Eds.), Advances in machine learning and data mining for astronomy (pp. 563–593). New York: Taylor & Francis.
Reid, S., & Grudic, G. (2009). Regularized linear models in stacked generalization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-642-02326-2_12.
Roy, D. M., Teh, Y. W. (2009). The Mondrian process. In Advances in neural information processing systems 21–proceedings of the 2008 conference.
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.
Sekulić, A., Kilibarda, M., Heuvelink, G. B., Nikolić, M., & Bajat, B. (2020). Random forest spatial interpolation. Remote Sensing. https://doi.org/10.3390/rs12101687.
Shepard, D. (1968). A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference, ACM 1968.
Sibson, R. (1981). A brief description of natural neighbour interpolation in interpreting multivariate data. New York: Wiley.
Sjöstedt-de Luna, S., & Young, A. (2003). The bootstrap and kriging prediction intervals. Scandinavian Journal of Statistics. https://doi.org/10.1111/1467-9469.00325.
Su, H., Shen, W., Wang, J., Ali, A., & Li, M. (2020). Machine learning and geostatistical approaches for estimating aboveground biomass in Chinese subtropical forests. Forest Ecosystems. https://doi.org/10.1186/s40663-020-00276-7.
Tibshirani, R. J., & Efron, B. (1993). An introduction to the bootstrap. Monographs on Statistics and Applied Probability, 57, 1–436.
Watson, D. F. (1992). Contouring: A guide to the analysis and display of spatial data. Amsterdam: Elesiver. https://doi.org/10.1016/0098-3004(93)90069-h.
Wilkinson, B., & Allen, M. (2004). Parallel programming: Techniques and applications using networked workstations and parallel computers (2nd ed.). New Yrok: Prentice-Hall Inc.
Zhang, S. E., Nwaila, G. T., Tolmay, L., Frimmel, H. E., & Bourdeau, J. E. (2020). Integration of machine learning algorithms with gompertz curves and kriging to estimate resources in gold deposits. Natural Resources Research. https://doi.org/10.1007/s11053-020-09750-z.
Acknowledgments
The authors acknowledge funding from the Chilean National Agency for Research and Development (ANID) PIA-Project AFB180004. The third author also acknowledge funding from ANID through grants CONICYT/FONDECYT/N°3180655. The authors also acknowledge the valuable support of Prof. Xavier Emery and all team members of the ALGES Lab at the AMTC and the Department of Mining Engineering at the University of Chile. Finally, the authors thank the three anonymous reviewers whose comments/suggestions helped improve and clarify this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Egaña, A., Navarro, F., Maleki, M. et al. Ensemble Spatial Interpolation: A New Approach to Natural or Anthropogenic Variable Assessment. Nat Resour Res 30, 3777–3793 (2021). https://doi.org/10.1007/s11053-021-09860-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11053-021-09860-2