Research paper
A fast edge-based two-stage direct sampling method

https://doi.org/10.1016/j.cageo.2021.104742Get rights and content

Highlights

  • Speeding up the direct sampling process using edge information.

  • At most one additional parameter is needed.

  • It can be easily combined with other existing acceleration methods.

Abstract

Direct sampling is an efficient information theory-based multipoint simulation method. To reduce the computational cost, many approaches have been proposed to speed up this simulation method. In this paper, an edge-based two-stage strategy is proposed to achieve speed-up. The proposed method first performs a simulation on a coarse grid and then detects edge cells in the simulation grid. Next, only edge cells are simulated, and the remaining cells are assigned the average value of the neighbouring simulated and hard data cells. The proposed method needs only one easily-interpreted parameter and can be combined with other existing acceleration methods for direct sampling, such as parallelization. Comparison experiments are performed on categorical, continuous and multivariate variables, including two-dimensional and three dimensional examples. The experimental results show that the proposed method can reduce the simulation time by almost half with minimal loss in terms of pattern reproduction.

Introduction

Geostatistics is one of the most effective tools for modelling geographical phenomena. It takes the spatial association between objects into account, and this is appropriate for analysing geographical processes. However, traditional geostatistics only uses pairs of points to model spatial processes. Modelling complex geometries requires considering the statistical relations among multiple objects simultaneously. Accordingly, multipoint geostatistics (MPS) was proposed for geological facies modelling in the 1990s(Guardiano and Srivastava, 1993, Journel, 1993). There has been an increasing list of applications for MPS in various fields, such as fluid flow modelling(Hajizadeh et al., 2011, Okabe and Blunt, 2004), performing super-resolution mapping(Boucher, 2008, Mariethoz et al., 2011, Ge, 2013, Jha et al., 2013), classifying remotely sensed imagery(Ge and Bai, 2010, Tang et al., 2016), mapping fossil ice-wedge polygons(Meerschman et al., 2014), reconstructing sandstone(Xu et al., 2012), measuring hydraulic conductivity(Mahmud et al., 2014a), filling the gaps in a remotely sensed image(Yin et al., 2016), modelling fracture networks(Chugunova et al., 2017), reconstructing porous media(Zhang et al., 2015), or estimating mining resources(Hong et al., 2018), among others.

Multipoint geostatistics algorithms can be divided into three groups (Mariethoz and Caers, 2014). The first group contains normal equation-based algorithms, such as ENESIM(Guardiano and Srivastava, 1993), GENESIM(Hansen et al., 2016), SNESIM(Strebelle, 2001, Ma and Jafarpour, 2019) and IMPALA(Straubhaar et al., 2011, Straubhaar et al., 2013). These methods scan the training image to learn the probability distributions of different categories in terms of conditioning data. Most of these methods are confined to the simulation of categorical data. The second group includes pattern matching-based methods. These methods can perform simulations for either categorical or continuous data. Representative examples include FILTERSIM(Zhang et al., 2006, Wu et al., 2008), SIMPAT(Arpat and Caers, 2007) and RPAFSIM(Sharifzadehlari et al., 2018). These methods learn structure patterns from the training image, and then these patterns are pasted onto the simulation grid while taking conditioning data into account. The third type contains sampling- or information theory-based methods(Tahmasebi et al., 2012, Abdollahifard, 2016, Mahmud et al., 2014b, Hoffimann et al., 2017).

Direct sampling (DS)(Mariethoz et al., 2010, Abdollahifard and Faez, 2014) is a representative information theory-based MPS algorithm. It does not need to learn any patterns from training images and directly reproduces the patterns from training images in a stochastic way. The information theory involved in the method ensures that complex geometric structures and statistical characteristics are well reproduced in the simulation results. Compared with other methods, there are few limitations on the applications of DS. For example, it can be applied to categorical, continuous, or multivariate variables. Moreover, it only needs three parameters and is easy to implement. Therefore, DS has been widely used in various real life applications(Mariethoz et al., 2012, Grijp and Minnitt, 2015, Talebi et al., 2019)

Currently, there are many methods that have been proposed to accelerate the simulation process for DS from two perspectives. The first perspective is taking the advantage of parallel computing capabilities of modern computers to speed up the simulation process. A parallel simulation can be implemented at the realization level, path level and node level. For example, the simplest approach is using the multiple thread technique to find comparable patterns for data events(Mariethoz et al., 2010).Mariethoz (2010) decoupled the sending and the receiving operations to avoid synchronization at the path level.Huang et al. (2013) used graphics processing units (GPUs) to accelerate DS at the node level.

The other perspective is to speed up the pattern searching process using different strategies. Because the most time-consuming component of DS is finding comparable patterns for each unsampled cell, some researchers have attempted to accelerate the finding process using heuristic information or machine learning technologies. For example,Abdollahifard and Faez (2014) and Abdollahifard (2016) proposed using gradient information in the training image to reduce the search space. Gravey and Mariethoz (2020) proposed using fast Fourier transforms to quickly compute mismatch maps to speed up the search process.Zuo et al. (2020) proposed grouping similar patterns and speeding up the pattern finding process using a clustering tree. Parallel computing and fast pattern finding can be used simultaneously to speed up a simulation. For example, the fast direct sampling method proposed byAbdollahifard (2016) and GPU-accelerated node-level parallel computing(Huang et al., 2013) can be coupled to perform fast simulations.

In addition to the above two approaches, DS can be accelerated from an edge representation perspective. For a simulation involving complex geometries, some cells are on the edge of the target geographical phenomenon. Generally, the variance is large on the edge and small in the interior area of the target geographical phenomenon. Moreover, edge pixels are essential in describing structural information. For example, many image compression algorithms use edge information to reduce the costs of storage and transmission(Liu et al., 2007, Mainberger et al., 2011, Sharma et al., 2019). In a simulation, if the edge cells are simulated first and the cell values in the interior area are assigned directly in terms of nearby cells instead of via their own simulation, the computation time decreases as many cells are in the interior homogeneous area. Taking the simulation of categorical data as an example, if the edge cells are simulated first, the values of other pixels can be directly assigned to the nearest simulated category, and no further simulation is required. Therefore, the simulation time can be reduced, especially for training images with large homogeneous interior areas.

This strategy can save approximately half of the simulation time, as shown in the experimental section. As recently developed applications of DS concern high-resolution remotely sensed images, some DS simulations, for example, the downscaling of multispectral remotely sensed images(Oriani et al., 2020), require several days. In this work, the downscaling of a multispectral satellite image requires over 1500 CPU hours. In such applications, our approach can save considerable resources. Additionally, several authors have proposed approaches to reduce the computational cost of MPS (see the introduction for references) in recent years, highlighting the relevance of this problem. Some of these approaches require a time-consuming preprocessing step. For example, Sharma et al. (2019) reported that their method can save 7/8 of the simulation time, but it requires over 400s to generate the cluster tree when using Fig.2 as the training image. Accordingly, it is only useful when generating a large number of realizations. In contrast, the advantage of our method is that it does not require a preprocessing step and can save almost half of the simulation time even for a single simulation. Finally, compared to parallelization approaches that distribute the cost (and always in a suboptimal way according toAmdahl, 1967), our approach reduces the computational cost for results that are almost equivalent to those output by DS in terms of simulation quality. This presents clear advantages from both financial and environmental points of view.

Based on the above strategy, this paper proposes a two-stage direct sampling (TSDS) method to accelerate simulations. It performs simulations on a coarse subgrid, and then edge cells are labelled using simulated cells. Next, simulations are performed on the edge cells, and the non-edge cells are associated with the value shared by neighbouring simulated cells. As non-edge cells do need not to be simulated, TSDS simulates fewer pixels during a simulation than DS. This in turn saves computational time at the cost of ignoring small spatial variations when simulating continuous variables. TSDS is compatible with other speed-up approaches, such as parallel computing and fast search strategies. Therefore, it can be combined with these approaches to further accelerate simulations.

Section snippets

Review of direct sampling

DS uses a training image to perform simulations. The training image and simulation grid are both two-dimensional (2D) or three-dimensional (3D) regular grids. A regular grid is constituted by unit cells, i.e.,unit squares or unit cubes. Each cell u in the regular grid can be addressed by the coordinates (ux,uy) in 2D or (ux,uy,uz) in 3D.

DS first defines a random path for all unsampled cells in the simulation grid. For each unsampled cell u, DS finds the n nearest conditioning cells in the hard

Experiments and discussion

To validate the efficiency of TSDS, TSDS and DS are compared with regard to simulating categorical variables, continuous variables and multivariate variables using multiple training images. In addition, a 3D training image is used to demonstrate the performance of TSDS for 3D simulations. The probability of connectivity, a variogram chart, and an analysis of distance (ANODI)(Tan et al., 2014) are calculated to compare the simulation quality between TSDS and DS.

Conclusion and future work

The traditional DS approach is an effective but computationally expensive method for geoscience modelling. It is necessary to speed up DS from both the financial and environmental points of view. In this paper, TSDS is proposed to speed up the simulations. TSDS first detects the edges on a coarse simulation grid. Then, it only simulates the edge cells in the finest simulation grid. Finally, the values associated with the remaining cells are assigned to the average of their nearest simulated

CRediT authorship contribution statement

Hexiang Bai: Conceptualization of this study, Methodology, Software. Gregoire Mariethoz: Conceptualization of this study, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The work is supported by the National Natural Science Foundation of China (Nos. 41871286, 62072294, 62072291).

Computer code availability

Name of code: Two-stage edge-based direct sampling.

Developer and contact address: Hexiang Bai, Shanxi University, No. 92, Wuchen road, Xiaodian district, Taiyuan, 030006, Shanxi, China.

Telephone number and e-mail: +86 351 7010566.

Year first available: 2020.

Hardware required: Any computer that can run Ubuntu 18.04.

Software required: Ubuntu 18.04, Makefile, Clang++ 6.0.6,libflann.

References (53)

  • TalebiH. et al.

    Joint simulation of compositional and categorical data via direct sampling technique - application to improve mineral resource confidence

    Comput. Geosci.

    (2019)
  • TongX.-Y. et al.

    Land-cover classification with high-resolution remote sensing images using transferable deep models

    Remote Sens. Environ.

    (2020)
  • XuZ. et al.

    Multiple-point statistics method based on array structure for 3D reconstruction of fontainebleau sandstone

    J. Pet. Sci. Eng.

    (2012)
  • AbdollahifardM.J. et al.

    Fast direct sampling for multiple-point stochastic simulation

    Arab. J. Geosci.

    (2014)
  • AllardD. et al.

    An efficient maximum entropy approach for categorical variable prediction

    Eur. J. Soil Sci.

    (2011)
  • AmdahlG.M.

    Validity of the single processor approach to achieving large scale computing capabilities

  • ArpatG.B. et al.

    Conditional simulation with patterns

    Math. Geol.

    (2007)
  • BoucherA.

    Super resolution mapping with multiple point geostatistics

  • ChugunovaT. et al.

    Explicit fracture network modelling: From multiple point statistics to dynamic simulation

    Math. Geosci.

    (2017)
  • GeY. et al.

    MPS-based information extraction method for remotely sensed imagery:a comparison of fusion methods

    Can. J. Remote Sens.

    (2010)
  • GraveyM. et al.

    Quicksampling v1.0: a robust and simplified pixel-based multiple-point simulation approach

    Geosci. Model Dev.

    (2020)
  • GrijpY. et al.

    Application of direct sampling multi-point statistic and sequential gaussian simulation algorithms for modelling uncertainty in gold deposits

    J. South. Afr. Inst. Mining Metal.

    (2015)
  • GuardianoF. et al.

    Multivariate geostatistics: beyond bivariate moments

  • HongJ. et al.

    Multiple-point geostatistical simulation for mine evaluation with aeromagnetic data

    Exp. Geophys.

    (2018)
  • HouH. et al.

    Cubic splines for image interpolation and digital filtering

    IEEE Trans. Acoust. Speech Signal Process.

    (1978)
  • JhaS.K. et al.

    Demonstration of a geostatistical approach to physically consistent downscaling of climate modeling simulations

    Water Resour. Res.

    (2013)
  • Cited by (7)

    • A nearest neighbor multiple-point statistics method for fast geological modeling

      2022, Computers and Geosciences
      Citation Excerpt :

      The core component is to perform a searching program in TI and paste the center of the first matching pattern into SG. Further technical developments include parameter specification (Meerschman et al., 2013), distance measurement (Zuo et al., 2019), template design (Chen et al., 2019), tree-based clustering (Zuo et al., 2020), quantile sampling (Gravey and Mariethoz, 2020) and fast pattern search (Bai and Mariethoz, 2021). In recent years, the point-based MPS has been applied in numerous applications including groundwater management (Barfod et al., 2018), gap filling (Dembele et al., 2019), Antarctica bedrock topography modeling (Yin et al., 2022).

    • Spatial extrapolation of statistically downscaled weather data over the Northwest Himalayas at major glacier sites

      2022, Environmental Modelling and Software
      Citation Excerpt :

      Various algorithms have been developed using the concept of MPS; however, in this study, we select the Direct Sampling (DS) algorithm developed by Mariethoz et al. (2010). The potential of DS algorithm has been examined in various research fields for the simulation of spatial data (Oriani et al., 2016; Dembélé et al., 2019; Zuo et al., 2019, 2020; Bai and Mariethoz, 2021; Hosseini et al., 2021). The advantage of using DS is that it is able to handle both univariate and multivariate simulations of categorical and continuous variables (Straubhaar and Renard, 2021).

    View all citing articles on Scopus
    View full text