Efficient extraction of pore networks from massive tomograms via geometric domain decomposition

https://doi.org/10.1016/j.advwatres.2020.103734Get rights and content

Highlights

  • 1

    A geometric domain decomposition algorithm was developed for efficient extraction of pore networks from massive size tomograms.

  • 2

    The developed algorithm gave 7x speed up and 50 percent less RAM consumption when used parallel and serial mode of operation respectively.

  • 3

    A massive 20483 pore network was extracted from Fontainebleau rock image to find effective permeability using single phase stokes flow.

  • 4

    A resolution study on different porous rocks were performed to highlight the application of developed algorithm in pore scale modelling of porous media.

Abstract

Image-based modelling of porous media to study the transport and reaction processes has become an essential tool. The availability of increasingly large image datasets at high resolution creates a need to develop algorithms that can process massive size images at a low computational cost. This study presents an efficient workflow to extract pore networks from large size porous domains using a watershed segmentation with geometrical domain decomposition. The method subdivides a porous image into smaller overlapping subdomains and performs a watershed segmentation on each subdomain in parallel or serial modes of operation to save CPU time or memory RAM, respectively. The computational performance of the algorithm was analyzed on a large size image and found to consume 50 percent less memory and upto 7 times less CPU time than the standard watershed implementation. Pore networks of four massive digital rock images were extracted and the the effective permeability predicted by the networks agreed well with previously investigated values illustrating the accuracy of the method. An additional application of this method, taking advantage of the reduced computational cost, is the upgrading of low-resolution image. It was found that increasing the resolution of a coarse image leads to more accurate predictions by helping the watershed segmenation prouduce a more faithful pore network model. The developed algorithm is implemented in Python, and included in the open source project PoreSpy. It uses highly optimized and efficient modules such as Dask and Numba to obtain the maximum performance. The domain decomposition approach used here will also lend itself well to processing on distributed memory clusters, enabling the processing of even larger porous domains.

Introduction

Image-based modelling of porous media to study transport and reaction processes has become a key tool thanks to the relentless increase in computational power (Gerke et al., 2020a,b; Parvan et al., 2020; Sadeghi et al., 2020; Santos et al., 2020; Sheng and Thompson, 2016). This has been accompanied and spurred on by the advancement of 3D imaging techniques, such as X-ray micro- and nano-computed tomography (CT) and focused ion beam scanning electron microscopy (FIB-SEM), that have enabled the acquisition of massive domains at resolutions better than a fraction of micron. For instance, domains can be imaged with current CT techniques as large as 2000 voxels across and essentially unlimited in height since scans can be stitched together (Jackson et al., 2020; Soulaine et al., 2016; Wang et al., 2019). Modelling transport processes, especially complex ones involving reactions and multiple coupled physics, using numerical techniques known collectively as direct numerical simulation (DNS) on such large domains strain even the most advanced modern workstations however. One common remedy is to use pore network modeling (PNM), which is the most efficient when compared to other pore-scale modelling approaches (Song et al., 2019) and its importance has been well established in modelling transport processes of various types of porous media (Khan et al., 2020; Sadeghi et al., 2019; Thomson et al., 2018; Torayev et al., 2018).

The promise of using PNMs to efficiently model complex transport processes in large domains is currently limited by the network extraction step which can take hours or more. There are three main approaches to network extraction and they all depend on image analysis, which can be resource and computationally intensive. The approaches are (1) maximal ball (Dong et al., 2008; Dong and Blunt, 2009), (2) skeletonization or medial axis thinning (Al-Raoush and Willson, 2005; Bakke and Øren, 1997), (3) watershed segmentation (Sheppard et al., 2005), as well as some hybrid techniques (Gerke et al., 2020; Miao et al., 2017). The present work focuses on the third approach, by adapting the SNOW algorithm developed by Gostick (2017). The main computational bottleneck of this algorithm is the watershed segmentation step which requires high computational memory (RAM) and a significant amount of CPU time as the size of the 3D image approaches and exceeds 10003 voxels. It should be stressed that the present work can be applied to any technique that uses a watershed segementation since it is applied to the result of the watershed and is therefore agnostic to how the watershed was performed.

There have been a few recent attempts to accelerate the network extraction by employing parallelization. Rabbani et al. (2019) approached the problem by dividing the image into several subdomains and performing watershed segmentation on each subdomain individually. A pore network was then extracted from each subdomain and these networks were subsequently stitched together to get one large network. The stitching was done by analyzing the interaction of the throats at the intersection of two neighboring subdomains. Although their algorithm obtained an increase in computational efficiency, the accuracy was sacrificed since the pore connections and throat radius at the intersections of two domains are handled based on aribitrary stitching rules. The final pore network was found to differ noticably from the network extracted from a single large domain, and moreoever, the result depended significantly on the number and size of subdomains used. Kohanpur and Valocchi (2020) developed a stitching method to join the two neighbouring subdomains statistically. Similar to the approach of Rabbani et al. (2019), stitching networks “post-extraction” introduces errors in the predicted transport properties, and the relative error changes depending on the number of subdomains.

To remedy the limitations of stitching networks ‘post-extraction’, the present work focused on obtaining a watershed via parallelizing the watershed step, then performing network extraction on the entire image. A key factor to note is that the number of segmented regions in a large image is far fewer than the number of voxels. Therefore, if a segmented image can be obtained efficiently, the network extraction step can be applied to the segmentation without concern for efficiency. In general there are two approaches to parallelizing the watershed: redefining the inner workings of the algorithm to work in a truly parallel way, or dividing the image into chunks then applying an existing algorithm on each chunk individually (Kornilov and Safonov, 2018; Moga and Gabbouj, 1998a,b; Moga et al., 1998).

The first approach is quite challenging, because as pointed out in several reviews (Roerdink and Meijster, 2000; Romero-Zaliz and Reinoso-Gordo, 2018), the watershed is inherently iterative or sequential in nature requiring multiple passes over the image. Typical performance gains for algorithms in this category are between 2-4x on 8 core machines (Moga and Gabbouj, 1998a,b; Nicolescu et al., 1999; Romero-Zaliz and Reinoso-Gordo, 2018; Wagner et al., 2009). The best performance gain on more than 10003 voxels images was achieved by Wagner et al. (2009) who reported an average speed-up of 5x on a machine with 8 cores, but conceded that the accuracy of the segmentation may not be maintained. Their speed-up was also inflated since they parallelized the immersion algorithm of Vincent and Soille (1991), which is faster than the marker-based segmentation of Meyer and Beucher (1990) and Beucher (1994) as no preprocessing steps to define markers are required. Attempts have also been made to develop parallelized algorithms for use on distributed-memory systems (Moga and Gabbouj, 1998a,b; Moga et al., 1998), but again the need to share information between nodes during the processing limits the performance gains. In general, intrinsically parallel algorithm only achieve moderate speed improvements, each implementation is specific to one type of watershed, and they involve sophisticated implementations that are not widely available for use by porous media researchers.

The other approach to parallelization is referred to as divide-and-conquer or geometric domain decomposition, where the image is divided into chunks and each is processed separately, followed by a post-processing step to recombine each chunk back into a single domain. This approach offers several useful advantages as it can be run on distributed-memory systems as well as shared-memory machines with ease. The former has the benefit that many nodes can be requisitioned and/or each node can have a large amount of RAM, for processing massive images. Of course this requires access to specialized computational resources. Because no information is exchanged between processes, domain decomposition schemes can optionally be run asynchronously, meaning that a single shared-memory machine can be used to analyze large images by processing each chunk in series. This allows the use of normal desktop or even laptop computers on images that would otherwise be infeasibly large. Domain decomposition schemes can also be run in parallel on shared-memory high performance workstations, allowing a high degree of flexibility in how they are run. Another advantage is that the user is free to choose which watershed algorithm is applied to each domain. For instance, it has been shown that a marker-based watershed with carefully selected markers is necessary to obtain a valid network from porous media images (Gostick, 2017), so the ability to choose the algorithm and its settings is critical. As another scenario, if one had access to a distributed-memory system, each node could use an internally parallelized algorithm (of the type discussed above), and domain decomposition could be used to harness many nodes, significantly increasing the computational performance. The downside of the domain decomposision approach is that the accuracy of the segmentation is not assured, as it depends on the handling of interfaces between each subdomain, which is one of the major points addressed in this work.

This work presents a pore network extraction workflow that is based on parallelized watershed segmentation using domain decomposition. The proposed algorithm is conceptually simple, requiring only that each subdomain overlaps its neighbor by a sufficient amount to prevent edge artifacts, and followed by relabeling of watershed basins by comparison of the overlapping regions. All of this is done in the pre- and post-processing steps, so the actual watershed function is untouched. Crucially, the proposed approach can extract an identical network as that without domain decomposition, and the result is independent of the number of subdomains. The developed algorithm's computational performance was tested on a variety of large images and shown to be many times faster than the currently available approaches based on existing single-core implementations. The presented algorithm, by working on each subsection of the image separately, can also be applied in serial processing mode for cases where RAM is limited, and the effectiveness of this scheme was also explored. The proposed domain decomposition algorithm, therefore, offers the option of fast parallelized processing on high-performance workstations with many cores and ample RAM, or slower serial processing on lesser machines with limited cores and RAM. The possibility also exists for the present algorithm to be deployed on distributed-memory systems, though this aspect was not explored in the present work. An additional benefit of the present approach is that any watershed algorithm can be applied, allowing users to customize the scheme for their own workflow. Lastly, and most practically, this present algorithm is implemented using widely accessible open source python tools such as the Scipy stack (Virtanen et al., 2020), Dask (Rocklin, 2015) and Numba (Lam et al., 2015). The algorithm is included in the in open source project PoreSpy (Gostick et al., 2019),and the output of code is also well aligned with open source pore networking modelling project OpenPNM (Gostick et al., 2016).

Section snippets

X-ray micro-computed tomography Image Samples

To investigate the computational performance and geometric domain decomposition artifacts if any, 13 different types of 3D datasets were used. These included the X-ray μCT images of different types of rocks provided by Imperial College London (Dong, 2007; Dong and Blunt, 2009), random sphere packings were generated using PoreSpy (Gostick et al., 2019), and massive size images of the Fontainebleau Sandstone samples were constructed by Institute for Computational Physics at the University of

Algorithm validation

The validation of the domain decomposition algorithm was performed on five different types of 3D images (ID: 1-5) mentioned in Table 1. The number of segmented regions with and without domain decomposition was counted to ensure no regions were created or lost at the intersection of subdomains. The decomposition ratio, defined as the ratio of the shape of the original image to the shape of the subdomain, was varied from 2 to 4 to ensure algorithm robustness. At decomposition ratio 3 it was

Conclusions and future work

An efficient geometric domain decomposition algorithm was developed to extract pore networks from massive size images of porous domains. The algorithm subdivides the image into small subdomains with sufficient overlap, and these are further processed to get a watershed segmentation for use in the SNOW algorithm. Validation of the algorithm was performed on various types of porous materials and found to give identical results, in terms of number of watershed basins found, to that obtained

CRediT authorship contribution statement

Zohaib Atiq Khan: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing - original draft, Writing - review & editing. Ali Elkamel: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing - original draft, Writing - review & editing. Jeff T Gostick: Conceptualization, Project administration, Resources, Software, Supervision, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors would like to thank the University of Engineering and Technology Lahore, New Campus, Pakistan as well as the Natural Science and Engineering Research Council (NSERC) of Canada for their funding and support to make this research possible. Special thanks to University of Waterloo, Canada for the Doctoral Thesis Completion Award 2020 awarded to Zohaib Atiq Khan to carry out this research.

References (59)

  • Alina N. Moga et al.

    Parallel marker-based image segmentation with watershed transformation

    J. Parallel Distrib. Comput.

    (1998)
  • Alina N Moga et al.

    Parallel marker-based image segmentation with watershed transformation

    J. Parallel Distrib. Comput.

    (1998)
  • B.P. Muljadi et al.

    The impact of porous media heterogeneity on non-Darcy flow behaviour from pore-scale simulation

    Adv. Water Resour.

    (2016)
  • C. Pan et al.

    A high-performance lattice Boltzmann implementation to model flow in porous media

    Comput. Phys. Commun.

    (2004)
  • A. Parvan et al.

    Insight into particle retention and clogging in porous media; a pore scale study using lattice Boltzmann method

    Adv. Water Resour.

    (2020)
  • A. Rabbani et al.

    Pore network extraction using geometrical domain decomposition

    Adv. Water Resour.

    (2019)
  • M.A. Sadeghi et al.

    Dispersion modeling in pore networks: a comparison of common pore-scale models and alternative approaches

    J. Contam. Hydrol.

    (2020)
  • J.E. Santos et al.

    PoreFlow-Net: A 3D convolutional neural network to predict fluid flow through porous media

    Adv. Water Resour.

    (2020)
  • S.M. Shah et al.

    Micro-computed tomography pore-scale study of flow in porous media: effect of voxel resolution

    Adv. Water Resour.

    (2016)
  • Q. Sheng et al.

    A unified pore-network algorithm for dynamic two-phase flow

    Adv. Water Resour.

    (2016)
  • T.G. Tranter et al.

    pytrax: a simple and efficient random walk implementation for calculating the directional tortuosity of images

    SoftwareX

    (2019)
  • Y.Da Wang et al.

    Computations of permeability of large rock images by dual grid domain decomposition

    Adv. Water Resour.

    (2019)
  • S. Bakke et al.

    3-D pore-scale modelling of sandstones and flow simulations in the pore networks

    SPE J.

    (1997)
  • S. Beucher

    Watershed, hierarchical segmentation and waterfall algorithm

  • E.A. Codling et al.

    Random walk models in biology

    J. R. Soc. Interface

    (2008)
  • T.P. de Carvalho et al.

    Pore-scale numerical investigation of pressure drop behaviour across open-cell metal foams

    Transp. Porous Media

    (2017)
  • H. Dong

    Micro-Ct Imaging and Pore Network Extraction

    (2007)
  • H. Dong et al.

    Pore-network extraction from micro-computerized-tomography images

    Phys. Rev. E

    (2009)
  • H. Dong et al.

    Pore network modelling on carbonate: a comparative study of different micro-CT network extraction methods

  • Cited by (13)

    View all citing articles on Scopus
    View full text