Brought to you by:
Paper

Computational feasibility of calculating the steady-state blood flow rate through the vasculature of the entire human body

, , , , and

Published 28 August 2020 © 2020 IOP Publishing Ltd
, , Citation William P Donahue et al 2020 Biomed. Phys. Eng. Express 6 055026 DOI 10.1088/2057-1976/abaf5d

2057-1976/6/5/055026

Abstract

The human body contains approximately 20 billion individual blood vessels that deliver nutrients and oxygen to tissues. While blood flow is a well-developed field of research, no previous studies have calculated the blood flow rates through more than 5 million connected vessels. The goal of this study was to test if it is computationally feasible to calculate the blood flow rates through a vasculature equal in size to that of the human body. We designed and implemented a two-step algorithm to calculate the blood flow rates using principles of steady-state fluid dynamics. Steady-state fluid dynamics is an accurate approximation for the microvascular and venous structures in the human body. To determine the computational feasibility, we measured and evaluated the execution time, scalability, and memory usage to quantify the computational requirements. We demonstrated that it is computationally feasible to calculate the blood flow rate through 17 billion vessels in 6.5 hours using 256 compute nodes. The computational modeling of blood flow rate in entire organisms may find application in research on drug delivery, treatment of cancer metastases, and modulation of physiological performance.

Export citation and abstract BibTeX RIS

1. Introduction

All mammalian life depends on a circulatory system to transport oxygen and nutrients to organs such as the central nervous system (CNS), liver, and skin. The circulatory system includes the heart, lungs, arteries, capillaries, and veins (Gray and Lewis 1918). To regulate the delivery of oxygen and nutrients, the CNS and local cellular environments monitor and modulate blood flow rates throughout the circulatory system (Strandgaard and Paulson 1984, Attwell et al 2010). Blood flow may be altered by disease processes (Ingvar and Franzén 1974), internal and external environmental conditions (Haddy and Scott 1968, Hauge et al 1983), and stress and injury (Marion et al 1991, Fassbender et al 2001). In the United States, cardiovascular and cerebrovascular diseases together resulted in approximately 775,000 deaths in 2015 (Murphy et al 2017) and are expected to cost $1 trillion per year by 2030 (Heidenreich et al 2011).

A great deal is known about the physics of blood flow. The canonical Naiver-Stokes equations describe the time-dependent changes in fluid flow. To solve these equations computationally, the vascular geometry may be broken into a finite element mesh, resulting in a system of partial differential equations (Volker 2016), which is a well-understood and widely-used technique. Researchers have used this technique to simulate strokes (Tang et al 2005, Leng et al 2014) and stent safety and effectiveness (Conti et al 2017, Gomes et al 2017) to a high degree of detail. While it is theoretically possible to solve the three-dimensional (3-D) Naiver-Stokes equations for all the vessels in the human body, simulations using these techniques are typically limited to less than 50 vessels (Moore et al 2006, Xiao et al 2013). The ability to only model a small section of vasculature necessitates the application of approximations and boundary conditions (Quarteroni et al 2016), which limits the ability to realistically model the blood flow (West et al 1997).

The Naiver-Stokes equations can be simplified using a variety of assumptions that reduce the dimensionality, enabling the simulation of larger vascular networks. One popular choice is the one-dimensional (1-D) time-dependent approximation where the blood is transported axially along the vessel while assuming radial symmetry of the flow in the vessel (Quarteroni et al 2016). This approach has been used to model the effects of flow on blood vessels (Pan et al 2013, Hasan et al 2018), large portions of the arterial tree (Formaggia et al 1999, Olufsen et al 2000, Sherwin et al 2003, Müller and Toro 2014), and to predict changes in blood flow due to different physiological conditions (Steele et al 2007). The largest vascular network simulated to date was 5 million vessels representing a mouse brain, using 1-D techniques (Linninger et al 2019). Time-independent, or steady-state, models further reduce the dimensionality allowing larger vascular networks to be simulated (Sutera and Skalak 1993). This is an attractive approach because it reduces the computational requirements and simplifies implementation (Pries and Secomb 1990, Roth and Kiani 1999). Biologically, blood flow is steady in the capillaries and veins of the human vascular network (Olufsen et al 2000, Pan et al 2013). This approach has been used to simulate blood flow through the microvasculature of the brain (Linninger et al 2013) and heart (Hyde et al 2014) and to determine the relative viscosity of blood in vessels (Pries et al 1996). The work of Linninger et al (2019) successfully calculated the blood flow rates through 5 million individual vessels, the largest network previously simulated. Finally, lumped-parameter (or compartmental) modeling groups vessels together into compartments and models the transfer of blood between compartments using a simple parameterisation (Zagzoule and Marc-Vergnes 1986, Quarteroni et al 2016). This is used extensively in studies to model drug kinetics in different organs and tissues (Atkins 1974, McGowan et al 2017, Cherkaoui-Rbati et al 2017). With this approach, it is possible to model the distribution of blood throughout the entire human body but it provides no information on individual vessels (Holzhütter et al 2012).

Quarteroni et al (2016) provided an excellent review of blood flow modeling techniques and described approaches for coupling compartment, 1-D, and 3-D Naiver-Stokes models to simulate whole-body blood flow. To summarize the literature reports, blood flow can be simulated with high-levels of detail for a few blood vessels or low-levels of detail for entire organism. Previous studies performed detailed simulations in small numbers of vessels and then used simplified models to incorporate the influence of other vessels in the organ or human body. Hitherto, it was unknown if computing the blood flow rate through each and every vessel in the human body was feasible.

The goal of this work was to calculate the blood flow rates through 17 billion individual vessels, i.e., approximately the number in the human body (Herman 2016). To accomplish this task, we calculated the blood flow rates in a network of vessels that was modeled with fractal geometry. We used a simplified steady-state approach to calculate the blood flow rates. The network was geometrically partitioned using graph analysis to reduce the amount of intra-node communication necessary for the blood flow rate calculations, increasing the computational efficiency. We compared the calculated blood flow rates using our method to those of a previously-validated computational-fluid-dynamics (CFD) package to determine the accuracy of the model.

2. Methods

2.1. Modeling steady-state blood flow

2.1.1. Geometric model

The vascular geometry used in this study was constructed with a symmetric, fractal approach. Vessels were represented as rigid, cylindrical tubes that were connected at junctions to form a closed network. The term junction denotes any location where vessels meet, or the start and end points of the network, where boundary conditions are applied. The fractal model was conveniently scalable to billions of vessels and had only two boundary conditions, namely the blood pressure at the inlet and the blood flow rate at the outlet. The number of vessels (Nv) in a network was related to the number of bifurcations or generations, denoted by Ng, by

Equation (1)

In the model, each parent vessel branches into two child vessels. The smallest vessels in the network were 2.5 μm in radius and 55 μm in length (Lauwers et al 2008). Between each generation, the vessel radius was reduced by 2−1/3 following Murray's Law for bifurcation (Adam 2011) and the length was reduced by 0.8 to reduce vessel overlap for visualization (Donahue and Newhauser 2020). Figure 1 plots an illustrative example of a 2-dimensional network created using this algorithm. The algorithm used to construct this network geometry was described previously (Donahue and Newhauser 2020). Each vessel was modeled with a start location, end location, and inner radius, as well as junction indexes, and a list of immediately connecting vessels.

Figure 1.

Figure 1. A simple illustrative example of a 2-dimesional vascular network with six generations. The solid lines represent the arteries, while the dashed lines represent the veins. The circles depict junctions in the network.

Standard image High-resolution image

2.1.2. Fluid dynamics model

At a point ${\boldsymbol{x}}=\left({x}_{1},{x}_{2},{x}_{3}\right)$ in an arbitrary 3-D coordinate system, the time-dependent form of the Naiver-Stokes equations are

Equation (2)

and

Equation (3)

where ρ is the fluid density, ${\boldsymbol{v}}\left({\boldsymbol{x}},t\right)$ is the fluid velocity, ${\boldsymbol{T}}$ is the fluid stress tensor, and ${\boldsymbol{f}}\left({\boldsymbol{x}},t\right)$ represents volumetric forces acting on the fluid (Quarteroni et al 2016). Equation (3) describes the continuity of the blood flow through the vascular network, primarily that the blood is incompressible.

In this work, we simplified the Naiver-Stokes equations by assuming the flow to be steady, laminar, and fully developed. The solution gives the volumetric blood flow rate (Q) in a vessel of radius r and length L as

Equation (4)

where η is the blood viscosity, C is the conductance of the vessel, and Pin and Pout are the blood pressures at the inlet and outlet of the vessel, respectively (Linninger et al 2013). Equation (4) is also known as Poiseuille's equation (Sutera and Skalak 1993). To specify the viscosity, η, of the blood, we used an empirical model for in-vitro blood viscosity adapted from Pries et al (1996) for a constant hematocrit of 0.45, or

Equation (5)

where r is the radius of the vessel in millimeters and the value of η0 was 35 mPa s (Shibeshi and Collins 2005).

Because blood is an incompressible fluid, the net blood flow rate at each junction must be zero, or, mathematically,

Equation (6)

where n is the total number of vessels at the junction and Qm is the blood flow rate of the mth vessel at that junction (by convention the sign of Qm is positive if blood is entering the junction and negative if leaving). Using this constraint and equation (4), we constructed a system of linear equations that described the net flow rates at each junction to determine the corresponding fluid pressures. For example, in a network with 6 vessels (figure 2) the corresponding system of equations is

Equation (7)

where Pin in the inlet pressure and Qout is the outlet blood flow rate. We specified the boundary conditions Pin and Qout to ensure the problem is well-defined. These six equations are used to determine the six unknown pressures at the junctions. This system was cast in matrix form, or

Equation (8)

Figure 2.

Figure 2. Illustration of a simple vascular geometry showing the geometric relationship between the blood vessels and junctions. Junctions are labeled with numbers and vessels with letters.

Standard image High-resolution image

In this form, the 2-dimensional matrix mostly describes the conductance values of the vessels that connect junctions while the pressure vector describes the unknown pressures at each junction (e.g., Pi 's) and the vector at the right mostly specifies the net flow rate at each junction. For brevity, the 2-dimensional matrix will be referred to as the conductance matrix for the remainder of the manuscript. We solved this system of equations using iterative methods (van der Vorst 2003) to determine the pressures. With the pressures calculated at all the junctions in the network, we calculated the blood flow rate, Q, through each vessel using equation (4).

2.2. Algorithms

2.2.1. General overview

We designed a two-step algorithm (figure 3) to calculate blood flow rate in a vascular network of arbitrary size. The first step partitioned the vascular network's geometry and created a computationally-efficient distribution of data using graph partitioning algorithms (see section 2.2.2). The second step calculated blood flow rates through the vessels using matrix inversion techniques (see section 2.2.3).

Figure 3.

Figure 3. Flowchart mapping the two major steps and selected sub-steps of calculating the blood flow rates in a vascular network.

Standard image High-resolution image

We called the first step Network Preprocessing and the second step Blood Flow Rate Calculations. We developed separate modules for each step to facilitate validation and testing. Additionally, this modular design simplifies future extensions by allowing additional modules to manipulate the vascular data prior to calculating the blood flow.

2.2.2. Preprocessing of the vessel network's geometry

Limitations on the memory available in a single compute node, as well as matrix solver requirements, motivated the first step of our algorithm. For networks with greater than 67 million vessels, the required amount of memory to describe the vessel network geometry and the conductance matrix exceeded 64 GB available on each of our compute nodes. Our solution was to distribute the rows of the conductance matrix across multiple compute nodes. The matrix solver used in this study stipulated that rows of the conductance matrix be distributed in contiguous groups among the compute nodes (Falgout and Yang 2002), e.g., rows 1 to n on compute node 1, rows n + 1 to 2n on compute node 2, etc. To limit the volume of inter-node communication, the network geometry was partitioned to reduce the number of compute nodes where a given junction appeared.

The algorithm that constructed the network (Donahue and Newhauser 2020) stored the vascular data sequentially in the order of generation number, NG. If the data was read and distributed among compute between nodes sequentially and contiguously, it resulted in many junctions referenced on multiple compute nodes, reducing computational efficiency (figure 4(a)). We used a graph partitioning algorithm (Karypis 1999) to group and sort the vessels to minimize the number of multi-partition junctions (figure 4(b)). A multi-partition junction is a junction where vessels belonging to multiple partitions meet (figure 4). In practice, each partition is assigned to a different compute node during the Blood Flow Rate Calculations. Thus, to construct a given row in the matrix representing a multi-partition junction, the conductances of the vessels that meet at that junction must be communicated to the compute node that contains that row of the matrix. To facilitate this communication, one of the partitions was selected as the master partition. The master partition was used to decide which compute node was responsible creating the multi-partition row in the conductance matrix, including collecting the conductance values. The lists of immediately connected vessel indexes and the junction indexes were renumbered to correspond with the new partitions . The newly partitioned vessel data was written to the data file for subsequent processing by the Blood Flow Rate Calculation algorithm.

Figure 4.

Figure 4. Schematic illustration of the effects of graph partitioning on the data distribution. Panel (a) shows the distribution of data among the compute nodes after sequentially and contiguously importing the data. Panel (b) shows the distribution of data after the graph partitioning and renumbering of junction and vessel indexes. In both panels, the open circles represent multi-partition junctions where communication is necessary for the construction of the conductance matrix. The number of multi-partition junctions (NMPJ) is significantly reduced in Panel (b), greatly reducing the inter-node communication.

Standard image High-resolution image

2.2.3. Blood flow rate calculations

Calculating the blood flow rates in the preprocessed network comprised four sub-tasks: importing the vessel data, constructing the conductance matrix, solving for the blood pressure at each junction, and calculating the blood flow rate through each individual vessel. To accomplish the first sub-task, Each compute node used in the simulation was assigned one of the partitions created by the Network Preprocessing application. The compute node imported only the relevant junction indexes, boundary conditions, and vessel geometry from the data file.

We calculated the blood pressures at each junction by constructing the conductance matrix and solving for the unknown pressures. The second sub-task was to construct the conductance matrix (section 2.1.2) in compressed sparse row format. This required the communication of conductance values for vessels that met at a multi-partition junction. The third sub-task used an iterative Krylov method (van der Vorst 2003) to solve the system of linear equations. We applied a boundary condition at the network inlet of Pin = 133.3 Pa, which is the average blood pressure of the cardiac cycle (Aribisala et al 2014), and a boundary condition was assigned to the blood flow rate at the outlet using

Equation (9)

where ${Q}_{{\rm{out}}}$ is the outlet boundary condition, NG is the number of generations in the network, vc is the blood velocity in the smallest capillary (of generation G), and rc is the corresponding radius of the capillary. In our geometric model, the radius of the smallest capillary (rc) was 2.5 μm (Duvernoy et al 1981, Lauwers et al 2008, Cassot et al 2010). Ivanov et al (1981) measured the blood flow velocities in capillaries at approximately 1 mm s−1, which we assigned to vc.

To accomplish the final sub-task, the blood flow rate through each vessel was calculated according to the following procedure. First, the pressure for each of the multi-partition junctions was communicated to all the compute nodes sharing a given junction. Then, the blood flow rate of each vessel was calculated using equation (4). To optimize this procedure for computational speed and efficiency, we converted the problem into a matrix-vector multiplication operation.

In order to verify the calculations were free of blunders, each compute node calculated the blood flow rate using the following secondary method. For the specific-case of symmetric vascular networks used in this study on the assumption of steady-state flow, we calculated the blood flow rate (Q) of each vessel of network according to

Equation (10)

where l is the generation number of the vessel. This simple procedure allowed us to verify the calculations for networks with up to 17 billion vessels. Following this verification, the blood flow rates were written to persistent storage media.

2.3. Validation of steady-state blood flow calculations

2.3.1. Validation methodology

To assess the accuracy of the calculated blood flow rates (equation (4)), we compared the blood flow rates calculated by the steady-state model to those predicted by a commercial computational-fluid-dynamics (CFD) package (Autodesk CFD 2018, San Rafael, CA, USA). The commercial package used a finite-element model to calculate the blood flow using the Naiver-Stokes equations. This package was previously validated (Alexa et al 2014, Autodesk 2015).

We used two different types of vessel network configurations to validate the algorithm. The first types was a healthy vascular network, where all vessels were open (figure 1). For the healthy networks, we calculated the flow rates through networks with between 6 and 126 vessels (2 ≤ NG ≤ 6). The second type was a damaged network, where a single vessel was completely closed, i.e., the vessel radius was set to zero. For the damaged networks, we used a 126 vessel network (NG = 6) calculated the flow when three different vessels were closed (figure 5). The networks sizes were limited by the computational expense of CFD calculations.

Figure 5.

Figure 5. Illustration of vessels that were closed from the 126-vessel network (figure 1) to create the three different damaged-network configurations.

Standard image High-resolution image

To perform the calculations in the CFD package, it was necessary to convert the geometry created by the fractal method (Donahue and Newhauser 2020) into a fully specified 3-D model. This was done using a commercial computer-aided design (CAD) package (Inventor 2018, Autodesk, San Rafael, CA, USA). First, we imported the bifurcation locations and vessel radii from our model geometry. Next, linearly-tapered vessels were constructed between junctions. Tapering vessels avoided nonphysical sharp edges at the bifurcations in the CFD model. The initial radius of each vessel matched the radius of the corresponding generation in our model while the final radius was the radius of the next generation. We chose this approach to simplify the implementation of the tapers. The vessels in the final generation had a constant radius of 2.5 μm. We converted the vessel network geometries manually. Closed vessels were completely removed from the geometric representation in the CAD model. Using the CFD package, we specified the material properties of the blood and vessel walls using the built-in materials library. Automated tools, provided by the package, were used to compute the simulation mesh for each geometry. We used the default solver of the CFD package to perform the blood-flow calculations. The solver was set up to perform laminar calculations of the blood flow, in accordance with the assumptions of the steady-state model. Additionally, it was found that the Reynolds number was less than 100 for all networks simulated, implying that turbulent flow was not present. Table 1 lists the mesh generation and solver settings for the commercial package. We used the boundary conditions specified in the steady-state model (section 2.2.3) for the CFD package calculations.

Table 1. Settings used for mesh generation and solver in the CFD package. Settings not listed were set to the default values.

Mesh Generation Solver
Diagnostics Automatic Convergence Control
Minimum Refinement Length0.0025 mmInstantaneous Convergence Curve Slope× 10−13
Wall Layer Time-Average Convergence Curve Slope× 10−13
Number of Layers3Time-Average Convergence Curve Concavity× 10−13
Layer Factor0.2Field Fluctuations× 10−13
Layer gradationAutoMaximum Iterations1500
Advanced Physics
Resolution Factor0.9CompressibilityIncompressible
Edge Growth Rate2TurbulenceLaminar
Minimum Points on Edge3 Automatic Mesh Adaptation
Points on Longest Edge10Number of Cycles3
Volume Growth Rate2Growth Rate2
Surface Growth Rate2Boundary Layer Growth1.1
Enhancement Growth Rate2Refinement Limit0.001
Refinement length0 or minimumResolution Factor0.9
Fluid Gap Elements3  

We recorded the calculated flow rates from the steady-state model (equation (4)) and the CFD package. Our model calculated a single value of the blood flow rate through each vessel. For the CFD package, which calculated the distribution of blood flow rates throughout each vessel, we recorded the blood flow rate at the middle of each vessel. Theoretically, blood vessels directly downstream of a closed vessel have a zero flow rate. The iterative solvers used in both our algorithm and the CFD package reported negligible but non-zero values in these vessels that were caused by limitations in machine precision (e.g., rounding and truncation errors). We removed these artifacts by setting blood flow rates less than 1 μm3/s to zero. This value, which is 20,000 times smaller than the theoretical minimum flow rate in the network (1.9 × 104 μm3 s−1), is negligible. We computed the unsigned absolute ($\left|{{\rm{\Delta }}}_{\mathrm{abs}}\right|$) and relative ($\left|{{\rm{\Delta }}}_{\mathrm{rel}}\right|$) differences in blood flow reported by the two calculation methods.

2.3.2. Computational environment

Validation measurements were performed using a personal computer. The system had a 4-core 4.1 GHz CPU (Core i5-3570K, Intel Corporation, Intel Corporation, Santa Clara, CA, United States) and 32 GB of memory. The operating system was Windows 10 (Microsoft, Redmond, Washington, United States).

2.4. Computational feasibility measurements

2.4.1. Testing environment

We measured the performance of the algorithm on the QB2 cluster hosted by the Louisiana Optical Network Initiative. This shared resource has 504 compute nodes. Each compute node contains 64 GB of memory and two 10-core processors (2.8 GHz Ivy Bridge-EP E5-2680 Intel Xeon 64-bit). An Infiniband Interconnect (Pfister 2001) provides internode communication and access to persistent storage, comprising a 2.8-PB parallel file system. The file system uses a Lustre architecture (Schwan 2003), enabling up to 16 compute nodes to operate on a single file simultaneously.

2.4.2. Software environment

We compiled all software developed in this study with a commercial compiler (Intel C++ Compiler Cluster Edition v18.0, Intel Corporation, Santa Clara, CA, United States). The applications utilized shared-memory parallelism, distributed-memory parallelism, and linear algebra routines. We used OpenMP (Dagum and Menon 1998) for shared-memory parallelism, an implementation of the message passing interface (MPI) for distributed memory communication (Intel MPI v18.0), and the Math Kernel Library (MKL v18.0) of optimized linear algebra routines. These were included in the compiler.

All input and output (I/O) operations to persistent storage for our applications utilized the Hierarchical Data Format version 5 (HDF5) library (Folk et al 1999). Our applications used the parallel I/O capabilities of the HDF5 library and the Lustre file system to increase speed. We tuned the I/O operations following the suggestions of Howison et al (2010).

Graph partitioning in the Network Preprocessing application utilized a k-way partitioning algorithm implemented in the parallel METIS library (ParMETIS version 4.0.3) (Karypis and Kumar 1998, Karypis 1999). We used the k-way partitioning routine on an edge graph of our vascular network. The ParMETIS library was compiled with support for large indexes (64-bit) and double-precision real (64-bit) values to accommodate networks with up to 2.3 × 1018 vessels.

Matrix solving was completed with the Loose Generalized Minimum Residual algorithm (LGMRES) (Baker et al 2005), a Krylov-based iterative solver. We used the BoomerAMG solver (Henson and Yang 2002) as a matrix preconditioner to accelerate the convergence of the iterative solution. Implementations of these algorithms were provided by the Hypre Solver Library (Falgout and Yang 2002, Stüben 2001). We compiled the library with support for OpenMP and large (64-bit) indexes. Table 2 lists the user-specified parameters and options for the matrix solver and preconditioner. These parameters were determined through an iterative tuning process.

Table 2. Configuration parameters for the LGMRES (Baker et al 2005) and BoomerAMG (Henson and Yang 2002) solvers used in this work.

Parameter nameValue
LGMRES Matrix Solver Settings
CoarsenType10
StrongThreshold0.25
MeasureType1
KDim35
Tol× 10−6
AbsTol× 10−6
MaxIter150
BoomerAMG Matrix Solver Settings
KeepTransposeTrue
MaxLevels20
MaxIter1
Tol0
RAP2True
InterpType7
PMaxElmts6
AggNumLevels2
AggInterpType1
AggP12MaxElmts6
AggPMaxElmts6
RelaxOrder1
Cycle 1 NumSweeps1
Cycle 2 NumSweeps2
Cycle 3 NumSweeps2

2.4.3. Computation speed

One of the primary metrics for computational feasibility and utility is execution time. We measured the execution times to calculate the blood flow rate in networks containing between 14 vessels and 17 billion vessels (3 and 33 generations, respectively). Each calculation was repeated 5 times to quantify the variation in execution time caused by resource contention in the computational cluster. For large vessel networks (i.e. NG > 25 and NV > 67,000,000), the volume of data required to describe the network and perform the calculations exceeded the memory of a single compute node (64 GB). The number of compute nodes, Nc, used to perform the calculations was

Equation (11)

The Network Preprocessing application was only required for large networks (NG > 25). Computational execution times were recorded for the Blood Flow Rate Calculations and the Network Preprocessing independently. The total time, vessel-network-partitioning time, vessel-index-renumbering time, junction-renumbering time, and I/O time were recorded for the Network Preprocessing application. For the Blood Flow Rate Calculations, the time to complete the I/O operations, conductance matrix construction, conductance matrix solution, blood flow rate calculation, and total times were recorded.

2.4.4. Computational scalability

Weak and strong scaling are two metrics used to evaluate the scalability of an algorithm in high-performance computing (Gustafson 1988). Weak scaling characterizes the execution time as the number of processors increases when the workload per processor is fixed, e.g., as the number of processors doubles so does the workload. Ideally, as the number of processors increases the wall-clock execution time remains constant. We measured the weak scaling for both Network Preprocessing and Blood Flow Rate Calculations. For the Blood Flow Rate Calculations, the smallest vessel network used was NG = 25 on a single compute node and doubled the number of vessels and compute nodes up to NG = 33 on 256 compute nodes. Because the Network Preprocessing was only used for more than one compute node, we tested the scalability in the range from NG = 26 on 2 compute nodes and to NG = 33 on 256 compute nodes.

Strong scaling characterizes how adding more processors reduces the computation time of a fixed size problem. This is typically measured using the speedup factor. Ideally, the speedup factor is equal to the ratio of the new number of processors to the original number used for the problem (Gustafson 1988). The strong scaling was determined for the Blood Flow Rate Calculations by constructing a fixed-size vessel network (NG = 25) and varying the numbers of compute nodes (1 ≤ Nc ≤ 256). Strong scaling of the Network Processing considered a 26-generation network, and 1 ≤ Nc ≤ 256. Strong scaling was quantified by calculating the speedup factor (S) according to

Equation (12)

where Tq is the time it takes a reference number of processors, q, to complete the calculation and Tn is the time it takes n processors to perform the same calculation. For the Blood Flow Rate Calculations, q = 1. The Network Preprocessing had a reference number of processors of q = 2.

2.4.5. Memory requirements

Memory requirements for Network Preprocessing and Blood Flow Rate Calculations were collected separately using a heap memory profiling tool (Valgrind Massif version 3.13 (Nethercote and Seward 2007)). We measured the peak and average memory usage for vessel networks with 3, 13, 19, 25, 26, 28, 30, and 33 generations, which spanned the operational range of our two-step algorithm and compute cluster (1 ≤ Nc ≤ 256). For network sizes requiring two or more compute nodes, the memory used per node was recorded.

3. Results

3.1. Computational feasibility

The algorithm, described in section 2.2, successfully calculated the blood flow rates in networks containing up to 17 billion vessels. The algorithm completed all computations without major errors for all network sizes tested demonstrating that the computations are feasible.

3.2. Validation of steady-state blood flow calculations

We compared the blood flow rates predicted by our model and CFD package for small networks (6–126 vessels). Figure 6(a) plots the absolute difference in blood flow rate reported by our blood flow algorithm compared to the CFD package for the healthy networks. This figure shows that the distribution for all networks considered is centered around 2 × 10−8 mm3 s−1), with a maximum error less than 1 × 10−6 mm3 s−1). Figure 6(b) plots the relative difference in blood flow rate between the CFD package and our model for healthy networks. The relative discrepancy was less than 0.85%. This provides confidence in our model's implementation and the accuracy of the calculated steady-state laminar blood flow rates. This data suggests that the errors are independent of the size of a healthy network.

Figure 6.

Figure 6. Histogram of the number of vessels (NV) versus (a) unsigned absolute $(| {{\rm{\Delta }}}_{{\rm{abs}}}| )$ and (b) relative $\left(\left|{{\rm{\Delta }}}_{{\rm{rel}}}\right|\right)$ differences between the commercial CFD package and our model for healthy networks. The colors break down the data by the number of generations in each network.

Standard image High-resolution image

Figure 7(a) plots the absolute difference between our blood flow algorithm and the CFD package for the damaged networks. Overall, our model predicted the blood flow rates with moderate accuracy. The maximum absolute unsigned difference for all three networks analyzed was less than 7 × 10−6 mm3 s−1), while the average signed absolute difference was 5.3 × 10−7 mm3 s−1). Figure 7(b) plots of the relative difference between the CFD package and our model for damaged networks. Here, the relative difference was less than 13% and the root-mean-square error was 1.9%. This accuracy is reasonable considering the assumptions made in the steady-state model. These data also suggest that the size of the errors depends on the generation of the damaged vessel. The main reason for the differences our model under-predicted the blood flow rate near the closed vessel and the amount of under-prediction decreased with increasing distance from the closed vessel (figure 8). This was caused, in part, by the tapered vessels in the CFD model, as opposed to the constant radius of the vessels in our model. This resulted in a smaller average radius for each vessel than in our model and a decreased flow (see equation (4)). Second, the CFD package included more physical realism than our model, especially when modeling the effects of junctions and sharp turns on the blood flow rate.

Figure 7.

Figure 7. Histogram of the number of vessels (NV) versus (a) unsigned absolute $(| {{\rm{\Delta }}}_{\mathrm{abs}}| )$ and (b) relative $(\left|{{\rm{\Delta }}}_{\mathrm{rel}}\right|)$ between the commercial CFD package and our model for damaged networks. The colors break down the data by generation of the removed vessel.

Standard image High-resolution image
Figure 8.

Figure 8. Relative difference Δr between our model and the CFD package calculations for the network with a vessel removed in the 4th generation.

Standard image High-resolution image

3.3. Computational speed

We measured the execution times of the Network Preprocessing and Blood Flow Rate Calculation applications (figure 9). A 17-billion vessel network (i.e., approximately the number in the human body (Herman 2016)) required 6.5 wall-clock hours to complete using 256 compute nodes. A 67-million vessel network (i.e., the maximum size that would fit on a single compute node and more than a whole mouse brain) took 1.8 wall-clock minutes on a single compute node. Finally, 4-million vessels (i.e., approximately the number of vessels in a mouse brain (Mayerich et al 2008)) required only 36 wall-clock seconds, also on a single compute node.

Figure 9.

Figure 9. Plot of execution time in core-hours (T) versus the number of vessels in a network (NV). The arrows signify the number of compute nodes, (Nc), that performed the calculations (Nc = 1 if omitted). Each compute node contained 20 processors The error bars represent the range of compute times of the 5 timing runs. For context, the dashed lines represent literature estimates for the number of vessels in the human brain (Lauwers et al 2008) and the mouse brain (Mayerich et al 2008).

Standard image High-resolution image

3.4. Computational scalability

To quantify the scalability of the algorithm, we measured both the weak (Figure 10) and strong scaling (figure 11). The weak scaling data shows that both applications scaled well when the number of processors was below 640 (32 compute nodes). The reasons each application performed poorly at Nc > 32 compute nodes differed. The Network Preprocessing was limited by the cluster architecture. Specifically, the application was restricted to 16 I/O channels leading to increasing contention as more processors were added, and thus the execution time increased. This explanation is supported by the strong scaling that showed the computational portions of the algorithm scaled well with the number of processors for the fixed problem size used.

Figure 10.

Figure 10. Plots of the wall-clock time (T) versus the number of compute nodes (Nc) for Network Preprocessing (a) and Blood Flow Rate Calculations (b). The dark circles represent the overall scalability of Network Preprocessing. Arrows specify the number of vessels in the problem.

Standard image High-resolution image
Figure 11.

Figure 11. Plots of the speedup factor (S) versus the number of compute nodes (Nc) for Network Preprocessing (a) and Blood Flow Rate Calculations (b). The dashed line represents perfect scalability. The dark circles represent the overall scalability of Network Preprocessing (a) and Blood Flow Rate Calculations (b).

Standard image High-resolution image

The scalability of the Blood Flow Calculation algorithm is limited by the matrix solving algorithm. The scalability of the matrix solver appears to be limited by two primary causes, namely, increasing inter-process communication with increasing compute nodes and memory use as the problem size increased. In the weak scaling measurements (figure 10(b)), the problem size increased as the number of processors was increased, improving the solver scalability. However, we see that a majority of the time was still spent on solving the matrix. This is because solving sparse matrices is limited by retrieving data from electronic memory for computation. As the problem size grew, more memory transfer was needed on each compute node. The sharp decrease in strong scaling of the matrix solver (figure 11(b)) could be attributed to the decreasing volume of arithmetic computations per node with increasing number of compute nodes. As the workload of each compute node decreased, the matrix solver spent more time communicating to the other compute nodes, this limits the scalability of the algorithms.

3.5. Memory usage

Figure 12 plots peak and average memory usage per node for the applications. The Network Preprocessing application remained well below the 62 GB available on each node. The Blood Flow Calculation required a peak of about 68 GB which occured during the matrix solving. This peak is above the available memory per node and could result in performance degradation due to the use of the hard disk as a virtual memory device.

Figure 12.

Figure 12. Plot of memory usage (M) versus the number of vessels (NV). The dashed line marks the maximum available memory on a single compute node (64 GB).

Standard image High-resolution image

4. Discussion

In this study, we developed a two-step algorithm to compute the blood flow rate through a vascular network corresponding in size to that of the human body. We validated the algorithm against a previously-validated computational-fluid-dynamics package. The computational characteristics of the algorithm, including execution time, scalability, and memory usage, were measured. We successfully demonstrated, for the first time, that whole-body blood-flow rate calculations are feasible with a contemporary high-performance computing system. The accuracy of two-step algorithm was confirmed by comparison to a commercial CFD package to within 13% for both healthy and damaged vascular networks. This confirms that the two-step blood flow algorithm was implemented correctly for present intents and purposes. To simulate a network with 17 billion vessels, only 6.5 hours of wall-clock time was required using 256 compute nodes (5120 processors). The algorithm scaled moderately as the number of processors increased, primarily due to limitations of the matrix solver and file I/O operations. In both steps of the algorithm, the average memory consumption per node was 56 GB, i.e. less than the 64 GB available. Finally, the results of this work show that 4 million vessels, the approximated size of a mouse brain (Mayerich et al 2008), executed in only 36 seconds on a single compute node.

The primary implication of this study is that new avenues of inquiry are possible that could elucidate blood-flow problems that were hitherto considered intractable. Several areas of medical research may find applications for this new capability, including the simulation and evaluation of injury, healing, disease, and drug delivery. In all of these applications, knowledge of the blood flow in individual vessels has the potential to improve the accuracy of physiological simulations, provide new insights, and, ultimately, to improve outcomes for patients. For example, many drugs are delivered to patients intravenously and the bloodstream transports these drugs throughout the body. A whole-body model of vasculature could, in theory, enable more detailed localization of the drug within each organ than would be possible with compartmental modeling, for example. Our model may find utility for validating the findings of compartmental models, as reported by Hyde et al (2014), but in much larger and complete vascular networks of whole organs. Another implication is that a simple whole-body model could potentially calculate boundary condition estimations used in advanced blood-flow simulations. This would alleviate the need for cyclic boundary conditions or relax simplifying assumptions currently used. Finally, using a whole-body model of blood flow could facilitate a more complete interpretation of data from existing diagnostic imaging and nuclear medicine procedures. For example, blood flow plays a critical role in generating the signal in functional MRI (Glover 2011). This imaging procedure measures the changes in blood flow in the brain and correlates those changes to brain activity. Being able to model blood flow though the entire brain may enable researchers to draw more robust conclusions from their data by reducing the impact of confounding changes in blood flow caused by the connected nature of a vascular network.

There are four important characteristics of blood flow models,namely the size of the vascular domain simulated,biophysical realism,computational performance,and accuracy of the blood flow rate. In the previous literature,the largest network used in a blood flow simulation was approximately 5 million vessels,representing a mouse brain (Linninger et al 2019). Our model is capable of calculating the blood flow rates through networks containing up to 17 billion vessels,but with a much simpler geometry. Realism of a blood flow simulation can be broken down into geometric and physical categories. Our model is comparable in physical realism to many steady-state models of blood flow (Boas et al 2008, Reichold et al 2009, Linninger et al 2013, Hyde et al 2014),which are used for their simplicity and scalability. 3-D finite element simulations provide superior geometric and physical realism (Zagzoule and Marc-Vergnes 1986, Sherwin et al 2003, Pan et al 2013, Blanco et al 2014, Müller and Toro 2014). One important feature of finite element simulations is the modeling of the blood's momentum,enabling accurate modeling of the second-order effects of bifurcations on blood flow rate. Another important characteristic of time-dependent blood flow rates in the arterial network is cardiac pulsation. Our model intentionally omitted these features so that we could expand the performance envelope to 17 billion vessels,where previously reported time-dependent and time-independent models have only reached 256,000 vessels. Similarly,our geometry was less realistic than that modeled in the finite-element and steady-state studies previously performed. This was done to probe the performance envelope for problems with billions of vessels. Finally,our work revealed important insights regarding the computational-performance aspects of blood flow rate calculations. Our literature search found only two previous studies that provided any details on execution times (Pan et al 2013, Blanco et al 2014), and none reported memory requirements. Neither study provided sufficient details to allow a direct comparison of execution times with those from this study. One reason for this is that the previous studies used time-dependent models of the blood flow rate,but neglected to provide the number of iterations required to solve the system of partial differential equations. The work performed by our algorithm is similar to the inner-most loop of a time-dependent simulation,where a matrix must be constructed and solved. Thus,the timing data reported in our study could be used to estimate a lower bound on the compute time for whole-body blood flow simulations using time-dependent models. Finally,the accuracy of blood flow simulations is a well studied topic. Many authors have compared their computational models to experimental results (Olufsen et al 2000, Sun et al 2012, Passerini et al 2013, Hyde et al 2014, Botar et al 2017). This differs from our computational approach to validation, making a direct comparison of results difficult. However, their results provide context to the accuracy of the advanced CFD technique we used. The maximum differences observed in our work were similar in magnitude to the 11% differences errors observed by Sun et al (2012) when they compared their 3-D Navier–Stokes model of blood flow rate to experimental results. Similarly, Passerini et al (2013) observed a 1% average difference between their experimental model and a 3-D Navier–Stokes model. This is on the order of the maximum relative difference seen for the healthy networks in this study. Our work is also comparable to the results published by Radaelli et al (2008), who compared 7 different models of blood flow through stents, each created by independent research groups. They reported a 5% difference in the predicted blood flow rate across all the models tested. This is similar to RMS error observed between our model and the CFD package (1.9%). Taken together, these results lend confidence to the predictions of our model.

To our knowledge, this is the first study to systematically explore the computational requirements to calculate the blood flow rate through each vessel in the entire human body. Using the simplified vessel geometry and a steady-state blood flow model, we were able to provide an estimate of the minimum computational resources needed to perform such calculations. This information can be used to inform future research study designs. Another strength was the use of the Poiseuille equation in representing the blood flow. It minimized the computational complexity of the problem, making the calculations feasible and results reproducible. A further strength of this work was the use of realistic lengths and radii for the vessel lumens in our vascular geometry. The smallest dimensions for the blood vessel in our model were taken from literature studying the morphometry of capillaries in the human brain (Lauwers et al 2008, Linninger et al 2013). For a 33 generation network, the largest vessels in the network have a radius of 5mm and a length of 9 cm. These sizes are comparable the radius and length of the ascending aorta with dimensions of 15 mm and 6.7 cm, respectively (Gray and Lewis 1918). This provided a suitable test of the matrix solver's ability for resolving dimensions spanning 5 orders of magnitude. Finally, the modular design of our algorithm allows for the integration of more accurate geometries with the steady-state blood flow model. In theory, our algorithm could readily accommodate more realistic vessel geometries, such as those taken from imaging studies (Heinzer et al 2006, Mayerich et al 2011, Shih et al 2012).

This work contains several limitations. The first limitation is the use of the steady-state approximation. While its use was necessary to compute the blood flow to billions of individual vessels, it limits the potential accuracy of the blood flow algorithm. In the steady-state approximation it is impossible to model the time dependent pulsation of blood pressure and the resulting forces on the arterial vessels (Pan et al 2013). This effect can cause dynamic changes in blood flow rates through the vessels, as the elasticity of them changes the flow profile. This is especially true in the larger arteries, where the pressure variation is the most intense. In the capillary beds and venous vessels, the steady state approximation is acceptable in the first order due to the lower pressure variation with time (Pries and Secomb 1990, Raju 2019). In addition to this major simplification, the steady-state approximation neglects second-order effects such as blood acceleration, red blood cell skimming, and vessel compliance (Quarteroni et al 2016). All of these have an effect on the accurate calculation of the blood flow in individual vessels of a network. Computational-fluid-dynamics algorithms implicitly or explicitly include these effects (Quarteroni et al 2016). To overcome this effect in large vascular networks, such as those investigated in this work, it may be possible to implement 1D time-dependent models which have been developed extensively in smaller networks (Olufsen et al 2000, Steele et al 2007, Pan et al 2013, Müller and Toro 2014, Quarteroni et al 2016, Hasan et al 2018).

Another limitation of this work is that our vessel geometry comprised straight vessels that were rigid and connected with fixed bifurcation angles. A uniform distribution of blood throughout the network results which significantly impact the ability to accurately calculate blood flows in an organ or body. The errors from this approximation are the same as using the Poiseuille equation because the Poiseuille ignores many of the geometric properties of a vessel, e.g. curvature and elasticity. Additionally, to show the computational feasibility of the approach, we considered it more important to model the correct number of vessels, than to model their geometry with high realism. This is because the computational workload scales with the number of objects modeled and is proportional to each object's complexity. This vessel network geometry is the only currently known geometry that can be scaled to billions of vessels; a requirement for charting the performance envelope at the whole-body scale. Ways to address these limitations were discussed extensively in a previous work (Donahue and Newhauser 2020). The differences in the accuracy of this geometry compared to a more advanced one will be addressed as part of a future research study, currently in progress in our laboratory.

An additional limitation of the validation was the modest sizes of the vascular networks, thus limiting the ability to confirm the accuracy of the model for large vascular networks. This caused by the computational expense of the CFD package used in this study. In principle, this limitation could be overcome with additional research and computing power. This study has shown that, for the first time, it is computationally feasible to compute blood flow rates in a network with as many vessels as the human body with current computing technology. These results could enable improvements in several fields of research in the life sciences and may provide critical insights about the blood flow in whole organs and organisms.

Acknowledgments

This work was supported by funding from the Bella Bowman Foundation and the US Nuclear Regulatory Commission (NRC; award NRT-HQ-84-15-G-0017). Portions of this research were conducted with high performance computing resources provided by the Louisiana Optical Network Initiative and Louisiana State University (HPC@LSU). The authors would like to thank Drs. Lydia Wilson, Christopher Schneider, Xin Li, and Paul Maggi for helpful discussions. This manuscript is dedicated to the memory of Bella Bowman.

Please wait… references are loading.