L-Sweeps: A scalable, parallel preconditioner for the high-frequency Helmholtz equation

doi:10.1016/j.jcp.2020.109706

Journal of Computational Physics

Volume 420, 1 November 2020, 109706

https://doi.org/10.1016/j.jcp.2020.109706 Get rights and content

Highlights

•
First fast parallel solver for the high-frequency Helmholtz equation.
•
Optimal $O (N / p)$ parallel scaling, up to logarithmic factors.
•
The scaling holds in a distributed memory environment.
•
Sweeping preconditioners on a checkerboard domain decomposition.
•
Numerical examples in 2D and 3D (including practical benchmark wave speeds) are considered and corroborate all claims.

Abstract

We present the first fast solver for the high-frequency Helmholtz equation that scales optimally in parallel for a single right-hand side. The L-sweeps approach achieves this scalability by departing from the usual propagation pattern, in which information flows in a $180^{\circ}$ degree cone from interfaces in a layered decomposition. Instead, with L-sweeps, information propagates in $90^{\circ}$ cones induced by a Cartesian domain decomposition (CDD). We extend the notion of accurate transmission conditions to CDDs and introduce a new sweeping strategy to efficiently track the wave fronts as they propagate through the CDD. The new approach decouples the subdomains at each wave front, so that they can be processed in parallel, resulting in better parallel scalability than previously demonstrated in the literature. The method has an overall $O ((N / p) \log ω)$ empirical run-time for $N = n^{d}$ total degrees-of-freedom in a d-dimensional problem, frequency ω, and $p = O (n)$ processors. We introduce the algorithm and provide a complexity analysis for our parallel implementation of the solver. We corroborate all claims in several two- and three-dimensional numerical examples involving constant, smooth, and discontinuous wave speeds.

Introduction

In this manuscript we consider the Helmholtz equation with variable wave speed and constant density on an open domain $Ω_{bulk}$ with absorbing boundary conditions, $\begin{matrix} - Δ u - ω^{2} m u = f in Ω_{bulk}, \\ + A.B.C on \partial Ω_{bulk}, \end{matrix}$ where for $x \in Ω_{bulk}$ , $m (x) = 1 / c^{2} (x)$ is the squared slowness for the p-wave speed c, u is the solution wavefield, ω is the characteristic frequency, and f is the source density. For simplicity, we consider $Ω_{bulk}$ to be a square or a cube. While we consider only problems of this form in the ensuing developments, the method we propose is a framework that can be applied to other time-harmonic formulations that model more complex physics with ease.

The Helmholtz equation arises in modeling many physical phenomena, including electromagnetic and subsurface wave propagation, and it is the cornerstone of many modern algorithmic pipelines of inverse problems, especially in exploration geophysics [87], [68]. Within those pipelines, the finest recoverable detail is determined by the highest frequency for which (1) can be solved. Thus, the efficient solution of time-harmonic wave equations at high-frequency is extremely important in scientific and industrial applications. Consequently, we consider discretizations of (1) in the high-frequency regime, which means that the characteristic mesh-size of the spatial discretization has to scale as $1 / ω$ .

Solving the high-frequency Helmholtz equation is notoriously difficult, mainly due to two issues:

•
the accuracy of the numerical solution deteriorates as the frequency increases. To solve this issue, usually called pollution error, one needs to either over-sample the discretization, or to use specialized discretization schemes [44], [45], [60],
•
the spectrum of the resulting matrix deteriorates as the frequency increases [61], [74], thus rendering the numerical solution of the linear system prohibitively expensive for large frequencies.

These issues are a primary driver of current research in this area [29], [58], [82]. In this manuscript we focus on the second issue and, for clarity and conciseness in the development, we present our new method using a simple low-order finite difference discretization. However, to address the first issue, our method can easily be extended to discretizations that mitigate pollution error¹ [30], [82], [93] increasing the overall cost only by polylogarithmic factors.

A number of methods for solving the resulting linear systems are available, including direct methods (e.g., [88]), domain decomposition methods (e.g., [84] ), and preconditioned iterative methods, including multigrid and shifted-Laplacian methods (e.g., [72], [39] ). We consider a solver or a preconditioner to be sequentially scalable if, up to logarithmic factors, it can compute a solution in $O (N)$ run-time in a sequential computational environment, where N is the total number of degrees-of-freedom in the discrete problem. We consider a solver or a preconditioner to be parallel scalable if, up to logarithmic factors, it can compute a solution in $O (N / p)$ run-time in a parallel computational environment with $p = O (n)$ processors for a single right-hand side. Here, n is the number of degrees of freedom in one spatial direction, i.e., $n = O (N^{\frac{1}{d}})$ where d is the problem dimension. In this paper, we present what we believe to be the first parallel scalable preconditioner for the high-frequency Helmholtz equation.

Currently, no scalable direct method, sequential or parallel, is available for the high-frequency regime. Standard domain decomposition methods (DDMs) localize the problem to subdomains and transfer information between subdomains. Domain decomposition methods can be applied as a direct solver (e.g., in the context of optimal Schwarz methods [35], [33]), but the resulting solver is not scalable. Domain decomposition methods can yield scalable preconditioners, both in parallel and sequentially. However, the resulting preconditioned iterative solver is not scalable because it requires $O (ω)$ iterations. This shows an important aspect for the construction of solvers based on preconditioning: a quality preconditioner has to be scalable but also the resulting iterative method has to converge with limited growth in iterations as ω increases. In this regard, we call a preconditioner effective if the resulting preconditioned system can be solved in $O (\log ω)$ iterations. Classical DDMs exhibit sub-optimal behavior for two reasons. First, artificial and spurious reflections are induced by imprecise information transfer between subdomains. Second, long-range wave-material interactions are not tracked consistently.

Sweeping preconditioners were introduced [26], [86], [16], [91], [32] to alleviate these drawbacks, while preserving the advantages of classical DDMs. Sweeping preconditioners make use of layered domain decompositions, accurate transmission conditions, and a layer-by-layer sweeping strategy. In particular, the layered domain decomposition provides scalability by controlling growth in computational cost and memory footprint. The accurate transmission conditions allow information to flow between the subdomains without numerical artifacts, e.g., artificial reflections. Finally, the sweeping strategy consistently tracks and propagates long-range wave-material interactions. The resulting approach can be interpreted as an approximate block-LU factorization, where the blocks correspond to the local problem in each subdomain. In particular, using sparse direct solvers on blocks that arise from sufficiently thin layers yields a method with quasi-linear (i.e., linear with poly-logarithmic factors) sequential complexity for a single right-hand side [26], [25], [92], [55], [77], [16], [86], [59]. In the presence of many right-hand sides, the layered domain decomposition allows for optimal parallelization via pipelining [95], [86].

Efforts to improve the performance of sweeping preconditioners have focused on obtaining better and more accurate factorizations [86], reducing the overall cost by restricting the unknowns to the interfaces [91], [92], and accelerating the computation on local subproblems (blocks in the approximate factorization) with compression or parallelization [67]. Several approaches aim specifically to sparsify those blocks, thus decreasing the sequential costs [55], [92]. However, while leveraging parallelism to accelerate the solve for local blocks decreases the run-times, it does not result in a parallel scalable solver for a single right-hand side.

The main bottleneck of current sweeping preconditioners is a lack of parallel scalability for a single right-hand side. The difficulty arises due to the inherently sequential nature of the Helmholtz problem, which is independent of the domain decomposition strategy. This sequential nature is dictated by the hyperbolic behavior of wave equations and is manifested in the need to accurately resolve the long-range interactions. The key to accurately resolving these interactions is a consistent information transfer between subdomains, which has precluded more general domain decompositions (i.e., beyond layered subdomains) and consequently inhibited parallelization.

In this work, we address these issues by departing from standard layered domain decompositions and introducing a Cartesian domain decomposition (CDD) along with a new sweeping strategy that is only viable on such domain decompositions. Our novel sweeping preconditioner consistently and efficiently tracks the wavefield across subdomains. The preconditioner can be interpreted as an approximate LU factorization, where parallelism arises because the diagonal blocks themselves have a block-diagonal structure. For a single right-hand side, the resulting algorithm has a $O (N / p)$ run-time, up to logarithmic factors, where, independent of geometric dimension, $p = O (n)$ is the number of processors. The new algorithm therefore provides the first parallel scalable solver for the Helmholtz equation at high-frequency for a single right-hand side.

Our approach can be applied to two- and three-dimensional problems. In two dimensions, the algorithm exhibits good weak parallel scalability. That is, as the frequency increases, and consequently the problem is refined, we refine the CDD so that the local problems in each subdomain have constant size. In three dimensions, we apply the same strategy as for the two-dimensional case in two of the spatial dimensions and extend the subdomains along the third spatial dimension resulting in beam-shaped quasi-one-dimensional local problems. In either case, we employ off-the-shelf direct solvers to solve the local problems and obtain parallel complexities of $O (n \log ω)$ and $O (n^{2} \log ω)$ in two and three dimensions, respectively. The 3D complexity can be reduced to $O (n \log ω)$ by employing existing parallel direct solvers [69], [52] for the quasi-one-dimensional problems in each subdomain, but we do not exploit these tools here.

The new algorithm therefore results in an $O (n \log ω)$ complexity for a single right-hand side, regardless of the geometric dimension. This new algorithm requires sweeps across the CDD in both cardinal and diagonal directions, and we can exploit parallelism in all directions orthogonal to the current sweep direction. While this new strategy increases parallelization, the sweeps are still inherently sequential and cannot be parallelized. Thus, we do not anticipate further reduction in complexity for a single right-hand, due exclusively to modification of the domain decomposition. In the presence of $O (n)$ right-hand-sides, it has been shown, however, that the sequential nature of the sweeps can be mitigated by pipelining multiple right-hand sides [95]. Using pipelining, solutions for all $O (n)$ right-hand-sides can be computed with $O (n \log ω)$ parallel complexity, i.e., the average parallel complexity per right-hand-side is $O (\log ω)$ .

As is customary in the development of new preconditioners or solution strategies, we focus on the parallel scalability of the algorithm and we assume that the accuracy of the solution stems from the chosen discretization. However, we give further exposition on achieving a parallel scalable and accurate algorithm in Section 6.

Our method is inspired by the method of polarized traces, which is well-established in the literature and, in contrast to other proposed preconditioners, has been proven applicable for problems with high-order discretizations [93], [78] and highly heterogeneous (and even discontinuous) wave speed distributions [92].

Standard linear algebra techniques such as nested dissection [40] and multi-frontal solvers [24], [89], coupled with $H$ -matrices [9], have been applied to the Helmholtz problem [42], [19], [88], [1]. While these methods take advantage of compressed linear algebra to gain more efficiency (e.g., [6]), in the high-frequency regime they still suffer from the same sub-optimal asymptotic complexity as standard multi-frontal methods (e.g., [22], [2], [18]).

Multigrid methods (e.g., [50], [13], [28], [72], [62], [28], [3]), once thought to be inefficient for the Helmholtz problem, have been successfully applied [14], [47], [76]. Approaches stemming from the complex-shifted Laplacian [28] can be advantageous if properly tuned. However, in general, they either require an expensive solver for the shifted problem or require a large number of iterations to reach convergence, depending on the scaling between the complex shift and the frequency [39]. Although these algorithms do not result in a lower computational complexity, they are highly parallelizable, resulting in low run-times.

Within the geophysical community, the analytic incomplete LU (AILU) method was explored in [65], [64]. A variant of Kaczmarz preconditioners [43] has been studied and applied to time-harmonic wave equations by [53]. Another class of methods, called hybrid direct-iterative methods, have been explored by [73]. Although these solvers have, in general, relatively low memory consumption they tend to require many iterations to converge, thus hindering practical run-times.

Domain decomposition methods for solving partial differential equations (PDEs) have a long history [71], [54]. The first application of domain decomposition to the Helmholtz problem was proposed by [23], which inspired the development of various domain decomposition algorithms, which are now classified as Schwarz algorithms.² However, the convergence rate of such algorithms is strongly dependent on the boundary conditions prescribed at the interfaces between subdomains [35]. The subsequent introduction of the optimized Schwarz framework in [33], which uses optimized boundary conditions to obtain good convergence, has inspired several competing approaches, including, but not limited to [34], [11], [37], [38], [36].

Absorbing boundary conditions for domain decomposition schemes for elliptic problems are introduced by [27] and the first application of such techniques to the Helmholtz problem traces back to the AILU factorization [31]. The sweeping preconditioner, introduced in [25], [26], was the first algorithm to show that those ideas could yield algorithms with quasi-linear complexity, leading to several related algorithms with similar claims such as the source transfer preconditioner [16], the rapidly converging domain decomposition [75] and its extensions [77], the double sweep preconditioner [86], [63], and the method of polarized traces [91]. For an extensive review on sweeping-type methods we direct the reader to [32].

To our knowledge, all domain decomposition methods and sweeping preconditioners have been based on layered domain decompositions, with one exception. In [51], a $q \times q$ CDD is considered in the context of the source transfer method. Their proposed strategy requires $p = q^{2}$ processors to obtain the sub-optimal parallel complexity of $O ((N / \sqrt{p}) \log N)$ , in both 2D and 3D. The primary difference between this method and our method is the sweeping strategy. The strategy employed in [51] is inspired by classical domain decomposition methods, while our method is inspired by sweeping preconditioners. Consequently, our method inherits a superior sweeping strategy.

We consider problems formulated within the framework of (1), where we choose to model absorbing boundary conditions using perfectly matched layers (PMLs) [7], [48]. To preserve the solution in all of $Ω_{bulk}$ , using a PML requires the domain to be extended, which in turn also implies that the solution, material property, and source spaces must also be extended. The extended domain, $Ω_{extended}$ , contains all of $Ω_{bulk}$ , as illustrated in Fig. 1a and a new boundary value problem is formulated on $Ω_{extended}$ , $\begin{matrix} - div (Λ \nabla u) - ω^{2} m u & = f in Ω_{extended}, \\ u & = 0 on \partial Ω_{extended}, \end{matrix}$ where, for brevity, we re-use the symbols u, m, and f to represent the extended solution wavefield, slowness, and source distribution. As is customary for PMLs in domains with varying wave speed, we assume that the source density, f, is extended by zero into $Ω_{extended}$ and the squared slowness, m, is extended into $Ω_{extended}$ along the normal direction of $\partial Ω_{bulk}$ in a constant fashion, as illustrated in Fig. 1b. The PML-extended Helmholtz equation (2) is reduced to (1) in $Ω_{bulk}$ by imposing that Λ is the identity matrix in $Ω_{bulk}$ . In the PML region, $Ω_{extended} \ Ω_{bulk}$ , Λ is a complex valued diagonal matrix which depends on the PML formulation, and m and f are complex-valued functions, obtained from imposing the PML. Details of the precise formulation used in our developments are provided in Appendix A. Equation (2) is effectively a complex-valued boundary-value problem in $Ω_{extended}$ with homogeneous Dirichlet boundary conditions. In the remainder of the discussion, we consider (2) to be the canonical problem and therefore, for simplicity of notation, denote $Ω_{extended}$ as Ω.

When (2) is discretized, we obtain the linear algebraic system $A u = f$ where A is the model-dependent system matrix, u is the solution vector, and f is the vector of the source density f. In this paper we restrict our discussion of absorbing boundary conditions to PMLs and discretization to finite-difference methods. These restrictions are merely to simplify the exposition: other transparent boundary conditions, such as absorbing layers or sponge layers, and other discretizations, such as higher-order finite differences and those derived from finite element methods, may be used in this framework. For 2D problems we use standard 5-point finite difference stencils and in 3D, we use 9-point stencils. As a result, the discretization has N global degrees-of-freedom and $n = N^{\frac{1}{2}}$ and $n = N^{\frac{1}{3}}$ degrees-of-freedom in each spatial dimension for two- and three-dimensional problems respectively.

For the PMLs we use a cubic PML profile function. As we increase the frequency of the problem, we do not increase the width of the PML. Instead, we choose the PML width so that the number of wavelengths is constant in the PML region, and increase the absorption constant logarithmically with the frequency. This is motivated by an analysis of the PML [12], where this choice is rigorously justified for a more complex PML-profile. In our work, we employ the same strategy for the cubic PML-profile and obtain satisfactory results. Details on the construction of the linear system and the PMLs are given in Appendix A.

The method of polarized traces was introduced as a solver [91] and then as a preconditioner [94] for the linear system (3). At its core, the method of polarized traces spatially subdivides the discrete degrees-of-freedom in Ω into layers and computes an approximate solution to the global wavefield by sweeping over the layers and solving a local discrete half-space problem in each layer. Following [91], the solutions of the half-space problems are called polarized wavefields. In this work, we make extensive use of this concept and therefore provide a brief review, in the continuous setting, in this section. In Section 1.4, we present a similar treatment in the discrete context.

Consider the boundary-value problem (2) with a source density function f supported only in $Ω_{bulk}$ . Let Γ be an interface dividing Ω into two regions, $Ω_{1}$ and $Ω_{2}$ , such that the support of f lies entirely within $Ω_{1}$ . For example, Γ could be a straight line (Fig. 2a) or an L-shaped line (Fig. 2b).

The wavefield in $Ω_{2}$ can be computed from the representation formula $u (x) = \int_{Γ} μ (y) G (x, y) d s_{y} - \int_{Γ} λ (y) [n (y) \cdot (Λ (y) \nabla_{y} G (x, y))] d s_{y} x \in Ω_{2},$ where $λ = u |_{Γ}, and μ = {[n \cdot (Λ \nabla u)] |}_{Γ}$ are the Dirichlet and Neumann traces on Γ, and $G (x, y)$ is the Green's function corresponding to the problem (2), i.e., for $x \in Ω$ $- {div}_{y} (Λ (y) \nabla_{y} G (y, x)) - ω^{2} m (y) G (y, x) = δ (x - y) y \in Ω, G (y, x) = 0 y \in \partial Ω .$ This formula directly follows from the divergence theorem and properties of the Green's function, assuming that Λ is a diagonal matrix.

Equation (4) requires knowledge of the Dirichlet and Neumann traces of the solution almost everywhere on Γ. Thus, corners in Γ are admissible, as in the quarter-space problem, even though the normal is not uniquely defined. Using (4), the solution u can then be computed on $Ω_{2}$ . Extending equation (4) to all of Ω, $U (x) = \int_{Γ} μ (y) G (x, y) d s_{y} - \int_{Γ} λ (y) [n (y) \cdot (Λ (y) \nabla_{y} G (x, y))] d s_{y} x \in Ω,$ it can be shown that U vanishes on $Ω_{1}$ , $U (x) = 0 for x \in Ω_{1} .$ Following the terminology of [91], we call (6) the annihilation condition, Γ the polarization interface, and U a polarized wavefield. A proof of the annihilation condition (6) is provided in Appendix B.

A discrete counterpart of the polarized wavefield U can be derived by constructing a discrete solution that satisfies the discrete analogue to the annihilation condition in (6). Consider the discretization points corresponding to degrees-of-freedom in (3), as well as an interface Γ which does not intersect with any discretization point, as illustrated in Fig. 3 for both the half-space and quarter-space subdomains. Given a discretization-dependent distance δ (e.g., $δ = 1$ for a classical 5-point finite difference stencil in 2D), such an interface divides the degrees-of-freedom into four sets, as labeled in Fig. 3:

1.
$Γ_{1}$ , the set of all degrees-of-freedom physically contained in $Ω_{1}$ and δ-adjacent to Γ;
2.
$Ω_{1}$ , the set of all degrees-of-freedom physically contained in $Ω_{1}$ , excluding $Γ_{1}$ ;
3.
$Γ_{2}$ , the set of all degrees-of-freedom physically contained in $Ω_{2}$ and δ-adjacent to Γ;
4.
$Ω_{2}$ , the set of all degrees-of-freedom physically contained in $Ω_{2}$ , excluding $Γ_{2}$ .

Upon reordering the degrees-of-freedom with respect to these four sets, the discrete system (3) can be rewritten as $(\begin{matrix} A_{Ω_{1} Ω_{1}} & A_{Ω_{1} Γ_{1}} & 0 & 0 \\ A_{Γ_{1} Ω_{1}} & A_{Γ_{1} Γ_{1}} & A_{Γ_{1} Γ_{2}} & 0 \\ 0 & A_{Γ_{2} Γ_{1}} & A_{Γ_{2} Γ_{2}} & A_{Γ_{2} Ω_{2}} \\ 0 & 0 & A_{Ω_{2} Γ_{2}} & A_{Ω_{2} Ω_{2}} \end{matrix}) (\begin{matrix} u_{Ω_{1}} \\ u_{Γ_{1}} \\ u_{Γ_{2}} \\ u_{Ω_{2}} \end{matrix}) = (\begin{matrix} f_{Ω_{1}} \\ f_{Γ_{1}} \\ f_{Γ_{2}} \\ f_{Ω_{2}} \end{matrix}),$ where $A_{Γ_{1} Ω_{1}}$ denotes the slice of the matrix A generated from the rows corresponding to $Γ_{1}$ and the columns corresponding to $Ω_{1}$ . The remaining submatrices are similarly generated. In the same way, the vector $u_{Ω_{2}}$ (and similarly all other vectors) denotes the slice of the vector u with respect to $Ω_{2}$ . As in the continuous case, we consider the case without sources in $Ω_{2}$ , i.e., $f_{Ω_{2}} = 0$ and $f_{Γ_{2}} = 0$ . Then, due to the invertibility of A, it is easy to see that the solution U of the linear system $(\begin{matrix} A_{Ω_{1} Ω_{1}} & A_{Ω_{1} Γ_{1}} & 0 & 0 \\ A_{Γ_{1} Ω_{1}} & A_{Γ_{1} Γ_{1}} & A_{Γ_{1} Γ_{2}} & 0 \\ 0 & A_{Γ_{2} Γ_{1}} & A_{Γ_{2} Γ_{2}} & A_{Γ_{2} Ω_{2}} \\ 0 & 0 & A_{Ω_{2} Γ_{2}} & A_{Ω_{2} Ω_{2}} \end{matrix}) U = (\begin{matrix} 0 \\ A_{Γ_{1} Γ_{2}} u_{Γ_{2}} \\ - A_{Γ_{2} Γ_{1}} u_{Γ_{1}} \\ 0 \end{matrix})$ satisfies $U = (\begin{matrix} 0 \\ 0 \\ u_{Γ_{2}} \\ u_{Ω_{2}} \end{matrix}) .$ Consequently, the discrete counterpart of the polarized wavefield U is $U = {(\begin{matrix} A_{Ω_{1} Ω_{1}} & A_{Ω_{1} Γ_{1}} & 0 & 0 \\ A_{Γ_{1} Ω_{1}} & A_{Γ_{1} Γ_{1}} & A_{Γ_{1} Γ_{2}} & 0 \\ 0 & A_{Γ_{2} Γ_{1}} & A_{Γ_{2} Γ_{2}} & A_{Γ_{2} Ω_{2}} \\ 0 & 0 & A_{Ω_{2} Γ_{2}} & A_{Ω_{2} Ω_{2}} \end{matrix})}^{- 1} (\begin{matrix} 0 \\ A_{Γ_{1} Γ_{2}} u_{Γ_{2}} \\ - A_{Γ_{2} Γ_{1}} u_{Γ_{1}} \\ 0 \end{matrix}) .$

From the right-hand side of (8), one can easily see that knowledge of both $u_{Γ_{1}}$ and $u_{Γ_{2}}$ is required in order to compute U. By construction, these sets contain information about the discrete wavefield and its normal derivative in the vicinity of Γ. Thus, as with the continuous case, the discrete case also requires information about the Dirichlet and Neumann traces to compute the polarized wavefield U in $Ω_{2}$ . In fact, [91] demonstrates, using similar techniques, that a discrete counterpart of the representation formula (4) can be derived.

Finally, our technique is easily extended to more general discretization techniques, as long as they allow for a reordering of the degrees-of-freedom to the block-tridiagonal system (7). Depending on the discretization, change in the selection of the sets $Γ_{1}$ and $Γ_{2}$ may be required. Similar schemes have been applied for many different discretizations such as high-order finite difference methods [95], finite element methods [90], enriched finite element methods [30], discontinuous Galerkin methods [79], and integral representations [93]. For example, for higher-order finite difference methods the stencils centered at the points in $Ω_{2}$ (respecting $Γ_{2}$ , $Γ_{1}$ , or $Ω_{1}$ ) cannot involve discretization points in $Γ_{1}$ (respecting $Ω_{1}$ , $Ω_{2}$ , $Γ_{2}$ ). This is easily enforced by defining an appropriate δ, e.g., $δ = 2$ for a 9-point, 5x5, stencil in 2D. Similarly, in finite element or discontinuous Galerkin methods, the sparsity of the system matrices can be exploited in order to obtain suitable sets of degrees-of-freedom.

In Section 2, we introduce the algorithm to compute an approximate global solution u of the boundary value problem (2) with constant squared slowness m. We introduce the algorithm first on the continuous level in Section 2.1 and then extend it to the discrete level in Section 2.2. In Section 3, we show how the algorithm can be used to precondition the linear system (3). This opens the possibility of using the algorithm as part of an optimally parallel scaling solver based on a preconditioned GMRES method. We conclude Section 3 with a complexity analysis of this solver with regards to computational and communication effort. In Section 4, we analyze the effects of heterogeneous wave speeds on the effectiveness of the preconditioner. In Section 5, we provide several numerical examples in two- and three-dimensions for constant and non-constant wave speeds to corroborate all claims. The paper is concluded by a discussion of possible extensions of the method and a summary of the result. Additional proofs and implementation details are provided in the Appendices. Pseudo-code for the proposed algorithms is included as supplementary material, a C++ implementation of the 2D algorithm is available at [80], and experimental setups for the 2D examples are available at [81].

Section snippets

L-sweeps: reconstruction of wavefields

In this section, we introduce the algorithm to compute the global solution u of the boundary value problem (2). This solution is obtained from local solutions of local problems defined over a CDD. The concept of polarization introduced in Section 1 plays a crucial role in this procedure. We introduce the algorithm for constant wave speeds at the continuous and the discrete level in Sections 2.1 and 2.2. The procedure can be applied to problems with non-constant wave speeds in an analogous

Implementation and complexity

The procedure introduced in Section 2 produces an approximation of a global solution to (3). This approach therefore defines the approximate solution operator ${\tilde{A}}^{- 1}$ , such that ${\tilde{A}}^{- 1} (f) \approx A^{- 1} f$ . Thus, we use ${\tilde{A}}^{- 1}$ to precondition (3), ${\tilde{A}}^{- 1} (A u) = {\tilde{A}}^{- 1} (f),$ and use a Krylov subspace method, such as GMRES [70] or BiCG-stab [85], to solve (15).

The resulting preconditioned iterative solver has the following properties:

•
the preconditioner ${\tilde{A}}^{- 1}$ can be applied with optimal parallel complexity, $O (N / p)$ , and
•
Krylov

Heterogeneities

As developed in Section 2, the wavefield computed by the proposed algorithm can be interpreted as a sum of global solutions, each of which is induced by a global source density supported in a single subdomain only. We denote a source density supported in the subdomain $Ω_{i j}$ by $f_{ij}$ and the global wavefield induced by this source density as $u^{f_{ij}}$ . Accordingly, the global wavefield u can be written as $u = \sum_{i = 1}^{q} \sum_{j = 1}^{r} u^{f_{ij}} .$ To understand the effect of heterogeneities on the effectiveness of the

Numerical examples

In this section, we consider several numerical examples to corroborate the claims of this paper. All numerical examples are constructed using variations of a standard setup. The problems are posed on the unit square ( $d = 2$ ) or the unit cube ( $d = 3$ ) and the wave speed is scaled such that the squared slowness, $m (x) \in [m_{0}, 1] \forall x \in Ω$ for some $m_{0} > 0$ . For the characteristic frequency ω, the minimum wavelength is $2 π / ω$ and the maximum wavelength is $2 π / (\sqrt{m_{0}} ω)$ . All problems are discretized using a uniform

Discussion

We have introduced a parallel scalable algorithm to solve the high-frequency Helmholtz equation. To keep length of the paper reasonable, we have omitted some details, including the analysis of the continuum version of the algorithm in constant media, where the method essentially reduces to a direct solver, and the effect of pollution errors and associated refined discretization on the proposed algorithm. In this section we comment on both cases.

Summary

We have introduced the first solver for the high-frequency Helmholtz equation that scales as $O ((N / p) \log ω)$ in a distributed-memory parallel computational environment, for a single right-hand side. The new solver constructs a preconditioner based on a CDD. This approach reveals parallelism which is exploited to obtain the optimal parallel complexity. The performance of the preconditioner is similar to the well-established method of polarized traces, i.e., the number of iterations grows as $O (\log ω$

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors thank Total SA for support and for permission to release the example code. LD is also supported by AFOSR grant FA9550-17-1-0316. The BP model is provided courtesy of BP and Fréderic Billette. We thank NERSC for computation resources. MT thanks the Institute of Applied Mathematics at Graz University of Technology for hosting him for part of this research.

References (95)

J.-P. Bérenger
A perfectly matched layer for the absorption of electromagnetic waves
J. Comput. Phys.
(1994)
Y. Boubendir
An analysis of the BEM-FEM non-overlapping domain decomposition method for a scattering problem
J. Comput. Appl. Math.
(2007)
Y. Boubendir et al.
A quasi-optimal non-overlapping domain decomposition algorithm for the Helmholtz equation
J. Comput. Phys.
(2012)
F. Collino et al.
Domain decomposition method for harmonic wave propagation: a general presentation
Comput. Methods Appl. Mech. Eng.
(2000)
B. Engquist et al.
Absorbing boundary conditions for domain decomposition
Appl. Numer. Math.
(1998)
M. Gander et al.
AILU for Helmholtz problems: a new preconditioner based on an analytic factorization
C. R. Acad. Sci., Ser. 1 Math.
(2000)
Dan Gordon et al.
A robust and efficient parallel solver for linear systems, applied to strongly convection dominated {PDEs}
Parallel Comput.
(2010)
R.-E. Plessix et al.
Separation-of-variables as a preconditioner for an iterative Helmholtz solver
Appl. Numer. Math.
(2003)
C.C. Stolk
A dispersion minimizing scheme for the 3-D Helmholtz equation based on ray theory
J. Comput. Phys.
(2016)
A. Vion et al.
Double sweep preconditioner for optimized Schwarz methods applied to the Helmholtz problem
J. Comput. Phys.
(2014)

L. Zepeda-Núñez et al.

The method of polarized traces for the 2D Helmholtz equation

J. Comput. Phys.

(2016)

P. Amestoy et al.

Fast 3D frequency-domain full-waveform inversion with a parallel block low-rank multifrontal direct solver: application to OBC data from the North Sea

Geophysics

(2016)

P.R. Amestoy et al.

A fully asynchronous multifrontal solver using distributed dynamic scheduling

SIAM J. Matrix Anal. Appl.

(2001)

D. Aruliah et al.

Multigrid preconditioning for Krylov methods for time-harmonic Maxwell's equations in three dimensions

SIAM J. Sci. Comput.

(2002)

A.V. Astaneh et al.

A two-level domain decomposition method with accurate interface conditions for the Helmholtz problem

Int. J. Numer. Methods Eng.

(2016)

G. Ballard et al.

Minimizing communication in numerical linear algebra

SIAM J. Matrix Anal. Appl.

(2011)

M. Bebendorf

Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems

(2008)

F. Billette et al.

The 2004 BP Velocity Benchmark

(2005)

S. Boerm et al.

Hierarchical Matrices

(2006)

J. Bramble et al.

Analysis of a finite PML approximation for the three dimensional time-harmonic Maxwell and acoustic scattering problems

Math. Comput.

(2007)

A. Brandt et al.

Wave-ray multigrid method for standing wave equations

Electron. Trans. Numer. Anal.

(1997)

H. Calandra et al.

An improved two-grid preconditioner for the solution of three-dimensional Helmholtz problems in heterogeneous media

Numer. Linear Algebra Appl.

(2013)

T.F. Chan et al.

Domain decomposition algorithms

Acta Numer.

(1994)

Z. Chen et al.

A source transfer domain decomposition method for Helmholtz equations in unbounded domain

SIAM J. Numer. Anal.

(2013)

T.A. Davis

Algorithm 832: UMFPACK v4.3—an unsymmetric-pattern multifrontal method

ACM Trans. Math. Softw.

(June 2004)

M.V. de Hoop et al.

On 3D modeling of seismic wave propagation via a structured parallel multifrontal direct Helmholtz solver

Geophys. Prospect.

(2011)

A. de La Bourdonnaye et al.

A non-overlapping domain decomposition method for the exterior Helmholtz problem

Contemp. Math.

(1998)

J. Demmel et al.

Communication avoiding rank revealing qr factorization with column pivoting

(May 2013)

J.W. Demmel et al.

A supernodal approach to sparse partial pivoting

SIAM J. Matrix Anal. Appl.

(1999)

B. Després

Décomposition de domaine et problème de Helmholtz

C. R. Séances Acad. Sci., Sér. 1 Math.

(1990)

I.S. Duff et al.

The multifrontal solution of indefinite sparse symmetric linear

ACM Trans. Math. Softw.

(September 1983)

B. Engquist et al.

Sweeping preconditioner for the Helmholtz equation: hierarchical matrix representation

Commun. Pure Appl. Math.

(2011)

B. Engquist et al.

Sweeping preconditioner for the Helmholtz equation: moving perfectly matched layers

Multiscale Model. Simul.

(2011)

Y.A. Erlangga et al.

A novel multigrid based preconditioner for heterogeneous Helmholtz problems

SIAM J. Sci. Comput.

(2006)

O.G. Ernst et al.

Why it is difficult to solve Helmholtz problems with classical iterative methods

J. Fang et al.

Learning dominant wave directions for plane wave methods for high-frequency Helmholtz equations

Res. Math. Sci.

(May 2017)

M. Gander et al.

A class of iterative solvers for the Helmholtz equation: factorizations, sweeping preconditioners, source transfer, single layer potentials, polarized traces, and optimized Schwarz methods

SIAM Rev.

(2019)

M.J. Gander

Optimized Schwarz methods

SIAM J. Numer. Anal.

(2006)

M.J. Gander et al.

Optimal interface conditions for an arbitrary decomposition into subdomains

M.J. Gander et al.

Optimized Schwarz methods without overlap for the Helmholtz equation

SIAM J. Sci. Comput.

(2002)

M.J. Gander et al.

Optimized Schwarz method with two-sided transmission conditions in an unsymmetric domain decomposition

M.J. Gander et al.

Domain decomposition methods for the Helmholtz equation: a numerical investigation

M.J. Gander et al.

Optimized Schwarz methods with overlap for the Helmholtz equation

M.J. Gander et al.

Applying GMRES to the Helmholtz equation with shifted Laplacian preconditioning: what is the largest shift for which wavenumber-independent convergence is guaranteed?

Numer. Math.

(2015)

A. George

Nested dissection of a regular finite element mesh

SIAM J. Numer. Anal.

(1973)

S. Ghanemi

A domain decomposition method for Helmholtz scattering problems

A. Gillman et al.

A spectrally accurate direct solution technique for frequency-domain scattering problems with variable media

BIT Numer. Math.

(Mar 2015)

Cited by (40)

A hybrid shifted Laplacian multigrid and domain decomposition preconditioner for the elastic Helmholtz equations
2024, Journal of Computational Physics
In this work we extend the shifted Laplacian approach to the elastic Helmholtz equation. The shifted Laplacian multigrid method is a common preconditioning approach for the discretized acoustic Helmholtz equation. In some cases, like geophysical seismic imaging, one needs to consider the elastic Helmholtz equation, which is harder to solve: it is three times larger and contains a nullity-rich grad-div term. These properties make the solution of the equation more difficult for multigrid solvers. The key idea in this work is combining the shifted Laplacian with approaches for linear elasticity. We provide local Fourier analysis and numerical evidence that the convergence rate of our method is independent of the Poisson's ratio. Moreover, to better handle the problem size, we complement our multigrid method with the domain decomposition approach, which works in synergy with the local nature of the shifted Laplacian, so we enjoy the advantages of both methods without sacrificing performance. We demonstrate the efficiency of our solver on 2D and 3D problems in heterogeneous media.
A hybridizable discontinuous Galerkin method with characteristic variables for Helmholtz problems
2023, Journal of Computational Physics
A new hybridizable discontinuous Galerkin method, named the CHDG method, is proposed for solving time-harmonic scalar wave propagation problems. This method relies on a standard discontinuous Galerkin scheme with upwind numerical fluxes and high-order polynomial bases. Auxiliary unknowns corresponding to characteristic variables are defined at the interface between the elements, and the physical fields are eliminated to obtain a reduced system. The reduced system can be written as a fixed-point problem that can be solved with stationary iterative schemes. Numerical results with 2D benchmarks are presented to study the performance of the approach. Compared to the standard HDG approach, the properties of the reduced system are improved with CHDG, which is more suited for iterative solution procedures. The condition number of the reduced system is smaller with CHDG than with the standard HDG method. Iterative solution procedures with CGNR or GMRES required smaller numbers of iterations with CHDG.
Scalable DPG multigrid solver for Helmholtz problems: A study on convergence
2023, Computers and Mathematics with Applications
This paper presents a scalable multigrid preconditioner targeting large-scale systems arising from discontinuous Petrov–Galerkin (DPG) discretizations of high-frequency wave operators. This work is built on previously developed multigrid preconditioning techniques of Petrides and Demkowicz (Comput. Math. Appl. 87 (2021) pp. 12–26) and extends the convergence results from $O (10^{7})$ degrees of freedom (DOFs) to $O (10^{9})$ DOFs using a new scalable parallel MPI/OpenMP implementation. Novel contributions of this paper include an alternative definition of coarse-grid systems based on restriction of fine-grid operators, yielding superior convergence results. In the uniform refinement setting, a detailed convergence study is provided, demonstrating h and p robust convergence and linear scaling with respect to the wave frequency. The paper concludes with numerical results on hp-adaptive simulations including a large-scale seismic modeling benchmark problem with high material contrast.
A unified framework for double sweep methods for the Helmholtz equation
2023, Journal of Computational Physics
We consider sweeping domain decomposition preconditioners to solve the Helmholtz equation in the case of stripwise domain decomposition with or without overlaps. We unify their derivation and convergence studies by expressing them as Jacobi, Gauss-Seidel, and Symmetric Gauss-Seidel methods for different numbering of the unknowns. The proposed framework enables theoretical comparisons between the double sweep methods in [30], [39] and those in [34], [38], [35]. Additionally, it facilitates the introduction of a new sweeping algorithm. We provide numerical test cases to assess the validity of the theoretical studies.
Conditioning analysis for discrete Helmholtz problems
2022, Computers and Mathematics with Applications
Citation Excerpt :
Standard Krylov subspace solvers are ineffective in obtaining the solution of the discrete Helmholtz problem for large wave numbers because of indefiniteness of the linear systems [27]. Different preconditioners, such as multigrid [21,25,3,6,8,9,11,17,19,47], domain decomposition methods [33,49,4,5,7,10,12,13,18,29,36], sweeping preconditioners [35,22,23,28,52], shifted Laplace preconditioners [26,32,54] and others [14,24,34,20] have been considered to accelerate the convergence. However, these preconditioners are also not as effective as a preconditioner designed for the Laplace equation when the wave number is large.
In this paper, we examine conditioning of the discretization of the Helmholtz problem. Although the discrete Helmholtz problem has been studied from different perspectives, to the best of our knowledge, there is no conditioning analysis for it. We aim to fill this gap in the literature. We propose a novel method in 1D to observe the near-zero eigenvalues of a symmetric indefinite matrix. Standard classification of ill-conditioning based on the matrix condition number is not true for the discrete Helmholtz problem. We relate the ill-conditioning of the discretization of the Helmholtz problem with the condition number of the matrix. We carry out analytical conditioning analysis in 1D and extend our observations to 2D with numerical observations. We examine several discretizations. We find different regions in which the condition number of the problem shows different characteristics. We also explain the general behavior of the solutions in these regions.
Trace transfer-based diagonal sweeping domain decomposition method for the Helmholtz equation: Algorithms and convergence analysis
2022, Journal of Computational Physics
By utilizing the perfectly matched layer (PML) and source transfer techniques, the diagonal sweeping domain decomposition method (DDM) was recently developed for solving the high-frequency Helmholtz equation in $R^{n}$ , which uses $2^{n}$ sweeps along respective diagonal directions with checkerboard domain decomposition. Although this diagonal sweeping DDM is essentially multiplicative, it is highly suitable for parallel computing of the Helmholtz problem with multiple right-hand sides when combined with the pipeline processing since the number of sequential steps in each sweep is much smaller than the number of subdomains. In this paper, we propose and analyze a trace transfer-based diagonal sweeping DDM. A major advantage of changing from source transfer to trace transfer for information passing between neighbor subdomains is that the resulting diagonal sweeps become easier to analyze and implement and more efficient, since the transferred traces have only 2n cardinal directions between neighbor subdomains while the transferred sources come from a total of $3^{n} - 1$ cardinal and corner directions. We rigorously prove that the proposed diagonal sweeping DDM not only gives the exact solution of the global PML problem in the constant medium case but also does it with at most one extra round of diagonal sweeps in the two-layered media case, which lays down the theoretical foundation of the method. Performance and parallel scalability of the proposed DDM as direct solver or preconditioner are also numerically demonstrated through extensive experiments in two and three dimensions.

View all citing articles on Scopus

View full text

L-Sweeps: A scalable, parallel preconditioner for the high-frequency Helmholtz equation

Highlights

Abstract

Introduction

Section snippets

L-sweeps: reconstruction of wavefields

Implementation and complexity

Heterogeneities

Numerical examples

Discussion

Summary

Declaration of Competing Interest

Acknowledgement

J. Comput. Phys.

J. Comput. Appl. Math.

J. Comput. Phys.

Comput. Methods Appl. Mech. Eng.

Appl. Numer. Math.

C. R. Acad. Sci., Ser. 1 Math.

Parallel Comput.

Appl. Numer. Math.

J. Comput. Phys.

J. Comput. Phys.

J. Comput. Phys.

Fast 3D frequency-domain full-waveform inversion with a parallel block low-rank multifrontal direct solver: application to OBC data from the North Sea

Geophysics

A fully asynchronous multifrontal solver using distributed dynamic scheduling

SIAM J. Matrix Anal. Appl.

Multigrid preconditioning for Krylov methods for time-harmonic Maxwell's equations in three dimensions

SIAM J. Sci. Comput.

A two-level domain decomposition method with accurate interface conditions for the Helmholtz problem

Int. J. Numer. Methods Eng.

Minimizing communication in numerical linear algebra

SIAM J. Matrix Anal. Appl.

Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems

The 2004 BP Velocity Benchmark

Hierarchical Matrices

Analysis of a finite PML approximation for the three dimensional time-harmonic Maxwell and acoustic scattering problems

Math. Comput.

Wave-ray multigrid method for standing wave equations

Electron. Trans. Numer. Anal.

An improved two-grid preconditioner for the solution of three-dimensional Helmholtz problems in heterogeneous media

Numer. Linear Algebra Appl.

Domain decomposition algorithms

Acta Numer.

A source transfer domain decomposition method for Helmholtz equations in unbounded domain

SIAM J. Numer. Anal.

Algorithm 832: UMFPACK v4.3—an unsymmetric-pattern multifrontal method

ACM Trans. Math. Softw.

On 3D modeling of seismic wave propagation via a structured parallel multifrontal direct Helmholtz solver

Geophys. Prospect.

A non-overlapping domain decomposition method for the exterior Helmholtz problem

Contemp. Math.

Communication avoiding rank revealing qr factorization with column pivoting

A supernodal approach to sparse partial pivoting

SIAM J. Matrix Anal. Appl.

Décomposition de domaine et problème de Helmholtz

C. R. Séances Acad. Sci., Sér. 1 Math.

The multifrontal solution of indefinite sparse symmetric linear

ACM Trans. Math. Softw.

Sweeping preconditioner for the Helmholtz equation: hierarchical matrix representation

Commun. Pure Appl. Math.

Sweeping preconditioner for the Helmholtz equation: moving perfectly matched layers

Multiscale Model. Simul.

A novel multigrid based preconditioner for heterogeneous Helmholtz problems

SIAM J. Sci. Comput.

Why it is difficult to solve Helmholtz problems with classical iterative methods

Learning dominant wave directions for plane wave methods for high-frequency Helmholtz equations

Res. Math. Sci.

A class of iterative solvers for the Helmholtz equation: factorizations, sweeping preconditioners, source transfer, single layer potentials, polarized traces, and optimized Schwarz methods

SIAM Rev.

Optimized Schwarz methods

SIAM J. Numer. Anal.

Optimal interface conditions for an arbitrary decomposition into subdomains

Optimized Schwarz methods without overlap for the Helmholtz equation

SIAM J. Sci. Comput.

Optimized Schwarz method with two-sided transmission conditions in an unsymmetric domain decomposition

Domain decomposition methods for the Helmholtz equation: a numerical investigation

Optimized Schwarz methods with overlap for the Helmholtz equation

Applying GMRES to the Helmholtz equation with shifted Laplacian preconditioning: what is the largest shift for which wavenumber-independent convergence is guaranteed?