1 Introduction

FFT-based solvers, as pioneered by Moulinec–Suquet [1, 2], enjoy great popularity for solving homogenization problems on complex microstructures. Their success rests essentially on three pillars. First, they profit from modern imaging techniques like micro-computed tomography which naturally produce volumetric data on a regular grid. Secondly, the current implementations of the fast Fourier transform [3] are extremely powerful, providing programmers of FFT-based methods a cutting edge. Last but not least, FFT-based methods naturally support matrix-free solvers also for strongly nonlinear mechanical problems, which can be decisive if memory occupancy is an issue. A recent overview of applications in micromechanics is given in Matouš et al. [4], including the use in FE-FFT schemes [5] and as precomputations for model-order reduction methods [6,7,8] and deep learning techniques [9, 10].

Progress in FFT-based methods focussed on three principal directions. Firstly, the areas of applicability were extended, for instance to finite strain problems [11], crystal viscoplasticity [12] and damage [13]. Recently, also applications to interface decohesion [14], upscaling of micromechanics-based fatigue [15] and the homogenization of brittle fracture [16] were established.

Secondly, the disadvantages of the original discretization method of Moulinec-Suquet were focus of investigations. Fourier–Galerkin methods [17] and finite element [18,19,20], finite difference [21, 22] as well as finite volume discretizations [23] were investigated, some of them playing a crucial role when treating media with imperfections such as cracks and pores. Also, recently, a discretization in terms of B-splines was introduced [24].

Last but not least, the basic (solution) scheme was superseded by more powerful numerical solution methods. Eyre–Milton [25] introduced a so-called polarization scheme, which was subsequently generalized by Michel–Moulinec-Suquet [26]. Monchiet–Bonnet [27] introduced a one-parameter family of polarization schemes and exhibited both the Eyre–Milton scheme and the augmented Lagrangian method of Michel–Moulinec–Suquet as particular instances, see also Moulinec–Silva [28]. Furthermore, they introduced a hybrid method which is essentially the average of the Eyre–Milton and the augmented Lagrangian operators. Schneider–Wicht–Böhlke [29] interpreted polarization schemes from the viewpoint of global optimization and identified them as Douglas–Rachford schemes [30], providing convergence guarantees and optimal parameter selection for strongly convex problems.

Zeman et al. [31] and Brisard–Dormieux [18] introduced Krylov subspace solvers for linear homogenization problems. Zeman et al. [31] proposed using the (linear) conjugate gradient method with great success. As the conjugate gradient method was applied to a non-symmetric operator, its successful application came as a surprise at that time (and triggered further research, see below). Brisard–Dormieux used a reformulation of the corrector equation in terms of the Hashin–Shtrikman variational principle, guaranteeing a symmetric, but possibly indefinite formulation. They proposed using the conjugate gradient method or MINRES and SYMMLQ, respectively, see Paige–Saunders [32].

As an alternative to the basic scheme and polarization-type methods, Newton solvers were proposed by Lahellec et al. [11]. Initially, the basic scheme was used as the linear solver. Only much later, Gélébart–Mondon–Cancel [33] and Kabel-Böhlke–Schneider [34] proposed Newton–Krylov solvers, ie, to use Krylov subspace solvers for computing the Newton increment. For a thorough study on line-search methods suitable in the FFT context and strategies for choosing the forcing term, we refer to Wicht–Schneider–Böhlke [35].

Kabel–Böhlke–Schneider [34] observed that the basic scheme may be interpreted as a gradient descent scheme, and connected the reference material to (the inverse of) a step-size. By this observation, numerical solution methods from large-scale optimization could enter into FFT-based computational homogenization. The benefits of using fast gradient methods in the FFT-based context were first exploited in Schneider [36], see also Ernesti–Schneider–Böhlke [37] for an application of fast gradient methods to phase-field fracture. Using Quasi-Newton methods was pioneered by Shantraj et al [38], and further extended in Wicht–Böhlke–Schneider [35].

All of these solvers have their distinctive advantages. The basic scheme is most memory-efficient and extremely robust, but may become slow for highly contrasted problems. The polarization schemes fail to converge quickly for problems involving pores (see Schneider [39] for a counterexample) and rely upon a complexity-reduction trick to be efficient, see Schneider–Wicht–Böhlke [29]. Newton methods are fast and robust, but suffer from heavy memory demands. They are particularly useful if evaluating the nonlinear material law is very expensive compared to applying the tangent. The fast gradient methods combine fast convergence with a low memory footprint. However, they suffer from a delicate parameter selection. More precisely, they require the strong convexity constant to be known to be efficient. However, determining the strong convexity constant for problems involving pores is extremely difficult, in general. Restarting techniques do not solve these problems. Indeed, for problems where the strong convexity constant is known, restarting schemes require about twice the iteration count as if the method was used with optimal parameter choice.

Thus, research was devoted to finding solution techniques that share the positive characteristics of the mentioned solution schemes without combining their disadvantages. Schneider [39] proposed the Barzilai–Borwein method as a general-purpose method for computational micromechanics. It may be interpreted as the basic scheme with adaptive time-stepping (or, equivalently, adaptive reference material). Thus, it has low memory-footprint, but is competitive in terms of convergence speed when compared to the solvers mentioned above. However, the Barzilai–Borwein method is an intrinsically non-monotone method, ie, the residual is not monotonically decreasing from one iteration to the next. This behavior is unfamiliar, and may limit the applicability for industrial applications. It may happen that after the lunch break the residual is two orders of magnitude higher than before the lunch break.

Thus, it is desirable to seek alternatives to the Barzilai–Borwein scheme with guaranteed monotonicity properties.

1.1 Contributions

Nonlinear conjugate gradient methods were introduced by Fletcher–Reeves [40] as a generalization of the linear conjugate gradient scheme [41]. Nonlinear conjugate gradient methods were among the first general-purpose solution schemes for unconstrained optimization problems, and have been studied thoroughly, see Nocedal–Wright [42] for a comprehensive textbook treatment. They are based upon an iteratively updated search direction, involving a coefficient (the “\(\beta \)”) to be chosen wisely and a step-size which is typically determined by line search. For the \(\beta \)-coefficient, a number of different variants were proposed, the most popular due to Fletcher–Reeves [40], Polak–Ribière–Polyak [43, 44], Hestenes–Stiefel [41] and Dai–Yuan [45].

On a formal level, nonlinear conjugate gradient methods and the heavy-ball scheme [46], a very powerful fast gradient scheme, are identical, see Polyak’s book [47]. However, the parameters are chosen in a different way. Indeed, for the heavy-ball method, the step-size and the \(\beta \)-coefficient are typically chosen as constants depending on the properties of the function (the Lipschitz constant of the gradient of the objective, and the strong monotonicity constant, if available [44, 48, 49]). In contrast, nonlinear conjugate gradient methods choose the \(\beta \)-coefficient depending on the current and the last iterate, in general, and determine the step-size by a line-search procedure. We refer to Sect. 2 for more details.

A naive application of nonlinear conjugate gradient methods in the FFT-based context does not lead to competitive schemes, essentially due to the intrinsic line-search. (Another difficulty arising in computational micromechanics is the fact that the condensed incremental energy is typically not computed - only gradient information, ie, the stress, is evaluated.) In the nonlinear conjugate gradient community, this problem is well-known. Indeed, Quasi-Newton methods of the Broyden class, see chapter 8 in Nocedal–Wright [42], may also be interpreted as generalizations of linear CG. However, they typically outperform nonlinear conjugate gradient methods because they asymptotically accept the proposed step-size—in contrast, the most efficient conjugate gradient methods are reported to require about two line-search steps per iteration to be efficient. Sun–Zhang [50] pioneered using conjugate gradient methods without line search and proved convergence under mild conditions. However, these step-sizes are too conservative for the resulting methods to be efficient. Indeed, under suitable conditions, they ensure a monotonic decrease of the objective value, which is known to be inefficient for fast gradient methods.

In this work, we connect nonlinear conjugate gradient methods to fast gradient methods by letting the step-size go to zero, trying to understand the resulting dynamical system. Indeed, the Fletcher–Reeves CG permits such a limiting procedure, giving rise to Newtonian dynamics with nonlinear damping. More precisely, the damping is determined by the speed of decrease of the residual (the norm of the gradient). In theory, this damping could become negative, inducing an instability of the dynamical system. In the numerical experiments, however, this situation did not arise frequently, whence we discarded including proper safeguarding strategies into the manuscript.

We propose to combine the Fletcher-Reeves nonlinear CG with the step-size of the basic scheme, and detail its implementation in the FFT-based context, see Sect. 3. The method may be implemented on three strain-like fields, which is only one more than the Barzilai–Borwein method. However, the resulting solver leads to a monotonic, and thus reliable, decrease of the residual. We demonstrate the usefulness of the proposed method as a general-purpose solver for FFT-based computational micromechanics in Sect. 4, comparing the method to a multitude of state-of-the-art numerical solvers for problems of industrial scale.

2 The dynamics of nonlinear conjugate gradients

2.1 From linear to nonlinear conjugate gradient methods

Let V be a Hilbert space with inner product \(\langle \cdot ,\cdot \rangle \), and let \(A\in L(V)\) be a bounded linear operator, which we assume self-adjoint and uniformly positive, ie, \(\langle Au,v\rangle = \langle Av,u\rangle \) holds for all \(u,v\in V\) and there is \(\mu >0\), s.t. \(\langle Au,u\rangle \ge \mu \Vert u\Vert ^2\) holds for all \(u\in V\), where \(\Vert u\Vert =\sqrt{\langle u,u\rangle }\) denotes the induced norm. For prescribed \(b\in V\), we are concerned with minimizing the quadratic objective

$$\begin{aligned} V \ni x \mapsto f(x) = \frac{1}{2} \langle x, A x\rangle - \langle b, x\rangle , \end{aligned}$$
(2.1)

whose unique minimizer \(x^*\) solves the linear equation \(A x^* = b\), ie, \(x^* = A^{-1}b\).

An ordered set of vectors \((d_0,d_1,d_2,\ldots )\) is called A-conjugate (or simply conjugate if A is fixed), if \(d_i\ne 0\) for all i and \(\langle d_i, A d_j\rangle = 0\) for all \(i\ne j\). Put differently, A-conjugacy is equivalent to orthogonality w.r.t. the A-weighted inner product \((u,v)\mapsto \langle u, Av \rangle \) on V.

For a collection \((d_0,d_1,d_2,\ldots )\) of A-conjugate vectors and an initial point \(x_0 \in V\), the conjugate direction method (Sect. 5.1 in Nocedal–Wight [42]) iteratively updates

$$\begin{aligned} x_{k+1} = x_k + \alpha _k d_k, \quad k=0,1,2,\ldots , \end{aligned}$$
(2.2)

where the step size \(\alpha _k\) is chosen by exact line search

$$\begin{aligned} \alpha = \text {argmin}_{\alpha \in \mathbb {R}} f(x_k + \alpha d_k), \end{aligned}$$
(2.3)

which can be computed explicitly for the quadratic objective (2.1) as

$$\begin{aligned} \alpha _k = - \frac{\langle g_k, d_k\rangle }{\langle d_k, A d_k\rangle }, \quad \text {where} \quad g_k = \nabla f(x_k) \equiv Ax_k - b. \end{aligned}$$
(2.4)

If V has finite dimension n, any n conjugate directions form a basis of V, and the conjugate direction method converges to \(x^*\) after at most n iterations. More generally, if the collection \(\{d_0,d_1,d_2,\ldots \}\) forms a complete basis of the Hilbert space V, the conjugate direction method converges.

The conjugate gradient method of Hestenes–Stiefel [41] is a conjugate direction method where the conjugate directions are generated by applying the Gram–Schmidt procedure to the negative gradients

$$\begin{aligned} d_k = -g_k + \sum _{i=0}^{k-1} \frac{\langle g_k, A d_k\rangle }{\langle d_k, A d_k\rangle } d_i, \quad g_k = \nabla f(x_k) \equiv Ax_k - b. \end{aligned}$$

The latter formula may be simplified, taking into account the update (2.2) and the conjugacy to arrive at

$$\begin{aligned} d_k = -g_k + \beta _{k-1} d_{k-1}, \quad \beta _{k-1} = \frac{\Vert g_k\Vert ^2}{\Vert g_{k-1}\Vert ^2}, \quad k=1,2,\ldots . \end{aligned}$$
(2.5)

We formulate this version of the linear conjugate gradient algorithm for later reference

$$\begin{aligned} \begin{aligned} g_k&= Ax_k - b\\ \beta _{k-1}&= \left\{ \begin{array}{cl} 0, &{} k=0, \\ \frac{\Vert g_k\Vert ^2}{\Vert g_{k-1}\Vert ^2} &{} k>0. \end{array} \right. \\ d_k&= -g_k + \beta _{k-1} d_{k-1}\\ \alpha _k&= - \frac{\langle g_k, d_k\rangle }{\langle d_k, A d_k\rangle }\\ x_{k+1}&= x_k + \alpha _k d_k \end{aligned} \end{aligned}$$
(2.6)

and formally defining \(d_{-1} = 0\). Fletcher–Reeves [40] extended this algorithm to general unconstrained nonlinear optimization problems

$$\begin{aligned} f(x) \rightarrow \min _{x\in V} \end{aligned}$$
(2.7)

in the form

$$\begin{aligned} \begin{aligned} d_k&= -\nabla f(x_k) + \beta _{k-1} d_{k-1},\\ x_{k+1}&= x_k + \alpha _k d_k, \end{aligned} \end{aligned}$$
(2.8)

where \(\alpha _k\) is determined by a line search procedure (2.3), and

$$\begin{aligned} \beta ^\text {FR}_k = \frac{\Vert \nabla f(x_{k+1})\Vert ^2}{\Vert \nabla f(x_{k})\Vert ^2}, \end{aligned}$$
(2.9)

where the superscript \(\text {FR}\) stands for Fletcher–Reeves. Subsequently, to account for inaccurate line-search procedures, a variety of other formulae for \(\beta _k\) were introduced, the most popular are the following.

  1. 1.

    The formula introduced, independently, by Polak–Ribière [43] and Polyak [44],

    $$\begin{aligned} \beta ^\text {PRP}_k = \frac{\langle g_{k+1}, g_{k+1} - g_{k}\rangle }{\Vert g_{k}\Vert ^2}, \quad g_i = \nabla f(x_i). \end{aligned}$$
    (2.10)

    Due to a counter-example exhibiting divergence even for exact line search [51], Gilbert–Nocedal [52] introduced a modification

    $$\begin{aligned} \beta ^\text {PRP+}_k = \max \left\{ \beta ^\text {PRP}_k, 0 \right\} . \end{aligned}$$
    (2.11)
  2. 2.

    The formula

    $$\begin{aligned} \beta ^\text {HS}_k = \frac{\langle g_{k+1}, g_{k+1} - g_{k}\rangle }{\langle d_k, g_{k+1} - g_{k} \rangle } \end{aligned}$$
    (2.12)

    which is the form of \(\beta _k\) in the original work of Hestenes–Stiefel [41]. This variant ensures that the vectors \(d_k\) and \(d_{k-1}\) are \(A_k\)-conjugate for the averaged Hessian

    $$\begin{aligned} A_k = \int _0^1 \nabla ^2 f(x_k +s \alpha _k d_k )\, ds \end{aligned}$$

    between successive iterates.

  3. 3.

    Dai–Yuan [45] introduced the formula

    $$\begin{aligned} \beta ^\text {DY}_k = \frac{\Vert g_{k+1} \Vert ^2}{\langle d_k, g_{k+1} - g_{k} \rangle }, \end{aligned}$$
    (2.13)

    which ensures that \(d_k\) is always a descent direction, ie, \(\langle d_k, g_k \rangle <0\) holds, provided f is convex and any \(\alpha _k\) [53] or for general f if \(\alpha _k\) satisfies the Wolfe conditions [45].

The different formulae for \(\beta _k\) coincide for quadratic objective (2.1) and exact line search (2.4). For line search, typically either the strong Wolfe conditions

$$\begin{aligned} \begin{aligned} f(x_k) - f(x_k + \alpha _k d_k)&\ge -\delta \alpha _k \langle g_k,d_k\rangle ,\\ \left| \langle \nabla f(x_k + \alpha _k d_k),d_k\rangle \right|&\le -\sigma \langle g_k, d_k\rangle , \end{aligned} \end{aligned}$$
(2.14)

or the Wolfe conditions

$$\begin{aligned} \begin{aligned} f(x_k) - f(x_k + \alpha _k d_k)&\ge -\delta \alpha _k \langle g_k,d_k\rangle ,\\ \langle \nabla f(x_k + \alpha _k d_k),d_k\rangle&\ge \sigma \langle g_k, d_k\rangle , \end{aligned} \end{aligned}$$
(2.15)

involving parameters \(0<\delta<\sigma <1\), are enforced. (See Sect. 3.1 in Nocedal–Wright [42] for details.) For general optimization problems and proper line-search conditions, convergence theorems may be established, see Dai–Yuan [45] and Hager–Zhang [54] for an overview. However, the results are typically of the form

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert g_k\Vert = 0 \quad \text {or} \quad \liminf _{k\rightarrow \infty } \Vert g_k\Vert = 0, \end{aligned}$$

ie, no explicit convergence rate may be deduced, in general.

The biggest limitation of the nonlinear conjugate gradient methods is the line search involved in the update (2.8). This becomes apparent even for the quadratic case (2.6). In the presented form, two applications of A (ie, gradient evaluations of f) are required for the linear case. Of course, introducing \(z_k = Ad_k\) and noticing \(g_{k+1} = g_k + \alpha _k z_k\), the standard implementation of linear CG (Sect. 5.1 in Nocedal–Wright [42]) may be obtained, which only requires one application of A per iteration. However, if we apply the nonlinear CG method to a quadratic function using the exact line search again requires two applications of A per iteration. Thus, if we assume that the effort of applying A dominates the computational expense per iteration, the nonlinear CG is about a factor of two slower than linear CG applied to quadratic problems (2.1).

For general nonlinear optimization problems, state-of-the-art implementations of the nonlinear conjugate gradient methods require two gradient evaluations per line search (see the introduction of Dai–Kou [55]). In contrast, (limited memory) Quasi-Newton methods [56,57,58,59,60] asymptotically accept the suggested step size. Thus, they often outperform conjugate gradient methods in practice.

For these reasons, research effort has been invested in nonlinear conjugate gradient methods avoiding line search at all. Sun–Zhang [50] have shown that coming close to the upper bound in the Armijo condition [the first inequality in the Wolfe conditions (2.15)] leads to convergence for the different variants of \(\beta _k\). Also, combinations of nonlinear CG and the Barzilai–Borwein method have been explored [55, 61].

In general, the Armijo condition ensures a monotonic decrease of the function value (provided \(d_k\) is a descent direction). We will see that this is not optimal.

2.2 The heavy-ball method versus nonlinear conjugate gradients

For minimizing general (differentiable) objective functions (2.7), the heavy-ball method [46] uses the iterative scheme

$$\begin{aligned} x_{k+1} = x_k - \alpha _k \nabla f(x_k) + \gamma _k (x_k - x_{k-1}), \end{aligned}$$
(2.16)

involving step-sizes \(\alpha _k\) and momentum coefficients \(\gamma _k\). The heavy-ball scheme (2.16) may be motivated, see Helmke–Moore [62], as a time-discretized version of the linearly damped Newtonian dynamicsFootnote 1

$$\begin{aligned} \ddot{x} + b\,{\dot{x}} = -\nabla f(x) \end{aligned}$$
(2.17)

for a curve \(x:[0,\infty )\rightarrow V\) and damping parameter \(b\ge 0\). A backward Euler discretization of the dynamics (2.17) with time-step size \(h>0\) reads

$$\begin{aligned} \frac{x_{k+1} - 2x_k + x_{k-1}}{h^2} + b \, \frac{x_k - x_{k-1}}{h} = -\nabla f(x_k), \end{aligned}$$

which may be rearranged into the scheme

$$\begin{aligned} x_{k+1} = x_k - {h^2} \nabla f(x_k) + (1-bh)(x_k - x_{k-1}), \end{aligned}$$
(2.18)

which has the form (2.16) with \(\alpha _k \equiv {h^2}\) and \(\beta _k \equiv 1-bh\).

Initially, Polyak investigated the local convergence of the scheme (2.16) (for strongly convex objective functions) by Lyapunov’s first method. However, the determined optimal parameters failed to be stable for general strongly convex objectives with Lipschitz gradient, see Lessard–Recht–Packard [49] and Ghadimi–Feyzmahdavian–Johansson [48] for counterexamples.

A global convergence analysis was carried out based on Lyapunov’s second method, ie, in terms of suitable Lyapunov functions, see the pioneering work Zavriev–Kostyuk [63]. More precisely, in the continuous setting, the total energy of the dynamical system (2.17)

$$\begin{aligned} V(x,p) = f(x) +\frac{1}{2} \Vert p\Vert ^2 \quad \text {with} \quad p={\dot{x}} \end{aligned}$$
(2.19)

may be used as a Lyapunov function, ie,

$$\begin{aligned} \frac{d}{dt}[V(x,{\dot{x}})] = -b\Vert {\dot{x}}\Vert ^2 \end{aligned}$$

holds. Put differently, if \(b\ge 0\), the dynamics is stable. For \(b>0\), the function V decreases strictly along the trajectories. Time-discretized variants of the Lyapunov function (2.19) were used [48, 63, 64] to establish global convergence of the heavy-ball method for particular choices of \(\alpha _k\) and \(\gamma _k\), and different classes of objective functions f.

Siegel [65] used Lyapunov-function arguments to show that, for \(\mu \)-strongly convex functions f, choosing \(b = 2\sqrt{\mu }\) leads to the estimate

$$\begin{aligned} f(x(t)) - f(x^*) \le 2 e^{-\sqrt{\mu } t} (f(x_0) - f(x^*)), \end{aligned}$$
(2.20)

where \(x^*\) denotes the minimizer of f.

It has been long known that nonlinear conjugate gradient methods (2.8) and heavy-ball schemes (2.16) coincide as numerical schemes. Indeed, for the conjugate gradient update (2.8)

$$\begin{aligned} d_k = -\nabla f(x_k) + \beta _{k-1} d_{k-1}, \quad x_{k+1} = x_k + \alpha _k d_k, \end{aligned}$$

combining these two equations yields

$$\begin{aligned} x_{k+1} = x_k - \alpha _k\nabla f(x_k) + \frac{\alpha _k \beta _{k-1}}{\alpha _{k-1}} (x_k - x_{k-1}), \end{aligned}$$

ie, it is of heavy-ball form (2.16) with \(\gamma _k = \frac{\alpha _k \beta _{k-1}}{\alpha _{k-1}}\).

Although they formally coincide, the way the numerical parameters are chosen is different. Also, both the respective interpretations and the convergence theories differ. For the heavy-ball method, the parameters \(\alpha _k\) and \(\gamma _k\) are chosen independently of k, but depending on the strong convexity constant of the objective function and the Lipschitz constant of its gradient. More precisely, the Lipschitz constant L and the strong monotonicity constant \(\mu \) of the gradient of f are required. Then, the convergence of the heavy-ball method is deduced by Lyapunov’s first or second method, deducing local [46, 49] or global [48, 64] convergence, respectively. The convergence results are typically very strong, involving a q-linear convergence towards the minimum. Getting hands on the strong monotonicity constant \(\mu \) is difficult, in general, and for homogenization problems of Sect. 3, in particular. This constitutes the biggest limitation of the heavy-ball method.

In contrast, for nonlinear conjugate gradient methods, the parameters \(\alpha _k\) and \(\beta _k\) are chosen adaptively. The different formulae for \(\beta _k\) measure the local geometry of the function, and the step-size \(\alpha _k\) is chosen by a line-search procedure. Notice that the bounds entering the Wolfe conditions (2.15) and (2.14) also depend on local information about the function f. Convergence of the conjugate gradient method can be typically established for a large class of objective functions (see, for instance, Dai [66]), not necessarily restricted to convex objective functions. However, the convergence results are typically rather weak, for instance \(\Vert \nabla f(x_{k_n})\Vert \rightarrow 0\) along a subsequence. As already mentioned in the previous section, the biggest limitation of nonlinear conjugate gradient methods is their reliance upon line-search procedures.

2.3 A dynamical perspective on conjugate gradient methods

We have seen that heavy-ball methods work well for fixed step-size, but the momentum parameter needs tweaking. Fixing the step-size \(\alpha _k\equiv \alpha \) in the nonlinear conjugate gradient update (2.8) leads to \(\gamma _k = \beta _k\) in the heavy-ball scheme (2.16) and, thus, to the method

$$\begin{aligned} x_{k+1} = x_k - \alpha \nabla f(x_k) + \beta _k (x_k - x_{k-1}). \end{aligned}$$

Reversing the arguments leading to Eq. (2.18) yields, for \(\alpha = h^2\)

$$\begin{aligned} \frac{x_{k+1} - 2x_k + x_{k-1}}{h^2} + \frac{1 - \beta _k}{h} \, \frac{x_k - x_{k-1}}{h} = - \nabla f(x_k). \end{aligned}$$

Thus, provided the limit \(\frac{1 - \beta _k}{h}\) as \(h\rightarrow 0\) exists, we get the corresponding dynamical system

$$\begin{aligned} \ddot{x} + b(x,{\dot{x}})\,{\dot{x}} = -\nabla f(x) \end{aligned}$$
(2.21)

which is formally similar to Eq. (2.17), but with a state-dependent damping. However, not all formulae for \(\beta _k\) of Sect. 2.1 give rise to a dynamical system. (For instance, the Polak–Ribière–Polyak formula does not satisfy \(\beta _k^\text {PRP}\rightarrow 1\) as \(h\rightarrow 0\).) For the Fletcher–Reeves CG, \(\beta _k^\text {FR} = \frac{\Vert \nabla f(x_{k+1})\Vert ^2}{\Vert \nabla f(x_{k})\Vert ^2}\), we get

$$\begin{aligned} \frac{1 - \beta _k^\text {FR}}{h}= & {} - \frac{1}{h} \, \frac{\Vert \nabla f(x_{k+1}) - \Vert \nabla f(x_{k})\Vert ^2}{\Vert \nabla f(x_{k})} \longrightarrow \\&- \frac{\frac{d}{dt}[\Vert \nabla f(x)\Vert ^2]}{\Vert \nabla f(x)\Vert ^2} \quad \text {as} \quad h\rightarrow 0, \end{aligned}$$

ie, the “damping” term in Eq. (2.21) becomes

$$\begin{aligned} b(x,{\dot{x}}) = -2 \frac{\langle \nabla f(x), \nabla ^2 f(x) {\dot{x}}\rangle }{\Vert \nabla f(x)\Vert ^2}. \end{aligned}$$
(2.22)

Thus, we arrive at the dynamical system

$$\begin{aligned} \ddot{x} -2 \frac{\langle \nabla f(x), \nabla ^2 f(x) {\dot{x}}\rangle }{\Vert \nabla f(x)\Vert ^2} \, {\dot{x}} = -\nabla f(x) \end{aligned}$$
(2.23)

with initial conditions \(x(0) = x_0\) and \({\dot{x}}(0) = -\nabla f(x_0)\). Several remarks are in order.

  1. 1.

    The Fletcher–Reeves dynamical system is similar to the dynamical system associated to Nesterov’s fast gradient method [67, 68], as developed by Su–Boyd–Candes [69],

    $$\begin{aligned} \ddot{x} + \frac{3}{t}\,{\dot{x}} = -\nabla f(x), \quad t\ge 0, \end{aligned}$$
    (2.24)

    where \(\frac{3}{t}\) serves as a time-dependent damping. For convex f, a \(1/t^2\)-convergence rate of the objective along a trajectory of the ODE (2.24) may be established using Lyapunov-type arguments.

    To highlight the similarity with the Fletcher–Reeves–ODE (2.23), we rewrite the latter in the form

    $$\begin{aligned} \ddot{x} + \frac{d}{dt} \left[ - \ln \Vert \nabla f(x)\Vert ^2\right] \, {\dot{x}} = -\nabla f(x) , \end{aligned}$$
    (2.25)

    whereas we may cast (2.24) in the form

    $$\begin{aligned} \ddot{x} + \frac{d}{dt} \left[ -\ln \frac{1}{t^3}\right] \,{\dot{x}} = -\nabla f(x), \quad t> 0. \end{aligned}$$
    (2.26)

    Thus, for both Eqs. (2.25) and (2.26), a logarithmic barrier function, see Sect. 4.2 in Nesterov [70], determines the damping coefficient, leading to a damping when approaching the critical set and \(t=+\infty \), respectively.

    Also, for both ODEs (2.25) and (2.26), the trajectories are invariant w.r.t. a homogeneous rescaling of time.

  2. 2.

    The stability analysis of the dynamical system (2.18) using the Lyapunov function V from Eq. (2.19) may be repeated without change, leading to

    $$\begin{aligned} \frac{d}{dt}[V(x,{\dot{x}})] = - b(x, {\dot{x}}) \Vert {\dot{x}}\Vert ^2. \end{aligned}$$
    (2.27)

    We see that the dynamical system is stable provided b is non-negative. In turn, b from Eq. (2.22) is non-negative if and only if

    $$\begin{aligned} 2 \langle \nabla f(x), \nabla ^2 f(x) {\dot{x}}\rangle \equiv \frac{d}{dt} \left[ \Vert \nabla f(x)\Vert ^2\right] \le 0 \end{aligned}$$

    holds. Put differently, as long as the residual \(\Vert \nabla f(x)\Vert ^2\) is non-increasing, the Fletcher–Reeves dynamical system (2.23) is stable. Clearly, due to the initial condition \((x(0), {\dot{x}}(0)) = (x_0, -\nabla f(x_0))\) with \(\nabla f(x_0)\ne 0\), the residual is decreasing for convex functions f and small times \(t>0\). Indeed,

    $$\begin{aligned} b(x_0, -\nabla f(x_0)) = 2 \frac{\langle \nabla f(x_0), \nabla ^2 f(x_0) \nabla f(x_0)\rangle }{\Vert \nabla f(x_0)\Vert ^2} \end{aligned}$$

    holds. Thus, \(b(x_0, -\nabla f(x_0))\) is non-negative if \(\nabla ^2 f(x_0)\) is positive semi-definite (which is true if f is convex).

    In our numerical experiments, b remained positive (except for very few iterates). However, we did not find a mathematical proof for this fact. Various safe-guarding strategies (preventing b from becoming negative) never led to an improvement of the convergence behavior, whence we discarded discussing them from the manuscript.

  3. 3.

    In view of Siegel’s convergence result (2.20), fast convergence is to be expected if \(b \approx 2 \sqrt{\mu }\). b may be considered as a feedback controller for the dynamical system (2.21). Its analysis goes beyond this manuscript.

  4. 4.

    For some other nonlinear conjugate gradient methods, corresponding dynamical systems may be derived as well. For instance, the Dai–Yuan method [45] leads to the dynamical system

    $$\begin{aligned} \ddot{x} + \frac{d}{dt} \left[ - \ln - \langle \nabla f(x), {\dot{x}} \rangle \right] \, {\dot{x}} = -\nabla f(x). \end{aligned}$$

    However, the latter ODE is degenerate, and we did not investigate it further.

  5. 5.

    The linear conjugate gradient method was interpreted as a dynamical system with feedback control in Bhaya-Kaszkurewicz [71]. Their treatment is different from ours, as they consider discrete dynamical systems and discrete controls, and does not directly generalize to nonlinear CG.

3 Application to micromechanical problems and implementation in the FFT framework

In a small-strain setting, suppose a heterogeneous, possibly nonlinear elastic energy \(w:Y\times {{\,\mathrm{\text {Sym}}\,}}(d) \rightarrow \mathbb {R}\) is given, where \(Y\subseteq \mathbb {R}^d\) denotes a rectangular cell in d dimensions and \({{\,\mathrm{\text {Sym}}\,}}(d)\) stands for the linear space of symmetric \(d\times d\)-matrices. Typically, w is differentiable in the second variable, and has jumps in the first, the spatial, variable. The latter framework also encompasses inelastic problems, as w may stand for the condensed incremental potential corresponding to a fixed time step, see Ortiz–Stainier [72] and Miehe [73].

For prescribed macroscopic \(E\in {{\,\mathrm{\text {Sym}}\,}}(d)\), we seek a periodic displacement field \(u:Y\rightarrow \mathbb {R}^d\) solving the balance of linear momentum on the microscale

$$\begin{aligned} \text {div }\sigma (E + \nabla ^s u) = 0, \end{aligned}$$
(3.1)

where \(\nabla ^s\) denotes the symmetrized gradient, \(\text {div }\) stands for the divergence, and \(\sigma \) is the stress operator derived from the potential w,

$$\begin{aligned} \sigma (\varepsilon ) = \frac{\partial w}{\partial \varepsilon }(\cdot ,\varepsilon ) \quad \text {for} \quad \varepsilon :Y\rightarrow {{\,\mathrm{\text {Sym}}\,}}(d). \end{aligned}$$

After solving Eq. (3.1) for u, the effective stress is computed by volume averaging

$$\begin{aligned} \Sigma = \langle \sigma (E + \nabla ^s u)\rangle _Y \equiv \frac{1}{|Y|} \int _Y \sigma (E + \nabla ^s u)\, dx. \end{aligned}$$

To apply nonlinear conjugate gradient methods, we recast the balance of linear momentum (3.1) as an optimization problem (2.7). We set \(V = H^1_\#(Y;\mathbb {R}^d)\), the first order \(L^2\)-Sobolev space of periodic vector fields with vanishing mean, and

$$\begin{aligned} f(u) = \langle w(E+\nabla ^s u)\rangle _Y, \quad u\in H^1_\#(Y;\mathbb {R}^d), \end{aligned}$$
(3.2)

which is well-defined under technical conditions, see Schneider [74]. We notice that the first order optimality condition of the functional (3.2) is precisely the balance of linear momentum (3.1).

For the Hilbert space \(V = H^1_\#(Y;\mathbb {R}^d)\), we fix the inner product

$$\begin{aligned} (u,v) \mapsto \langle u,v \rangle = \frac{1}{|Y|} \int _Y \nabla ^s u:\nabla ^s v \, dx \equiv \langle \nabla ^su : \nabla ^s v\rangle _Y, \end{aligned}$$
(3.3)

where  :  denotes the double contraction of tensor indices. Recall that, for a continuously differentiable function f on a Hilbert space, the differential Df(u), for \(u\in V\), is an element of the (continuous) dual Hilbert space, ie, \(Df(u) \in V'\). The gradient of f at u is defined implicitly via

$$\begin{aligned} \langle \nabla f(u),v \rangle = Df(u)[v] \quad \text {for all} \quad v\in V. \end{aligned}$$

Thus, \(\nabla f(u) \in V\), but is dependent on the chosen inner product on V. (In contrast, for two equivalent inner products on a fixed Hilbert space, the corresponding differentials are identical.) More explicitly, the gradient may be written as

$$\begin{aligned} \nabla f(u) = R Df(u), \end{aligned}$$

where \(R:V' \rightarrow V\) denotes the Riesz isomorphism, the inverse of the mapping \(V\rightarrow V'\), \(u\mapsto \langle u,\cdot \rangle \).

In the homogenization context (3.2), we have

$$\begin{aligned} \langle Df(u),v\rangle = \langle \sigma (E+\nabla ^s u):\nabla ^sv \rangle _Y, \end{aligned}$$

which we may write more succinctly in the form

$$\begin{aligned} Df(u) = -\text {div }\sigma (E + \nabla ^s u) \end{aligned}$$

in terms of the “natural” pairing

$$\begin{aligned} V'\times V \rightarrow \mathbb {R}, \quad (g,u) \mapsto \frac{1}{|Y|} \int _Y g\cdot u\, dx. \end{aligned}$$

The Riesz map for the inner product (3.3), in turn, becomes \(R = -G\) in terms of the Green’s function \(G = (\text {div }\nabla ^s)^{-1}\), which is well-defined on \(H^1_\#(Y;\mathbb {R}^d)\) (due to the mean-value constraint).

With these identifications, for given initial vector \(u_0\) and \(d_{-1} = 0\), the nonlinear conjugate gradient update (2.8) becomes

$$\begin{aligned} \begin{aligned} d_k&= -G\text {div }\sigma (E + \nabla ^s u_k) + \beta _{k-1} d_{k-1},\\ u_{k+1}&= u_k + \alpha _k d_k. \end{aligned} \end{aligned}$$

For FFT-based methods, it is convenient to introduce the total strain \(\varepsilon _k = E + \nabla ^s u_k\) and \(D_k = \nabla ^s d_k\), and to rewrite the latter update in the form

$$\begin{aligned} \begin{aligned} D_k&= -\Gamma : \sigma (\varepsilon _k) + \beta _{k-1} D_{k-1},\\ \varepsilon _{k+1}&= \varepsilon _k + \alpha _k D_k, \end{aligned} \end{aligned}$$
(3.4)

using the operator \(\Gamma = \nabla ^s G\text {div }\), an orthogonal projector on \(L^2(Y;{{\,\mathrm{\text {Sym}}\,}}(d))\) (see Milton [75]). Upon choosing \(\beta _k\equiv 0\) in Eq. (3.4), we recover the basic scheme of Moulinec-Suquet

$$\begin{aligned} \varepsilon _{k+1} = \varepsilon _k - \alpha _k \Gamma : \sigma (\varepsilon _k) \end{aligned}$$

with variable step-size, confirming the observations in Schneider [74]. Specializing to the Fletcher–Reeves CG (2.9), we may write

$$\begin{aligned} \begin{aligned} G_k&= \Gamma : \sigma (\varepsilon _k)\\ \beta _{k-1}&= \frac{\Vert G_k\Vert ^2}{\Vert G_{k-1}\Vert ^2}\\ D_k&= - G_k + \beta _{k-1} D_{k-1},\\ \varepsilon _{k+1}&= \varepsilon _k + \alpha _k D_k. \end{aligned} \end{aligned}$$
(3.5)

Discretizing the balance of linear momentum (3.1), see Moulinec–Suquet [1, 2], Willot [22] or Schneider–Ospald–Kabel [21], leads to discretized systems of equations (with corresponding discretized \(\Gamma \)-operators) that may be solved using nonlinear CG (3.5). We summarized the implementation in algorithm 1.

figure a

Several remarks are in order.

  1. 1.

    To complete the description of the Algorithm 1, a step-size strategy needs to be prescribed. We use the step-size of the basic scheme (see Schneider [74] for the nonlinear convex case), ie, if \(c_-\) and \(c_+\) denote the smallest and largest eigenvalue of the material tangents \(\frac{\partial \sigma }{\partial \varepsilon }\), evaluated over all microscopic points and admissible strains, we set \(\alpha = \frac{2}{c_- + c_+}\).

  2. 2.

    In the presented form, the algorithm can be implemented on three strain-like fields \((\varepsilon , G, D)\). In contrast to the Barzilai–Borwein scheme [39] and the fast gradient schemes [74], an additional field is necessary for evaluating \(\beta _k\).

  3. 3.

    The other nonlinear conjugate gradient methods (2.10), (2.12) and (2.13) may be implemented in a similar way. However, they require storing (at least) an additional vector, as \(D_{k} - D_{k-1}\) enters the formulae for \(\beta _k\).

  4. 4.

    If the memory footprint is decisive, reduced-memory implementations may be used, see for instance Grimm–Strehle–Kabel [76].

  5. 5.

    Incorporating mixed boundary conditions is straightforward, see Kabel–Fliegener–Schneider [77].

4 Numerical investigations

4.1 Setup and materials

The nonlinear conjugate gradient method, cf Alg. 1, was integrated into an in-house FFT-based computational micromechanics code (written in Python with Cython extensions). For all but the MMC example, we use the staggered-grid discretization [21]. For the MMC example, in consistence with Schneider–Wicht–Böhlke [29], we rely upon the original spectral discretization of Moulinec–Suquet [1, 2, 78].

As the convergence criterion, we use

$$\begin{aligned} \frac{\Vert \Gamma :\sigma (\varepsilon ^k)\Vert }{\Vert \langle \sigma (\varepsilon ^k)\rangle _Y\Vert } \le \text{ tol } \end{aligned}$$
(4.1)

for prescribed tolerance \(\text {tol}\), see Schneider-Wicht-Böhlke [29], Sect. 5. (Notice \(\Sigma \equiv 0\) in (5.7) for strain-driven uniaxial extension.) For the algorithm 1, the quantity (4.1) is computed in line 18. We set \(\text {tol}=10^{-5}\) for all examples in this section, unless explicitly mentioned otherwise.

Affine extrapolation [2] is used to generate a good initial guess for the subsequent load step. The linear elastic computations were carried out on a Laptop with 16 GB RAM and a dual core Intel i7 CPU. For the inelastic computations, a desktop computer with a 6-core Intel i7 CPU and 32GB RAM was utilized.

All used material parameters are collected in Table 1. The meaning of the inelastic parameters is specified in the proper subsection.

Table 1 Material parameters used in this article, supplemented by their source

4.2 A sandcore microstructure

Fig. 1
figure 1

Sandcore microstructure and local solution field

As our first example, we consider a microstructure of inorganically bound sand for casting applications, which was already investigated in Schneider [39]. It consists of 64 sand grains with \(58.58\%\) volume fraction, held together by \(1.28\%\) binder, cf Fig. 1a. The geometry was generated by the algorithm described in the work of Schneider et al. [84] and resolved by \(256^3\) voxels. The material parameters used for the grains and the binder are given in Table1.

We apply uniaxial extension in x-direction (for \(5\%\) axial strain), see Fig. 1b, where the norm of the strain field is shown. For more details about digital sand-core physics, we refer to the work of Ettemeyer et al [85]. We compare several solution methods for this porous example. First, we compare the different conjugate gradient methods, cf Fig. 2a. The linear conjugate gradient method [31, 41] serves as our reference. We investigate the performance of the nonlinear conjugate gradient methods of Fletcher–Reeves (2.9), Dai–Yuan (2.13), Hestenes–Stiefel (2.12) and Polyak–Ribière–Polyak (2.10) for constant step-size (chosen according to the basic scheme). If exact line search was used, all these conjugate gradient methods would lead to identical iterates.

Fig. 2
figure 2

Comparison of different solvers for the sandcore microstructure, cf Fig. 1

From Fig. 2a we see that the Fletcher–Reeves and Dai–Yuan methods require about \(50\%\) more iterations than linear CG. The Dai–Yuan method is only slightly slower than Fletcher–Reeves, but exhibits a distinct zig-zagging from one iteration to the next. In contrast, the decrease of the residual is monotonic for the Fletcher–Reeves method.

The two other investigated nonlinear conjugate gradient methods, Hestenes-Stiefel and Polyak–Ribière–Polyak are much slower for this scenario. Actually, their convergence behavior is very close to the basic scheme in Fig. 2b. This is no coincidence, as we observe \(\beta ^\text {HS}_k\rightarrow 0\) and \(\beta ^\text {PRP}_k\rightarrow 0\) as \(k\rightarrow \infty \).

In addition to the conjugate gradient method, other popular FFT-based solvers were applied to the problem at hand, see Fig. 2b. The Broyden–Fletcher–Goldfarb–Shanno algorithm [56,57,58,59] may be considered as another generalization of the linear CG method, as it reduces to the linear CG method when applied to quadratic problems with exact line search. Here, we investigate the limited-memory version of BFGS, as introduced by Nocedal [60], see Wicht–Schneider-Böhlke [35] for details in the FFT-based context. For a depth of 4, L-BFGS exhibits a convergence behavior that almost coincides with Fletcher–Reeves CG, however at a much higher computational cost per iteration.

The Barzilai–Borwein method [86] is also investigated, as it serves as a reference solver in FFT-based computational micromechanics [39]. The Barzilai–Borwein method exhibits a strongly non-monotonic convergence behavior, as can also be seen in Fig. 2b. However, the method can be shown to exhibit local r-linear convergence [87] when applied to uniformly convex optimization problems. As we are only interested in the smallest k, s.t. the convergence criterion (4.1) is fulfilled, the non-monotonicity may be non-critical. Concerning the number of iterations to terminate, the Barzilai–Borwein scheme is almost identical to L-BFGS(4) and Fletcher–Reeves CG. The Anderson-accelerated basic scheme was pioneered by Shantraj et al. [38] and Chen et al. [88]. It may be interpreted as a Quasi-Newton method of multi-secant type [89]. Applied to linear problems it is closely related to GMRES, see Toth-Kelley [90]. For low accuracy (above \(10^{-3}\)), Anderson with depth 4 outperforms all other methods. This is to be expected, as the (generalized) minimal residual method represents the Krylov subspace solver with minimal residual [91]. (Recall that linear CG is the Krylov subspace solver with minimal “energy”.) However, for higher accuracy, the method slows down considerably. This complies with theoretical and numerical results [90], which show that an Anderson-accelerated gradient method does not improve the convergence rate, only the constant, see Li–Li [92].

For comparison we have also included Nesterov’s method with speed restart, as described in Schneider [74]. It represents the fastest accelerated gradient method for this kind of problem, compare Schneider [39], Sect. 3.2. Nesterov’s method is optimal in the sense that its convergence rate is identical to the worst-case convergence rate for optimizing uniformly convex functions with Lipschitz gradient [70]. However, the constant in front of the convergence estimate is about 4 times as high as the worst-case bound (which might also be too pessimistic). Furthermore, speed restarting is always slower than if Nesterov’s method is applied for explicitly known convexity constant. For the problem at hand, Nesterov’s method requires about 5 times as many iterations as linear CG. Last but not least, we report on the basic scheme which fails to converge to the prescribed tolerance within 1000 iterations.

4.3 A planar short-fiber reinforced composite

Fig. 3
figure 3

Fiber-reinforced microstructure and norm of the strain field upon different levels of monotonic uniaxial loading

We investigate a short-fiber reinforced microstructure and the associated effective mechanical response. We consider short glass fibers with a length of \(275\,\upmu \)m and a diameter of \(10\, \upmu \)m, reinforcing a polymer by \(18\%\) volume fraction. The second-order fiber-orientation tensor reads \(A=\text {diag}(0.45,0.45,0.1)\). The periodic microstructure, cf Fig. 3a, has a size of \(1024 \,\upmu \mathrm{{m}} \times 1024\, \upmu \mathrm{{m}} \times 128 \upmu \mathrm{{m}}\) and was generated by the sequential addition and migration algorithm [93]. The resulting microstructure contains 1119 cylindrical fibers and was discretized by \(512 \times 512 \times 64\) voxels.

For the elastic parameters of E-glass and the polymer, cf Table 1, we solve a linear homogenization problem for prescribed uniaxial extension of \(5\%\) strain up to a specified tolerance of \(10^{-10}\). Compared to the example of the previous section, the example at hand is simpler as the constituents are finitely contrasted. (The difficulties emerge for the nonlinear behavior, though). We compare the performance of the various nonlinear CG methods to linear CG in Fig. 4a. Both the Hestenes–Stiefel and the Polak–Ribière–Polyak method were not competitive, whence they are not included in the figure. The linear conjugate gradient method converges linearly, and quickly so. The Dai–Yuan and the Fletcher–Reeves CG are almost indistinguishable, converging also linearly, but with a slower rate than linear CG. Both require about \(50\%\) more iterations than linear CG. In contrast to the sandcore microstructure, the Dai–Yuan CG does not exhibit a zig-zagging convergence behavior. This observation may indicate that the Dai–Yuan method is more sensitive to the step-size than the Fletcher–Reeves CG. Indeed, due to the finite contrast, the step-size of the basic scheme (which we also use for the nonlinear CGs) is smaller than for the infinite-contrast case.

Fig. 4
figure 4

Convergence behavior of different solvers for homogenizing the fiber-reinforced microstructure, Fig. 3a

The numerical performance of a variety of other solvers is shown in Fig. 4b. We see that the Fletcher–Reeves method lags behind three other methods in terms of convergence rate: the L-BFGS(4) method, the heavy-ball scheme and the Barzilai–Borwein solver all lead to approximately identical convergence rate. The heavy-ball scheme, however, starts to become unstable for a residual of \(10^{-9}\). This is certainly a problem of the implementation, see Ernesti–Schneider-Böhlke [37], and can be cured easily, but at the expense of storing an additional field. The fluctuations inherent to the Barzilai–Borwein method are smaller than for the infinite-contrast case of the previous section. Similar to the sandcore microstructure, the Anderson-accelerated basic scheme is superior to all other schemes up to a tolerance of \(10^{-3}\), but become dramatically slower for higher accuracy. The basic scheme is not competitive.

Fig. 5
figure 5

Stress–strain diagrams for the inelastic examples in this paper. The elastic behavior is indicated by dots

After this warm-up, we turn our attention to the inelastic behavior. More precisely, following Wu et al. [82], we furnish the matrix with a \(J_2\)-elastoplastic material model and linear-exponential hardening

$$\begin{aligned} \sigma _0(p)=\sigma _Y + k_1 p + k_2(1-\exp (-mp)). \end{aligned}$$

We apply uniaxial extension in x-direction up to \(5\%\) strain, distributed into 50 equal-sized time steps. The stress-strain curves for matrix, fiber and the composite are shown in Fig. 5a. The local strain fields corresponding to \(1\%\), \(2\%\), \(3\%\), \(4\%\) and \(5\%\) are shown in Fig. 3. Up to \(1\%\) applied strain, cf Fig. 3b, the local strain field exhibits peaks only at fiber tips and at spots where fibers are closely packed. For higher levels of strain, large portions of the matrix get plastified, as is reflected in the high strain levels. For the inelastic problem at hand, we compare the basic scheme, Fletcher–Reeves CG, the Newton-CG method, L-BFGS(4) and the Barzilai–Borwein scheme, both in terms of iteration count and runtime, cf Fig. 6. Please keep in mind that the runtimes were recorded for the computer with 6 cores, see Sect. 4.1. For the forcing term in Newton-CG, we choose forcing-term choice 2 with \(\gamma _0 = 0.75\), see Wicht-Schneider-Böhlke [35].

Fig. 6
figure 6

Iteration count and runtime per load step for the fiber-reinforced composite with exponential-linear hardening

Taking a look at the iteration count, cf Fig. 6a, we see that the iterations required for the basic scheme increase monotonically up to about \(2\%\) applied strain. For higher levels of loading, the iteration count decreases again. For the other solvers, the iterations counts are much smaller than for the basic scheme. Their respective iteration counts are on a similar level, with the Newton-CG lagging behind slightly for strain levels exceeding \(3.5\%\).

A look at the runtime per step, cf Fig. 6b, enables us to further examine the performance of the solvers. As for the iteration counts, the basic scheme leads to much higher runtimes. The other solvers however, operate on different levels. The limited-memory BFGS method clearly lags behind in terms of performance. Fletcher–Reeves CG and Newton-CG exhibit more or less the same runtimes, with Newton-CG having advantages for lower strain-levels and Fletcher–Reeves CG being faster later on. The Barzilai–Borwein scheme appears to be faster, probably because of the lower computational cost per step. Indeed, the Barzilai–Borwein scheme is very close to the basic scheme in terms of computational cost per step, whereas both Newton-CG and Fletcher–Reeves CG are similar in terms of these costs, as long as the nonlinear material law is comparably expensive as applying the material tangent.

Note that we have also run the heavy-ball method for this problem. However, the scheme did not converge within 1000 iterations for the fourth load step, whence we consider the method clearly inferior.

Table 2 Average iteration count and total runtime for 50 load steps and the inelastic fiber-reinforced problem

To quantify our observations, both the average iteration count and the average runtime per step are collected in Table 2. The basic scheme takes more than ten times the iteration count that is required for the second worst solver considered. Thus, for the problem at hand, the basic scheme is clearly not competitive. Notice, however, that the basic scheme features the lowest run-time per iteration.

In terms of runtime per iteration, the Barzilai–Borwein method and Newton-CG are comparable. The Barzilai–Borwein method requires fewer iterations than Newton-CG. This is caused by the delicate choice of the forcing term (the tolerance to which the linear system needs to be solved) in the Newton method, see Wicht-Schneider-Böhlke [35] for a discussion. The Fletcher–Reeves CG has a similar iteration count as the Barzilai–Borwein method, but features a higher runtime per iteration. All in all, it is \(20\%\) slower than the Barzilai–Borwein method. Still, it is slightly faster than Newton-CG, but is characterized by a much smaller memory-footprint than Newton-CG. Last but not least, we mention L-BFGS, which has comparable iteration count, but is not competitive in terms of runtime. Indeed, the runtime per iteration is more than twice as high as for the basic scheme.

4.4 A metal-matrix composite

Fig. 7
figure 7

Microstructure and accumulated plastic strain for increasing applied axial strain for the metal-matrix composite (MMC)

As our last inelastic example, we consider a metal-matrix composite (MMC). The microstructure containing spherical fillers is shown in Fig. 7a. The ceramic fillers are modeled elastically, whereas the aluminium matrix is governed by \(J_2\)-elastoplasticity with power-law hardening

$$\begin{aligned} \sigma _0(p) = \sigma _Y + k\, p^m. \end{aligned}$$

The material parameters are listed in Table 1 following Segurado et al [83]. The composite was subjected to uniaxial extension up to \(3\%\) strain, decomposed into 225 equidistant time-steps. The stress-strain curves are shown in Fig. 5b. The accumulated plastic strain fields for increasing levels of applied strain are shown in Fig. 7b–d. Due to the strong nonlinearity induced by power-law hardening, this setup represents a challenging benchmark for FFT-based solution methods, and has been considered before in Schneider–Wicht–Böhlke [29] and Schneider [39]. To ensure compatibility to the results in the mentioned papers, the spectral discretization of Moulinec–Suquet [1, 2] was used.

Fig. 8
figure 8

Iteration per load step (\(128^3\) voxels, left) and average iteration count for varying resolution (right)

The iteration count per load step for the basic scheme, the Barzilai–Borwein method and Fletcher–Reeves CG is shown on the left of Fig. 8. We see that the basic scheme performs quite well for this example, and the Fletcher–Reeves CG is on the same level as Barzilai–Borwein.

Table 3 Summary of FFT-based solvers for nonlinear and inelastic computational homogenization

We refrained from including too many solvers into the latter diagram, as many solvers operate on a similar level for this example (or fail admirably, rendering the diagram less expressive). Thus, we included a table documenting the mean iteration counts for various solvers and increasing resolution. The solvers are grouped into

  1. 1.

    Gradient methods (basic scheme, Barzilai–Borwein),

  2. 2.

    Polarization schemes (Eyre–Milton scheme [25], the augmented Lagrangian method of Michel–Moulinec–Suquet [26] and the Monchiet–Bonnet solver [27]),

  3. 3.

    Fast gradient methods (Nesterov’s method with optimal step-size [49], the heavy-ball method with optimal step-size [49], Fletcher–Reeves CG (2.9)).

We notice that all schemes are stable w.r.t. mesh refinement except for Nesterov’s method and the heavy-ball scheme. This is a result of the lack of uniform convexity of the condensed energy functional governing von-Mises elastoplasticity with power-law hardening. Recall that the performances of Nesterov’s method and the heavy-ball scheme crucially rely upon knowing the strong convexity constant in question. We follow the approach of determining the strong convexity constant in terms of the eigenvalues of the tangent stiffness, for every iteration. As the mesh gets refined, the strains at the fiber tips increase, leading to a lowering of the convexity constant. A similar, but less severe trend is observed for the Eyre-Milton scheme, where the average iteration count increases for \(128^3\) voxels. For the polarization schemes, the strong convexity constant is required as well. However, the two damped polarization solvers, the augmented Lagrangian and the Monchiet-Bonnet scheme suffer less from mesh

Concerning the overall performance, the basic scheme is worst, requiring about \(150\%\) more iterations than the Barzilai–Borwein scheme. In turn, the polarization schemes outperform the Barzilai–Borwein scheme, or operate on roughly the same level. Recall from Schneider [39] that polarization schemes fail to perform well for infinitely contrasted media, but exhibit very good performance for finitely-contrasted media, as long as the complexity-reduction trick [29] is available.

From the category of fast gradient schemes, both Nesterov’s scheme and the heavy-ball solver fail to be competitive. The heavy-ball method becomes unstable, probably rooted in the lack of global convergence of the scheme. Nesterov’s method is globally convergent (and faster than the basic scheme). However, it requires about twice as many iterations as the other competitive solvers. Fletcher–Reeves CG consistently outperforms the Barzilai–Borwein scheme in terms of required iterations, but lags behind the two better polarization schemes.

5 Conclusion

This work was devoted to studying nonlinear conjugate gradient methods applied in the context of FFT-based micromechanics. We showed that the Fletcher–Reeves nonlinear conjugate gradient method is capable of filling the role as general-purpose solver for FFT-based computational micromechanics. Its monotonicity properties, which we showed by computational examples, set it apart from the Barzilai–Borwein method. In other regards, the performance is similar. Of course, the price to pay for monotonicity is a slight increase in both memory footprint and computational effort. In this regard, we may insert the Fletcher–Reeves CG into Table 3, which compares the corresponding merits and downsides of the competitive solvers for FFT-based computational micromechanics.

Our focus has been on providing a useful and useable method, and did not dwell upon theoretical investigations. Exploring the Fletcher–Reeves dynamical system (2.23) appears alluding, in particular in terms of stability and basins of attraction. Also, the corresponding time-discrete system should be investigated.