Convergence properties of the Broyden-like method for mixed linear–nonlinear systems of equations

Mannel, Florian

doi:10.1007/s11075-020-01060-y

Convergence properties of the Broyden-like method for mixed linear–nonlinear systems of equations

Original Paper
Open access
Published: 01 February 2021

Volume 88, pages 853–881, (2021)
Cite this article

Download PDF

You have full access to this open access article

Numerical Algorithms Aims and scope Submit manuscript

Convergence properties of the Broyden-like method for mixed linear–nonlinear systems of equations

Download PDF

Florian Mannel ORCID: orcid.org/0000-0001-9042-0428¹

1053 Accesses
3 Citations
Explore all metrics

Abstract

We consider the Broyden-like method for a nonlinear mapping $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ that has some affine component functions, using an initial matrix B₀ that agrees with the Jacobian of F in the rows that correspond to affine components of F. We show that in this setting, the iterates belong to an affine subspace and can be viewed as outcome of the Broyden-like method applied to a lower-dimensional mapping $G:\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}$, where d is the dimension of the affine subspace. We use this subspace property to make some small contributions to the decades-old question of whether the Broyden-like matrices converge: First, we observe that the only available result concerning this question cannot be applied if the iterates belong to a subspace because the required uniform linear independence does not hold. By generalizing the notion of uniform linear independence to subspaces, we can extend the available result to this setting. Second, we infer from the extended result that if at most one component of F is nonlinear while the others are affine and the associated n − 1 rows of the Jacobian of F agree with those of B₀, then the Broyden-like matrices converge if the iterates converge; this holds whether the Jacobian at the root is invertible or not. In particular, this is the first time that convergence of the Broyden-like matrices is proven for n > 1, albeit for a special case only. Third, under the additional assumption that the Broyden-like method turns into Broyden’s method after a finite number of iterations, we prove that the convergence order of iterates and matrix updates is bounded from below by $\frac {\sqrt {5}+1}{2}$ if the Jacobian at the root is invertible. If the nonlinear component of F is actually affine, we show finite convergence. We provide high-precision numerical experiments to confirm the results.

On the order of convergence of Broyden’s method

Article Open access 21 October 2021

Local convergence of a relaxed two-step Newton like method with applications

Article 13 January 2017

Convergence of Newton’s method under Vertgeim conditions: new extensions using restricted convergence domains

Article 03 January 2017

1 Introduction

This work is devoted to convergence properties of the Broyden-like method for systems of equations in which some of the equations are linear. Among others, it provides the first answer to the decades-old question whether the Broyden-like matrices converge under the standard assumptions for q-superlinear convergence of the iterates, albeit for a special case only.

Given a smooth nonlinear mapping $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$, Broyden’s method [3] aims at finding $\bar {u}\in \mathbb {R}^{n}$ with:

$$ F(\bar{u}) = 0. $$

It is a well-established member of the class of quasi-Newton methods and shares its local q-superlinear convergence, cf. [9, 14, 15, 21, 23]. The Broyden-like method generalizes Broyden’s method by allowing an additional parameter σ_k in the matrix update. It reads as follows.

For (σ_k) ≡ 1, we recover Broyden’s method. An appropriate choice of σ_k ensures that B_k+ 1 is invertible if B_k is invertible. In fact, by the Sherman-Morrison formula, all choices but one maintain invertibility. The Broyden-like method is well known, cf. [22], [28, Section 6] and [16, Algorithm 1].

In this work, we consider Algorithm 1 for mixed linear-nonlinear systems of equations. That is, there exists J ⊂{1,…,n} such that $F_{j}(u)={a_{j}^{T}} u + b_{j}$, where $a_{j}\in \mathbb {R}^{n}$ and $b_{j}\in \mathbb {R}$ for all j ∈ J. In addition, we suppose that the initial matrix B₀ agrees with the Jacobian of F in the rows that correspond to (some of) the affine components of F, i.e., ${B_{0}^{j}} = {a_{j}^{T}}$ for all j ∈ J. For j∉J the functions F_j can be nonlinear and ${B_{0}^{j}}$ is not restricted. This framework includes many practically relevant systems of equations. Also, it fits two standard suggestions for the choice of B₀, which are to use $B_{0}=F^{\prime }(u^{0})$ or a finite difference approximation of $F^{\prime }(u^{0})$. In the following, we speak of exact initialization if ${B_{0}^{j}} = {a_{j}^{T}}$ for all j ∈ J.

This article is divided into four parts. In the first part, we show that exact initialization ensures that the steps (s_k)_k≥ 1 stay in a subspace ${\mathcal {S}}$ and that they can be generated by applying Algorithm 1 to a lower-dimensional mapping $G:\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}$, where d is the dimension of ${\mathcal {S}}$. This extends results from [18].

The second part is concerned with the consequences of the first part for the convergence of the Broyden-like matrices (B_k). We point out that it is still largely open if (B_k) converges and that several renowned researchers have mentioned this issue in their works, cf. the survey articles [8, Example 5.3], [21, p. 117], [14, p. 306] and [2, p. 940]. The convergence of (B_k) is for example of interest because it is closely related to the rate of convergence of (u^k), see, e.g., Lemma 2 and 3. For invertible $F^{\prime }(\bar {u}),$ there is only one result available: It is established in [22, Theorem 5.7] and in [17] that if the sequence of steps (s^k) is uniformly linearly independent, then (B_k) converges and $\lim _{k\to \infty } B_{k}=F^{\prime }(\bar {u})$. We include the precise result as Theorem 4. Unfortunately, conditions that imply uniform linear independence of (s^k) are unknown and we are not aware of a single example-be it theoretical or numerical-in which (s^k) is actually uniformly independent. In the setting of this work, anyway, (s^k)_k≥ 1 is confined to the subspace ${\mathcal {S}}$ and thus violates uniform linear independence. After extending the notion of uniform linear independence to subspaces, we generalize the above convergence result for (B_k) to the setting of this work, cf. Theorem 5. In doing so, we also obtain a formula for the limit of (B_k).

In the third part, we observe that if F has only one nonlinear component function and B₀ is initialized exactly, then the generalized convergence result from the second part implies that (B_k) converges whenever the iterates (u^k) converge, and this holds for regular and for singular $F^{\prime }(\bar {u})$, cf. Corollary 2. Since the assumption of only one nonlinear component function is very restrictive, we stress that this is the first time that convergence of (B_k) is shown for n > 1 and invertible $F^{\prime }(\bar {u})$. We will also see that even though each B_k agrees with $F^{\prime }(\bar {u})$ in n − 1 of n rows, the limit of (B_k) is generally not $F^{\prime }(\bar {u})$.

We continue the third part by paying special attention to the case that σ_k = 1 for all k ≥ k₀ and some k₀ ≥ 0, i.e., Algorithm 1 turns into Broyden’s method. The result of the first part implies that in this case, Broyden’s method essentially reduces to the one-dimensional secant method. This yields a comprehensive characterization of the convergence of (u^k) including a lower bound for its q-order, which in turn allows us to establish significantly stronger convergence properties of (B_k) than for the Broyden-like method, cf. Theorem 6. For affine F, we prove finite convergence if σ_k = 1 is selected at least once, cf. Theorem 7. The third part concludes with a brief application of the developed convergence theory to two examples from the literature.

In the last part, we verify the results from the third part in numerical experiments with high precision. Among others, we find that if $F^{\prime }(\bar {u})$ is invertible, then choosing $(\sigma _{k})_{k\geq k_{0}}\equiv 1$ for some k₀ ≥ 0 leads to much faster convergence than, e.g., (σ_k) ≡ 0.99, while this is not the case if $F^{\prime }(\bar {u})$ is not invertible.

The convergence theory of Broyden’s method and specific versions of the Broyden-like method are developed in, e.g., [4, 12, 16, 22]. There is only one further result available on the convergence of the Broyden(-like) matrices besides the one mentioned above: In [19], it was recently shown for Broyden’s method that if $F^{\prime }(\bar {u})$ is singular with some additional structure, then $({\lVert B_{k+1}-B_{k}\rVert })$ converges q-linearly to zero under appropriate assumptions, so (B_k) converges.

For other quasi-Newton updates, convergence results are available. We are aware of results for the SR1 update [5, 11, 30], for the Powell-symmetric-Broyden update [26], for the DFP and the BFGS update [13], and for the convex Broyden class excluding the DFP update [29].

This paper is organized as follows. In Section 2, we collect preparatory results and we present the generalization of uniform linear independence that is useful for subspaces. In Section 3, we prove the subspace property of (s^k)_k≥ 1 and show that (s^k)_k≥ 1 can be obtained by applying Algorithm 1 to a suitable mapping $G:\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}$. Section 4 contains the convergence results for the Broyden-like matrices and the application to examples from the literature. Section 5 presents numerical experiments and Section 6 summarizes.

Notation

We use $\mathbb {N}=\{1,2,3,\ldots \}$. For $n\in \mathbb {N}$ we set [n] := {1,2,…,n}, [n]₀ := [n] ∪{0} and [0] := ∅. The Euclidean norm of $v\in \mathbb {R}^{n}$ is ${\lVert v\rVert }$, while ${\lVert A\rVert }$ is the spectral norm if $A\in \mathbb {R}^{m\times n}$. For $A\in \mathbb {R}^{m\times n}$, A^j indicates the j th row of A, regarded as a row vector, whereas $A^{i,j}\in \mathbb {R}$ is the usual notation for entries. The span of $C\subset \mathbb {R}^{n}$ is indicated by 〈C〉. We will use tacitly that Algorithm 1 cannot generate a step s^k satisfying s^k = 0. For k ≥ 0, we define:

$$ E_{k}:= B_{k} - F^{\prime}(\bar{u}) \qquad\text{ and }\qquad \hat s^{k} := \frac{s^{k}}{{\left\|s^{k}\right\|}}, $$

where the first definition assumes that Algorithm 1 has generated (B_k) and (u^k) with $\lim _{k\to \infty }u^{k}=\bar {u}$ for some $\bar {u}$ at which F is differentiable, while the second definition already makes sense if Algorithm 1 has generated s^k. We employ the q-order of convergence and the r-order of convergence in this work. They are studied in, e.g., [25, Section 9].

2 Preliminaries

2.1 Convergence of the Broyden-like method

The main convergence result for Algorithm 1 reads as follows.

Theorem 1

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ be differentiable in a neighborhood of $\bar {u}$ with $F(\bar {u})=0$ and let ${\lVert F^{\prime }(u)-F^{\prime }(\bar {u})\rVert }\leq L{\lVert u-\bar {u}\rVert }^{\alpha }$ for all u from this neighborhood and constants L,α > 0. Let $F^{\prime }(\bar {u})$ be invertible. If Algorithm 1 generates a sequence (u^k) that satisfies ${\sum }_{k}{\lVert u^{k}-\bar {u}\rVert }^{\alpha }<\infty $, then there holds:

$$ \sum\limits_{k=0}^{\infty} \left( \frac{{\left\|u^{k+1}-\bar{u}\right\|}}{{\left\|u^{k}-\bar{u}\right\|}}\right)^{2} < \infty, $$

(1)

implying that (u^k) converges q-superlinearly to $\bar {u}$.

Moreover, there are δ,ε > 0 such that for every (u⁰,B₀) with ${\lVert u^{0}-\bar {u}\rVert }\leq \delta $ and ${\lVert B_{0}-F^{\prime }(\bar {u})\rVert }\leq \varepsilon $, Algorithm 1 either terminates with output $u^{\ast }=\bar {u}$ or it generates (u^k) such that all B_k are invertible and ${\sum }_{k}{\lVert u^{k}-\bar {u}\rVert }^{\alpha }<\infty $.

Proof

This follows from [20, Theorem 1]. □

If we restrict attention to Broyden’s method instead of the Broyden-like method, then a stronger result is available, namely Gay’s theorem on 2n-step q-quadratic convergence [12, Theorem 3.1]. For mixed linear–nonlinear systems with exact initialization, this result has recently been improved.

Theorem 2

Let $n\in \mathbb {N}$, d ∈ [n]₀ and J := [n] ∖ [d]. Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ satisfy $F_{j}(u)={a_{j}^{T}} u + b_{j}$ for all j ∈ J, where $a_{j}\in \mathbb {R}^{n}$ and $b_{j}\in \mathbb {R}$ for all j ∈ J. Let F be differentiable in a neighborhood of $\bar {u}$ with $F(\bar {u})=0$ and let ${\lVert F^{\prime }(u)-F^{\prime }(\bar {u})\rVert }\leq L{\lVert u-\bar {u}\rVert }$ for all u from this neighborhood and a constant L > 0. Let $F^{\prime }(\bar {u})$ be invertible. Then there are δ,ε > 0 and C > 0 such that for every (u⁰,B₀) with ${\lVert u^{0}-\bar {u}\rVert }\leq \delta $, ${\lVert B_{0}-F^{\prime }(\bar {u})\rVert }\leq \varepsilon $, and ${B_{0}^{j}} = {a_{j}^{T}}$ for all j ∈ J, Algorithm 1 with (σ_k) ≡ 1 either terminates with output $u^{\ast }=\bar {u}$ or it generates (u^k) that satisfies (1) and:

$$ {\left\|u^{k+2d}-\bar{u}\right\|}\leq C{\left\|u^{k}-\bar{u}\right\|}^{2} \qquad\forall k\geq 1. $$

In particular, (u^k) converges q-superlinearly and with r-order at least 2^1/(2d) to $\bar {u}$ and all B_k are invertible.

Proof

See [18]. □

2.2 Convergence of the Broyden-like updates

If (u^k) and the Broyden-like updates converge, then $F(\lim _{k\to \infty } u^{k})=0$.

Lemma 1

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ be continuous at $\bar {u}$. Let (u^k) and (B_k) be generated by Algorithm 1. Suppose that $u^{k}\to \bar {u}$ and $\sup _{k\geq 0}{\lVert B_{k+1}-B_{k}\rVert }<\infty $. Then $F(\bar {u})=0$.

Proof

From $\sup _{k\geq 0}{\lVert B_{k+1}-B_{k}\rVert }<\infty $, we infer $\sup _{k\geq 0}\frac {{\lVert F(u^{k+1})\rVert }}{{\lVert s^{k}\rVert }}<\infty $. The convergence of (u^k) yields $\lim _{k\to \infty }{\lVert s^{k}\rVert }=0$, so $\lim _{k\to \infty }{\lVert F(u^{k})\rVert }=0$, whence $F(\bar {u})=0$. □

If (u^k) and the Broyden-like matrices converge, then the convergence of (u^k) is q-superlinear.

Lemma 2

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ be differentiable at $\bar {u}$ with $F^{\prime }(\bar {u})$ invertible. Let (u^k) and (B_k) be generated by Algorithm 1. Suppose that $u^{k}\to \bar {u}$ and ${\lVert B_{k+1}-B_{k}\rVert }\to 0$ for $k\to \infty $. Then (u^k) converges q-superlinearly to $\bar {u}$.

Proof

Due to the invertibility of $F^{\prime }(\bar {u})$ and $u^{k}\to \bar {u}$, there is C > 0 such that:

$$ \begin{array}{llll} {\left\|u^{k+1}-\bar{u}\right\|} & \leq C {\left\|F(u^{k+1})-F(\bar{u})\right\|} = \frac{C}{\sigma_{k}} {\left\|B_{k+1}-B_{k}\right\|}{\left\|s^{k}\right\|}\\ & \leq \frac{C}{\sigma_{\min}}{\left\|B_{k+1}-B_{k}\right\|}\left( {\left\|u^{k+1}-\bar{u}\right\|}+{\left\|u^{k}-\bar{u}\right\|}\right) \end{array} $$

for all k sufficiently large. Here, we also used that $F(\bar {u})=0$ by Lemma 1. Subtracting $\frac {C}{\sigma _{\min \limits }}{\lVert B_{k+1}-B_{k}\rVert }{\lVert u^{k+1}-\bar {u}\rVert }$ and taking the limit yields the claim. □

Next we show that convergence of (u^k) with q-order at least γ > 1 implies convergence of $({\lVert B_{k+1}-B_{k}\rVert })$ with r-order at least γ, cf. also [25, 9.1.8&9.2.7].

Lemma 3

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ and let (u^k) and (B_k) be generated by Algorithm 1. Suppose that (u^k) converges to some $\bar {u}$ and that F satisfies ${\lVert F(u)-F(\bar {u})\rVert }\leq L{\lVert u-\bar {u}\rVert }$ for all u in a neighborhood of $\bar {u}$ and some constant L > 0. Let γ > 1.

1.
If $F(\bar {u})=0$ and there is C > 0 such that for all k sufficiently large:
$$ {\left\|u^{k+1}-\bar{u}\right\|}\leq C {\left\|u^{k}-\bar{u}\right\|}^{\gamma} $$
(2)
is satisfied, then there exists $\hat C>0$ such that:
$$ {\left\|B_{k+1}-B_{k}\right\|} \leq \hat C {\left\|u^{k}-\bar{u}\right\|}^{\gamma-1} $$
(3)
for all sufficiently large k.
2.
If $C,\hat C>0$ exist such that (2) and (3) are satisfied for all sufficiently large k, then we have $F(\bar {u})=0$ and $\lim _{k\to \infty }\lVert {B_{k+1}-B_{k}}\rVert ^{\frac {1}{p^{k}}}=0$ for all p ∈ [1,γ). In particular, ${\sum }_{k}\lVert {B_{k+1}-B_{k}}\rVert <\infty $ and (B_k) converges.

Proof

Proof of 1: Since (2) implies q-superlinear convergence of (u^k), we obtain from a well-known result of Dennis and Moré that ${\lVert u^{k}-\bar {u}\rVert }/{\lVert s^{k}\rVert }\to 1$ for $k\to \infty $, cf. [7, Lemma 2.1]. The Lipschitz-type property of F at $\bar {u}$, $F(\bar {u})=0$ and (2) hence yield:
$$ {\left\|B_{k+1}-B_{k}\right\|} = \sigma_{k}\frac{{\lVert F(u^{k+1})-F(\bar{u})\rVert}}{{\lVert s^{k}\rVert}} \leq \hat C{\left\|u^{k}-\bar{u}\right\|}^{\gamma-1} $$
for all sufficiently large k and a constant $\hat C>0$, which proves (3).
Proof of 2: Lemma 1 yields $F(\bar {u})=0$ due to (3). To prove the remaining claims it suffices to establish that
$$ \lim_{k\to\infty}\left( {\lVert B_{k+1}-B_{k}\rVert}^{\frac{1}{\gamma-1}}\right)^{\frac{1}{p^{k}}}=0 \qquad \forall p\in[1,\gamma). $$
(4)
As (u^k) has q-order at least γ by (2), its r-order is also at least γ, cf. [25, 9.3.2], thus $\lim _{k\to \infty }{\lVert u^{k}-\bar {u}\rVert }^{\frac {1}{p^{k}}}=0$ for all p ∈ [1,γ), so (4) follows from (3).

□

Remark 1

For Broyden’s method, it is unknown whether (2) holds for any γ > 1 if n > 1, cf. also [18]. For n = 1, it is known that (2) holds with γ equal to the golden mean [31]. In Theorem 6, we show that this result extends to arbitrary n provided F has n − 1 affine component functions and B₀ is initialized exactly.

2.3 Uniform linear independence of dimension d

The following definition is the appropriate generalization of uniform linear independence for the purposes of this paper.

Definition 1

Let $n\in \mathbb {N}$ and $d\in \mathbb {N}$. The sequence of vectors $(s^{k})\subset \mathbb {R}^{n}\setminus \{0\}$ is called uniformly linearly independent of dimension d iff there exist constants $m\in \mathbb {N}$ and ρ > 0 such that for every sufficiently large k the set:

$$ \bigl\{ s^{k}, s^{k+1}, \ldots, s^{k+m} \bigr\} $$

contains d vectors $s^{k_{1}}, \ldots , s^{k_{d}}$ such that all singular values of the matrix:

$$ \begin{pmatrix} \frac{s^{k_{1}}}{{\lVert s^{k_{1}}\rVert}} & \frac{s^{k_{2}}}{{\lVert s^{k_{2}}\rVert}} & {\ldots} & \frac{s^{k_{d}}}{{\lVert s^{k_{d}}\rVert}} \end{pmatrix} \in\mathbb{R}^{n\times d} $$

are larger than ρ.

Remark 2

The usual notion of uniform linear independence, cf. [5, (AS.4)], is recovered for d = n. If d is not specified, then it is understood that d = n.

3 Behavior of the Broyden-like method on mixed systems

To conveniently state results for mixed linear–nonlinear systems of equations, we will use the following assumption.

Assumption 1

Let $n\in \mathbb {N}$, d ∈ [n]₀ and J := [n] ∖ [d]. Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ satisfy $F_{j}(u)={a_{j}^{T}} u + b_{j}$ for all j ∈ J, where $a_{j}\in \mathbb {R}^{n}$ and $b_{j}\in \mathbb {R}$ for all j ∈ J. Let $B_{0}\in \mathbb {R}^{n\times n}$ satisfy ${B_{0}^{j}}={a_{j}^{T}}$ for all j ∈ J and suppose that B₀ is invertible.

Remark 3

Due to ${B_{0}^{j}}={a_{j}^{T}}$ for all j ∈ J and the invertibility of B₀, Assumption 1 implies $\dim ({\langle \{a_{j}\}_{j\in J}\rangle })=n-d$, hence $\dim ({\langle \{a_{j}\}_{j\in J}\rangle }^{\perp })=d$.

The first result establishes basic properties of Algorithm 1 under Assumption 1. It generalizes [18, Lemma 2.1].

Lemma 4

Let Assumption 1 hold and let (u^k), (s^k) and (B_k) be generated by Algorithm 1. Then we have for each j ∈ J and all k ≥ 1 the identities ${B_{k}^{j}} = {a_{j}^{T}}$, F_j(u^k) = 0, ${a_{j}^{T}} s^{k}=0$ and B_ka_j = B₁a_j.

Proof

The proof of [18, Lemma 2.1] applies without changes. □

Under the assumptions of Lemma 4, the sequence (s^k) necessarily violates uniform linear independence except if J = ∅.

Corollary 1

Any selection $\{s^{k_{1}}, \ldots , s^{k_{d+1}}\}$ of d + 1 vectors from the sequence (s^k)_k≥ 1 of Lemma 4 is linearly dependent.

Proof

Lemma 4 yields ${a_{j}^{T}} s^{k}=0$ for all j ∈ J and all k ≥ 1, thus $s^{k}\in {\langle \{a_{j}\}_{j\in J}\rangle }^{\perp }$ for all k ≥ 1. The claim follows from $\dim ({\langle \{a_{j}\}_{j\in J}\rangle }^{\perp })=d$. □

To conveniently state the next result, we introduce some notation.

Definition 2

Let Assumption 1 hold. We set ${\mathcal {A}}:={\langle \{a_{j}\}_{j\in J}\rangle }$ and ${\mathcal {S}}:={\mathcal {A}}^{\perp }$. Furthermore, we let $\{{\mathfrak {s}}^{i}\}_{i\in [d]}$ be an orthonormal basis of ${\mathcal {S}}$ and we denote $S:=\begin {pmatrix} {\mathfrak {s}}^{1} & {\ldots } & {\mathfrak {s}}^{d} \end {pmatrix}\in \mathbb {R}^{n\times d}$. For any matrix $B\in \mathbb {R}^{n\times n}$, we denote:

$$ \widetilde B:=\begin{pmatrix}B^{1} \\ {\vdots} \\ B^{d}\end{pmatrix}\in\mathbb{R}^{d\times n} \qquad\text{ and similarly }\qquad \widetilde F(u):=\begin{pmatrix}F_{1}(u) \\ {\vdots} \\ F_{d}(u)\end{pmatrix}. $$

We show that under Assumption 1, the iterates (u^k)_k≥ 1 obtained by applying Algorithm 1 to F can also be generated by applying it to a mapping G acting between $\mathbb {R}^{d}$. The following result extends [18, Theorem 2.3].

Theorem 3

Let Assumption 1 hold and let (u^k), (B_k) and (σ_k) be generated by Algorithm 1, where each B_k is assumed to be invertible. Define:

$$ G:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}, \qquad G(w):=\widetilde F(u^{1} + S w) $$

as well as:

$$ C_{0}:=\widetilde B_{1} S\in\mathbb{R}^{d\times d}, \qquad w^{0}:=0\in\mathbb{R}^{d}, \qquad\text{ and }\qquad \tau_{k}:=\sigma_{k+1}\quad\forall k\geq 0. $$

Then the application of Algorithm 1 to G with initial guess (w⁰,C₀) and updating sequence (τ_k) generates sequences (w^k) and (C_k) with the following properties:

1.
Each C_k is invertible and for all k ≥ 1, there hold:
$$ u^{k} = u^{1} + S w^{k-1}, \qquad \widetilde F(u^{k}) = G(w^{k-1}) \qquad\text{and}\qquad C_{k-1} = \widetilde B_{k} S. $$
(5)
2.
The iterates (u^k) converge to $\bar {u}\in \mathbb {R}^{n}$ if and only if there is $\bar w\in \mathbb {R}^{d}$ such that (w^k) converges to $\bar w$. If (u^k) and (w^k) converge to $\bar {u}$ and $\bar w$, respectively, then we have for all k ≥ 1:
$$ \bar u = u^{1} + S\bar w \qquad\text{ and }\qquad {\left\|u^{k}-\bar{u}\right\|} = {\left\|w^{k-1}-\bar w\right\|}. $$
(6)
3.
The matrices (B_k) converge to $B\in \mathbb {R}^{n\times n}$ if and only if there is $C\in \mathbb {R}^{d\times d}$ such that (C_k) converges to C. If (B_k) and (C_k) converge to B and C, respectively, then we have for all k ≥ 1:
$$ C = \widetilde B S \qquad\text{ and }\qquad {\left\|C_{k}-C\right\|} = {\left\|B_{k}-B\right\|}. $$

Proof

Proof of 1: The proof of [18, Theorem 2.3], which is for (σ_k) ≡ 1, can be used almost verbatim.
Proof of 2: We will use several times that ${\lVert S v\rVert }={\lVert v\rVert }$ for all $v\in \mathbb {R}^{d}$ because the columns of S are orthonormal.Let (u^k) converge to $\bar {u}$. From (5), it follows that uⁿ − u^m = S(w^n− 1 − w^m− 1) for all n,m ≥ 1, which implies that (w^k) is a Cauchy sequence, hence convergent. Denoting the limit by $\bar w$ we deduce from (5) that $\bar {u} = u^{1} + S\bar w$, which in turn yields ${\lVert u^{k}-\bar {u}\rVert } = {\lVert S(w^{k-1}-\bar w)\rVert }$, hence ${\lVert u^{k}-\bar {u}\rVert } = {\lVert w^{k-1}-\bar w\rVert }$. If (w^k) converges to $\bar w$, then we can argue similarly.
Proof of 3: Let (B^k) converge to B. From (5), it follows that ${\lVert C_{n-1}-C_{m-1}\rVert }\leq {\lVert \widetilde B_{n}-\widetilde B_{m}\rVert }={\lVert B_{n}-B_{m}\rVert }$ for all n,m ≥ 1, where we used that ${\lVert S\rVert }=1$ and that ${B_{n}^{j}} - {B_{m}^{j}} = 0$ for all j ∈ J due to Lemma 4. This implies that (C^k) is a Cauchy sequence, hence convergent. Denoting the limit by C, we deduce from (5) that $C = \widetilde B S$. Let now (C^k) converge to C. We denote by $A\in \mathbb {R}^{n\times (n-d)}$ the matrix:
$$ A := \begin{pmatrix} \mathfrak{a}^{1} & {\ldots} & \mathfrak{a}^{n-d} \end{pmatrix}, $$
where $\{\mathfrak {a}^{i}\}_{i\in [n-d]}$ is an orthonormal basis of ${\mathcal {A}}$. Furthermore, let $\hat S\in \mathbb {R}^{n\times n}$ be given by $\hat S:=\begin {pmatrix}S & A \end {pmatrix}$. Since ${B_{k}^{j}} S = {a_{j}^{T}} S = 0$ and B_kA = B₁A for all j ∈ J and all k ≥ 1 by Lemma 4, we infer that:
$$ B_{k} \hat S = \begin{pmatrix} \begin{array}{ccc}\widetilde B_{k} S \\ 0 \end{array} \biggl\lvert\biggr. & B_{k} A \end{pmatrix} = \begin{pmatrix} \begin{array}{ccc}C_{k-1} \\ 0 \end{array} \biggl\lvert\biggr. & B_{1} A \end{pmatrix}, $$
(7)
where we also used the identity $\widetilde B_{k} S = C_{k-1}$ from (5). Since $\hat S \hat S^{T} = I$, it follows that:
$$ B_{k} = \begin{pmatrix} \begin{array}{ccc}C_{k-1} \\ 0 \end{array} \biggl\lvert\biggr. & B_{1} A \end{pmatrix} \hat S^{T} $$
for all k ≥ 1. Since (C_k) converges, we see that (B_k) converges, too. Denoting the limit of (B_k) by B, we conclude from (5) that $C = \widetilde B S$ and from (7) that ${\lVert C_{k-1}-C\rVert }={\lVert (B_{k} - B)\hat S\rVert } ={\lVert B_{k} - B\rVert }$, where we used that $\hat S$ is orthogonal.

□

Remark 4

Theorem 3 does not require invertibility of $F^{\prime }(\bar {u})$, which allows us to derive results for singular $F^{\prime }(\bar {u})$, too, cf. Theorems 6 and 7.

4 Convergence of the Broyden-like matrices

4.1 The general result

From [22, Theorem 5.7], we recall the following sufficient condition for convergence of (B_k) to $F^{\prime }(\bar {u})$.

Theorem 4

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ be strictly differentiable at $\bar {u}$. Let (u^k), (s^k) and (B_k) be generated by Algorithm 1. Let (u^k) converge to $\bar {u}$ and let (s^k) be uniformly linearly independent. Then $B:=\lim _{k\to \infty } B_{k}$ exists and satisfies $B=F^{\prime }(\bar {u})$. Moreover, we have $F(\bar {u})=0$. If, in addition, $F^{\prime }(\bar {u})$ is invertible, then (u^k) converges q-superlinearly.

Proof

There are three differences to [22, Theorem 5.7]. The first is that we replaced continuous differentiability of F by strict differentiability. It is easy to verify that the proof of [22, Theorem 5.7] still holds under this weaker assumption. The second and third difference are the statements for $F(\bar {u})=0$ and the q-superlinear convergence of (u^k), which we added. They follow from Lemma 1 and Lemma 2, respectively. □

Corollary 1 shows that for mixed linear–nonlinear systems with exact initialization, the uniform linear independence required in Theorem 4 does not hold. The following result extends Theorem 4 to mixed systems. We recall that the matrix S is introduced in Definition 2.

Theorem 5

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$. Let Assumption 1 hold and let (u^k), (s^k) and (B_k) be generated by Algorithm 1, where each B_k is assumed to be invertible. Let (u^k) converge to $\bar {u}$ and suppose that $w\mapsto \widetilde F(\bar {u}+Sw)$ is strictly differentiable at w = 0. Let (s^k) be uniformly linearly independent of dimension d. Then $B:=\lim _{k\to \infty } B_{k}$ exists and satisfies $\widetilde B S = \widetilde F^{\prime }(\bar {u}) S$, Ba_j = B₁a_j and $B^{j} = {a_{j}^{T}}=F_{j}^{\prime }(\bar {u})$ for all j ∈ J. Moreover, we have $F(\bar {u})=0$. If $\widetilde F^{\prime }(\bar {u}) S$ is invertible, then (u^k) converges q-superlinearly. If F is strictly differentiable at $\bar {u}$, then $E:=\lim _{k\to \infty } E_{k}$ exists and satisfies E = E₁(I − SS^T).

Proof

For d = n, we have J = ∅, $\widetilde E=E$ and $S\in \mathbb {R}^{n\times n}$ is orthogonal, so the result is equivalent to Theorem 4 and there is nothing to prove. For d < n, we begin by noting that Lemma 4 yields ${B_{k}^{j}} = {a_{j}^{T}}$ and B_ka_j = B₁a_j for all j ∈ J and all k ≥ 1, which carries over to $\lim _{k\to \infty } B_{k}$ if it exists. Next we show the existence of $\lim _{k\to \infty } B_{k}$. By applying Theorem 3, we obtain sequences (C_k) and (w^k) and a point $\bar w$ as stated in that theorem. Part 3 of that theorem shows that for convergence of (B_k), it suffices to demonstrate the convergence of (C_k). Denoting ${s_{w}^{k}}:=w^{k+1}-w^{k}$ we now prove that $({s_{w}^{k}})\subset \mathbb {R}^{d}\setminus \{0\}$ is uniformly linearly independent (of dimension d). Indeed, using (5), we have:

$$ \hat s^{k} = \frac{S s_{w}^{k-1}}{{\lVert s^{k}\rVert}} = \frac{S(w^{k}-w^{k-1})}{{\lVert S(w^{k}-w^{k-1})\rVert}} = \frac{S(w^{k}-w^{k-1})}{{\lVert w^{k}-w^{k-1}\rVert}}. $$

This implies that the matrix $\hat S^{k}$ appearing in the definition of uniform linear independence of dimension d of (s^k) and the matrix appearing in the definition of uniform linear independence of $({s_{w}^{k}})$ have identical singular values, so the uniform linear independence of dimension d of (s^k) implies the uniform linear independence of $({s_{w}^{k}})$. The uniform linear independence of $({s_{w}^{k}})$ and the results of Theorem 3 allow us to apply Theorem 4 to G, (w^k), $({s_{w}^{k}})$, and (C_k). This yields convergence of (C_k) to $G^{\prime }(\bar w)=\widetilde F^{\prime }(\bar {u}) S$, which by means of Theorem 3 3 implies $\widetilde BS = \widetilde F^{\prime }(\bar {u}) S$. Since (B_k) converges, Lemma 1 supplies $F(\bar {u})=0$ and Theorem 4 implies q-superlinear convergence of (w^k), from which the q-superlinear convergence of (u^k) follows by use of (6). If F is strictly differentiable at $\bar {u}$, then the claims for B imply that E exists and satisfies $\widetilde E S = 0$ as well as E^j = 0 and Ea_j = E₁a_j for all j ∈ J. It is easy to see that these conditions are equivalent to E = E₁(I − SS^T). □

Remark 5

1.
If F is strictly differentiable at $\bar {u}$, then $\widetilde F(\bar {u}+Sw)$ is strictly differentiable at w = 0. If $F^{\prime }(\bar {u})$ is invertible, then $\widetilde F^{\prime }(\bar {u})S$ is invertible.
2
To illustrate the conditions obtained for B let us consider the case that ${\mathcal {S}}=\{(s_{1},s_{2},\ldots ,s_{n})^{T}\in \mathbb {R}^{n}: s_{j}=0 \forall j>d\}$. In this case, we can use for S the first d columns of the n × n identity matrix. Thus, $\widetilde {B} S$ consists of the entries B^i,j, i,j ∈ [d], and $\widetilde B S = \widetilde F^{\prime }(\bar {u})S$ states that the first d × d block of B agrees with the respective block of $F^{\prime }(\bar {u})$. From Ba_j = B₁a_j for all j ∈ J, we obtain in addition that the entries B^i,j, i ∈ [d], j ∈ [n] ∖ [d], are the same as in B₁. If F is strictly differentiable at $\bar {u}$, then this implies that B^i,j, i ∈ [d], j ∈ [n] ∖ [d], cannot equal the respective entries of $F^{\prime }(\bar {u})$ if the rank of $(E_{0}^{i,j})_{i\in [d],j\in [n]\setminus [d]}$ is larger than one.

4.2 The special case d = 1

Sufficient conditions for uniform linear independence of $(s^{k})\subset \mathbb {R}^{n}$ are unknown for Broyden’s method if n > 1 (hence also for the more general Algorithm 1). However, any sequence $(s^{k})\subset \mathbb {R}^{n}\setminus \{0\}$ is uniformly linearly independent of dimension 1, hence Theorem 5 implies the following result.

Corollary 2

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$. Let Assumption 1 hold for d = 1 and let (u^k), (s^k) and (B_k) be generated by Algorithm 1, where each B_k is assumed to be invertible. Let (u^k) converge to $\bar {u}$ and suppose that $t\mapsto F_{1}(\bar {u}+t\bar s)$ is strictly differentiable at t = 0, where $\bar s:=S$. Then $B:=\lim _{k\to \infty } B_{k}$ exists and satisfies $B^{1} \bar s = F_{1}^{\prime }(\bar {u})(\bar s)$, $B^{1} a_{j} = {B_{1}^{1}} a_{j}$ and $B^{j} = {a_{j}^{T}}=F_{j}^{\prime }(\bar {u})$ for all j > 1. Moreover, we have $F(\bar {u})=0$. If $F_{1}^{\prime }(\bar {u})(\bar s)\neq 0$, then (u^k) converges q-superlinearly. If F₁ is strictly differentiable at $\bar {u}$, then $E:=\lim _{k\to \infty } E_{k}$ exists and satisfies $E^{1} = {E_{1}^{1}} (I-\bar s \bar s^{T})$ and E^j = 0 for all j > 1; in particular, (B_k) converges to $F^{\prime }(\bar {u})$ iff ${E_{1}^{1}} a_{j} = 0$ for all j > 1.

Remark 6

Under the assumptions of Corollary 2, each B_k agrees with $F^{\prime }(\bar {u})$ in all rows except the first and $B:=\lim _{k\to \infty }B_{k}$ exists, yet B will usually be different from $F^{\prime }(\bar {u})$ (provided $F^{\prime }(\bar {u})$ exists). If, say, $\bar s$ is the first canonical unit vector, then $E^{1} = \begin {pmatrix}0 & E_{1}^{1,2} & {\ldots } & E_{1}^{1,n}\end {pmatrix}$; hence, E = 0 holds iff $B_{1}^{1,j}=\left [F_{1}^{\prime }(\bar {u})\right ]_{j}$ for all j > 1, where $[F_{1}^{\prime }(\bar {u})]_{j}$ indicates the j th component of the vector $F_{1}^{\prime }(\bar {u})$. This also shows that if ${\lVert E_{0}\rVert }$ is large, then ${\lVert E\rVert }$ will usually be large, too. The numerical results in Section 5 and our numerical experience from other work confirm that (B_k) will frequently not converge to $F^{\prime }(\bar {u})$ and indicate that this also holds in more nonlinear settings.

We now focus on Broyden’s method, where (σ_k) ≡ 1. In fact, it is enough if σ_k = 1 for all k sufficiently large. For this case, we can strengthen the findings of Corollary 2 in several ways, for instance by providing orders of convergence for (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$. These results are derived by exploiting the fact that if σ_k = 1 for a $k\in \mathbb {N}$, then s^k+ 1 and thus u^k+ 2 can also be generated by the one-dimensional secant method, cf. the proof of Theorem 6 1. Correspondingly, let us first argue for the one-dimensional case.

Lemma 5

Let $G:\mathbb {R}\rightarrow \mathbb {R}$. Let (w^k), $({s_{w}^{k}})$ and (C_k) be generated by Algorithm 1 applied to G, using an update sequence (τ_k) that satisfies:

$$ \lim_{k\to\infty}\frac{\tau_{k+1}}{\tau_{k}} = 1. $$

Let (w^k) converge to $\bar w$ with $G(\bar w)=0$. For k ≥ 0, respectively, k ≥ 1 define:

$$ {q_{k}^{G}} := \frac{{\lvert w^{k+1}-\bar{w}\rvert}}{{\lvert w^{k}-\bar{w}\rvert}} \qquad\text{ and }\qquad {Q_{k}^{G}} := \frac{{\lvert C_{k+1}-C_{k}\rvert}}{{\lvert C_{k}-C_{k-1}\rvert}}. $$

Then the following statements hold:

1.
Let G be differentiable at $\bar w$ with $G^{\prime }(\bar w)\neq 0$. Let $\varphi :=\frac {1+\sqrt 5}{2}$ and suppose that:
$$ \lim_{k\to\infty}\frac{{\lvert w^{k+1}-\bar w\rvert}}{{\lvert w^{k}-\bar w\rvert}^{\varphi}} $$
(8)
exists. Then we have:
$$ \lim_{k\to\infty}\frac{{Q_{k}^{G}}}{q_{k-2}^{G}} = 1. $$
If, in addition, $\lim _{k\to \infty }\tau _{k}=1$ is satisfied, then there holds:
$$ \lim_{k\to\infty}\frac{{\lvert C_{k+1}-C_{k}\rvert}}{{\lvert C_{k}-C_{k-1}\rvert}^{\varphi}}={\lvert G^{\prime}(\bar w)\rvert}^{1-\varphi}. $$
2.
Let $m_{0}\in \mathbb {N}$, κ ∈ (0,1) and $\hat \kappa >0$. Let G be m₀ + 1 times differentiable at $\bar w$. Let $G^{(m)}(\bar w)=0$ for all m ∈ [m₀] and $G^{(m_{0}+1)}(\bar w)\neq 0$. Suppose that:
$$ \lim_{k\to\infty} {q_{k}^{G}} = \kappa \qquad\text{ and }\qquad \lim_{k\to\infty}\frac{{\lvert {s_{w}^{k}}\rvert}}{{\lvert w^{k}-\bar w\rvert}} = \hat\kappa $$
are satisfied. Then we have:
$$ \lim_{k\to\infty}{Q_{k}^{G}} = \kappa^{m_{0}}. $$

Proof

Proof of 1: Using $G(\bar w)=0,$ we find:
$$ \begin{array}{llll} \frac{\tau_{k-1}}{\tau_{k}}\cdot\frac{{\lvert C_{k+1}-C_{k}\rvert}}{{\lvert C_{k}-C_{k-1}\rvert}} & = \frac{{\lvert G(w^{k+1})\rvert}{\lvert s_{w}^{k-1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}{\lvert G(w^{k})\rvert}}\\ & = \frac{{\lvert G^{\prime}(\bar{w})(w^{k+1}-\bar{w})+o({\lvert w^{k+1}-\bar{w}\rvert})\rvert}{\lvert s_{w}^{k-1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}{\lvert G^{\prime}(\bar{w})(w^{k}-\bar{w})+o({\lvert w^{k}-\bar{w}\rvert})\rvert}} \end{array} $$
for all k ≥ 1. As (8) implies that (w^k) converges q-superlinearly, a well-known lemma of Dennis and Moré, cf. [7, Lemma 2.1], yields $\lim _{k\to \infty }\frac {{\lvert {s_{w}^{k}}\rvert }}{{\lvert w^{k}-\bar {w}\rvert }}=1$. Therefore, we have:
$$ \begin{array}{llll} \lim_{k\to\infty}\frac{{Q_{k}^{G}}}{q_{k-2}^{G}} & = \lim_{k\to\infty} \frac{{\lvert C_{k+1}-C_{k}\rvert}}{{\lvert C_{k}-C_{k-1}\rvert}}\frac{{\lvert w^{k-2}-\bar{w}\rvert}}{{\lvert w^{k-1}-\bar{w}\rvert}}\\ & = \lim_{k\to\infty} \frac{{\lvert G^{\prime}(\bar{w})\rvert}{\lvert w^{k+1}-\bar{w}\rvert}{\lvert w^{k-1}-\bar{w}\rvert}}{{\lvert w^{k}-\bar{w}\rvert}{\lvert G^{\prime}(\bar{w})\rvert}{\lvert w^{k}-\bar{w}\rvert}} \frac{{\lvert w^{k-2}-\bar{w}\rvert}}{{\lvert w^{k-1}-\bar{w}\rvert}} \\ & = \lim_{k\to\infty} \frac{{\lvert w^{k+1}-\bar{w}\rvert}{\lvert w^{k-2}-\bar{w}\rvert}}{{\lvert w^{k}-\bar{w}\rvert}^{2}}, \end{array} $$
provided the latter limit exists. By applying (8) multiple times, we obtain:
$$ \lim_{k\to\infty} \frac{{\lvert w^{k+1} - \bar{w}\rvert}{\lvert w^{k-2} - \bar{w}\rvert}}{{\lvert w^{k}-\bar{w}\rvert}^{2}} = \lim_{k\to\infty} \mu^{\varphi-1-\frac{1}{\varphi}}{\lvert w^{k-1} - \bar{w}\rvert}^{\varphi^{2}-2\varphi+\frac{1}{\varphi}} = 1, $$
where $\mu \in [0,\infty )$ denotes the limit from (8) and where we used the identities $\varphi ^{2}-2\varphi +\frac {1}{\varphi } = -\varphi +1+\frac {1}{\varphi } = \varphi -1-\frac {1}{\varphi }=0$ that follow from φ² − φ − 1 = 0. Similar considerations show that:
$$ \lim_{k\to\infty}\frac{{\lvert C_{k+1}-C_{k}\rvert}}{{\lvert C_{k}-C_{k-1}\rvert}^{\varphi}}= \bar \mu \lim_{k\to\infty} \frac{{\lvert w^{k+1}-\bar{w}\rvert}}{{\lvert w^{k}-\bar{w}\rvert}}\cdot \frac{{\lvert w^{k-1}-\bar{w}\rvert}^{\varphi}}{{\lvert w^{k}-\bar{w}\rvert}^{\varphi}} = \bar \mu $$
for $\bar \mu :={\lvert G^{\prime }(\bar {w})\rvert }^{1-\varphi }$, where we used (8) to obtain the final equality.
Proof of 2: Let us prove the claim for m₀ = 1; it is readily generalized to arbitrary m₀ ≥ 1. Taylor expansion around $\bar {w}$ together with $G(\bar {w})=0$ implies
$$ \begin{array}{llll} & \lim_{k\to\infty} \frac{{\lvert G(w^{k+1})\rvert}}{{\lvert G(w^{k})\rvert}}\\ & \enspace = \lim_{k\to\infty}\frac{{\lvert G^{\prime}(\bar{w})(w^{k+1}-\bar{w})+\frac{1}{2} G^{\prime\prime}(\bar{w})(w^{k+1}-\bar{w})^{2}+o({\lvert w^{k+1}-\bar{w}\rvert}^{2})\rvert}}{{\lvert G^{\prime}(\bar{w})(w^{k}-\bar{w})+\frac{1}{2} G^{\prime\prime}(\bar{w})(w^{k}-\bar{w})^{2}+o({\lvert w^{k}-\bar{w}\rvert}^{2})\rvert}}\\ & \enspace = \lim_{k\to\infty}\frac{{\lvert G^{\prime\prime}(\bar{w})\rvert}}{{\lvert G^{\prime\prime}(\bar{w})\rvert}}\cdot\frac{{\lvert w^{k+1}-\bar{w}\rvert}^{2}}{{\lvert w^{k}-\bar{w}\rvert}^{2}} = \kappa^{2} = \kappa^{m_{0}+1}. \end{array} $$
By assumption, we have $\hat \kappa =\lim _{k\to \infty }\frac {{\lvert {s_{w}^{k}}\rvert }}{{\lvert w^{k}-\bar {w}\rvert }}>0$, hence:
$$ \lim_{k\to\infty}\frac{{\lvert s_{w}^{k-1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}} = \lim_{k\to\infty}\frac{\hat\kappa{\lvert w^{k-1}-\bar{w}\rvert}}{\hat\kappa{\lvert w^{k}-\bar{w}\rvert}} = \frac{1}{\kappa}. $$
By definition, there holds for all k ≥ 1:
$$ \frac{\tau_{k-1}}{\tau_{k}}\cdot {Q_{k}^{G}} = \frac{{\lvert G(w^{k+1})\rvert}}{{\lvert G(w^{k})\rvert}} \cdot \frac{{\lvert s_{w}^{k-1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}}. $$
Taking the limit for $k\to \infty $ yields the claim.

□

We now provide a detailed description of the convergence behavior of Algorithm 1 with σ_k = 1 for all large k and d = 1, where F has n − 1 affine component functions F₂,…,F_n. We first present a result for nonlinear F₁ and then deal with affine F₁.

Theorem 6

Let Assumption 1 hold for d = 1 and let (u^k), (s^k) and (B_k) be generated by Algorithm 1, with each B_k invertible. Suppose that σ_k = 1 for all k large enough and that (u^k) converges to some $\bar {u}$. Set $\bar s:=S$ and define:

$$ q_{k} := \frac{{\left\|u^{k+1}-\bar{u}\right\|}}{{\left\|u^{k}-\bar{u}\right\|}} \qquad\text{ and }\qquad Q_{k} := \frac{{\left\|B_{k+1}-B_{k}\right\|}}{{\left\|B_{k}-B_{k-1}\right\|}} $$

for all k ≥ 0, respectively, k ≥ 1. Then the following statements hold:

1.
Let $t\mapsto F_{1}(\bar {u}+t\bar s)$ be twice differentiable near t = 0 with $t\mapsto F_{1}^{\prime \prime }(\bar {u} + t \bar s)(\bar s,\bar s)$ continuous at t = 0 and $F_{1}^{\prime }(\bar {u})(\bar s)\neq 0$. Then we have:
$$ \limsup_{k\to\infty}\frac{{\left\|u^{k+1}-\bar{u}\right\|}}{{\left\|u^{k}-\bar{u}\right\|}^{\varphi}} \leq \left\lvert\frac{F_{1}^{\prime\prime}(\bar{u})(\bar s,\bar s)}{2 F_{1}^{\prime}(\bar{u})(\bar s)}\right\rvert^{\frac{1}{\varphi}}, $$
(9)
where $\varphi :=\frac {1+\sqrt 5}{2}$. For all p ∈ [1,φ), there holds:
$$ \lim_{k\to\infty}{\left\|B_{k+1}-B_{k}\right\|}^{\frac{1}{p^{k}}}=0. $$
(10)
If, in addition, $F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)\neq 0$, then (9) holds with equality and $\limsup $ replaced by $\lim $, and we have:
$$ \lim_{k\to\infty}\frac{{\left\|B_{k+1}-B_{k}\right\|}}{{\left\|B_{k}-B_{k-1}\right\|}^{\varphi}}=\left\lvert F_{1}^{\prime}(\bar{u})(\bar s)\right\rvert^{1-\varphi}\qquad\text{and}\qquad \lim_{k\to\infty}\frac{Q_{k}}{q_{k-2}} = 1. $$
(11)
2.
Let $m_{0}\in \mathbb {N}$ and denote by κ ∈ (0,1) the unique root of the polynomial $x^{m_{0}+1}+x^{m_{0}}-1$ in (0,1). Let $t\mapsto F_{1}(\bar {u}+t\bar s)$ be m₀ + 1 times differentiable near t = 0 with its (m₀ + 1)th derivative continuous at t = 0. If $F_{1}^{(m)}(\bar {u})(\bar s,\ldots ,\bar s)=0$ for all m ∈ [m₀] and $F_{1}^{(m_{0}+1)}(\bar {u})(\bar s,\ldots ,\bar s)\neq 0$, then:
$$ \lim_{k\to\infty} q_{k} = \kappa \qquad\text{ and }\qquad \lim_{k\to\infty} Q_{k} = \kappa^{m_{0}}. $$

Proof

Proof of 1: From Theorem 3, we obtain $G:\mathbb {R}\rightarrow \mathbb {R}$, (w^k), (C_k), and $\bar w$ as stated in that theorem. We let ${s_{w}^{k}}:=w^{k+1}-w^{k}$ for all k ≥ 0. Due to $C_{k} {s_{w}^{k}} ({s_{w}^{k}})^{T} / {\lvert {s_{w}^{k}}\rvert }^{2} = C_{k}$, we have $C_{k+1}=(G(w^{k+1})-G(w^{k}))/{s_{w}^{k}}$ if σ_k = 1 and thus Algorithm 1 for G agrees with the one-dimensional secant method for all sufficiently large k. As $(G(w^{k+1})-G(w^{k}))/{s_{w}^{k}} \to G^{\prime }(\bar w)$ for $k\to \infty $, we obtain the convergence of (C_k), thus $G(\bar w)=0$ by Lemma 1. Furthermore, there holds $G^{\prime }(\bar w) = \widetilde F^{\prime }(\bar {u}) S = F_{1}^{\prime }(\bar {u})(\bar s)\neq 0$. Since (w^k) converges to $\bar w$ with $G(\bar w)=0$ and $G^{\prime }(\bar w)\neq 0$, classical results for the secant method, cf. [31, (6)], yield that if $G^{\prime \prime }(\bar w)\neq 0$, then:
$$ \lim_{k\to\infty}\frac{{\lvert w^{k}-\bar w\rvert}}{{\lvert w^{k-1}-\bar w\rvert}^{\varphi}} =\left\lvert\frac{G^{\prime\prime}(\bar w)}{2 G^{\prime}(\bar w)}\right\rvert^{\frac{1}{\varphi}}, $$
which by use of (5) is readily transformed into (9) with equality and $\limsup $ replaced by $\lim $. Similarly for (9). The r-order (10) follows from Lemma 3 using that $F(\bar {u})=0$ due to Corollary 2. Since ${Q_{k}^{G}}=Q_{k+1}$ and $q_{k-2}^{G} = q_{k-1}$ by (5) and (6), Lemma 5 1 yields (11).
Proof of 2: We argue only for m₀ = 1. It follows from Corollary 2 that $F(\bar {u})=0$. It is a standard result for the one-dimensional secant method, cf. [10, Section 2.2.2], that $\lim _{k\to \infty }{q_{k}^{G}} = \kappa $, hence $\lim _{k\to \infty }q_{k} = \kappa $, too. The claim on (Q_k) follows via $({Q_{k}^{G}})$ from Lemma 5 2 if we can show that there is $\hat \kappa >0$ such that:
$$ \lim_{k\to\infty}\frac{{\lvert {s_{w}^{k}}\rvert}}{{\lvert w^{k}-\bar{w}\rvert}} = \hat\kappa. $$
Using $G^{\prime }(\bar w)=0$, $G^{\prime \prime }(\bar w)\neq 0$, and $\lim _{k\to \infty } {q_{k}^{G}}=\kappa $, elementary considerations show that there is an index k₀ such that $(w^{k}-\bar w)_{k\geq k_{0}}$ converges to zero without changing signs. For sufficiently large k, we thus have:
$$ \left\lvert {s_{w}^{k}}\right\rvert = \left\lvert (w^{k+1} - \bar w) - (w^{k} - \bar w)\right\rvert = (1-{q_{k}^{G}})\left\lvert w^{k}-\bar w\right\rvert, $$
hence, the desired limit exists with $\hat \kappa =1-\kappa >0$.

□

Remark 7

1.
If $F^{\prime }(\bar {u})$ is invertible, then $F_{1}^{\prime }(\bar {u})(\bar s)\neq 0$. Indeed, since $\bar s\in {\mathcal {S}}$ and since $F_{j}^{\prime }(\bar {u})={a_{j}^{T}}\in {\mathcal {A}} = {\mathcal {S}}^{\perp }$ for all j > 1, we have $F_{j}^{\prime }(\bar {u})(\bar s)=0$ for all j > 1; hence, $F_{1}^{\prime }(\bar {u})(\bar s)=0$ would imply $F^{\prime }(\bar {u})(\bar s)=0$.
2.
(9) and (10) show that (u^k), respectively, $({\lVert B_{k+1}-B_{k}\rVert })$ have q-order, respectively, r-order no less than φ. If $F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)\neq 0$, then the additional part of 1 implies that both (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$ have q-order and r-order φ, cf. [25, 9.3.3]. For (u^k), the q-order φ improves the best available result, which is the 2-step q-quadratic convergence ensured by Theorem 2 for d = 1. Moreover, the example in Section 4.3.2 shows that if $F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)=0$, then it is possible to have a higher q-order than φ.
3.
For m₀ = 1, Theorem 6 2 is related to the results in [6, 19].
4.
Corollary 2 is valid under the assumptions of Theorem 6, so in 1 and 2, we also have $F(\bar {u})=0$ and B satisfies the conditions from that corollary.

In the affine setting, Algorithm 1 terminates after finitely many steps, provided a root exists and σ_k = 1 for at least one k (if the Jacobian is regular). More precisely, we have the following result.

Theorem 7

Let $F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}$ be affine. Let Assumption 1 hold for d = 1 and let (u^k), (s^k) and (B_k) be generated by Algorithm 1, with each B_k invertible. Let F(u⁰)≠ 0. Then the following statements hold:

1.
Let $F^{\prime }$ be invertible. Then F has a unique root $\bar {u}$. If there is an index k ≥ 1 with σ_k = 1, then $u^{k+1}=\bar {u}$ or $u^{k+2}=\bar {u}$, hence the algorithm terminates in iteration k + 1 or k + 2 with output $u^{\ast }=\bar {u}$. If the algorithm does not terminate with output $u^{\ast } = \bar {u}$, then (u^k) converges to $\bar {u}$ and satisfies (1).
2.
Let $F^{\prime }$ be singular. If F has a root, then F(u¹) = 0. If F does not have a root, then the algorithm generates a diverging sequence (u^k) such that F(u^k) = (ω,0,…,0)^T for all k ≥ 1 and some ω≠ 0.

Proof

Proof of 1: From [22, Theorem 3.2], we know that for affine F with invertible $F^{\prime }$, Algorithm 1 converges q-superlinearly for any u⁰ if all B_k are invertible and the algorithm does not terminate with output $u^{\ast }=\bar {u}$. (Since d = 1, it is also not difficult to establish this directly.) Theorem 1 now yields (1). Corollary 2 yields the convergence of (B_k). It remains to prove that if σ_k = 1 and F(u^k+ 1)≠ 0, then F(u^k+ 2) = 0. Since F_j(u^k) = 0 for all j > 1 and all k ≥ 1 by Lemma 4, we have to show that F₁(u^k+ 2) = 0. Similar as in the proof of Theorem 6, we use Theorem 3 to obtain $\{w^{j}\}_{j=0}^{k+1}$ and $\{C_{j}\}_{j=0}^{k+1}$ by applying Algorithm 1 to the affine function $G:\mathbb {R}\rightarrow \mathbb {R}$, $G(w):=F_{1}(u^{1}+w\bar s)$, where $\bar s:=S$. In view of (5), we have to show that G(w^k+ 1) = 0. From τ_k− 1 = σ_k = 1, it follows that $C_{k} = (G(w^{k})-G(w^{k-1}))/(w^{k}-w^{k-1}) = G^{\prime }$. Using C_k(w^k+ 1 − w^k) = −G(w^k), we find $G(w^{k+1}) = G(w^{k}) + G^{\prime }\cdot (w^{k+1} - w^{k}) = G(w^{k})-G(w^{k})=0$, hence F(u^k+ 2) = 0.
Proof of 2: Defining $A:=F^{\prime }$, we note that A has rank n − 1 since $A \bar s=0$ and since n − 1 rows of A agree with the invertible B₀. Thus, A¹ can be expressed as a linear combination of $\{A^{j}\}_{j=2}^{n}$. Since F has a root and since F_j(u¹) = 0 for all j > 1 by Lemma 4, it readily follows that F₁(u¹) = 0, whence F(u¹) = 0. Now suppose that F does not have a root. By applying Theorem 3 again, we obtain that $G^{\prime }=A\bar s = 0$; hence, G is constant, say G ≡ ω for some $\omega \in \mathbb {R}$. Since F has no root, we must have ω≠ 0. Since G is constant, there holds F₁(u^k) = G(w^k− 1) = ω for all k ≥ 1. The sequence (u^k) cannot be convergent because Corollary 2 would entail that the limit point is a root of F.

□

Remark 8

1.
The starting point u⁰ is arbitrary in Theorem 7.
2.
The finite convergence in Theorem 7 1 is related to the 2n-step convergence of Broyden’s method for regular linear systems [12, 24]. Indeed, in the proof of Thm. 7 1, we can replace the computation for showing G(w^k+ 1) = 0 by an application of the 2n-step convergence to G using that due to τ_k− 1 = 1, $s_{w}^{k-1}$ and ${s_{w}^{k}}$ are the Broyden steps for initial (w^k− 1,C_k− 1).
3.
If in Theorem 7 1, Algorithm 1 does not terminate with $u^{\ast }=\bar {u}$, then $\lim _{k\to \infty } E_{k}$ exists and satisfies the conditions from Corollary 2.

4.3 Application to two examples from the literature

We illustrate some of our findings on two examples from the literature. The second example also hints at two extensions.

4.3.1 An example by Dennis and Schnabel

In [9, Example 8.1.3] and [9, Lemma 8.2.7], it is shown that for:

$$ F:\mathbb{R}^{2}\rightarrow\mathbb{R}^{2}, \qquad F(u)=\begin{pmatrix} u_{1} + u_{2} - 3 \\ {u_{1}^{2}} + {u_{2}^{2}} - 9 \end{pmatrix} $$

with root $\bar {u}=(0,3)^{T}$, the initial data:

$$ u^{0} = \begin{pmatrix} 1 \\ 5 \end{pmatrix} \qquad\text{ and }\qquad B_{0} = F^{\prime}(u^{0}) = \begin{pmatrix} 1 & 1 \\ 2 & 10 \end{pmatrix} $$

yields sequences (u^k) and (B_k) with $u^{k}\to \bar {u}$ for $k\to \infty $ and:

$$ B_{1} = \begin{pmatrix} 1 & 1 \\ 0.375 & 8.625 \end{pmatrix}, \qquad B:=\lim_{k\to\infty} B_{k} = \begin{pmatrix} 1 & 1 \\1.5 & 7.5 \end{pmatrix}, \qquad F^{\prime}(\bar{u}) = \begin{pmatrix} 1 & 1 \\ 0 & 6 \end{pmatrix}. $$

The affine component F₁ has coefficient vector a₁ = (1,1)^T, so ${\mathcal {S}}={\langle \{a_{1}\}\rangle }^{\perp }=\{t\bar s:t\in \mathbb {R}\}$ with $\bar s:=\frac {1}{\sqrt 2} (1,-1)^{T}$. Theorem 3 yields that $(s^{k})_{k\geq 1}\subset {\mathcal {S}}$ and (F₁(u^k))_k≥ 1 ≡ 0. Of course, this can also be verified directly, cf. also [9, Example 8.1.3 and Lemma 8.2.7]. In agreement with Theorem 5 and Corollary 2, there holds $\widetilde B S = B^{2} \bar s = -3\sqrt {2} = \widetilde F^{\prime }(\bar {u})S $, $B^{1} = {B_{0}^{1}}$ and B(1,1)^T = B₁(1,1)^T. (From B₁, $F^{\prime }(\bar {u})$ and $\bar s,$ we can actually determine the limit B.) Because of $F_{2}^{\prime }(\bar {u})\bar s\neq 0\neq F_{2}^{\prime \prime }(\bar {u})(\bar s,\bar s)$, Theorem 6 1 yields q-order φ for (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$ as well as the validity of (11).

4.3.2 An example by Dennis and Moré

In [8, Example 5.3], Dennis and Moré consider Broyden’s method for:

$$ F:\mathbb{R}^{2}\rightarrow\mathbb{R}^{2}, \qquad F(u)=\begin{pmatrix} u_{1} \\ u_{2}+{u_{2}^{3}} \end{pmatrix} $$

with root $\bar {u}=(0,0)^{T}$ and note that for any $\delta ,\epsilon \in \mathbb {R}$ the initial data:

$$ u^{0} = \begin{pmatrix} 0 \\ \epsilon \end{pmatrix} \qquad\text{ and }\qquad B_{0} = \begin{pmatrix} 1+\delta & 0 \\ 0 & 1 \end{pmatrix} $$

(12)

yields a sequence (B_k) with $B_{k}^{1,1}=1+\delta $ for all k ≥ 0. Hence, the incorrect entry 1 + δ is never corrected (assuming δ≠ 0), preventing convergence of (B_k) to $F^{\prime }(\bar {u})$. According to [8], “The above example points out that one of the disadvantages of Broyden’s method is that it is not self-correcting. In particular, B_k depends upon each B_j with j < k and thus it may retain information which is irrelevant or even harmful.”. It is well known that the BFGS method is self-correcting, cf., e.g., [1, 27].

We show that the iterates (u^k) converge rapidly despite the incorrect entry 1 + δ in all B_k. The affine component F₁ has coefficient vector a₁ = (1,0)^T, thus ${\mathcal {S}}={\langle \{a_{1}\}\rangle }^{\perp }=\{(0,t)^{T}: t\in \mathbb {R}\}$. We set $\bar s:=(0,1)^{T}$ and observe $(s^{k})_{k\geq 0}\subset {\mathcal {S}}$ as well as (F₁(u^k))_k≥ 0 ≡ 0. It is not difficult to see that Theorem 3 and, in turn, Theorem 6 1 apply, even though Assumption 1 is not satisfied in this example. Theorem 6 1 implies that if (u^k) converges to $\bar {u}$, then it has a q-order no smaller than φ and $({\lVert B_{k+1}-B_{k}\rVert })$ goes to zero with r-order no smaller than φ. The fast convergence is enabled by the fact that Broyden’s method effectively reduces to the one-dimensional secant method. It should also be noted that (B_k) converges to $F^{\prime }(\bar {u})$ in ${\mathcal {S}}$, i.e., $(B_{k}-F^{\prime }(\bar {u}))S\to 0$, cf. Corollary 2. Furthermore, since B₀S = 1 correctly approximates the affine part of F₂ and since F₂ does not contain a quadratic part, it can be shown that $({\lVert B_{k+1}-B_{k}\rVert })$ has q-order 2, which implies that (u^k) has q-order 2, too. The numerical experiments confirm the q-order 2, cf. Section 5.2.2.

5 Numerical experiments

We use numerical examples to verify Corollary 2 and Theorems 6 and 7. We first present the design of the experiments and then provide the examples and results.

5.1 Design of the experiments

5.1.1 Implementation and accuracy

We use the variable precision arithmetic (vpa) of Matlab 2020b. Unless stated otherwise, we work with a precision of 10000 digits and replace the termination criterion F(u^k) = 0 in Algorithm 1 by ${\lVert F(u^{k})\rVert }\leq 10^{-5000}$. By $\bar k,$ we denote the final value of k.

5.1.2 Known solution and random initialization

All examples have root $\bar {u}=0$ and the experiments are set up in such a way that convergence to $\bar {u}$ takes place in all runs except possibly a handful that are discarded. Except in the second example, the initial guess (u⁰,B₀) is randomly generated using Matlab’s function rand to satisfy u⁰ ∈ [−α,α]ⁿ and $B_{0}=F^{\prime }(u^{0})+\hat \alpha {\lVert F^{\prime }(u^{0})\rVert }R$. Here, $R\in \mathbb {R}^{n\times n}$ is a matrix with R^j = 0 for all j > 1 and the entries in R¹ randomly drawn from [− 1,1]. The values of α ∈ [10^− 3,1000] and $\hat \alpha \in [0,1000]$ will be specified within each example.

5.1.3 Quantities of interest

To display the course of Algorithm 1, we use the norm of F_k := F(u^k), the error ${\lVert E_{k}\rVert }$, the quotients q_k and Q_k introduced in Theorem 6, and furthermore:

$$ \beta_{k}:={\lVert B_{k}-B_{k-1}\rVert}, \qquad {C_{k}^{u}} := \frac{{\lVert u^{k}-\bar{u}\rVert}}{{\left\|u^{k-1}-\bar{u}\right\|}^{\varphi}},\qquad {C_{k}^{B}}:=\frac{{\lVert B_{k}-B_{k-1}\rVert}}{{\left\|B_{k-1}-B_{k-2}\right\|}^{\varphi}}, $$

as well as:

$$ \mathcal{R}_{k}^{B}:=\log\bigl({\left\|B_{k}-B_{k-1}\right\|}^{-1}\bigr)^{\frac{1}{k}} $$

and:

$$ {{\mathcal{Q}}_{k}^{u}} := \frac{\log({\lVert u^{k}-\bar{u}\rVert})}{\log({\lVert u^{k-1}-\bar{u}\rVert})},\qquad {{\mathcal{Q}}_{k}^{B}}:=\frac{\log({\lVert B_{k}-B_{k-1}\rVert})}{\log({\lVert B_{k-1}-B_{k-2}\rVert})}. $$

We note that ${{\mathcal {Q}}_{k}^{u}}$ and ${{\mathcal {Q}}_{k}^{B}}$ approximate the q-order of convergence while $\mathcal {R}_{k}^{B}$ approximates the r-order. Whenever any of these quantities is undefined, we set it to − 1, e.g., β₀ := − 1. We will use these quantities to confirm that (B_k) converges, cf. Corollary 2, and to assess the convergence order of (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$, cf. Theorem 6. We are also interested in whether ${\lVert E_{k}\rVert }\to 0$, i.e., whether (B_k) converges to the true Jacobian $F^{\prime }(\bar {u})$, cf. for instance Remark 6.

5.1.4 Single runs and cumulative runs

We use single runs and cumulative runs. For single runs, we display the quantities of interest during the course of the algorithm. A cumulative run consists of 1000 single runs with initial data varying according to Section 5.1.2, unless stated otherwise. Let us briefly describe the aggregated quantities that we use to assess cumulative runs. For instance, to gauge the q-order of $({\lVert B_{k+1}-B_{k}\rVert })$, we compute for each single run of a cumulative run the number:

$$ {\mathcal{Q}}_{j} :=\min_{k_{0}(j)\leq k\leq\bar k(j)} {{\mathcal{Q}}_{k}^{B}}, $$

where j ∈ [1000] indicates the respective single run and we consistently use $k_{0}(j):=\min \limits \{100,\lfloor 0.75\bar k(j)\rfloor \}$. As outcome of the cumulative run, we display:

$$ {\mathcal{Q}}_{B}^{-}:=\min_{j\in[1000]}{\mathcal{Q}}_{j}\qquad\text{ and }\qquad {\mathcal{Q}}_{B}^{+}:=\max_{j\in[1000]}{\mathcal{Q}}_{j}. $$

If the stronger conditions in Theorem 6 1 hold, then ${\mathcal {Q}}_{B}^{-}$ and ${\mathcal {Q}}_{B}^{+}$ should both be close to the golden mean φ. If the convergence is of lower order in any of the 1000 single runs, then we expect ${\mathcal {Q}}_{B}^{-}$ to be smaller than φ.

In the same way as just presented for ${\mathcal {Q}}_{B}^{-}$ and ${\mathcal {Q}}_{B}^{+}$, we derive ${\lVert E\rVert }^{-}$, ${\lVert E\rVert }^{+}$, q⁻, q⁺, ${\mathcal {Q}}_{u}^{-}$, ${\mathcal {Q}}_{u}^{+}$, β⁻, β⁺, Q⁻, Q⁺, $\mathcal {R}_{u}^{-}$, and $\mathcal {R}_{u}^{+}$ from the respective quantities used in single runs. In addition, we use:

$$ \lVert F\rVert^{-}:=\min_{j\in\left[1000\right]} \lVert F\left( u^{\bar{k}(j)}\right)\rVert \qquad \text{ and }\qquad \lVert F\rVert^{+}:=\max_{j\in\left[1000\right]} \lVert F\left( u^{\bar{k}(j)}\right)\rVert. $$

To keep the tables for cumulative runs of a reasonable size, we will omit some of these quantities, but what is omitted varies from example to example.

5.2 Numerical examples

5.2.1 Example 1

To verify the results of Theorem 6 1, we consider $F:\mathbb {R}^{10}\rightarrow \mathbb {R}^{10}$ given by:

$$ F(u)=\begin{pmatrix} u_{1} \cdot \left[ {\prod}_{j=2}^{10}\left( u_{j}+(-1)^{j}\right)\right]\\ A u \end{pmatrix}, $$

where $A\in \mathbb {R}^{9\times 10}$ is a random matrix with entries in [− 1,1] that is changed after each of the 1000 single runs of the cumulative run. The randomly generated A is only accepted if the resulting $F^{\prime }(\bar {u})$ is invertible. We use α = 0.001 in this example. A single and a cumulative run with (σ_k) ≡ 1 and $\hat \alpha =0$ are displayed in Tables 1 and 2. The results agree with Theorem 6 1. For instance, it is apparent that (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$ converge with q-order φ ≈ 1.618 and that $\lim _{k\to \infty }\frac {Q_{k}}{q_{k-2}}=1$ (since A is random, we expect $F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)\neq 0$). Table 2 also shows results for a cumulative run with (σ_k) ≡ 1 and $\hat \alpha =0.1$. In accordance with Theorem 6 1, deviating from the choice $B_{0}=F^{\prime }(u^{0})$ does not affect the q-order of convergence. Next we keep $\hat \alpha =0.1$ and let σ_k = 0.5 for k ≤ 3 and (σ_k)_k≥ 4 ≡ 1. Theorem 6 1 predicts that this choice of (σ_k) maintains q-order φ for (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$, and Table 2 confirms this.

Table 1 Example 1: Single run with $\hat \alpha =0$, i.e., $B_{0}=F^{\prime }(\bar {u})$

Full size table

Table 2 Example 1: Cumulative runs with $\hat \alpha =0$ (1st row), $\hat \alpha =0.1$ (2nd row), $\hat \alpha =0.1$ and σ_0,1,2,3 = 0.5 (3rd), $\hat \alpha =0$ and (σ_k) ≡ 0.99 (4th), $\hat \alpha =0$ and σ_k = 1 − (k + 2)^− 4 (5th), and $\hat \alpha =0$ and σ_k = 1 − (k + 2)^− 4 with higher precision (6th)

Full size table

In contrast, if we choose $\hat \alpha =0$ and (σ_k) ≡ 0.99, then the order of convergence drops significantly and the same holds for (σ_k) ≡ 1 − (k + 2)^− 4, cf. Table 2. In fact, except for some special cases it can be shown that (u^k) can only converge with q-order greater than one if σ_k → 1 fast enough. In particular, for (σ_k) ≡ 0.99 and (σ_k) ≡ 1 − (k + 2)^− 4, both (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$ have q-order 1. To confirm this for (σ_k) ≡ 1 − (k + 2)^− 4, we repeat the cumulative run with a higher precision of 100000 digits, using ${\lVert F(u^{k})\rVert }\leq 10^{-50000}$ as termination criterion and only 100 single runs instead of 1000. We view the results in Table 2 as being in line with q-order 1. In any case, it is apparent that for (σ_k) ≡ 0.99 and (σ_k) ≡ 1 − (k + 2)^− 4 the q-order of convergence is not φ anymore and that $({\lVert B_{k+1}-B_{k}\rVert })$ converges to zero at least q-linearly for all choices of (σ_k); hence, (B_k) converges, which validates Corollary 2. The values of ${\lVert E\rVert }^{-}$ show that (B_k) never converges to $F^{\prime }(\bar {u})$.

5.2.2 Example 2

We provide results for the example by Dennis and Moré discussed in Section 4.3.2, which concerns Broyden’s method, so (σ_k) ≡ 1. A single run is displayed in Table 3 and four cumulative runs in Table 4. For the single run and the first cumulative run, we use (u⁰,B₀) that satisfy (12) with randomly generated δ,𝜖 ∈ [− 0.5,0.5]. The results confirm that, as argued in Section 4.3.2; both (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$ have q-order 2. Because of $F_{2}^{\prime \prime }(\bar {u})=0$, this does not contradict Theorem 6 1.

Table 3 Example 2: Single run with initial data of the form (12)

Full size table

Table 4 Example 2: Cumulative runs with initial data of the form (12) (first row), random u₀ without exact initialization (2nd), random u₀ without exact initialization with higher precision (3rd), and random u⁰ with exact initialization (4th)

Full size table

In the second cumulative run, we let u⁰ = (𝜖₁,𝜖₂)^T for random numbers 𝜖₁,𝜖₂ ∈ [− 0.5,0.5], while keeping B₀ as in (12) with δ ∈ [− 0.5,0.5]. Due to 𝜖₁≠ 0, we cannot expect (s^k) to belong to a one-dimensional subspace; hence, Theorem 6 does not apply anymore. Correspondingly, the second row in Table 4 shows that (u^k) does not attain the q-order φ but suggests that the q-order may still have a lower bound larger than 1. This view is further encouraged by the fact that the r-order of $(\lVert {B_{k+1}-B_{k}}\rVert {2})$ seems to admit such a lower bound, too, which is a necessary condition for (u^k) to have a q-order, cf. Lemma 3. To investigate the potential q-order of (u^k) further, we repeat the cumulative run at a higher precision using ${\lVert F(u^{k})\rVert }\leq 10^{-100000}$ as termination criterion and 400 single runs. The results are contained in Table 4 and support the existence of a q-order larger than one for (u^k).

In the third cumulative run, whose results are depicted in the last row of Table 4, we keep the choice u⁰ = (𝜖₁,𝜖₂)^T from the second cumulative run, but use $B_{0}=F^{\prime }(u^{0})$ as initial, so that ${B_{0}^{1}} = F_{1}^{\prime }(u^{0})$ and hence Assumption 1 holds. In turn, Theorem 6 1 applies, which ensures a q-order, respectively, r-order no smaller than φ for (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$, respectively. It can be argued in the same way as in Section 4.3.2 that both sequences actually converge with q-order 2. Table 4 confirms this q-order.

The values of ${\lVert E\rVert }^{-}$ in Table 4 show that (B_k) never converges to $F^{\prime }(\bar {u})$. Yet, since $({\lVert B_{k+1}-B_{k}\rVert })$ declines quickly, the convergence of (u^k) is still rapid.

5.2.3 Example 3 a

We turn to Theorem 6 2, where $F^{\prime }(\bar {u})$ is singular. Let:

$$ F:\mathbb{R}^{3}\rightarrow\mathbb{R}^{3},\qquad F(u)=\begin{pmatrix} {u_{2}^{2}}-2 {u_{3}^{3}} \\ u_{1} + u_{2} + u_{3} \\ 5 u_{1} \end{pmatrix}. $$

Because of ${\mathcal {A}}^{\perp } = {\langle \{(0,1,-1)^{T}\}\rangle ,}$ we have $\bar s=\frac {1}{\sqrt {2}}(0,1,-1)^{T}$, hence $F_{1}^{\prime }(0)=0$ and $F_{1}^{\prime \prime }(0)(\bar s,\bar s)=2\neq 0$, which implies $\lim _{k\to \infty }q_{k}=\lim _{k\to \infty } Q_{k}=\frac {\sqrt {5}-1}{2}\approx 0.618$ for the choice (σ_k) ≡ 1 that we consider first. We use $\alpha =\hat \alpha =0.01$ in this example. The results of a cumulative run with (σ_k) ≡ 1 are displayed in Table 5 and are in perfect agreement with Theorem 6 2. Table 5 also provides results for (σ_k) ≡ 0.99, which are similar to those for (σ_k) ≡ 1. Moreover, it features ι⁻ and ι⁺, which denote the minimal, respectively, maximal number of iterations of all single runs within a cumulative run. As in the previous examples, we consistently find $B_{k}\not \to F^{\prime }(\bar {u})$.

Table 5 Example 3 a and b: Two cumulative runs in a with (σ_k) ≡ 1 (top) and (σ_k) ≡ 0.99 (below top) and in b with (σ_k) ≡ 1 (above bottom) and (σ_k) ≡ 0.99 (bottom)

Full size table

5.2.4 Example 3 b

We change F₁ in example 3 a, using $F_{1}(u)={u_{2}^{3}}-2 {u_{3}^{3}}$ instead. This results in $F_{1}^{\prime }(0)=0$, $F_{1}^{\prime \prime }(0)(\bar s,\bar s)=0$ and $F_{1}^{\prime \prime \prime }(0)(\bar s,\bar s,\bar s)\neq 0$, so Theorem 6 2 implies $\lim _{k\to \infty } q_{k} \approx 0.755$ and $\lim _{k\to \infty } Q_{k} \approx 0.570$. Table 5 confirms this for (σ_k) ≡ 1 and shows that the choice (σ_k) ≡ 0.99 induces only marginal changes. Overall, example 3 exhibits a remarkably uniform convergence behavior of iterates and matrix updates, as evidenced, for instance, by the fact that q⁻ = q⁺ and Q⁻ = Q⁺. Table 6 exemplifies this for example 3 b in a single run with (σ_k) ≡ 1. Since this uniformity is characteristic for singular $F^{\prime }(\bar {u})$ of rank n − 1, cf. also [19], we used ${\lVert F(u^{k})\rVert }\leq 10^{-500}$ as termination criterion in example 3 and the cumulative runs consisted of 100 single runs.

Table 6 Example 3 b: Single run with (σ_k) ≡ 1

Full size table

5.2.5 Example 4

To verify Theorem 7 1 we consider F(u) = Au, where $A\in \mathbb {R}^{10\times 10}$ is an invertible random matrix with entries in [− 1000,1000] that is changed after each single run of the cumulative run. We choose $\alpha =\hat \alpha =1000$. In the first cumulative run, we use σ₄ = 1, σ_k = 0.1 otherwise. Theorem 7 1 guarantees F(u⁶) = 0 if F(u^k)≠ 0 for 0 ≤ k ≤ 5. Table 7 shows that ι⁻ = ι⁺ = 6, so all runs use exactly 6 steps. On a side note, we remark that Q⁻ = Q⁺ = 9 can easily be proven. The second experiment displayed in Table 7 uses (σ_k) ≡ 1 − (k + 2)^− 4. The outcome is in line with Theorem 7 1 that asserts global q-superlinear, but not finite convergence for this choice of (σ_k), as well as convergence of (B_k). As in example 1, it can be shown that the q-order of (u^k) and $({\lVert B_{k+1}-B_{k}\rVert })$ is 1. To verify this, we repeat the cumulative run with (σ_k) ≡ 1 − (k + 2)^− 4, using a precision of 100000 digits and ${\lVert F(u^{k})\rVert }\leq 10^{-50000}$ as termination criterion, but only 100 single runs. The result in Table 7 is in line with q-orders of 1. Despite the fact that all B_k agree with A on n − 1 of n rows, the difference between B_k and A in the last 25% of iterations is large in norm, which, however, does not prevent finite convergence if σ_k = 1 for at least one k ≥ 1; cf. Theorem 7 and Remark 6.

Table 7 Example 4: Cumulative runs with σ_0,1,2,3,5 = 0.1 and σ₄ = 1 (top), with (σ_k) ≡ 1 − (k + 2)^− 4 (middle), with (σ_k) ≡ 1 − (k + 2)^− 4 and higher precision (bottom)

Full size table

6 Summary

We have shown that, up to a translation, the iterates of the Broyden-like method for mixed linear–nonlinear systems of equations can be obtained by applying the Broyden-like method to a lower-dimensional mapping, provided that the rows of the initial matrix agree with the rows of the Jacobian for (some of) the linear equations. We have used this subspace property to extend a sufficient condition for convergence of the Broyden-like matrices. For the special case that at most one equation is nonlinear, we have concluded that the Broyden-like matrices converge whenever the iterates converge. For Broyden’s method, we could, in addition, quantify how fast iterates and updates converge, respectively, prove finite convergence if the system is linear. We verified the results in high-precision numerical experiments.

References

Al-Baali, M.: Extra updates for the BFGS method. Optim. Methods Softw. 13(3), 159–179 (2000). https://doi.org/10.1080/10556780008805781
Article MathSciNet MATH Google Scholar
Al-Baali, M., Spedicato, E., Maggioni, F.: Broyden’s quasi-Newton methods for a nonlinear system of equations and unconstrained optimization: a review and open problems. Optim. Methods Softw. 29(5), 937–954 (2014). https://doi.org/10.1080/10556788.2013.856909
Article MathSciNet MATH Google Scholar
Broyden, C.: A class of methods for solving nonlinear simultaneous equations. Math. Comput. 19, 577–593 (1965). https://doi.org/10.2307/2003941
Article MathSciNet MATH Google Scholar
Broyden, C., Dennis, J., More, J.J.: On the local and superlinear convergence of quasi-Newton methods. J. Inst. Math. Appl. 12, 223–245 (1973). https://doi.org/10.1093/imamat/12.3.223
Article MathSciNet MATH Google Scholar
Conn, A.R., Gould, N.I.M., Toint, P.L.: Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math. Program. 50(2 (A)), 177–195 (1991). https://doi.org/10.1007/BF01594934
Article MathSciNet MATH Google Scholar
Decker, D.W., Kelley, C.T.: Broyden’s method for a class of problems having singular Jacobian at the root. SIAM J. Numer. Anal. 22, 566–574 (1985). https://doi.org/10.1137/0722034
Article MathSciNet MATH Google Scholar
Dennis, J., More, J.J.: A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comput. 28, 549–560 (1974). https://doi.org/10.2307/2005926
Article MathSciNet MATH Google Scholar
Dennis, J., More, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19, 46–89 (1977). https://doi.org/10.1137/1019005
Article MathSciNet MATH Google Scholar
Dennis, J., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM, Philadelphia (1996). https://doi.org/10.1137/1.9781611971200
Book Google Scholar
Díez, P.: A note on the convergence of the secant method for simple and multiple roots. Appl. Math. Lett. 16(8), 1211–1215 (2003). https://doi.org/10.1016/S0893-9659(03)90119-4
Article MathSciNet MATH Google Scholar
Fayez Khalfan, H., Byrd, R.H., Schnabel, R.B.: A theoretical and experimental study of the symmetric rank-one update. SIAM J. Optim. 3(1), 1–24 (1993). https://doi.org/10.1137/0803001
Article MathSciNet MATH Google Scholar
Gay, D.M.: Some convergence properties of Broyden’s method. SIAM J. Numer. Anal. 16, 623–630 (1979). https://doi.org/10.1137/0716047
Article MathSciNet MATH Google Scholar
Ge, R., Powell, M.J.D.: The convergence of variable metric matrices in unconstrained optimization. Math. Program. 27, 123–143 (1983). https://doi.org/10.1007/BF02591941
Article MathSciNet MATH Google Scholar
Griewank, A.: Broyden updating, the good and the bad! Doc. Math. (Bielefeld) pp. 301–315 (2012) https://www.emis.de/journals/DMJDMV/vol-ismp/45_griewank-andreas-broyden.pdf
Kelley, C.: Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia (1995). https://doi.org/10.1137/1.9781611970944
Book Google Scholar
Li, D., Fukushima, M.: A derivative-free line search and global convergence of Broyden-like method for nonlinear equations. Optim. Methods Softw. 13(3), 181–201 (2000). https://doi.org/10.1080/10556780008805782
Article MathSciNet MATH Google Scholar
Li, D., Zeng, J., Zhou, S.: Convergence of Broyden-like matrix. Appl. Math. Lett. 11(5), 35–37 (1998). https://doi.org/10.1016/S0893-9659(98)00076-7
Article MathSciNet MATH Google Scholar
Mannel, F.: On the 2n–step q-quadratic convergence and the q-order of Broyden’s method. Submitted (2020). https://imsc.uni-graz.at/mannel/Broy2n.pdf
Mannel, F.: On the convergence of Broyden’s method and of the Broyden matrices for a class of singular problems. Submitted (2020). https://imsc.uni-graz.at/mannel/CGB_sing.pdf
Mannel, F.: On the convergence rate of Broyden–like methods. In preparation (2021)
Martínez, J.M.: Practical quasi-Newton methods for solving nonlinear systems. J. Comput. Appl. Math. 124(1-2), 97–121 (2000). https://doi.org/10.1016/S0377-0427(00)00434-9
Article MathSciNet MATH Google Scholar
More, J., Trangenstein, J.: On the global convergence of Broyden’s method. Math. Comput. 30, 523–540 (1976). https://doi.org/10.2307/2005323
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5
MATH Google Scholar
O’Leary, D.P.: Why Broyden’s nonsymmetric method terminates on linear equations. SIAM J. Optim. 5(2), 231–235 (1995). https://doi.org/10.1137/0805012
Article MathSciNet MATH Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM, Philadelphia (2000). https://doi.org/10.1137/1.9780898719468
Book Google Scholar
Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J., Mangasarian, O., Ritter, K. (eds.) Nonlinear Programming, pp 31–65. Academic Press (1970). https://doi.org/10.1016/B978-0-12-597050-1.50006-3
Powell, M.J.D.: How bad are the BFGS and DFP methods when the objective function is quadratic? Math. Program. 34, 34–47 (1986). https://doi.org/10.1007/BF01582161
Article MathSciNet MATH Google Scholar
Sachs, E.: Convergence rates of quasi-newton algorithms for some nonsmooth optimization problems. SIAM J. Control Optim. 23, 401–418 (1985). https://doi.org/10.1137/0323026
Article MathSciNet MATH Google Scholar
Stoer, J.: The convergence of matrices generated by rank-2 methods from the restricted β-class of Broyden. Numer. Math. 44, 37–52 (1984). https://doi.org/10.1007/BF01389753
Article MathSciNet MATH Google Scholar
Sun, L.: The convergence of quasi-Newton matrices generated by the self-scaling symmetric rank one update. Indian J. Pure Appl. Math. 29(1), 51–58 (1998)
MathSciNet MATH Google Scholar
Vianello, M., Zanovello, R.: On the superlinear convergence of the secant method. Am. Math. Mon. 99(8), 758–761 (1992). https://doi.org/10.2307/2324244
Article MathSciNet MATH Google Scholar

Download references

Funding

Open Access funding provided by University of Graz.

Author information

Authors and Affiliations

University of Graz, Graz, Austria
Florian Mannel

Authors

Florian Mannel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Mannel.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mannel, F. Convergence properties of the Broyden-like method for mixed linear–nonlinear systems of equations. Numer Algor 88, 853–881 (2021). https://doi.org/10.1007/s11075-020-01060-y

Download citation

Received: 31 May 2020
Accepted: 21 December 2020
Published: 01 February 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11075-020-01060-y

Keywords

Mathematics subject classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convergence properties of the Broyden-like method for mixed linear–nonlinear systems of equations

Abstract

Similar content being viewed by others

On the order of convergence of Broyden’s method

Local convergence of a relaxed two-step Newton like method with applications

Convergence of Newton’s method under Vertgeim conditions: new extensions using restricted convergence domains

1 Introduction

Notation

2 Preliminaries

2.1 Convergence of the Broyden-like method

Theorem 1

Proof

Theorem 2

Proof

2.2 Convergence of the Broyden-like updates

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Remark 1

2.3 Uniform linear independence of dimension d

Definition 1

Remark 2

3 Behavior of the Broyden-like method on mixed systems

Assumption 1

Remark 3

Lemma 4

Proof

Corollary 1

Proof

Definition 2

Theorem 3

Proof

Remark 4

4 Convergence of the Broyden-like matrices

4.1 The general result

Theorem 4

Proof

Theorem 5

Proof

Remark 5

4.2 The special case d = 1

Corollary 2

Remark 6

Lemma 5

Proof

Theorem 6

Proof

Remark 7

Theorem 7

Proof

Remark 8

4.3 Application to two examples from the literature

4.3.1 An example by Dennis and Schnabel

4.3.2 An example by Dennis and Moré

5 Numerical experiments

5.1 Design of the experiments

5.1.1 Implementation and accuracy

5.1.2 Known solution and random initialization

5.1.3 Quantities of interest

5.1.4 Single runs and cumulative runs

5.2 Numerical examples

5.2.1 Example 1

5.2.2 Example 2

5.2.3 Example 3 a

5.2.4 Example 3 b

5.2.5 Example 4

6 Summary

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article