1 Introduction

Nonlinear eigenvalue problems play an important role in mathematics and its applications, see e.g., the surveys [21, 26, 30]. In particular, polynomial eigenvalue problems have been receiving much attention [3, 14, 15, 22, 24, 25]. Recall that

$$\begin{aligned} P(\lambda ) = \lambda ^{d}A_{d} + \dots + \lambda A_1 + A_0, \quad \ A_i \in {\mathbb {C}}^{m \times n}, \text { and } i=0, \dots , d \end{aligned}$$
(1.1)

is a matrix polynomial and that the number d is called the grade of \(P(\lambda )\). If \(A_d \ne 0\) then the grade coincides with the degree of the polynomial. Frequently, complete eigenstructures, i.e. elementary divisors and minimal indices of matrix polynomials (for the definitions, see e.g., [6, 14]) provide an understanding of the properties and behaviour of the underlying physical system and thus are the actual objects of interest. The complete eigenstructure is usually computed by passing to a (strong) linearization which replaces a matrix polynomial by a matrix pencil, i.e. matrix polynomials of degree \(d = 1\), with the same finite (and infinite) elementary divisors and with the known changes in the minimal indices, see [26] for more details. For example, a classical linearization of (1.1), used in this paper, is the first companion form

$$\begin{aligned} {{\mathscr {C}}}^1_{P(\lambda )}=\lambda \begin{bmatrix} A_d&{}\quad &{}\quad &{}\\ &{}\quad I_n&{}\quad &{}\\ &{}\quad &{}\ddots \quad &{}\\ &{}\quad &{}\quad &{}\quad I_n\\ \end{bmatrix} + \begin{bmatrix} A_{d-1}&{}\quad A_{d-2}&{}\quad \dots &{}\quad A_{0}\\ -I_n&{}\quad &{}\quad &{}\\ &{}\quad \ddots &{}\quad &{}\\ &{}\quad &{}\quad -I_n&{}\\ \end{bmatrix}, \end{aligned}$$
(1.2)

where \(I_n\) is the \(n\times n\) identity matrix and all nonspecified blocks are zeros.

In this paper, we find which perturbation of the matrix coefficients of a given matrix polynomial corresponds to a given perturbation of the entire linearization pencil. To be exact, we find such a perturbation of the matrix polynomial coefficients that the linearization of this perturbed polynomial (2.2) has the same complete eigenstructure as a given perturbed linearization (2.1). We also note that the existence of such a perturbation (2.2) was proven before for Fiedler-type linearizations [8, 14, 32], and even for a larger class of block-Kronecker linearizations [15]. Nevertheless, the problem of finding this perturbation explicitly has been open until now. The main contribution of this paper is an algorithm for finding this perturbation explicitly. Moreover, now the existence of such perturbation also follows from the convergence of the algorithm developed in this paper.

The results of this paper can be applied to a number of problems in numerical linear algebra. One example is solving various distance problems for matrix polynomials if the corresponding problems are solved for matrix pencils, e.g., finding a singular matrix polynomial nearby a given matrix polynomial [4, 18, 20]. Another application lies in the stratification theory [8, 14]: constructing an explicit perturbation of a matrix polynomial when a perturbation of its linearization is known. This will allow to say which perturbation does a certain change to the complete eigenstructure of a given polynomial. (In [11, 16, 17] the explicit perturbations for investigating such changes for matrix pencils, bi- and sesquilinear forms are derived.) Moreover, our result may also be useful for investigating the backward stability of the polynomial eigenvalue problems solved by using the backward stable methods on the linearizations, see e.g., [29].

2 Perturbations of matrix polynomials and their linearizations

Recall that for every matrix \(X = [x_{ij}]\) its Frobenius norm is given by \(\Vert X \Vert := \Vert X \Vert _F= \left( \sum _{i,j} |x_{ij}|^2 \right) ^{\frac{1}{2}}\). Hereafter, unless otherwise stated, we use the Frobenius norm for matrices. Let \(P(\lambda )\) be an \(m \times n\) matrix polynomial of grade d and define a norm of \(P(\lambda )\) as follows

$$\begin{aligned} \Vert P(\lambda ) \Vert := \left( \sum _{k=0}^d \Vert A_k \Vert ^2 \right) ^{\frac{1}{2}}. \end{aligned}$$

Definition 2.1

Let \(P(\lambda )\) and \(E(\lambda )\) be two \(m \times n\) matrix polynomials, with \({{\,\mathrm{grade}\,}}P(\lambda ) \ge {{\,\mathrm{grade}\,}}E(\lambda )\). The matrix polynomial \(P(\lambda ) + E(\lambda )\) is a perturbation of the \(m \times n\) matrix polynomial \(P(\lambda )\).

In this paper \(\Vert E(\lambda ) \Vert \) is typically small. Definition 2.1 is also applicable to matrix pencils as a particular case of matrix polynomials.

The first companion form \({{\mathscr {C}}}^1_{P(\lambda )}\) of \(P(\lambda )\) is defined in (1.2) and is a well-known way to linearize matrix polynomials, i.e. to substitute an investigation of a matrix polynomial by an investigation of a certain matrix pencil with the same characteristics of interest. Namely, \(P(\lambda )\) and \({{\mathscr {C}}}_{P(\lambda )}^1\) have the same finite and infinite elementary divisors (the same finite and infinite eigenvalues and their multiplicities), the same left minimal indices, and there is a simple relation between their right minimal indices (those of \({{\mathscr {C}}}_{P(\lambda )}^1\) are greater by \(d-1\) than those of \(P(\lambda )\)), see [6] for the definitions and more details. Define a (full) perturbation of the linearization of an \(m \times n\) matrix polynomial of grade d as follows

$$\begin{aligned} \begin{aligned} {{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}&:= {{{\mathscr {C}}}^1_{P(\lambda )} + \lambda E + {\widetilde{E}} = } \lambda \begin{bmatrix} A_d&{}\quad &{}\quad &{}\\ &{}\quad I_n&{}\quad &{}\\ &{}\quad &{}\ddots \quad &{}\\ &{}&{}&{}I_n\\ \end{bmatrix}\quad + \begin{bmatrix} A_{d-1}&{}\quad A_{d-2}&{}\quad \dots &{}\quad A_{0}\\ -I_n&{}\quad 0&{}\quad \dots &{}\quad 0\\ &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ 0&{}&{}-I_n&{}0\\ \end{bmatrix}\\&\qquad + \lambda \begin{bmatrix} E_{11}&{}\quad E_{12}&{}\quad E_{13}&{}\quad \dots &{}\quad E_{1d}\\ E_{21}&{}\quad E_{22}&{}\quad E_{23}&{}\quad \dots &{}\quad E_{2d}\\ E_{31}&{}\quad E_{32}&{}\quad E_{33}&{}\quad \dots &{}\quad E_{3d}\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ E_{d1}&{}\quad E_{d2}&{}\quad E_{d3}&{}\quad \dots &{}\quad E_{dd}\\ \end{bmatrix} + { \begin{bmatrix} \widetilde{E}_{11}&{}\quad \widetilde{E}_{12}&{}\quad \widetilde{E}_{13}&{}\quad \dots &{}\quad \widetilde{E}_{1d}\\ \widetilde{E}_{21}&{}\quad \widetilde{E}_{22}&{}\quad \widetilde{E}_{23}&{}\quad \dots &{}\quad \widetilde{E}_{2d}\\ \widetilde{E}_{31}&{}\quad \widetilde{E}_{32}&{}\quad \widetilde{E}_{33}&{}\quad \dots &{}\quad \widetilde{E}_{3d}\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ \widetilde{E}_{d1}&{}\quad \widetilde{E}_{d2}&{}\quad \widetilde{E}_{d3}&{}\quad \dots &{}\quad \widetilde{E}_{dd}\\ \end{bmatrix}},\nonumber \end{aligned}\!\!\!\!\!\\ \end{aligned}$$
(2.1)

and define a structured perturbation of the linearization, i.e. a perturbation in which only the blocks \(A_i, \ i=0,1, \dots , d\) are   perturbed

$$\begin{aligned} {{\mathscr {C}}}^1_{P(\lambda ) + E(\lambda )}&= \lambda \begin{bmatrix} A_d&{}\quad &{}\quad &{}\\ &{}\quad I_n&{}\quad &{}\\ &{}\quad &{}\quad \ddots &{}\\ &{}\quad &{}\quad &{}\quad I_n\\ \end{bmatrix} + \begin{bmatrix} A_{d-1}&{}\quad A_{d-2}&{}\quad \dots &{}\quad A_{0}\\ -I_n&{}\quad 0&{}\quad \dots &{}\quad 0\\ &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad &{}\quad -I_n&{}\quad 0\\ \end{bmatrix}\nonumber \\&\qquad + \lambda \begin{bmatrix} F_d&{}\quad 0&{}\quad \dots &{}\quad 0\\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \end{bmatrix} + \begin{bmatrix} F_{d-1}&{}\quad F_{d-2}&{}\quad \dots &{}\quad F_{0}\\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots \\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \end{bmatrix}. \end{aligned}$$
(2.2)

We also refer to (2.2) as the linearization of a perturbed matrix polynomial.

Recall that an \(m \times n\) matrix pencil \(\lambda A_1 + A_0\) is called strictly equivalent to \(\lambda B_1 + B_0\) if there are non-singular matrices Q and R such that \(Q^{-1}A_1R =B_1\) and \(Q^{-1}A_0R=B_0\). Note that two \(m \times n\) matrix pencils have the same complete eigenstructure if and only if they are strictly equivalent. Moreover, two \(m \times n\) matrix polynomials of degree d, \(P(\lambda )\) and \(Q(\lambda )\), have the same complete eigenstructure if and only if \({{\mathscr {C}}}^1_{P(\lambda )}\) and \({{\mathscr {C}}}^1_{Q(\lambda )}\) are strictly equivalent. Now we can state one of our goals as finding a perturbation \(E(\lambda )\) such that \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}\) and \({{\mathscr {C}}}^1_{P(\lambda )+ E(\lambda )}\) are strictly equivalent. The existence of such a perturbation \(E(\lambda )\) is known and stated in Theorem 2.1, which is a slightly adapted formulation of Theorem 2.5 in [10], see also Theorem 5.21 in [15] and [14, 23, 32].

Theorem 2.1

Let \(P(\lambda )\) be an \(m\times n\) matrix polynomial of degree d and let \({{\mathscr {C}}}^1_{P(\lambda )}\) be its first companion form. If \(\mathcal{C}^1_{P(\lambda )} + {{\mathscr {E}}}\) is a perturbation of \({{\mathscr {C}}}^1_{P(\lambda )}\) such that

$$\begin{aligned} \Vert {{\mathscr {E}}}\Vert = \Vert ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}) - {{\mathscr {C}}}^1_{P(\lambda )} \Vert < \frac{\pi }{12 \, d^{3/2}} \, , \end{aligned}$$

then there is some perturbation polynomial \(E(\lambda )\) such that \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}\) is strictly equivalent to the linearization of the perturbed polynomial \({{\mathscr {C}}}^1_{P(\lambda )+ E(\lambda )}\), i.e. there exist two nonsingular matrices U and V (they are small perturbations of the identity matrices) such that

$$\begin{aligned} U \cdot ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}) \cdot V = {{\mathscr {C}}}^1_{P(\lambda )+ E(\lambda )}, \end{aligned}$$

moreover,

$$\begin{aligned} \Vert {{\mathscr {C}}}^1_{P(\lambda ) + E(\lambda )} - {{\mathscr {C}}}^1_{P(\lambda )} \Vert \le 4 \, d \, (1+\Vert P(\lambda )\Vert _F) \; \Vert {{\mathscr {E}}} \Vert \, . \end{aligned}$$

Theorem 2.1 guarantees the existence of the structured perturbation (2.2) and the transformation matrices U and V but says nothing about how to find them. In the following section we present an algorithm that, for a given perturbation (2.1), finds such a structured perturbation (2.2) and transformation matrices explicitly.

3 Reduction algorithm

In this section we describe our iterative algorithm (Algorithm 3.1) that, by strict equivalence transformation, reduces a full perturbation of a linearization pencil (2.1) to a structured perturbation of this pencil (2.2), i.e. a perturbation where only the blocks that correspond to the matrix coefficients of a matrix polynomial are perturbed. The corresponding transformation matrices are derived too. We also analyze important characteristics of the proposed algorithm and its outputs.

Define an unstructured perturbation \({{\mathscr {E}}}^u=\lambda E^u + \widetilde{E}^u\) of the linearization \({{\mathscr {C}}}^1_{P(\lambda )}\) as a perturbation (2.1) where the blocks \(E_{11}, \widetilde{E}_{11}, \widetilde{E}_{12}, \dots , \widetilde{E}_{1d}\) are replaced by the zero blocks of the corresponding sizes, namely:

$$\begin{aligned} \begin{aligned} {{\mathscr {E}}}^u&= \lambda E^u + \widetilde{E}^u\\&= \lambda \begin{bmatrix} 0&{}\quad E_{12}&{}\quad E_{13}&{}\quad \dots &{}\quad E_{1d}\\ E_{21}&{}\quad E_{22}&{}\quad E_{23}&{}\quad \dots &{}\quad E_{2d}\\ E_{31}&{}\quad E_{32}&{}\quad E_{33}&{}\quad \dots &{}\quad E_{3d}\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ E_{d1}&{}\quad E_{d2}&{}\quad E_{d3}&{}\quad \dots &{}\quad E_{dd}\\ \end{bmatrix} + \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \widetilde{E}_{21}&{}\quad \widetilde{E}_{22}&{}\quad \widetilde{E}_{23}&{}\quad \dots &{}\quad \widetilde{E}_{2d}\\ \widetilde{E}_{31}&{}\quad \widetilde{E}_{32}&{}\quad \widetilde{E}_{33}&{}\quad \dots &{}\quad \widetilde{E}_{3d}\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ \widetilde{E}_{d1}&{}\quad \widetilde{E}_{d2}&{}\quad \widetilde{E}_{d3}&{}\quad \dots &{}\quad \widetilde{E}_{dd}\\ \end{bmatrix} \end{aligned}. \end{aligned}$$

\({{\mathscr {E}}}^u\) consists of all the perturbation blocks that are not included in the structured perturbation (2.2), i.e. \({{\mathscr {E}}}^u\) consists of all the perturbations of the identity and zero blocks of the linearization \({{\mathscr {C}}}^1_{P(\lambda )}\). In turn, the structured perturbation \({{\mathscr {E}}}^s\) is as follows

$$\begin{aligned} \begin{aligned} {{\mathscr {E}}}^s=\lambda E^s + \widetilde{E}^s = \lambda \begin{bmatrix} E_{11}&{}\quad 0&{}\quad \dots &{}\quad 0\\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \end{bmatrix} + \begin{bmatrix} \widetilde{E}_{11}&{}\quad \widetilde{E}_{12}&{}\quad \dots &{}\quad \widetilde{E}_{1d}\\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad 0&{}\quad \dots &{}\quad 0\\ \end{bmatrix} \end{aligned}. \end{aligned}$$

In Sect. 3.1 we show that the unstructured part of the perturbation tends to zero (entrywise) as the number of iterations of our algorithm grows; in Sect. 3.2 we derive a bound on the norm of the resulting structured perturbation; in Sect. 3.3 we explain how to construct the corresponding transformation matrices, i.e. matrices that reduce a full perturbation to a structured one.

We note that the construction of the corresponding transformation matrices in this paper is similar to the construction of the transformation matrices for the reduction to miniversal deformations of matrices in [12, 13], and that the evaluation of the norm of the structured part has some similarities with the evaluation of the norm of the miniversal deformation of (skew-)symmetric matrix pencils in [7, 9], see also [12, 13]. These similarities are due to the fact that our structured perturbation is a versal deformation (but not miniversal), see the mentioned papers for the definitions and details.

Algorithm 3.1

Let \({{\mathscr {C}}}^1_{P(\lambda )}\) be a first companion linearization of a matrix polynomial \(P(\lambda )\) and \({{\mathscr {E}}}_1\) be a full perturbation of \({{\mathscr {C}}}^1_{P(\lambda )}\).

  • Input: Matrix polynomial \(P(\lambda )\), perturbed matrix pencil \({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1\), and the tolerance parameter \(\mathrm{tol}\);

  • Initialization: \(U_1:=I\) and \(V_1:=I\)

  • Computation: While \(\Vert {{\mathscr {E}}}^u_i \Vert > \mathrm{tol}\)

    • find the minimum norm least-squares solution to the coupled Sylvester equations: \(\left( ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{i}^{{s}})X_i + Y_i({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{i}^{{s}})\right) ^u = - {{\mathscr {E}}}^u_i;\)

    • update the perturbation of the linearization (the updated perturbed linearization remains strictly equivalent to the original one): \(({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{i+1}):=(I+Y_i) ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{i}) (I+X_i);\)

    • update the transformation matrices: \(U_{i+1} := (I+Y_i)U_{i}\) and \(V_{i+1} := V_{i}(I+X_i);\)

    • extract the new unstructured perturbation \({{\mathscr {E}}}^u_{i+1} \) to be eliminated;

    • increase the counter: \(i:=i+1\);

  • Output: A structurally perturbed linearization pencil \({{\mathscr {C}}}^1_{P(\lambda )+E(\lambda )}:={{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{k}\), where \({{\mathscr {E}}}_{k}\) is a structured perturbation (since \(\Vert {{\mathscr {E}}}^u_{k}\Vert < \mathrm{tol}\)), and the transformation matrices U and V.

The system \(\left( ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}^s_{i})X_i + Y_i({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}^s_{i})\right) ^u = - {{\mathscr {E}}}^u_i\), appearing at the computation step of Algorithm 3.1, has a solution since the space

$$\begin{aligned} \left\{ ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}^s_{i}) X + Y({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}^s_{i}) \ | \ X \in {\mathbb {C}}^{n \times n}, Y \in {\mathbb {C}}^{m \times m} \right\} \end{aligned}$$

is transversal to the space of the first companion linearizations

$$\begin{aligned} \left\{ {{\mathscr {C}}}^1_{Q(\lambda )} \ | \ Q(\lambda ) \text { is an } m \times n \text { matrix polynomial} \right\} , \end{aligned}$$

i.e., these spaces add up to the whole space of matrix pencils of the corresponding size, see e.g., [14, 23, 32].

Following Algorithm 3.1 we can explicitly write how the resulting perturbation of the linearization \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{k}\) is constructed:

$$\begin{aligned} {{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{k}= & {} {{\mathscr {C}}}^1_{P(\lambda )} +{{\mathscr {E}}}_{1}^s + \underbrace{ {{\mathscr {E}}}_{1}^u + \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1^{{s}})X_1 + Y_1({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1^{{s}}) \right) ^u}_{=0} \nonumber \\&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1)X_1 + Y_1({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1) \right) ^s + \left( {Y_1}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1){X_1}\right) ^s \nonumber \\&+ \underbrace{ \left( {{{\mathscr {E}}}_1^u X_1 + Y_1{{\mathscr {E}}}_1^u+} {Y_1}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1){X_1}\right) ^u + \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2^{{s}})X_2 + Y_2({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2^{{s}}) \right) ^u}_{=0} \nonumber \\&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2)X_2 + Y_2({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2) \right) ^s + \left( {Y_2}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2){X_2}\right) ^s \nonumber \\&+ \underbrace{\left( {{{\mathscr {E}}}_1^u X_2 + Y_2{{\mathscr {E}}}_1^u +} {Y_2}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2){X_2}\right) ^u + \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_3^{{s}})X_3 + Y_3({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_3^{{s}}) \right) ^u}_{=0} \nonumber \\&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_3)X_3 + Y_3({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_3) \right) ^s + {\left( {{\mathscr {E}}}_1^u X_3 + Y_3{{\mathscr {E}}}_1^u\right) ^u +} {Y_3}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_3){X_3} + \dots , \nonumber \!\!\!\!\!\\ \end{aligned}$$
(3.1)

where

$$\begin{aligned} \begin{aligned} {{\mathscr {E}}}_2 = {{\mathscr {E}}}_{1}^s&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1)X_1 + Y_1({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1) \right) ^s + {Y_1}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1){X_1},\\ {{\mathscr {E}}}_3 = {{\mathscr {E}}}_{2}^s&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2)X_2 + Y_2({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2) \right) ^s + {Y_2}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2){X_2}, \ \dots \\ \end{aligned} \end{aligned}$$

In the rest of the paper we investigate properties of this algorithm and perform numerical experiments.

3.1 Elimination of the unstructured perturbation

We start by deriving an auxiliary lemma that will be used to prove that in Algorithm 3.1 the unstructured perturbation converges to zero.

For a given matrix T, define \(\kappa (T):= \kappa _F(T) = \Vert T\Vert \cdot \Vert T^{\dagger }\Vert \) to be a Frobenius condition number of T, e.g., see references [2, 5, 28]. We recall that the matrix \(T^{\dagger }\) denotes the pseudoinverse (the Moore–Penrose inverse) of T, see e.g., [19].

Lemma 3.1

Let ABCDM, and N be \(m\times n\) matrices and let the \(n \times n\) matrix X and the \(m \times m\) matrix Y be the minimum norm least-squares solution to the system of coupled Sylvester equations

$$\begin{aligned} \begin{aligned} AX + YB&= M,\\ CX + YD&= N. \end{aligned} \end{aligned}$$
(3.2)

Then

$$\begin{aligned} \Vert X\Vert \cdot \Vert Y\Vert \le \frac{\kappa (T)^2}{2(n\Vert A\Vert ^2 + m \Vert B\Vert ^2 + n \Vert C\Vert ^2 + m \Vert D\Vert ^2)} \left( \Vert M\Vert ^2 + \Vert N\Vert ^2\right) , \end{aligned}$$
(3.3)

where \(T=\begin{bmatrix} I_n \otimes A &{} B^T \otimes I_m \\ I _n\otimes C &{} D^T \otimes I_m \end{bmatrix}\) is the Kronecker product matrix associated with the system (3.2).

Proof

Using the Kronecker product we can rewrite the system of coupled Sylvester equations as a system of linear equations \(Tx=b\), or explicitly

$$\begin{aligned} \begin{bmatrix} I_n \otimes A &{} B^T \otimes I_m \\ I _n\otimes C &{} D^T \otimes I_m \end{bmatrix} \begin{bmatrix} {{\,\mathrm{vec}\,}}(X) \\ {{\,\mathrm{vec}\,}}(Y) \end{bmatrix} = \begin{bmatrix} {{\,\mathrm{vec}\,}}(M) \\ {{\,\mathrm{vec}\,}}(N) \end{bmatrix}. \end{aligned}$$
(3.4)

The minimum norm least-squares solution to such system can be written as \(x = T^{\dagger } b\), implying \(\Vert x\Vert \le \Vert T^{\dagger }\Vert \cdot \Vert b\Vert \) or more explicitly, and taking into account \(\Vert x\Vert ^2 = \Vert X\Vert ^2 + \Vert Y\Vert ^2\):

$$\begin{aligned} \begin{aligned} \Vert X\Vert ^2 + \Vert Y\Vert ^2&\le \Vert T^{\dagger }\Vert ^2 \left( \Vert M\Vert ^2 + \Vert N\Vert ^2\right) = \frac{\kappa (T)^2}{\Vert T\Vert ^2} \left( \Vert M\Vert ^2 + \Vert N\Vert ^2\right) \\&= \frac{\kappa (T)^2}{\left( n\Vert A\Vert ^2 + m \Vert B\Vert ^2 + n \Vert C\Vert ^2 + m \Vert D\Vert ^2\right) } \left( \Vert M\Vert ^2 + \Vert N\Vert ^2\right) , \end{aligned} \end{aligned}$$
(3.5)

where \(\kappa (T)\) is the Frobenius condition number of T. Taking into account that

$$\begin{aligned} \Vert X\Vert \cdot \Vert Y\Vert \le \frac{1}{2}\left( \Vert X\Vert ^2 + \Vert Y\Vert ^2\right) , \end{aligned}$$

we obtain

$$\begin{aligned} \Vert X\Vert \cdot \Vert Y\Vert \le \frac{\kappa (T)^2}{2(n\Vert A\Vert ^2 + m \Vert B\Vert ^2 + n \Vert C\Vert ^2 + m \Vert D\Vert ^2)} \left( \Vert M\Vert ^2 + \Vert N\Vert ^2\right) . \end{aligned}$$

\(\square \)

The bounding expression in (3.3) depends on the conditioning of the problem (3.4) as well as on how small (normwise) the right-hand side of (3.4) (or, equivalently, (3.2)) is, compared to the matrix coefficients in the left-hand side. The conditioning of (3.2) may actually be better than the conditioning of (3.4). Thus for very ill-conditioned problems and large perturbations, it may be reasonable to use a solver for (3.2) instead of passing to the Kronecker product matrices.

In the following theorem we prove that Algorithm 3.1 eliminates the unstructured perturbation, i.e. we show that the norm of the unstructured part of the perturbation tends to zero as the number of iterations grows.

Theorem 3.1

Let \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{1}\) be a perturbation of the linearization and let \(\alpha \Vert {{\mathscr {E}}}_{1}\Vert <~1\), where

$$\begin{aligned} \begin{aligned}&\alpha := \alpha \left( {{\mathscr {C}}}^1_{P(\lambda )}, {{\mathscr {E}}}_{1}\right) \\&= \sup _{i} \left\{ \frac{{ 2^{\frac{3}{2}}\kappa (T_i) \left( n+m\right) ^{\frac{1}{2}} \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) ^{\frac{1}{2}} +}\kappa (T_i)^2 \Vert W+E_i\Vert }{\sqrt{2}\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) },\right. \\&\qquad \qquad \left. \frac{{ 2^{\frac{3}{2}}\kappa (T_i) \left( n+m\right) ^{\frac{1}{2}} \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) ^{\frac{1}{2}} +}\kappa (T_i)^2 \Vert \widetilde{W}+\widetilde{E}_i\Vert }{\sqrt{2}\left( n+m\right) \left( \Vert W+ E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{ E}_i\Vert ^2\right) } \right\} , \end{aligned}\qquad \end{aligned}$$
(3.6)

with the pencils \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{i} = \lambda (W+E_i) + (\widetilde{W} + \widetilde{E_i})\) from Algorithm 3.1, and the Kronecker product matrices

$$\begin{aligned} T_i=\begin{bmatrix} I_{nd} \otimes (W+E_i) &{} (W+E_i)^T \otimes I_{m+n(d-1)} \\ I _{nd}\otimes (\widetilde{W}+\widetilde{E}_i) &{} (\widetilde{W}+\widetilde{E}_i)^T \otimes I_{m+n(d-1)} \end{bmatrix}. \end{aligned}$$
(3.7)

Then \({{\mathscr {E}}}_{i}^u = \lambda E^u_i + \widetilde{E}^u_i\) in Algorithm 3.1 satisfies \(\Vert {{\mathscr {E}}}_{i}^u\Vert \rightarrow 0\) when \(i \rightarrow \infty \).

Proof

We start by proving a bound for the norm of the unstructured part of a perturbation at the \((i+1)\)-st step of the algorithm, using the norm of the unstructured part of a perturbation at the i-th step of the algorithm. Define \({{\mathscr {C}}}^1_{P(\lambda )} = \lambda W + \widetilde{W}\).

Following Algorithm 3.1 we obtain matrices \(X_i\) and \(Y_i\) by solving the system of coupled Sylvester matrix equations:

$$\begin{aligned} \begin{aligned} \left( (W+E_i)X_i + Y_i(W+E_i) \right) ^u&={-E_{i}^u},\\ \left( (\widetilde{W}+\widetilde{E}_i)X_i + Y_i(\widetilde{W}+\widetilde{E}_i) \right) ^u&={ -\widetilde{E}_{i}^u}. \end{aligned} \end{aligned}$$
(3.8)

Using the solution \(X_i\) and \(Y_i\) to the system (3.8) we compute

$$\begin{aligned} W+E_{i+1}&:= (I+ Y_i)(W+E_i)(I+ X_i),\\ \widetilde{W}+\widetilde{E}_{i+1}&:= (I+ Y_i)(\widetilde{W}+\widetilde{E}_i)(I+ X_i), \end{aligned}$$

or equivalently,

$$\begin{aligned} E_{i+1}&:=E_{i}^s + \left( E_{i}^u + (W+E_i)X_i + Y_i(W+E_i) \right) + Y_i(W+E_i)X_i,\\ \widetilde{E}_{i+1}&:= \widetilde{E}_{i}^s + \left( \widetilde{E}_{i}^u + (\widetilde{W}+\widetilde{E}_i)X_i + Y_i(\widetilde{W}+\widetilde{E}_i) \right) + Y_i(\widetilde{W}+\widetilde{E}_i)X_i. \end{aligned}$$

Since \(X_i\) and \(Y_i\) are a solution to (3.8) we have

$$\begin{aligned} E_{i+1}&= E_{i}^s + \left( (W+E_i)X_i + Y_i(W+E_i) \right) ^s + {\left( E_i^uX_i + Y_iE_i^u\right) ^u +}Y_i(W+E_i)X_i,\\ \widetilde{E}_{i+1}&= \widetilde{E}_{i}^s + \left( (\widetilde{W}+\widetilde{E}_i)X_i + Y_i(\widetilde{W}+\widetilde{E}_i) \right) ^s +{\left( \widetilde{E}_i^uX_i + Y_i\widetilde{E}_i^u\right) ^u +}Y_i(\widetilde{W}+\widetilde{E}_i)X_i. \end{aligned}$$

Splitting the perturbation into the structured and unstructured parts we obtain

$$\begin{aligned} E_{i+1}^s&= E_{i}^s + \left( (W+E_i)X_i + Y_i(W+E_i) \right) ^s + \left( Y_i(W+E_i){X_i}\right) ^s,\\ \widetilde{E}_{i+1}^s&= \widetilde{E}_{i}^s + \left( (\widetilde{W}+\widetilde{E}_i)X_i + Y_i(\widetilde{W}+\widetilde{E}_i) \right) ^s + \left( Y_i(\widetilde{W}+\widetilde{E}_i)X_i\right) ^s, \\ E_{i+1}^u&= {\left( E_i^uX_i + Y_iE_i^u\right) ^u +} \left( Y_i(W+E_i)X_i\right) ^u,\\ \widetilde{E}_{i+1}^u&= {\left( \widetilde{E}_i^uX_i + Y_i\widetilde{E}_i^u\right) ^u +} \left( Y_i(\widetilde{W}+\widetilde{E}_i)X_i\right) ^u. \end{aligned}$$

In general, \(E_{i+1}^u\) and \(\widetilde{E}_{i+1}^u\) are not zero matrices but we show that they tend to zero (entrywise) when \(i \rightarrow \infty \). In (3.9)–(3.12) below \(T_i\) is the Kronecker product matrix defined in (3.7) and associated with the system of coupled Sylvester equations (3.8). Using the bound (3.5) we have:

$$\begin{aligned}&\Vert \left( E_i^uX_i + Y_iE_i^u\right) ^u \Vert \le (\Vert X_i\Vert + \Vert Y_i\Vert )\cdot \Vert E_i^u\Vert \le \left( 2\left( \Vert X_i\Vert ^2 + \Vert Y_i\Vert ^2 \right) \right) ^{\frac{1}{2}}\cdot \Vert E_i^u\Vert \nonumber \\&\quad \le \frac{\sqrt{2}\kappa (T_i)}{\left( n+m\right) ^{\frac{1}{2}} \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) ^{\frac{1}{2}}} \ \Vert {{\mathscr {E}}}_i^u\Vert ^2, \end{aligned}$$
(3.9)

respectively, using the bound (3.3) we have:

$$\begin{aligned}&\Vert \left( Y_i(W+E_i)X_i\right) ^u\Vert \le \Vert Y_i(W+E_i)X_i\Vert \le \Vert X_i\Vert \cdot \Vert Y_i\Vert \cdot \Vert W+E_i\Vert \nonumber \\&\quad \le \frac{\kappa (T_i)^2 \Vert W+E_i\Vert }{2\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) } \ \Vert {{\mathscr {E}}}_i^u\Vert ^2. \end{aligned}$$
(3.10)

Combining (3.9) and (3.10) we obtain the following bound on the Frobenious norm of \(\Vert E_{i+1}^u\Vert \):

$$\begin{aligned} \Vert E_{i+1}^u\Vert\le & {} \Vert \left( E_i^uX_i + Y_iE_i^u\right) ^u + \left( Y_i(W+E_i)X_i\right) ^u\Vert \nonumber \\\le & {} \Vert \left( E_i^uX_i + Y_iE_i^u\right) ^u \Vert + \Vert \left( Y_i(W+E_i)X_i\right) ^u\Vert \nonumber \\\le & {} \frac{2^{\frac{3}{2}} \kappa (T_i) \left( n+m\right) ^{\frac{1}{2}} \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) ^{\frac{1}{2}} + \kappa (T_i)^2 \Vert W+E_i\Vert }{2\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) } \ \Vert {{\mathscr {E}}}_i^u\Vert ^2,\nonumber \\ \end{aligned}$$
(3.11)

similarly, for the matrix \(\Vert \widetilde{E}_{i+1}^u\Vert \),

$$\begin{aligned} \Vert \widetilde{E}_{i+1}^u\Vert\le & {} \left\| {\left( \widetilde{E}_i^uX_i + Y_i\widetilde{E}_i^u\right) ^u +} \left( Y_i(\widetilde{W}+\widetilde{E}_i)X_i\right) ^u \right\| \le \Vert { \widetilde{E}_i^uX_i + Y_i\widetilde{E}_i^u +} Y_i(\widetilde{W}+\widetilde{E}_i)X_i\Vert \nonumber \\\le & {} {(\Vert X_i\Vert + \Vert Y_i\Vert )\cdot \Vert \widetilde{E}_i^u\Vert +} \Vert X_i \Vert \cdot \Vert Y_i\Vert \cdot \Vert \widetilde{W}+\widetilde{E}_i\Vert \nonumber \\\le & {} \frac{ { 2^{\frac{3}{2}}\kappa (T_i) \left( n+m\right) ^{\frac{1}{2}} \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) ^{\frac{1}{2}} +}\kappa (T_i)^2 \Vert \widetilde{W}+\widetilde{E}_i\Vert }{2\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) } \ \Vert {{\mathscr {E}}}_i^u\Vert ^2.\nonumber \\ \end{aligned}$$
(3.12)

Using \(\alpha \), defined in (3.6), we can write the bounds on the unstructured part of the perturbation for both coefficients of the matrix pencil at the step \(i+1\) as follows

$$\begin{aligned} \Vert E_{i+1}^u\Vert \le \frac{\alpha }{\sqrt{2}} \Vert {{\mathscr {E}}}^u_i\Vert ^2 \quad \text { and } \quad \Vert \widetilde{E}_{i+1}^u\Vert \le \frac{\alpha }{\sqrt{2}} \Vert {{\mathscr {E}}}^u_i\Vert ^2. \end{aligned}$$
(3.13)

Note that the supremum in the definition of \(\alpha \) is finite since \(\kappa (T_i)\) and the norms of the corresponding matrices are finite. This results into the bound for the whole pencil:

$$\begin{aligned} \Vert {{\mathscr {E}}}_{i+1}^u\Vert = \left( \Vert E_{i+1}^u\Vert ^2 + \Vert \widetilde{E}_{i+1}^u\Vert ^2\right) ^{\frac{1}{2}}&\le \alpha \Vert {{\mathscr {E}}}_i^u\Vert ^2. \end{aligned}$$
(3.14)

Using the bounds (3.13) and (3.14) at each step we get

$$\begin{aligned} \max \left\{ \Vert E_{k}^u\Vert , \Vert \widetilde{E}_{k}^u\Vert \right\} \le \left( \frac{\alpha }{\sqrt{2}}\right) ^{2^{k-1}-1} \Vert {{\mathscr {E}}}_{1}^u\Vert ^{2^{k-1}} \text { and } \Vert {{\mathscr {E}}}_{k}^u\Vert \le \alpha ^{2^{k-1}-1} \Vert {{\mathscr {E}}}_{1}^u\Vert ^{2^{k-1}}.\nonumber \\ \end{aligned}$$
(3.15)

If \(\alpha \Vert {{\mathscr {E}}}_{1}\Vert < 1\) then the norm of the unstructured part of the perturbation tends to zero as the number of iterations grows. \(\square \)

Remark 3.1

In our case we should exclude some rows from (3.7), since we want to eliminate only the unstructured part of the perturbation \({{\mathscr {E}}}_i\). Therefore, the norm of the solution of such least-squares problem will be less than or equal to \(\Vert x\Vert \), where \(x= T^{\dagger } b\). Clearly, the bounds from Lemma 3.1 remain valid.

The sharpness of the bounds (3.15) depends on the value of \(\alpha \) and on the size of the initial perturbation: the better conditioned the problem is and the smaller initial perturbation is, the better the bounds (3.15) are. Even if the problem is ill-conditioned we can still guarantee the convergence for small enough perturbations. Note that a proper scaling of a matrix polynomial improves the conditioning of the problem, see e.g., [15]. Moreover, in practice, Algorithm 3.1 converges to a structured perturbation very well and requires only a small number of iterations, see the numerical experiments in Sect. 4. The numerical experiments also suggest that we have the quadratic order of convergence, and this can indeed be verified using (3.15).

3.2 Bound on the norm of the structured perturbation

In this section we find a bound on the resulting structured perturbation. Similarly to the analysis in Sect. 3.1 we have a dependence on the conditioning of the problem as well as on the norm of an original perturbation. Therefore, we need to make an assumption that these quantities are small enough.

Theorem 3.2

Let \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{1}\) be a perturbation of the linearization \({{\mathscr {C}}}^1_{P(\lambda )}\), \(\Vert {{\mathscr {E}}}_1\Vert = \varepsilon \), and \(\alpha \varepsilon < 1\), where \(\alpha = \alpha ({{\mathscr {C}}}^1_{P(\lambda )},{{\mathscr {E}}}_{1})\) is defined in (3.6). Define also \( \beta := \ \sup _{i} \ \sqrt{\frac{2}{\left( n+m\right) }} \kappa (T_i)\) and

$$\begin{aligned}&\gamma := \gamma \left( {{\mathscr {C}}}^1_{P(\lambda )}, {{\mathscr {E}}}_{1}\right) = \sup _{i} \left\{ \frac{\kappa (T_i)^2 \Vert W+E_i\Vert }{\sqrt{2}\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) },\right. \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left. \frac{\kappa (T_i)^2 \Vert \widetilde{W}+\widetilde{E}_i\Vert }{\sqrt{2}\left( n+m\right) \left( \Vert W+ E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{ E}_i\Vert ^2\right) } \right\} ,\nonumber \\ \end{aligned}$$
(3.16)

with the pencils \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{i} = \lambda (W+E_i) + (\widetilde{W} + \widetilde{E_i})\) from Algorithm 3.1, and the Kronecker product matrix \(T_i\) in (3.7). Then \(\Vert {{\mathscr {E}}}^s\Vert \le \varepsilon (1+ \beta ) / (1 - \gamma \varepsilon )\).

Proof

For the input \({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1\), following Algorithm 3.1 step by step, we can build the resulting perturbation as explained in (3.1). Skipping writing the eliminated unstructured parts of (3.1) we have:

$$\begin{aligned} {{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}^s= & {} {{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1^s \\&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1)X_1 + Y_1({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1) \right) ^s + \left( {Y_1}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1){X_1}\right) ^s\\&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2)X_2 + Y_2({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2) \right) ^s + \left( {Y_2}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2){X_2}\right) ^s + \ldots \\&+ \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i)X_i + Y_i({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i) \right) ^s + \left( {Y_i}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_{i}){X_i}\right) ^s + \dots \end{aligned}$$

We start by evaluating the structured part of the perturbation coming from the coupled Sylvester equations using the bounds (3.5):

$$\begin{aligned}&\left\| \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i)X_i + Y_i({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i) \right) ^s\right\| \le { \Vert {{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i\Vert \left( \Vert X_i\Vert + \Vert Y_i\Vert \right) }\nonumber \\&\quad {\le \Vert {{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i\Vert \sqrt{2(\Vert X_i\Vert ^2 + \Vert Y_i\Vert ^2)} \le \sqrt{\frac{2\kappa (T_i)^2 \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) \Vert {{\mathscr {E}}}_i^u \Vert ^2}{\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) }} }\nonumber \\&\quad { = \sqrt{\frac{2\kappa (T_i)^2 \Vert {{\mathscr {E}}}_i^u \Vert ^2}{\left( n+m\right) }} } = \frac{\sqrt{2}\kappa (T_i) }{\sqrt{\left( n+m\right) }} \ \Vert {{\mathscr {E}}}_i^u \Vert \le \ \beta \ \Vert {{\mathscr {E}}}_i^u \Vert . \end{aligned}$$
(3.17)

Recall that \(\kappa (T_i)\) and the norms of the corresponding perturbations are finite and thus the supremum in the definition of \(\beta \) is finite. We use the bounds in (3.10), see also (3.12), for \(\left\| \left( {Y_i}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_{i}){X_i}\right) ^s\right\| \) since \(\left\| \left( {Y_i}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_{i}){X_i}\right) ^s\right\| \le \Vert {Y_i}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_{i}){X_i}\Vert \). Thus we can evaluate the norm of \(\Vert {{\mathscr {E}}}^s_i\Vert \) using (3.15) and (3.17) as well as noting that \(\Vert {{\mathscr {E}}}^s_i\Vert \) and \(\Vert {{\mathscr {E}}}^u_i\Vert \) are less than or equal to \(\Vert {{\mathscr {E}}}_i\Vert \):

$$\begin{aligned} \Vert {{\mathscr {E}}}^s\Vert&{\le }&\Vert {{\mathscr {E}}}_1^s\Vert + \left\| \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1)X_1 + Y_1({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1) \right) ^s\right\| + \left\| \left( {Y_1}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_1){X_1}\right) ^s\right\| \nonumber \\&+ \left\| \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2)X_2 + Y_2({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2) \right) ^s\right\| + \left\| \left( {Y_2}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_2){X_2}\right) ^s\right\| +\ldots \nonumber \\&+ \left\| \left( ({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i)X_i + Y_i({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_i) \right) ^s\right\| + \left\| \left( {Y_i}({{\mathscr {C}}}^1_{P(\lambda )}+{{\mathscr {E}}}_{i}){X_i}\right) ^s\right\| + \dots \nonumber \\&{ \le }&\, \varepsilon + {\beta } \Vert {{\mathscr {E}}}_1^u\Vert + \gamma \Vert {{\mathscr {E}}}_1^u\Vert ^2 + {\beta } \Vert {{\mathscr {E}}}_2^u\Vert + \gamma \Vert {{\mathscr {E}}}_2^u\Vert ^2 + {\beta } \Vert {{\mathscr {E}}}_3^u\Vert + \gamma \Vert {{\mathscr {E}}}_3^u\Vert ^2 + \ldots \nonumber \\&+ {\beta } \Vert {{\mathscr {E}}}_i^u\Vert + \gamma \Vert {{\mathscr {E}}}_i^u\Vert ^2 + \dots \nonumber \\= & {} \varepsilon + {\beta } \varepsilon + \gamma \varepsilon ^2 + {\beta } \gamma \varepsilon ^2 + \gamma ^3 \varepsilon ^4 + {\beta } \gamma ^3 \varepsilon ^4 + \gamma ^7 \varepsilon ^8 + \ldots \nonumber \\&+ {\beta } \gamma ^{2^{i-1}-1} \varepsilon ^{2^{i-1}} + \gamma ^{2^{i}-1}\varepsilon ^{2^{i}} + \dots \nonumber \\= & {} \varepsilon (1 + {\beta }) \left( 1 + \gamma \varepsilon + (\gamma \varepsilon )^3 + \ldots + (\gamma \varepsilon )^{2^{i-1}-1} + \dots \right) \nonumber \\= & {} \varepsilon (1 + {\beta }) \left( \sum _{i=0}^{\infty }(\gamma \varepsilon )^{2^i-1} \right) \le \frac{ \varepsilon (1 + {\beta }) }{1 - \gamma \varepsilon }. \end{aligned}$$
(3.18)

In (3.18) we used that \(\gamma \varepsilon <1\) which is true since by the assumption of the theorem \(\alpha \varepsilon < 1\), and \(\gamma < \alpha \). \(\square \)

The bound in Theorem 3.2 is not very tight if \(\gamma \varepsilon \) is close to 1 but it is better for small \(\gamma \varepsilon \). For example, Theorem 3.2 says that for \(\gamma \varepsilon < 1/n\) we get \(\Vert {{\mathscr {E}}}^s\Vert \le n \varepsilon (1 + {\beta }) / (n-1)\), and in particular, for \(\gamma \varepsilon <~1/2\) we get \(\Vert {{\mathscr {E}}}^s\Vert \le 2 (1 + {\beta }) \varepsilon \).

3.3 Construction of the transformation matrices

In this section we investigate the transformation matrices that bring a full perturbation of the linearization to a structured perturbation of the linearization. Following Algorithm 3.1, we observe that the transformation matrices are constructed as the following infinite products:

$$\begin{aligned} U= & {} \lim _{i \rightarrow \infty } {(I_{m}+Y_i) \cdots (I_{m}+Y_2)(I_{m}+Y_1)} \text { and }\\ V= & {} \lim _{i \rightarrow \infty } {(I_{n}+X_1)(I_{n}+X_2) \cdots (I_{n}+X_i)}. \end{aligned}$$

The convergence of these infinite products to nonsingular matrices is proven in Theorem 3.3. Note that, for small initial perturbations, such transformation matrices are small perturbations of the identity matrices.

Theorem 3.3

Let \({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_{1}\) be a perturbation of the linearization \({{\mathscr {C}}}^1_{P(\lambda )}\), and \(\alpha \Vert {{\mathscr {E}}}_1\Vert <~1\), where \(\alpha = \alpha ({{\mathscr {C}}}^1_{P(\lambda )},{{\mathscr {E}}}_{1})\) is defined in (3.6). Let also \(X_i\) and \(Y_i\) be a solution to (3.8) for the corresponding index i, and \(I_m\) and \(I_n\) be the \(m \times m\) and \(n \times n\) identity matrices. Then

$$\begin{aligned} \lim _{i \rightarrow \infty } {(I_{m}+Y_i) \cdots (I_{m}+Y_2)(I_{m}+Y_1)} \text { and } \lim _{i \rightarrow \infty } {(I_{n}+X_1)(I_{n}+X_2) \cdots (I_{n}+X_i)}\nonumber \\ \end{aligned}$$
(3.19)

exist and are nonsingular matrices.

Proof

By [31, Theorem 4] the limits in (3.19) exist and are nonsingular matrices if the sums

$$\begin{aligned} \Vert X_1\Vert +\Vert X_2\Vert +\Vert X_3\Vert +\cdots= & {} \sum _{i=1}^{\infty } \Vert X_i\Vert \quad \text { and } \quad \Vert Y_1\Vert +\Vert Y_2\Vert +\Vert Y_3\Vert +\cdots \nonumber \\= & {} \sum _{i=1}^{\infty } \Vert Y_i\Vert , \end{aligned}$$
(3.20)

respectively, absolutely converge.

Using the bound (3.5) for a solution of coupled Sylvester equations and noting that \(\Vert X\Vert ^2 \le \Vert X\Vert ^2 + \Vert Y\Vert ^2\), we have the following bound for both \(\Vert X_{i}\Vert ^2\) and \(\Vert Y_{i}\Vert ^2\):

$$\begin{aligned} \begin{aligned} {\Vert X_{i}\Vert ^2}&\le \frac{\kappa (T_i)^2 }{\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) } \Vert {{\mathscr {E}}}^u_i\Vert ^2 \le \alpha \Vert {{\mathscr {E}}}^u_i\Vert ^2 \le \alpha ^{2^{i-1}-1} \Vert {{\mathscr {E}}}_1^u\Vert ^{2^{i-1}}, \\ {\Vert Y_{i}\Vert ^2}&\le \frac{\kappa (T_i)^2 }{\left( n+m\right) \left( \Vert W+E_i\Vert ^2 + \Vert \widetilde{W}+\widetilde{E}_i\Vert ^2\right) } \Vert {{\mathscr {E}}}^u_i\Vert ^2 \le \alpha \Vert {{\mathscr {E}}}^u_i\Vert ^2 \le \alpha ^{2^{i-1}-1} \Vert {{\mathscr {E}}}_1^u\Vert ^{2^{i-1}}.\nonumber \end{aligned}\\ \end{aligned}$$
(3.21)

Bounds (3.21) together with our assumption \(\alpha \Vert {{\mathscr {E}}}_1\Vert < 1 \ (\Vert {{\mathscr {E}}}_1^u\Vert \le \Vert {{\mathscr {E}}}_1\Vert )\) allow us to conclude that the sums in (3.20) absolutely converge. \(\square \)

4 Numerical experiments

All the numerical experiments are performed on a MacBook Pro (processor: 2.6 GHz Intel Core i7, memory: 32 GB 2400 MHz DDR4), using Matlab R2019a (64-bit). We consider a large number of randomly generated matrix polynomials, matrix polynomials coming from real world applications, and matrix polynomials crafted specially for testing the limits of the proposed algorithm.

Example 4.1

Consider 1000 random polynomials of size \(8 \times 8\) and degree 4. The entries of the matrix coefficients of these polynomials are generated from the normal distribution with mean \(\mu = 0\) and standard deviation \(\sigma =10\) (variance \(\sigma ^2=100\)). The polynomials are normalized to have the Frobenius norm equal to 1. Each polynomial is perturbed by adding a matrix polynomial whose matrix coefficients have entries that are uniformly distributed numbers on the interval (0, 0.01). At most 6 iterations are needed for the norm of the unstructured part of a perturbation to be of order \(10^{-16}\) (\(10^{-16}\) is the tolerance we require). In Fig. 1 we present the results in whisker plots (box plots).

Fig. 1
figure 1

The whisker plots illustrates the elimination of the unstructured part of the perturbation \(\Vert {{\mathscr {E}}}^u\Vert \) for 1000 perturbed random polynomials of size \(8 \times 8\) and degree 4. In a the Frobenius norms of the unstructured parts of the perturbations are plotted on the vertical axis and the iterations are on the horizontal axis. In b the same data is presented in whisker plot with a logarithmic scale on the vertical axis

In the following two examples we consider two quadratic matrix polynomials coming from applications. Both matrix polynomials belong to the NLEVP collection [3].

Example 4.2

Consider the \(5 \times 5\) quadratic matrix polynomial \(Q(\lambda ) = \lambda ^2 M + \lambda D + K\) arising from modelling a two-dimensional three-link mobile manipulator [3]. The \(5 \times 5\) coefficient matrices are

$$\begin{aligned} M=\begin{bmatrix}M_0&{}\quad 0\\ 0&{}\quad 0\\ \end{bmatrix}, \ D=\begin{bmatrix}D_0&{}\quad 0\\ 0&{}\quad 0\\ \end{bmatrix}, \text { and } K=\begin{bmatrix}K_0&{}\quad {-F^T_0}\\ {F_0}&{}\quad 0 \end{bmatrix}, \end{aligned}$$

with

$$\begin{aligned} M_0= & {} \begin{bmatrix}18.7532&{}\quad -7.94493 &{}\quad 7.94494\\ -7.94493 &{}\quad 31.8182 &{}\quad -26.8182\\ 7.94494 &{}\quad -26.8182 &{}\quad 26.8182\\ \end{bmatrix},\quad D_0=\begin{bmatrix}-1.52143&{}\quad -1.55168&{}\quad 1.55168\\ 3.22064 &{}\quad 3.28467 &{}\quad -3.28467\\ -3.22064 &{}\quad -3.28467 &{}\quad 3.28467\\ \end{bmatrix}, \\ K_0= & {} \begin{bmatrix}67.4894 &{}\quad 69.2393 &{}\quad -69.2393\\ 69.8124 &{}\quad 1.68624&{}\quad -1.68617\\ -69.8123&{}\quad -1.68617&{}\quad -68.2707 \\ \end{bmatrix},\quad F_0=\begin{bmatrix}1&{}\quad 0&{}\quad 0\\ 0&{}\quad 0&{}\quad 1 \end{bmatrix}. \end{aligned}$$

In Fig. 2 we present the decay of the norm of the unstructured part of the perturbation in Algorithm 3.1. The changes in the norm of the structured part of the perturbation and in the norms of the transformation matrices are presented in Figs. 3 and 4 , respectively.

Fig. 2
figure 2

The norm of the unstructured part of the perturbation \(\Vert {{\mathscr {E}}}^u\Vert \) of \(5 \times 5\) quadratic matrix polynomial \(Q(\lambda ) = \lambda ^2 M + \lambda D + K\) arising from modelling a two-dimensional three-link mobile manipulator (vertical axis) at each iteration (horizontal axis) is plotted in (a). The same data but with a logarithmic scale on the vertical axis is plotted in (b)

Example 4.3

Consider a \(21 \times 16\) quadratic matrix polynomial arising from calibration of a surveillance camera using a human body as a calibration target [3, 27]. Note that the polynomial is rectangular. In Fig. 5 we present the decay of the norm of the unstructured part of the perturbation. The changes in the norm of the structured part of the perturbation and in the norms of the transformation matrices are presented in Figs. 6 and 7, respectively.

Fig. 3
figure 3

The norm of the structured part of the perturbation \(\Vert {{\mathscr {E}}}^s\Vert \) (vertical axis) at each iteration (horizontal axis): a when we do not normalize the original matrix polynomial; b when we normalize the original matrix polynomial (notably, in b we have that \(\alpha \Vert {{\mathscr {E}}}_1\Vert = 0.29 < 1\) thus Theorem 3.2 is applicable and gives us \(\Vert {{\mathscr {E}}}^s\Vert \le ~2.27\))

Fig. 4
figure 4

The 2-norm of the transformation matrices U and V (vertical axis) at each iteration (horizontal axis) are plotted in (a) and (b), respectively. Recall that \(U \cdot ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_1) \cdot V = {{\mathscr {C}}}^1_{P(\lambda )+ E(\lambda )}\). Note that, \(\Vert U\Vert _2\) and \(\Vert V\Vert _2\) are close to 1 (\(\Vert I\Vert _2=1\))

Fig. 5
figure 5

The norm of the unstructured part of the perturbation \(\Vert {{\mathscr {E}}}^u\Vert \) of the \(21 \times 16\) quadratic matrix polynomial arising from calibration of a surveillance camera (vertical axis) at each iteration (horizontal axis) is plotted in (a). The same convergence data but with a logarithmic scale on the y-axis is plotted in (b)

Fig. 6
figure 6

The norm of the structured part of a perturbation (vertical axis) at each iteration (horizontal axis): a when we do not normalize the original matrix polynomial; b when we normalize the original matrix polynomial (notably, in b we have that \(\alpha \Vert {{\mathscr {E}}}_1\Vert = 0.37 < 1\) thus Theorem 3.2 is applicable and gives us \(\Vert {{\mathscr {E}}}^s\Vert \le 7.7\))

Fig. 7
figure 7

The 2-norm of the transformation matrices U and V (vertical axis) at each iteration (horizontal axis) are plotted in (a) and (b), respectively. Recall that \( U \cdot ({{\mathscr {C}}}^1_{P(\lambda )} + {{\mathscr {E}}}_1) \cdot V = {{\mathscr {C}}}^1_{P(\lambda )+ E(\lambda )}\)

In the following example we tune the conditioning of the problem and the value of the initial perturbation to test the limits of Algorithm 3.1.

Example 4.4

Consider the \(5 \times 5\) quadratic matrix polynomial from Example 4.2. We scale the matrix coefficients of this polynomial and increase the initial perturbation to achieve the following goals: (a) making the structured perturbation much larger compared to the initial perturbation and (b) forcing Algorithm 3.1 to diverge. Notably, if (a) is achieved, i.e. the limit perturbation is much larger than the original one, then we may still have the convergence. We summarize the results of our experiment in Table 1. In the last column we underline the word “yes” if the convergence is also provable by Theorem 3.1, e.g., in the first and the second experiments the values of \(\alpha \Vert {{\mathscr {E}}}_1\Vert \) are 7.75e-05 and 0.98, respectively; in experiments 3,5,6, and 8 the value of \(\alpha \Vert {{\mathscr {E}}}_1\Vert \) is larger than 1 thus Theorem 3.1 is not applicable, nevertheless we still observe convergence numerically. In experiments 9 and 10 we do not have even the numerical convergence and of course the value of \(\alpha \Vert {{\mathscr {E}}}_1\Vert \) is larger than 1.

Table 1 We show how the choice of the scalars \(\alpha _i, i=0,1,2\) in the matrix polynomial \(Q(\lambda ) = \alpha _2 A_2 \lambda ^2 + \alpha _1 A_1 \lambda + \alpha _0 A_0\) and the initial perturbation \({{\mathscr {E}}}_1\) change the norm of the resulting structured perturbation \({{\mathscr {E}}}^s\) and the convergence of the algorithm

5 Future work

The method developed in this paper can be directly generalized to the other linearizations, e.g., Fiedler linearizations [1, 6, 14] or even block-Kronecker linearizations [15]. Such a generalization may also cover structure-preserving linearizations, see e.g., [8]. The existence of structured perturbations for these broader classes of linearizations follows, e.g., from [8, 14, 15]. Such a generalization will require solving the corresponding structured coupled Sylvester equations, or at least the corresponding structured least-squares problem.

Relaxing the conditions in Theorems 3.1 and 3.2, to have provable convergence and bounds for a broader class of examples, is another possible line of future research.