Abstract
We analyze the global convergence of the power iterates for the computation of a general mixed-subordinate matrix norm. We prove a new global convergence theorem for a class of entrywise nonnegative matrices that generalizes and improves a well-known results for mixed-subordinate \(\ell ^p\) matrix norms. In particular, exploiting the Birkoff–Hopf contraction ratio of nonnegative matrices, we obtain novel and explicit global convergence guarantees for a range of matrix norms whose computation has been recently proven to be NP-hard in the general case, including the case of mixed-subordinate norms induced by the vector norms made by the sum of different \(\ell ^p\)-norms of subsets of entries.
Similar content being viewed by others
1 Introduction
Let A be an \(m\times n\) matrix and consider the matrix norm
where \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) are vector norms.
Computing \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is a classical problem in computational mathematics, as norms of this kind arise naturally in many situations, such as approximation theory, estimation of matrix condition numbers and approximation of relative residuals [26]. However, attention around the problem of computing \(\Vert A\Vert _{\beta \rightarrow \alpha }\) has been growing in recent years. In fact, for example, matrix norms of this type can be used in combinatorial optimization and sparse data recovery, to approximate generalized Grothendieck and restricted isometry constants [1, 6, 16, 30], in scientific computing, to estimate the largest entries of large matrices [27], in data mining and learning theory, to minimize empirical risks or obtain robust nonnegative graph embeddings [9, 41], or in quantum information theory and the study of Khot’s unique game conjecture where the computational complexity of evaluating \(\Vert A \Vert _{\beta \rightarrow \alpha }\) plays an important role [2]. Moreover, it was observed by Lim [33] that the notion of tensor norm and tensor spectrum relates to \(\Vert A\Vert _{\beta \rightarrow \alpha }\) in a very natural way and thus relevant advances on the problem of computing \(\Vert A\Vert _{\beta \rightarrow \alpha }\) when A is entrywise nonnegative and \(\Vert \cdot \Vert _\alpha \), \(\Vert \cdot \Vert _{\beta }\) are \(\ell ^p\) norms have been recently obtained as a consequence of a number of new nonlinear Perron–Frobenius-type theorems for higher-order maps [15, 19, 21, 22].
Closed form solutions and efficient algorithms are known for some special \(\ell ^p\) norms, as for instance the case where \(\Vert \cdot \Vert _\alpha =\Vert \cdot \Vert _\beta \) and they coincide with either the \(\ell ^1\), the \(\ell ^2\), or the \(\ell ^\infty \) norm, or the case where \(p\le 1\le q\) and \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) are \(\ell ^p\) and \(\ell ^q\) (semi) norms, respectively (c.f. [10, 32, 36]). However, the computation of \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is generally NP-hard [23, 38].
The best known method for the computation of \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is the (nonlinear) power method, essentially introduced by Boyd [4] and then further analyzed and extended for instance in [3, 15, 25, 39]. When the considered vector norms are \(\ell ^p\) norms, the power method can count on a very fundamental global convergence result which ensures convergence to the matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\) for a class of entry-wise nonnegative matrices A and for a range of \(\ell ^p\) norms. We discuss in detail the method and its convergence in Sect. 2.
The convergence of the method is a consequence of an elegant fixed point argument that involves a nonlinear operator \({\mathcal {S}}_A\) and its Lipschitz contraction constant. However, the convergence analysis of this method has two main uncovered points: on the one hand, all the work done so far addresses only the case of \(\ell ^p\) norms whereas almost nothing is known about the global convergence behavior of the power iterates for more general norms. On the other hand, even for the case of \(\ell ^p\) norms, known upper-bounds on the contraction constant of \({\mathcal {S}}_A\) are not sharp, especially for positive matrices. In this work we provide novel results that address and improve both these directions.
Consider for example the case where \(\Vert \cdot \Vert _\alpha \) is defined as
where k is a positive integer not larger than the dimension of x and \(\Vert \cdot \Vert _{p_i}\) are \(\ell ^p\) norms. Of course one can extend this idea by looking at any family of subsets of entries of x and any set of \(\ell ^p\) norms, in order to generate arbitrarily new norms. Norms of this form are natural modifications of \(\ell ^p\) norms and are used for instance to define the generalized Grothendieck constants as in [30] or in graph matching problems to build continuous relaxation of the set of matrix permutations [11, 34]. However, even for this case, extending the result of Boyd is not straightforward.
In this work we consider general pairs of monotonic and differentiable vector norms and provide a thorough convergence analysis of the power method for the computation of the corresponding induced matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\). Our result is based on a novel nonlinear Perron–Frobenius theorem for this kind of norms and ensures global convergence of the power method provided that the Birkhoff contraction ratio of the power iterator is smaller than one.
When applied to the case \(\Vert A\Vert _{q\rightarrow p}\) of \(\ell ^p\) norms, our result does not only imply the current convergence result, but actually significantly improves the range of values of p and q for which global convergence can be ensured. This is particularly interesting from a complexity viewpoint. In fact, for example, although the computation of \(\Vert A\Vert _{q\rightarrow p}\) is well known to be NP-hard for \(p>q\), we show that for a non-trivial class of nonnegative matrices the power method converges to \(\Vert A\Vert _{q\rightarrow p}\) in polynomial time even for p sensibly larger than q. To our knowledge this is the first global optimality result for this problem that does not require the condition \(p\le q\).
In the general case \(\Vert A\Vert _{\beta \rightarrow \alpha }\), a main computational drawback of the power method is related with the computation of the dual norm \(\Vert \cdot \Vert _{\beta ^*}\). In fact, if \(\Vert \cdot \Vert _\beta \) is not an \(\ell ^p\) norm, the corresponding dual norm may be challenging to compute [14]. In practice, evaluating \(\Vert \cdot \Vert _{\alpha ^*}\) from \(\Vert \cdot \Vert _{\alpha }\) can be done via convex optimization and Corollary 7 of [14] proves that \(\Vert \cdot \Vert _{\alpha ^*}\) can be evaluated in polynomial time (resp. is NP-hard) if and only if \(\Vert \cdot \Vert _{\alpha }\) can be evaluated in polynomial time (resp. is NP-hard). There are norms for which an explicit expression in terms of arithmetic operations for \(\Vert \cdot \Vert _{\alpha }\) is given by construction (resp. modelisation), but such an expression is not available for the dual \(\Vert \cdot \Vert _{\alpha ^*}\). As we discuss in Sect. 5.1, examples of this type include for instance \(\Vert x \Vert _{\alpha }=(\Vert x \Vert ^2_{p}+\Vert x \Vert ^2_{q})^{1/2}\). A further main result of this work addresses this issue for the particular case of norms of the type (1). For this family of norms we provide an explicit convergence bound and an explicit formula for the power iterator for the computation of the corresponding matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\). To illustrate possible applications of the result, we list in Corollaries 3–8 relatively sophisticated and non-standard matrix norms together with an explicit condition for their computability.
We organize the discussion as follows: In Sect. 2 we review the nonlinear power method and its main convergence properties. In Sect. 3 we review relevant preliminary cone-theoretic results and notation. Then, in Sect. 4, we propose a novel and detailed global convergence analysis of the method based on a Perron–Frobenius type result for the map \(x\mapsto \Vert Ax\Vert _\alpha /\Vert x\Vert _\beta \), in the case of entry-wise nonnegative matrices and monotonic norms \(\Vert \cdot \Vert _\alpha , \Vert \cdot \Vert _\beta \). We derive new conditions for the global convergence to \(\Vert A\Vert _{\beta \rightarrow \alpha }\) that, in particular, help shedding new light on the NP-hardness of the problem, and we propose a new explicit bound on the linear convergence rate of the power iterates. In Sect. 5 we focus on the particular case of norms of the same form as (1). We show how to practically implement the power method for this type of norms, we prove a specific convergence criterion that gives a-priori global convergence guarantees and we discuss the complexity of the method. Finally, in Sect. 6 we illustrate the behaviour of the nonlinear power method on some example matrix norm.
2 Boyd’s Nonlinear Power Method
Let \(\Vert \cdot \Vert _p\), \(\Vert \cdot \Vert _q\) be the usual \(\ell ^p\) and \(\ell ^q\) vector norms and consider the induced matrix norm \(\Vert A\Vert _{q\rightarrow p} = \max _{x\ne 0}\Vert Ax\Vert _p / \Vert x\Vert _q\). A well known explicit formula holds for the \(\ell ^1\) and \(\ell ^\infty \) matrix norms \(\Vert A\Vert _{1\rightarrow 1}\), \(\Vert A\Vert _{\infty \rightarrow \infty }\). However, while the mixed norm \(\Vert A\Vert _{1\rightarrow \infty }\) equals \(\max _{ij}|a_{ij}|\), the computation of \(\Vert A\Vert _{\infty \rightarrow 1}\) is NP-hard [36]. More generally, when p is any rational number \(p\ne 1,2\), computing the norm \(\Vert A\Vert _{p\rightarrow p}\) is NP-hard for a general matrix A [23], and the same holds for any norm \(\Vert A\Vert _{q\rightarrow p}\), for \(1\le p< q \le \infty \) [38]. The best known technique to compute \(\Vert A\Vert _{q\rightarrow p}\) is a form of nonlinear power method that we review in what follows.
Consider the nonnegative function \(f_{A}(x) = \Vert Ax\Vert _p/\Vert x\Vert _q\). The norm \(\Vert A\Vert _{q\rightarrow p}\) is the global maximum of \(f_A\) by analyzing the optimality conditions of \(f_A\), for differentiable \(\ell ^p\)-norms \(\Vert \cdot \Vert _p\) and \(\Vert \cdot \Vert _q\), we note that
where, for \(1<p<\infty \), we denote by \(J_p(x)\) the gradient of the norm \(\nabla \Vert x\Vert _p = J_p(x)=\Vert x\Vert _p^{1-p}\, \Phi _p(x)\), with \(\Phi _p(x)\) entrywise defined as \(\Phi _p(x)_i = |x_i|^{p-2}x_i\). Let \(p^*\) be the dual exponent such that \(1/p+1/p^* =1\). As \(J_{p^*}(J_p(x))=x/\Vert x\Vert _p\) for all \(x\ne 0\) and \(J_{p}(\lambda \, x) = J_p(x)\) for any coefficient \(\lambda >0\), we have that \(\nabla f_A(x)=0\) if and only if \(J_{q^*}(A^TJ_p(Ax))= x/\Vert x\Vert _q\). Thus, x with \(\Vert x\Vert _{q}=1\) is a critical point of \(f_A(x)\) if and only if it is a fixed point of the map \(J_{q^*}(A^T J_{p}(Ax))\). The associated fixed point iteration
defines what we call (nonlinear) power method for \(\Vert A\Vert _{q\rightarrow p}\).
Although, in practice, the method applied to \(\Vert A\Vert _{p\rightarrow p}\) for \(p=1,\infty \) often seems to converge to the global maximum (see e.g. [24]), no guarantees exist for the general case. For differentiable \(\ell ^p\) norms and nonnegative matrices, instead, conditions can be established in order to guarantee that the power iterates always converge to a global maximizer of \(f_A\). The idea is that when the power method is started in the positive orthant then, provided A has an appropriate non-zero pattern, each iterate of the method will stay in this orthant until convergence. Then, a nonlinear Perron–Frobenius type result is proved to guarantee that there exists only one critical point of \(f_A\) in this region and this point is a global maximizer of \(f_A\). While this idea was already known by Perron himself in the Euclidean \(\ell ^2\) case, to our knowledge, the first version of this result for norms different than the Euclidean norm, has been proved by Boyd [4]. However, Boyd did not prove the uniqueness of positive critical points but only that they are global maximizer of \(f_A\) under the assumption that \(A^TA\) is irreducible and \(1<p\le q<\infty \). This work is then revisited by Bhaskara and Vijayaraghavan [3] who proved uniqueness for positive matrices A and \(1<p\le q<\infty \). Independently Friedland, Gaubert and Han proved in [15] similar results for \(1<p\le 2 \le q<\infty \) and any nonnegative A such that the matrix \(\left[ \begin{array}{cc} 0 &{} A \\ A^T &{} 0 \end{array}\right] \) is irreducible. Their result was then extended to \(1<p\le q<\infty \) in [18] under the assumption that \(A^T A\) is irreducible. Finally, all these results have been improved in [22], leading to the following
Theorem 1
(Theorems 3.2 and 3.3, [22]) Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and suppose that \(A^TA\) has at least one positive entry per row. If \(1<p\le q <\infty \), then, every positive critical point of \(f_A\) is a global maximizer. Moreover, if either \(p<q\) or \(A^TA\) is irreducible, then \(f_A\) has a unique positive critical point \(x^+\) and the power sequence (2) converges to \(x^+\) for every positive starting point.
In this work we consider the case of a matrix norm defined in terms of arbitrary vector norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) and we prove Theorem 4 below, which is a new version of Theorem 1, holding for general vector norms, provided that suitable and mild differentiability and monotonicity conditions are satisfied. We stress that Theorems 1 and 4 are not corollaries of each other in the sense that there are cases where exactly one, both or none apply. However, when both apply, then Theorem 4 is more informative. We discuss in detail these discrepancies in Sect. 4.1 and give there examples to illustrate them. In particular, a noticeable difference is that, for positive matrices A, the newly proposed Theorem 4 ensures uniqueness and maximality for choices of \(1<p,q<\infty \) that include the range \(p>q\). This is, to our knowledge, the first global optimality result for this problem that includes such range of values.
The key of our approach is the use of cone geometry techniques and the Birkhoff–Hopf theorem, which we recall below.
3 Cone–theoretic Background
We start by recalling concepts from conic geometry. Let \(\mathbb {R}^n_+\) be the nonnegative orthant in \(\mathbb {R}^n\), that is \(x\in \mathbb {R}^n_+\) if \(x_i\ge 0\) for every \(i=1,\ldots ,n\). The cone \(\mathbb {R}^n_+\) induces a partial ordering on \(\mathbb {R}^n\) as follows: for every \(x,y\in \mathbb {R}^n\) we write \(x\le y\) if \(y-x\in \mathbb {R}^n_+\), i.e. \(x_i\le y_i\) for every i. Furthermore, \(x,y\in \mathbb {R}^n_+\) are comparable, and we write \(x\sim y\), if there exist \(c,C>0\) such that \(cy \le x \le Cy\). Clearly, \(\sim \) is an equivalence relation and the equivalence classes in \(\mathbb {R}^n_+\) are called the parts of \(\mathbb {R}^n_+\). For example, if \(n=2\) and \(x = (1,0)\), then the equivalence class of x in \(\mathbb {R}^2_+\) is given by \(\{(y_1,0): y_1>0\}\).
For simplicity, from now on we will say that a vector is nonnegative (resp. positive) if its entries are nonnegative (resp. positive). The same nomenclature will be used for matrices.
We recall that a norm \(\Vert \cdot \Vert \) on \(\mathbb {R}^n\) is monotonic if for every \(x,y\in \mathbb {R}^n\) such that \(|x|\le |y|\), where the absolute value is taken componentwise, it holds \(\Vert x \Vert \le \Vert y \Vert \) and it is strongly monotonic if for every \(x,y\in \mathbb {R}^n\) with \(|x|\ne |y|\) and \(|x|\le |y|\) it holds \(\Vert x \Vert <\Vert y \Vert \).
One of the key tools for our main result is the Hilbert’s projective metric \(d_H:\mathbb {R}^n_+\times \mathbb {R}^n_+\rightarrow [0,\infty ]\), defined as follows:
where \(M(x/y) = \inf \{C>0 : x \le Cy\}\). We collect in the following lemma some useful properties of \(d_H\). Most of these results are known and can be found in [31]. Moreover, similarly to what is observed in Theorem 3 of [20], we prove a direct relation between the infinity norm and the Hilbert metric, which is useful for deriving explicitly computable convergence rates for the power method.
Lemma 1
For every \(x,y\in \mathbb {R}^n_+\), it holds \(d_H(x,y)=0\) if and only if \(x=\lambda y\) for some \(\lambda >0\) and \(d_H(cx,\widetilde{c}y)=d_H(x,y)\) for every \(c,\widetilde{c}>0\). Moreover, let \(\Vert \cdot \Vert \) be a monotonic norm on \(\mathbb {R}^n\), P a part of \(\mathbb {R}^n_+\) and define \(\mathbb {M}=P\cap \{x\in \mathbb {R}^n_+: \Vert x \Vert =1\}\). Then, \((\mathbb {M},d_H)\) is a complete metric space and
where \(r=\inf \{t>0: x_i\le t\ \forall x\in \mathbb {M}, i=1,\ldots ,n\}\).
Proof
Proposition 2.1.1 in [31] implies that \(d_H(x,y)=0\) if and only if \(x=\lambda y\) and that \((\mathbb {M},d_H)\) is a metric space. The property \(d_H(cx,\widetilde{c}y)=d_H(x,y)\) for every \(c,\widetilde{c}>0\) follows directly from the definition of \(d_H\). The completeness of \((\mathbb {M},d_H)\) is a consequence of Proposition 2.5.4 in [31]. We prove (3). If \(P=\{0\}\), the result is trivial so we assume \(P\ne \{0\}\) and let \(i_1,\ldots ,i_m\) be such that for any \(z\in \mathbb {R}^n_+\), \(z\in P\) if and only if \(z_{i_1},\ldots ,z_{i_m}>0\). Let \(x,y\in \mathbb {M}\), then \(x\le M(x/y)y\) and, by monotonicity of \(\Vert \cdot \Vert \), it follows \(1 = \Vert x \Vert \le M(x/y)\Vert y \Vert = M(x/y).\) Similarly \(M(y/x)\ge 1\), so that \(M(x/y)M(y/x) \ge \max \big \{M(x/y),M(y/x)\big \}.\) It follows that
where \(\overline{x}=\big (\ln (x_{i_1}),\ldots ,\ln (x_{i_m})\big )\) and \(\overline{y}=\big (\ln (y_{i_1}),\ldots ,\ln (y_{i_m})\big )\). By definition of \(r>0\), we have \(\ln (x_{i_j}),\ln (y_{i_j})\in (-\infty ,\ln (r)]\) for every \(j=1,\ldots ,m\). Furthermore, by the mean value theorem, we have
Finally, with \(\widetilde{x}=(x_{i_1},\ldots ,x_{i_m})\) and \(\widetilde{y}=(y_{i_1},\ldots ,y_{i_m})\), we obtain
which concludes the proof. \(\square \)
Observe that if r is defined as in Lemma 1 and \(\Vert \cdot \Vert \) is strongly monotonic, then
Indeed, if \(y\in \mathbb {M}\) is such that there exists \(j\in \{1,\ldots ,n\}\) with \(y_j>{\widetilde{r}}\), then \(1 = \Vert y \Vert >\Vert {\widetilde{r}} e_j \Vert = {\widetilde{r}} \Vert e_j \Vert \), which is not possible.
The proof of our main theorem is based on the Banach contraction principle. Thus, for a map \(F:\mathbb {R}_+^n\rightarrow \mathbb {R}_+^m\) we consider the Birkhoff contraction ratio \(\kappa _H(F)\in [0,\infty ]\) of F, defined as the smallest Lipschitz constant of F with respect to \(d_H\):
Clearly, if there exist \(x,y\in \mathbb {R}^n_+\) such that \(x\sim y\) and \(F(x)\not \sim F(y)\), then \(\kappa _H(F)=\infty \). However, such a situation never happens when F is a linear map in which case \(\kappa _H(F)\le 1\) always holds. Indeed, if \(A\in \mathbb {R}^{m\times n}\) is a nonnegative matrix, \(x,y\in \mathbb {R}^n_+\) and \(x\sim y\), then \(x\le M(x/y)y\) implies \(Ax \le M(x/y)Ay\). Similarly, we have \(Ay \le M(y/x)Ax\) and thus \(Ax\sim Ay\). These inequalities also imply that \(\kappa _H(A)\le 1\). This upper bound is not tight in many cases. However, thanks to the Birkhoff–Hopf theorem, a better estimate of \(\kappa _H(A)\) can be obtained by computing the projective diameter \(\triangle (A)\in [0,\infty ]\) of A, defined as
This is formalized in the following theorem whose proof can be found in Theorems 3.5 and 3.6 of [12].
Theorem 2
(Birkhoff–Hopf) Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries, then
where \(\tanh (t) = (e^{2t}-1)/(e^{2t}+1)\) and with the convention \(\tanh (\infty )=1\).
The above theorem is particularly useful when combined with the following Theorem 6.2 in [12] and Theorem 3.12 in [37]:
Theorem 3
Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and \(e_1,\ldots ,e_n\) the canonical basis of \(\mathbb {R}^n\). If there exists \({\mathcal {I}}\subset \{1,\ldots ,n\}\) such that \(Ae_i \sim Ae_j\) for all \(i,j\in {\mathcal {I}}\) and \(Ae_i=0\) for all \(i\notin {\mathcal {I}}\), then
In particular, if all the entries of A are positive, then \(\triangle (A) = \ln \big (\max _{i,j,k,l}\frac{a_{ki}\, a_{lj}}{a_{kj}\, a_{li}}\big )\) and \(\triangle (A)=\triangle (A^T)\). Moreover, if A has at least one positive entry per row and per column but A is not positive, then \(\triangle (A)=\infty \).
Unfortunately, such simple formulas for the Birkhoff contraction ratio are, to our knowledge, not known for general nonlinear mappings. We refer however to Corollary 2.1 in [35] and Corollary 3.9 in [17] for general characterizations of this ratio.
4 Nonlinear Perron–Frobenius Theorem for \(\Vert A\Vert _{\beta \rightarrow \alpha }\)
Given \(A\in \mathbb {R}^{m\times n}\), consider the matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }=\max _{x \ne 0}{\Vert Ax\Vert _\alpha }/{\Vert x\Vert _\beta }\), where \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) are arbitrary vector norms on \(\mathbb {C}^m\) and \(\mathbb {C}^n\), respectively. Then, as for the case of \(\ell ^p\) norms, consider the function
For an arbitrary possibly non-differentiable vector norm \(\Vert \cdot \Vert \) it holds ([13, e.g.])
where \(\partial \) denotes the subdifferential and \(\Vert \cdot \Vert _*\) is the dual norm of \(\Vert \cdot \Vert \), defined as \(\Vert y\Vert _* = \max _{x\ne 0}\langle x,y\rangle /\Vert x\Vert \). Again, for notational convenience, given the vector norm \(\Vert x\Vert _\alpha \), we introduce the set-valued operator \(J_\alpha \) such that
The definition of dual norm implies the generalized Hölder inequality \(\langle x,y\rangle \le \Vert x\Vert \Vert y\Vert _*\). Thus, for a vector x and a norm \(\Vert \cdot \Vert _\alpha \), the set of vectors \(J_\alpha (x)\) coincides with the set of vectors in the unit sphere of the dual norm of \(\Vert \cdot \Vert _\alpha \), for which equality holds in the Hölder inequality. In fact, the subdifferential of a norm \(J_\alpha \) is strictly related with the duality mapping \({\mathcal {J}}_\alpha \) induced by that norm. Precisely, by Asplund’s theorem (see e.g. [7]), we have that
It is well known that the subgradient of a convex function f is single valued if and only if f is Fréchet differentiable. Therefore \(J_\alpha \) is single valued if and only if \(\Vert \cdot \Vert _\alpha \) is a Fréchet differentiable norm. The assumption that the duality maps involved are single valued will be crucial for our main result. For this reason, throughout we make the following assumptions on the norms we are considering:
Assumption 1
The norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) we consider are such that
-
1.
\(\Vert \cdot \Vert _\alpha \) is Fréchet differentiable.
-
2.
The dual norm \(\Vert \cdot \Vert _{\beta ^*}\) is Fréchet differentiable.
-
3.
Both \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _{\beta *}\) are strongly monotonic.
Remark 1
Recall that every monotonic norm \(\Vert \cdot \Vert \) is also absolute (see e.g. [28, Thm. 1]), that is \(\Vert \,|x|\,\Vert =\Vert x \Vert \) for every x, where |x| denotes the entrywise absolute value. This implies, in particular, that a monotonic norm is Fréchet differentiable at every \(x\in \mathbb {R}^n{\setminus }\{0\}\) if and only if it is Fréchet differentiable at every \(x\in \mathbb {R}^n_+{\setminus }\{0\}\).
Points (1) and (1) of Assumption 1 ensure that the following nonlinear mapping
is single valued. Point (1) ensures that for nonnegative matrices the maximum of \(f_A\) is attained on a nonnegative vector and that if \(A^TA\) is irreducible, then this maximizer has positive entries. Overall, they allow us to prove the following fundamental preliminary Lemmas 2–6.
First, we discuss the critical points of \(f_A\). If \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) satisfy Assumption 1, then \(f_A\) may not be differentiable. Indeed, the differentiability of \(\Vert \cdot \Vert _{\beta ^*}\) does not imply that of \(\Vert \cdot \Vert _{\beta }\) (see for instance [7, Chapter II]). Hence, in the following, we use Clarke’s generalized gradient [8] to discuss the critical points of \(f_A\). In particular, let us recall that, by [8, Prop. 2.2.7], the generalized gradient of a convex function coincides with its subgradient. Moreover, it can be verified that \(f_A\) is locally Lipschitz near every \(x\in \mathbb {R}^n{\setminus }\{0\}\) so that its generalized gradient \(\partial f_A(x)\subset \mathbb {R}^n\) is well defined and x is a critical point of \(f_A\) if \(0\in \partial f_A(x)\). Moreover, if \(f_A\) attains a local minimum or maximum at \(x\ne 0\), then \(0\in \partial f_A(x)\) by [8, Prop. 2.3.2].
Lemma 2
Let \(\Vert \cdot \Vert _\alpha \), \(\Vert \cdot \Vert _\beta \) satisfy Assumption 1 and let \(x\in \mathbb {R}^n_+\) with \(\Vert x\Vert _{\beta }=1\) and \(f_A(x)\ne 0\). If x is a critical point of \(f_A\), then it is a fixed point of \({\mathcal {S}}_A\). Conversely, if x is a fixed point of \({\mathcal {S}}_A\) and \(\Vert \cdot \Vert _{\beta }\) is differentiable, then x is a critical point of \(f_A\).
Proof
First, assume that \(0\in \partial f_A(x)\). As \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) are Lipschitz functions and \(\Vert x \Vert _{\beta }=1\), Proposition 2.3.14 of [8] implies that
\(J_\alpha \) is single valued since \(\Vert \cdot \Vert _{\alpha }\) is differentiable. Hence, \(0\in \partial f_A(x)\) implies that \(f_A(x)^{-1}A^TJ_{\alpha }(Ax)\in J_{\beta }(x)\). Now, as \(\Vert \cdot \Vert _{\beta ^*}\) is differentiable, we have that, for the duality mapping \({\mathcal {J}}_\beta \), it holds \(y \in \mathcal J_\beta (x)\) if and only if \(x = {\mathcal {J}}_{\beta ^*}(y)\) (c.f. [7, Prop. 4.7]). It follows, with (8), that \( \lambda J_{\beta ^*}(A^TJ_\alpha (Ax))= x\) with \(\lambda >0\). Finally, as \(\Vert J_{\beta ^*}(A^TJ_\alpha (Ax)) \Vert _{\beta }=1=\Vert x \Vert _{\beta }\), we have \(\lambda = 1\) which implies that \({\mathcal {S}}_A(x)=x\).
Now, suppose that x is a fixed point of \({\mathcal {S}}_A\). Then, we have \(J_{\beta }({\mathcal {S}}_A(x))=J_{\beta }(x)\). Again, by [7, Prop. 4.7] and (8), we deduce the existence of \(\lambda >0\) such that \( \lambda \,A^TJ_{\alpha }(Ax)\in J_{\beta }(x)\). The definition of \(J_{\beta }\) implies that \(\left\langle x , \lambda \,A^TJ_{\alpha }(Ax) \right\rangle = \Vert x \Vert _{\beta }=1\) and thus \(\lambda ^{-1}=\left\langle Ax , AJ_{\alpha }(Ax) \right\rangle =f_A(x)\). It follows that \(0\in A^TJ_{\alpha }(Ax)-f_A(x)J_{\beta }(x)\). If \(\Vert \cdot \Vert _{\beta }\) is differentiable, then \(f_A\) is differentiable at x and the sets in (10) are equal (and singletons). It follows that \(0\in \partial f_A(x)\). \(\square \)
Lemma 3
Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and P, a part of \(\mathbb {R}^n_+\) such that \(A^TAx\in P\) for every \(x\in P\). Furthermore, let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) satisfy Assumption 1. If \(\kappa _H({\mathcal {S}}_A)\le \tau <1\), then \({\mathcal {S}}_A\) has a unique fixed point z in P and for every positive integer k and every \(x\in P\), it holds
where \(r=\inf \{t>0: x_i\le t\ \forall i=1,\ldots ,n, \ x\in P, \Vert x \Vert _{\beta }=1\}\).
Proof
By assumption \({\mathcal {S}}_A\) is a strict contraction on the metric space \((\mathbb {M},d_H)\) where \(\mathbb {M}=P\cap \{x \in \mathbb {R}^n_+: \Vert x \Vert _{\beta }=1\}\). As \((\mathbb {M},d_H)\) is complete by Lemma 1, it follows from the Banach fixed point theorem (see for instance Theorem 3.1 in [29]) that \({\mathcal {S}}_A\) has a unique fixed point z in \(\mathbb {M}\) and for every \(y\in \mathbb {M}\) it holds
As \({\mathcal {S}}_A(\lambda y)={\mathcal {S}}_A(y)\) and \(d_H(\lambda y, {\mathcal {S}}_A(y))=d_H(y, {\mathcal {S}}_A(y))\) for every \(\lambda >0\), the convergence rate is a direct consequence of the above inequality and Lemma 1. \(\square \)
We remark that this result does not guarantee that the unique fixed point z of \({\mathcal {S}}_A\) in P is a global maximizer of \(f_A\) and in fact this is not always true. Indeed, if A is a \(2\times 2\) diagonal matrix which is not a multiple of the identity and \(\Vert \cdot \Vert _\alpha =\Vert \cdot \Vert _2\), \(\Vert \cdot \Vert _{\beta }=\Vert \cdot \Vert _3\), then \(\kappa _H({\mathcal {S}}_A)\le 1/2\) and \({\mathcal {S}}_A\) leaves all the parts of \(\mathbb {R}^2_+\) invariant but some of them do not contain a global maximizer of \(f_A\). Moreover, as \(\mathbb {R}^n_+\) has \(2^n\) parts, testing each part of the cone is computationally too expensive for large n. Therefore, in the remaining part of the section, we derive conditions in order to ensure that the power iterates converge to a global maximizer of \(f_A\).
Lemma 4
Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and let \(\Vert \cdot \Vert _\alpha \), \(\Vert \cdot \Vert _\beta \) satisfy Assumption 1. Then it holds \(f_A(x)\le f_A(|x|)\) for any \(x\in \mathbb {C}^n{\setminus }\{0\}\) and the maximum of \(f_A\) is attained in \(\mathbb {R}^n_+\).
Proof
Let \(x\ne 0\), since A has nonnegative entries, it holds \(|Ax|\le A|x|\). Thus, as monotonic norms are also absolute, we have
Now, if y is a global maximizer of \(f_A\), then \(f_A(y)\le f_A(|y|)\le f_A(y)\) which concludes the proof. \(\square \)
In the forthcoming Lemma 6, we use the strong monotonicity required in Point (1) of Assumption 1 to prove that if \(A^TA\) is irreducible, then the nonnegative maximizer of Lemma 4 has positive entries. To this end, however, we need one additional preliminary result that characterizes strongly monotonic norms in terms of the zero pattern of J and which we prove in the following:
Lemma 5
Let \(\Vert \cdot \Vert _{\gamma }\) be a differentiable monotonic norm on \(\mathbb {R}^n\), then \(\Vert \cdot \Vert _{\gamma }\) is strongly monotonic if and only if \(x\sim J_{\gamma }(x)\) for every \(x\in \mathbb {R}^n_+\).
Proof
Suppose that \(\Vert \cdot \Vert _{\gamma }\) is strongly monotonic. Let \(x\in \mathbb {R}^n_{+}\). If \(x=0\), \(J_{\gamma }(0)=0\) by construction. Suppose that \(x\ne 0\). We use the strong monotonicity to prove the existence of \(c>0\) such that \(c\,x\le J_{\gamma }(x)\). Let i be such that \(x_i>0\) and define \(f(t)=\Vert x+(t-x_i)e_i \Vert _{\gamma }\) for all \(t>0\). Then, f is differentiable and \(f'(t)=J_{\gamma }(x+(t-x_i)e_i)_i\) for all \(t>0\). Furthermore, f is strictly increasing on \((0,\infty )\) since \(\Vert \cdot \Vert \) is strongly monotonic. It follows that \(J_{\gamma }(x)_i = f'(x_i) >0\). As this is true for all i such that \(x_i>0\), we conclude that there exists \(c >0\) such that \(c\,x\le J_{\gamma }(x)\). The existence of \(C >0\) such that \(J_{\gamma }(x)\le C\,x\) follows from Proposition 5.2 of [7, Chapter 1]. Hence, we have \(J_{\gamma }(x)\sim x\).
For the reverse implication, suppose that \(J_{\gamma }(x)\sim x\) for all \(x\in \mathbb {R}^n_+\). Let \(x,y\in \mathbb {R}^n_+\) be such that \(x\le y\) and \(x\ne y\). If \(x=0\), then \(\Vert x \Vert _{\gamma }=0<\Vert y \Vert _{\gamma }\). Suppose that \(x\ne 0\). As \(x\le y\) and \(x\ne 0\), there exists i and \(t_0>0\) such that \(x+te_i \le y\) for all \(t\in (0,t_0)\). For \(t\in (0,t_0)\), we have
where the second inequality follows from the convexity of \(\Vert \cdot \Vert _{\gamma }\). By assumption, we have \(J_{\gamma }\left( x+\tfrac{t_0}{2}e_i\right) \sim x+\tfrac{t_0}{2}e_i\) and thus \(J_{\gamma }\left( x+\tfrac{t_0}{2}e_i\right) _i>0\). It follows that \(\Vert y \Vert _{\gamma } > \Vert x \Vert _{\gamma }\), i.e. \(\Vert \cdot \Vert _{\gamma }\) is strongly monotonic. \(\square \)
Lemma 6
Let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) satisfy Assumption 1. Let A be a matrix with nonnegative entries and suppose that \(A^TA\) is irreducible. Then, \({\mathcal {S}}_A(x)\) is positive for every positive x and every nonnegative critical point of \(f_A\) is positive.
Proof
Lemma 5 implies that \({\mathcal {S}}_A(x) \sim A^TA x\). It follows that \({\mathcal {S}}_A\) maps positive vectors to positive vectors since the irreducibility of \(A^TA\) implies that \(A^TA\) is positive for all positive x. Finally, note that \(A^TA\) is symmetric positive semi-definite and therefore all its eigenvalues are nonnegative. It follows that \(A^TA\) is primitive (see e.g. Theorem 1 in [40]). By the same theorem, there exists a positive integer k such that \((A^TA)^k\) is a matrix with positive entries. Since \({\mathcal {S}}^{k}_A(x)\sim (A^TA)^{k}x\) for every \(x\in \mathbb {R}^n_+{\setminus }\{0\}\), we deduce that \({\mathcal {S}}^{k}_A(x)\) is strictly positive for every nonzero, nonnegative x. Finally, suppose that \(y\in \mathbb {R}^n_+\) is a critical point of \(f_A\), then y is a fixed point of \({\mathcal {S}}_A\) by Lemma 2 and thus \(y={\mathcal {S}}_A^k(y)\) is strictly positive. \(\square \)
We are now ready to state our main theorem of this section. This theorem provides conditions on A, \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) that ensure the existence of a unique positive maximizer \(x^+\) such that \(\Vert Ax^+\Vert _\beta /\Vert x^+\Vert _\alpha =\Vert A\Vert _{\beta \rightarrow \alpha }\) and that govern the convergence of the power sequence
to such \(x^+\). As announced, this result is essentially a fixed point theorem for \({\mathcal {S}}_A\) and thus the Birkhoff contraction ratio \(\kappa _H({\mathcal {S}}_A)\) and any \(\tau \) that well-approximate \(\kappa _H({\mathcal {S}}_A)\) from above play a central role.
Theorem 4
Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and suppose that \(A^TA\) is irreducible. Let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) satisfy Assumption 1.
If \(\kappa _H({\mathcal {S}}_A)\le \tau <1\), then:
-
1.
\(f_A\) has a unique critical point \(x^+\) in \(\mathbb {R}^n_+\). Moreover, \(f_A(x^+)=\Vert A \Vert _{\beta \rightarrow \alpha }\) and \(x^+\) is positive.
-
2.
If \(x_0\) is positive and \(x_{k+1}={\mathcal {S}}_A(x_k)\) is the power sequence, then
$$\begin{aligned} \Vert x_k-x^+ \Vert _{\infty }\le \tau ^k\,C\, \quad \text {with}\quad C=\max _{i=1,\ldots ,n}\frac{d_H(x_0,x_1)}{(1-\tau )\Vert e_i \Vert _{\beta }} \end{aligned}$$where \(e_1,\ldots ,e_n\) is the canonical basis of \(\mathbb {R}^n\). Furthermore, it holds
$$\begin{aligned} (1-\tau ^k\,{\widetilde{C}})\Vert A \Vert _{\beta \rightarrow \alpha } \le \Vert Ax_k \Vert _\alpha \le \Vert A \Vert _{\beta \rightarrow \alpha } \end{aligned}$$with \({\widetilde{C}}= C\,\max _{x\ne 0}\tfrac{\Vert x \Vert _{\alpha }}{\Vert x \Vert _{\infty }}\). In particular, \(x_k\rightarrow x^+\) as \(k\rightarrow \infty \).
Proof
Lemma 4 implies that \(f_A\) has a maximizer \(x^+\in \mathbb {R}^n_+\). Lemma 6 implies that \(x^+\) is positive and that the interior of \(\mathbb {R}^n_+\) is left invariant by \({\mathcal {S}}_A\). Hence, all statements except the bounds on \(\Vert Ax_k \Vert _{\alpha }\) follow by a direct application of Lemma 3 and Eq. (4). We conclude with a proof of the estimates for \(\Vert Ax_k \Vert _{\alpha }\). Clearly, \(\Vert Ax_k \Vert _{\alpha }\le \Vert A \Vert _{\beta \rightarrow \alpha }\) always hold. For the lower bound, let \(\gamma = \max _{x\ne 0}\tfrac{\Vert x \Vert _{\beta }}{\Vert x \Vert _{\infty }}\). The estimate on \(\Vert x_k-x^+ \Vert _{\infty }\) implies that
which concludes the proof. \(\square \)
Note that the condition that requires \(A^TA\) to be irreducible is in general weaker than requiring the initial matrix A to be irreducible itself, as \(A^TA\) may be irreducible even if A is reducible. This is also observed in the numerical examples in Sect. 6.
Theorem 4 holds for any upperbound \(\tau \) of \(\kappa _H({\mathcal {S}}_A)\) and a somewhat natural choice for such a \(\tau \) is the following
This coefficient is particularly useful in practice as, thanks to the Birkhoff–Hopf theorem, in many circumstances one can provide explicit bounds for \(\tau ({\mathcal {S}}_A)\). Although in principle \(\tau ({\mathcal {S}}_A)\) can be larger than \(\kappa _H({\mathcal {S}}_A)\), in the forthcoming Sect. 4.2 we show that there are cases where the equality \(\tau ({\mathcal {S}}_A)=\kappa _H({\mathcal {S}}_A)\) holds. Moreover, we discuss the sharpness of the condition \(\kappa _H({\mathcal {S}}_A)<1\) required by our main result. In the following Sect. 4.1, instead, we discuss the particular case where \(\Vert \cdot \Vert _{\alpha },\Vert \cdot \Vert _\beta \) are \(\ell ^p\) norms and we give examples showing how Theorem 4 improves the existing theory for this problem.
4.1 Examples and Comparison with Previous Work
When \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) are \(\ell ^p\) norms, Theorem 4 implies the following:
Corollary 1
Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and suppose that \(A^TA\) is irreducible. Let \(1<p,q<\infty \) and consider
If \(\tau <1\), then \(\Vert A \Vert _{q\rightarrow p}\) can be approximated to an arbitrary precision with the fixed point iteration (2).
In the case of \(\ell ^p\) norms, both Theorem 1 and Corollary 1 apply. In order to compare them let us compute the Birkhoff contraction ratio for some simple but explanatory cases. Let \(\varepsilon \ge 0\) and \(A\in \mathbb {R}^{3\times 2}\), \(B\in \mathbb {R}^{2\times 2}\), \(C\in \mathbb {R}^{3\times 3}\) be defined as
Due to Theorem 3, it is easy to see that
Note that \(A^TA\) and \(C^TC\) are positive matrices and \(B^TB\) is positive if and only if \(\varepsilon >0\). If \(\varepsilon =0\), then \(B^T B\) is the identity matrix. We first discuss the implications of Theorem 1 for the computation of \(\Vert X \Vert _{q\rightarrow p}\) where \(X\in \{A,B,C\}\).
If \(p\le q\) and \(\varepsilon >0\), then Theorem 1 implies that \(f_X\) has a unique positive maximizer \(x^+\), which is global, and the power sequence (11) will converge to \(x^+\). However, if \(\varepsilon =0\) then Theorem 1 ensures that every positive critical point of \(f_B\) is a global maximizer but uniqueness and convergence are only guaranteed under the assumption \(p<q\). Now, we look at the implications of Theorem 4. By noting that \(\kappa _H(J_p) = p-1\) and \(\kappa _H(J_{q^*}) = 1/(q-1)\), we have
Hence, for instance, uniqueness and global maximality of a positive maximizer of \(f_A\) is guaranteed by Theorem 4 under the assumption \(9(p-1)<400(q-1)\) which includes the known global convergence range of values \(p<q\), but is of course a much weaker assumption.
Now, note that for \(\varepsilon \ge 1\) we have \(\tau ({\mathcal {S}}_B)<1\) if and only if \((\varepsilon -1)^2(p-1)<(\varepsilon +1)^2(q-1)\). This assumption is less restrictive than \(p\le q\) for every \(\varepsilon \ge 1\) as \(p\le q\) correspond to the asymptotic case \(\varepsilon \rightarrow \infty \). If \(\varepsilon =1\), Theorem 4 applies for every \(1<p,q<\infty \). The analysis for \(0<\varepsilon < 1\) is similar. However, we note that if \(\varepsilon =0\), then Theorem 4 does not provide any information about \(f_B\) for the case \(p=q\) in contrast with Theorem 1. When \(\varepsilon =0\) and \(p<q\), both theorems imply the same result. Finally, note that \(\tau ({\mathcal {S}}_C)<1\) if and only if \(p<q\) and so Theorem 1 is more useful as it also covers the case \(p=q\).
More in general, when the considered matrix A has finite projective diameter \(\triangle (A)\), then Theorem 2 implies that \(\kappa _H(A)<1\) and thus Theorem 4 ensures that for any \(p>1\), the matrix norm \(\Vert A\Vert _{q\rightarrow p}\) can be approximated in polynomial time to an arbitrary precision for any choice of \(q>\kappa _H(A)^2(p-1)+1\), without the requirement \(q>p\).
Figure 1 shows that the value of \(\kappa _H(A)\) for matrices with positive entries is often substantially smaller than one, enhancing the relevance of Theorem 4.
4.2 On the Sharpness of the New Convergence Condition
As we observed earlier, the key property behind the global convergence of the power iterates relies on the fact that, when \(\kappa _H({\mathcal {S}}_A)<1\), the mapping \({\mathcal {S}}_A\) has a unique positive fixed point \(x^+\). Due to Lemma 2, this is equivalent to observing that, in this case, \(x^+\) is the unique positive critical point of \(f_A\), up to scalar multiples. In what follows we show that this is not anymore the case if \(\kappa _H({\mathcal {S}}_A)>1\). In particular, we limit our attention to the case of \(\ell ^p\) norms and we exhibit a one-parameter family of \(2\times 2\) positive and symmetric matrices \(A_\varepsilon \) for which a unique positive critical point of \(f_{A_\varepsilon }\) exists if and only if \(\kappa _H({\mathcal {S}}_{A_\varepsilon })\le 1\). Moreover, we show that for such a family of matrices it holds \(\tau ({\mathcal {S}}_A)=\kappa _H({\mathcal {S}}_A)\) where \(\tau ({\mathcal {S}}_A)\) is the estimate of \(\kappa _H({\mathcal {S}}_A)\) discussed in Eq. (12). As \(f_A\) is scale invariant, here and in the rest of this section, uniqueness of the critical point is meant up to scalar multiples.
For \(\varepsilon >0\) and \(p,q\in (1,\infty )\), let \(A_{\varepsilon }\in \mathbb {R}^{2\times 2}\) and \(f_{A_\varepsilon }:\mathbb {R}^{2}\rightarrow \mathbb {R}_+\) be defined as
The main result of this section is the following theorem, whose proof is postponed to the end of the section
Theorem 5
It holds \(\kappa _H({\mathcal {S}}_{A_{\varepsilon }})=\tau ({\mathcal {S}}_{A_{\varepsilon }})\). Furthermore, \(f_{A_\varepsilon }\) has a unique critical point in \(\mathbb {R}_+^2\) if and only if \(\tau ({\mathcal {S}}_{A_\varepsilon })\le ~1\).
This result shows that, unlike the previous Theorem 1, Theorem 4 is tight in the sense that when \(\kappa _H({\mathcal {S}}_A)>1\) there might be multiple distinct fixed points of \({\mathcal {S}}_A\) in \(\mathbb {R}^2_+\), and thus convergence of the power sequence to a prescribed fixed point cannot be ensured globally without restrictions on the starting point \(x_0\in \mathbb {R}^2_+\).
We subdivide the proof of Theorem 5 above into a number of preliminary results. Before proceeding, we recall that for \(p\in (1,\infty )\), \(\Phi _p:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is entrywise defined as \(\Phi _p(x)_i = |x_i|^{p-2}x_i\) for all i. We compute \(\tau ({\mathcal {S}}_{A_\varepsilon })\) and \(\kappa _H({\mathcal {S}}_A)\).
Lemma 7
For every \(\varepsilon >0\), we have \(\kappa _H({\mathcal {S}}_{A_{\varepsilon }})=\tau ({\mathcal {S}}_{A_{\varepsilon }})= \big (\frac{1-\varepsilon }{1+\varepsilon }\big )^2\frac{p-1}{q-1}.\)
Proof
As \(\kappa _H(A)=\big | \frac{\varepsilon -1}{1+\varepsilon }\big |\) by Theorem 3, we have \(\tau ({\mathcal {S}}_{A_\varepsilon })=\big (\frac{1-\varepsilon }{1+\varepsilon }\big )^2 \frac{p-1}{q-1}\). Now, we show that \(\kappa _H({\mathcal {S}}_{A_\varepsilon })=\tau ({\mathcal {S}}_{A_\varepsilon })\). Clearly, \(\kappa _H({\mathcal {S}}_{A_\varepsilon })\le \tau ({\mathcal {S}}_{A_\varepsilon })\), for the reverse inequality consider \(x=(1,1)^T\) and \(y(t)=(1,t)^T\). Furthermore, define \(h:(1,\infty )\rightarrow \mathbb {R}\) as
Then, we have \(h(t)\le \kappa _H({\mathcal {S}}_{A_\varepsilon })\) for every \(t>0\). To conclude the proof, we show that \(\lim _{t\rightarrow 1^+}h(t)=\tau ({\mathcal {S}}_{A_{\varepsilon }})\). A direct computation shows that \(d_H\big (x,y(t)\big )=\ln (t)\) and \(A_{\varepsilon }\Phi _p(A_{\varepsilon }x) =(1+\varepsilon )^p(1,1)^T\). Recalling that \({\mathcal {S}}_{A_{\varepsilon }}(z)= \Phi _{q^*}(A_{\varepsilon }\Phi _p(A_{\varepsilon }z))\), we have
So if we let \(f_1,f_2:(1,\infty )\rightarrow \mathbb {R}\) be such that \(A_{\varepsilon }\Phi _p(A_{\varepsilon }y(t))=\big (f_1(t),f_2(t)\big )^T\) for all \(t>1\), we get
With
the above computations, imply
where the last equality follows by continuity. As \(\ln (1)=\ln (g(1))=0\), L’Hopital’s rule implies that
where
and
As \(\zeta _1(1)\zeta _2(1)=(1+\varepsilon )^{2p}(1+\varepsilon )^4\), after rearrangement, we finally obtain
which implies \(\tau ({\mathcal {S}}_{A_{\varepsilon }})\le \kappa _H({\mathcal {S}}_{A_{\varepsilon }})\) and thus concludes the proof. \(\square \)
Now, we prove that the nonnegative critical points of \(f_{A_\varepsilon }\) are positive and we then characterize them in terms of a real parameter t. As critical points are defined up to multiples, we restrict our attention to the line \(\{x\in \mathbb {R}^2:x_1+x_2=1\}\).
Lemma 8
Let \(x\in \mathbb {R}^2_+\) with \(x_1+x_2=1\). Then x is a critical point of \(f_{A_\varepsilon }\) if and only if there exists \(t\in (0,1)\) such that \(x=(t,1-t)^T\) and \(\psi (t)=\psi (1-t)\) where \(\psi :[0,1]\rightarrow \mathbb {R}_{+}\) is defined as
Proof
As we already observed, \(f_{A_\varepsilon }\) attains a global maximum in \(\mathbb {R}^2_{+}\). Furthermore, the critical points of \(f_{A_\varepsilon }\) satisfy
As \(A_{\varepsilon }\) is positive, (15) implies that every nonnegative critical point of \(f_{A_\varepsilon }\) is positive. It follows that, for positive vectors x, (15) is equivalent to
Thus, \(x_1+x_2=1\) and \(x_1,x_2>0\) imply the existence of \(t\in (0,1)\) such that \(x_1=t\) and \(x_2=1-t\). Substituting \(x=(t,1-t)^T\) in (16) we finally obtain the claimed result. \(\square \)
A direct consequence of Lemma 8 is that \((1,1)^{T}/2\) is a critical point of \(f_{A_\varepsilon }\). Moreover, by symmetry, we see that \((t,1-t)^T\) is a critical point of \(f_{A_\varepsilon }\) if and only if \((1-t,t)^T\) is also a critical point. This observation implies the following
Lemma 9
If \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\), then \(f_{A_\varepsilon }\) has at least three distinct positive critical points.
Proof
Note that if \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\), then \( \big (\frac{1+\varepsilon }{1-\varepsilon }\big )^2<\frac{p-1}{q-1}\). Let \(h:[0,1]\rightarrow \mathbb {R}\) be defined as \(h(t)=\psi (1-t)-\psi (t)\), where \(\psi \) is defined as in (14). The critical points of \(f_{A_\varepsilon }\) correspond to zeros of h in (0, 1/2]. Indeed, by Lemma 8, we know that these points are in bijection with the zeros of h on (0, 1) and \(h(t)=-h(1-t)\) for every \(t\in (0,1)\). We have already observed that \(h(t_0)=0\) with \(t_0=1/2\). We now show that there exists \(t_1\in (0,t_0)\) such that \(h(t_1)=0\). The existence of such \(t_1\) implies that \((t_1,1-t_1)^T,(1-t_1,t_1)^T,(t_0,t_0)^{T}\) are three distinct positive critical points of \(f_{A_\varepsilon }\), since \(h(1-t_1)=h(t_1)=0\). To construct \(t_1\), we first prove that our assumption \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\) is equivalent to the condition \(h'(t_0)>0\). We have
With \((\varepsilon +t_0-t_0\varepsilon )=(t_0\varepsilon +1-t_0)=(\varepsilon +1)/2\) we get
As \(h'(t_0)=-\psi '(t_0)-\psi '(1-t_0)=-2\psi '(t_0)\), we have \(h'(t_0)>0\) if and only if \((q-1)(1+\varepsilon )^2<(p-1)(1-\varepsilon )^2\) i.e. \(h'(t_0)>0\) if and only if \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\).
Now, as \(h'(t_0)>0\), there exists a neighborhood U of \(t_0\) such that h is strictly increasing on U. Since \(h(t_0)=0\), this implies that there exists \(s\in (0,t_0)\cap U\) such that \(h(s)<0\). As \(\lim _{t\rightarrow 0}h(t)=\varepsilon ^{p-1}+\varepsilon >0\), the intermediate value theorem implies the existence of \(t_1\in (0,s)\) such that \(h(t_1)=0\). As observed above, this concludes the proof. \(\square \)
Finally, we address the case \(\tau ({\mathcal {S}}_{A_\varepsilon })=1\).
Lemma 10
If \(\tau ({\mathcal {S}}_{A_\varepsilon })= 1\), then \(f_{A_\varepsilon }\) has a unique nonnegative critical point.
Proof
Let \(F:\mathbb {R}^2_+\rightarrow \mathbb {R}_+^2\) be defined as \(F(x)=\Phi _{q^*}(A_{\varepsilon }\Phi _p(A_{\varepsilon }x))\), where \(q^*=q/(q-1)\) denotes the Hölder conjugate of q. Then, for \(\mathbf {1}=(1,1)^T\) and \(u=\mathbf {1}/2\), we have \(F(u)=\lambda u\) for some \(\lambda >0\). Hence, u is a fixed point of \({\mathcal {S}}_{A_{\varepsilon }}\) and, \(\Vert \cdot \Vert _q\) is differentiable, by Lemma 2, it follows that u is a critical point of \(f_{A_\varepsilon }\). Moreover, it is a fixed point of \(G:D_+\rightarrow D_+\) defined by \(G(x)=\langle {F(x),\mathbf {1}\rangle }^{-1}F(x)\), where \(D_+=\{(t,1-t): t\in [0,1]\}\). Note that the fixed points of G coincide, up to scaling, with those of \({\mathcal {S}}_{A_{\varepsilon }}\). To conclude, we prove that u is the unique fixed point of G.
As \(\tau ({\mathcal {S}}_{A_\varepsilon })=1\), we have \(d_H(G(x),G(y))=d_H(F(x),F(y))\le d_H(x,y)\) and so G is non-expansive with respect to \(d_H\). Now, Theorem 6.4.1 in [31] implies that u is the unique fixed point of G, if
where \(G'(u)\) denotes the Jacobian matrix of G evaluated at u. Moreover, as \(F(u)=\lambda u\), Lemma 6.4.2 in [31] implies that \(F'(u)u=\lambda u\) and
Suppose by contradiction that there exists a \(z\in \mathbb {R}^2{\setminus } \{0\}\) with \(z_1+z_2=0\), such that \(z-G'(u)z=0\). A direct computation shows that \(\langle {z,F'(u)^T u\rangle }=0\). Then,
It follows that \(F'(u)z=\lambda z\) and, as \(F'(u)\) is entry-wise positive, the classical Perron–Frobenius theorem implies that \(z=\pm u\). However, \(u_1+u_2>0\) which contradicts the assumption \(z_1+z_2=0\). So \(0 \ne z-G'(u)z\) for every \(z\ne 0\) such that \(z_1+z_2=0\). Hence, u is the unique fixed point of G, which concludes the proof. \(\square \)
Combining the last two lemmas allows us to conclude:
Proof of Theorem 5
Due to Lemmas 9 and 10 we only need to address the case \(\tau (S_{A_{\varepsilon }})< 1\). However this is a direct consequence of Lemma 3. In fact, as \(A_{\varepsilon }\) is entry-wise positive, the nonnegative fixed points of \({\mathcal {S}}_{A_\varepsilon }\) are positive and, if \(\tau (S_{A_{\varepsilon }})<1\), then \({\mathcal {S}}_{A_{\varepsilon }}\) is a strict contraction with respect to \(d_H\) and so it has a unique fixed point which also is the unique positive maximizer of \(f_{A_\varepsilon }\) on \(\mathbb {R}^2_+\). \(\square \)
5 Matrix Norms Induced by Sum of Weighted \(\ell ^p\) Norms
The Birkhoff contraction ratios \(\kappa _H(J_\alpha )\) and \(\kappa _H(J_{\alpha ^*})\) are easy to compute when \(\Vert \cdot \Vert _\alpha \) is a weighted \(\ell ^p\) norm. More precisely, we have the following
Proposition 1
Let \(\Vert x \Vert _{\alpha }=\Vert Dx \Vert _p\) for some \(p\in (1,\infty )\) and some diagonal matrix D with positive diagonal entries, then \(\Vert x \Vert _{\alpha ^*}=\Vert D^{-1}x \Vert _{p*}\) where \(p^*=p/(p-1)\). Furthermore, it holds \(\kappa _H(J_{\alpha })=\kappa _H(J_{\alpha ^*})^{-1}=p-1\).
Proof
The equality \(\Vert x \Vert _{\alpha ^*}=\Vert D^{-1}x \Vert _{p*}\) follows from Theorem 6 below. To conclude, note that \(J_{\alpha }(x)=\Vert Dx \Vert _{p}^{1-p}D^p\Phi _p(x)\) and therefore \(\kappa _H(J_{\alpha })=\kappa _H(\Phi _p)=p-1\). The same argument shows that \(\kappa _H(J_{\alpha ^*})=\kappa _H(\Phi _{p^*})=p^*-1=(p-1)^{-1}\). \(\square \)
While the above Proposition 1 makes the computation of the Birkhoff constant of weighted \(\ell ^p\)-norms particularly easy, computing \(\kappa _H(J_{\alpha })\) or \(\kappa _H(J_{\alpha ^*})\) for a general strongly monotonic norm \(\Vert \cdot \Vert _{\alpha }\) can be a difficult task. There are norms for which an explicit expression in terms of arithmetic operations for \(\Vert \cdot \Vert _{\alpha }\) is given by construction (resp. modelisation), but such an expression is not available for the dual \(\Vert \cdot \Vert _{\alpha _*}\). Examples include \(\Vert x \Vert _{\alpha }=(\Vert x \Vert ^3_{p}+\Vert x \Vert ^3_{q})^{1/3}\) as shown by Theorem 6 below. On the other hand, as discussed in the introduction, monotonic norms different than the standard \(\ell ^p\) norms arise quite naturally in several applications.
Motivated by the above observations, we devote the rest of the section to the study of a particular class of monotonic norms of the form \(\Vert x \Vert _{\alpha }=\Vert \big (\Vert x \Vert _{\alpha _1},\ldots , \Vert x \Vert _{\alpha _d}\big ) \Vert _{\gamma }\) where all the norms are monotonic and where we also allow \(\Vert x \Vert _{\alpha _i}\) to measure only a subset of the coordinates of x.
5.1 Composition of Monotonic Norms and its Dual
Let d be a positive integer. We consider norms of the following form
where \(\Vert \cdot \Vert _{\gamma }\) is a monotonic norm on \(\mathbb {R}^d\), \(\Vert \cdot \Vert _{\alpha _i}\) is a norm on \(\mathbb {R}^{n_i}\) and \(P_i\in \mathbb {R}^{n_i\times n}\) is a “weight matrix” for all \(i=1,\ldots ,d\). For \(\Vert \cdot \Vert _{\alpha }\) to be a norm, we assume that \(M= [P_1^T,\ldots ,P_d^{T}]^T\in \mathbb {R}^{ (n_1+\ldots +n_d)\times n}\) has rank n. Note that the monotonicity of \(\Vert \cdot \Vert _{\gamma }\) implies that \(\Vert \cdot \Vert _{\alpha }\) satisfies the triangle inequality.
Let us first discuss particular cases of (17). First, note that for two norms \(\Vert \cdot \Vert _{\alpha _1},\Vert \cdot \Vert _{\alpha _2}\) on \(\mathbb {R}^n\), the norm
can be obtained from (17) with \(d=2\), \(\Vert \cdot \Vert _{\gamma }=\Vert \cdot \Vert _p\), and \(P_1=P_2 = I\), with \(I\in \mathbb {R}^{n\times n}\) being the identity matrix. It is also possible to model norms acting on different coordinates of the vectors. For example, if \((x,y)\in \mathbb {R}^{2n}\), then
can be obtained from (17) with \(d=2\), \(\Vert \cdot \Vert _{\gamma }=\Vert \cdot \Vert _p\), \(P_1=\mathrm {diag}(1,\ldots ,1,0,\ldots ,0)\in \mathbb {R}^{2n\times 2n}\) and \(P_2=\mathrm {diag}(0,\ldots ,0,1,\ldots ,1)\in \mathbb {R}^{2n\times 2n}\). The dual of \(\Vert \cdot \Vert _{\alpha _{\times }}\) is discussed in Lemma 11 below and has a particularly elegant description. More complicated weight matrices \(P_i\) can also be used. For example if \({\widetilde{n}}\) is an integer not smaller than n and \(P\in \mathbb {R}^{{\widetilde{n}}\times n}\) has rank n, then the norm
can be obtained with \(d=1\), \(\Vert \cdot \Vert _{\gamma }=|\cdot |\), \(\Vert \cdot \Vert _{\alpha _1}=\Vert \cdot \Vert _p\) and \(P_1 = P\). Note that if \({\widetilde{n}} = n\), then P is square and invertible and this property can be used to simplify the evaluation of the dual norm of \(\Vert \cdot \Vert _{\alpha _P}\). Consequences of such additional structure are discussed in Corollary 2.
In the next Theorem 6 we provide a characterization of the dual norm of \(\Vert \cdot \Vert _{\alpha }\) in its general form as defined in (17). We first need the following lemma that addresses the particular case where \(P_1,\ldots ,P_d\) are projections.
Lemma 11
Let \(n_1,\ldots ,n_d\) be positive integers and for \(i=1,\ldots ,d\) let \(\Vert \cdot \Vert _i\) be a norm on \(\mathbb {R}^{n_i}\). Furthermore, let \(\Vert \cdot \Vert _{\gamma }\) be a monotonic norm on \(\mathbb {R}^d\). Let \(V= \mathbb {R}^{n_1}\times \ldots \times \mathbb {R}^{n_d}\) and for all \((u_1,\ldots ,u_d)\in V\) define
Then \(\Vert \cdot \Vert _{ V}\) is a norm on V and the induced dual norm \(\Vert \cdot \Vert _{ V^*}\) satisfies
Proof
The fact that \(\Vert \cdot \Vert _{ V}\) is a norm follows from a direct verification. Let \((u_1,\ldots ,u_d)\in V\). Then, for every \((y_1,\ldots ,y_d)\in V\), we have
which shows that
For the reverse inequality, let \(v = (\Vert u_1 \Vert _{\alpha _1^*},\ldots ,\Vert u_d \Vert _{\alpha _d^*})\). As \(\Vert \cdot \Vert _\gamma \) is monotonic, by Proposition 5.2 in [7, Chapter 1], there exists \(w\in \mathbb {R}^d_+\) such that \(\Vert w \Vert _{\gamma }\le 1\) and \(\left\langle v , w \right\rangle =\Vert v \Vert _{\gamma ^*}\). Let us denote by \(w_1,\ldots ,w_d\in \mathbb {R}_+\) and \(v_1,\ldots ,v_d\in \mathbb {R}\) respectively the components of w and v in the canonical basis of \(\mathbb {R}^d\). Now, let \( {\overline{y}}_1\in \mathbb {R}^{n_1},\ldots ,{\overline{y}}_d\in \mathbb {R}^{n_d}\) be such that \(\Vert {\overline{y}}_i \Vert _{\alpha _i}\le 1\) and \(\left\langle {\overline{y}}_i , u_i \right\rangle =\Vert u_i \Vert _{\alpha _i^*}\) for all \(i=1,\ldots ,d\). Then, as \(\Vert \cdot \Vert _{\gamma }\) is monotonic with respect to \(\mathbb {R}^d_+\) and \(\Vert {\overline{y}}_i\Vert _{\alpha _i}\le 1\) for all i, we have
Note that
It follows that \(\Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots , \Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma *}\le \Vert (u_1,\ldots ,u_d) \Vert _{ V^*},\) which, together with (18), concludes the proof. \(\square \)
Theorem 6
Let d be a positive integer. For \(i=1,\ldots ,d\), let \(P_i\in \mathbb {R}^{n_i\times n}\) and let \(\Vert \cdot \Vert _{\alpha _i}\) be a norm on \(\mathbb {R}^{n_i}\). Suppose that \(M= [P_1^T,\ldots ,P_d^{T}]^T\in \mathbb {R}^{ (n_1+\ldots +n_d)\times n}\) has rank n. Furthermore, let \(\Vert \cdot \Vert _{\gamma }\) be a monotonic norm on \(\mathbb {R}^d\). For every \(x\in \mathbb {R}^n\), define
Then, \(\Vert \cdot \Vert _{\alpha }\) is a norm on \(\mathbb {R}^n\) and the induced dual norm is given by
where \(\Vert \cdot \Vert _{\alpha _i^*}\) is the dual norm induced by \(\Vert \cdot \Vert _{\alpha _i}\) and \(\Vert \cdot \Vert _{\gamma ^*}\) is the dual norm induced by \(\Vert \cdot \Vert _{\gamma }\).
Proof
Let \(u_1\in \mathbb {R}^{n_1},\ldots ,u_d\in \mathbb {R}^{n_d}\) be such that \(P_1^Tu_1+\cdots +P_d^Tu_d=x\). Such vectors always exists as M has full rank. Then, for every \(y\in \mathbb {R}^n\), it holds
It follows that
Now, we prove the reverse inequality. To this end, consider the vector space \( V=\mathbb {R}^{n_1}\times \ldots \times \mathbb {R}^{n_d}\) endowed with the norm \(\Vert \cdot \Vert _{ V}\) defined as
As V is a finite product of finite dimensional vector spaces, we can identify \( V^*\) with V and by Lemma 11, we know that the dual norm of \(\Vert \cdot \Vert _{ V^*}\) induced by \(\Vert \cdot \Vert _{ V}\) satisfies
Consider now the vector subspace \( W=\{(P_1y,\ldots ,P_dy)\mid y\in \mathbb {R}^n\}\subset V\). Note that, we can identify W with the image of M, i.e. \( W = \{My\mid y\in \mathbb {R}^n\}\). Let \(M^\dagger \in \mathbb {R}^{n\times (n_1+\ldots +n_d)}\) be the Moore–Penrose inverse of M. Then, as M is full rank, we have \(M^\dagger My= y\) for all \(y\in \mathbb {R}^n\). Let \(\phi :W\rightarrow \mathbb {R}\) be defined as
For every \( (u_1,\ldots , u_d)\in W\), there exists \(y\in \mathbb {R}^n\) such that \((u_1,\ldots , u_d)=My\), i.e. \(u_i=P_iy\) for all \(i=1,\ldots ,d\), and thus
By the Hahn–Banach theorem (see e.g. Corollary 1.2 of [5]), there exists \((u_1',\ldots ,u_d')\in V\) such that
and
Next, let \(y\in \mathbb {R}^n\), then \(My=(P_1y,\ldots ,P_dy)\in W\) and with (19), we have
As the above is true for all \(y\in \mathbb {R}^n\), it follows that \(P_1^Tu_1'+\ldots +P_d^Tu_d'=x\). Hence, we have
which concludes the proof of the formula for \(\Vert \cdot \Vert _{\alpha ^*}\). \(\square \)
As a consequence of the above Theorem 6, we have that the dual of the norms \(\Vert \cdot \Vert _{\alpha _+},\Vert \cdot \Vert _{\alpha _{\times }}, \Vert \cdot \Vert _{\alpha _P}\) considered at the beginning of this section are respectively given by
with \(p^* = p/(p-1)\). Note that the \(\Vert \cdot \Vert _{\alpha ^*_{\times }}\) does not involve an infimum. The infimum can also be removed in \(\Vert x \Vert _{\alpha _P^*}\), if P is square and invertible and in that case it holds \(\Vert x \Vert _{\alpha _P^*}=\Vert P^{-T}x \Vert _{p^*}\).
We discuss more general examples in the next result.
Corollary 2
Under the same assumptions as Theorem 6, we have:
-
1.
If \(P_1,\ldots ,P_d\) are all square invertible matrices and
$$\begin{aligned} \Vert x \Vert _{\alpha ^*}=\min _{\begin{array}{c} x=u_1+\cdots + u_d\\ u_1,\ldots ,u_d\in \mathbb {R}^n \end{array}}\Vert (\Vert (P_1^T)^{-1} u_1 \Vert _{\alpha _1^*},\ldots ,\Vert (P_d^T)^{-1} u_d \Vert _{\alpha _d^*}) \Vert _{\gamma ^*} \end{aligned}$$ -
2.
If every \(x\in \mathbb {R}^n\) can be uniquely written as \(x=x_{P_1}+\ldots +x_{P_d}\) with \(x_{P_i}\in {\text {Im}}(P_i^T)\) for all \(i=1,\ldots ,d\) (i.e. \(\mathbb {R}^n\) is the direct sum of the range of \(P_1,\dots ,P_d\)), then
$$\begin{aligned} \Vert x \Vert _{\alpha ^*}=\left\| \left( \inf _{\begin{array}{c} u_1\in \mathbb {R}^{n_1}\\ P_1^Tu_1= x_{P_1} \end{array}}\Vert u_1 \Vert _{\alpha _1^*},\ldots ,\inf _{\begin{array}{c} u_d\in \mathbb {R}^{n_d}\\ P_d^Tu_d= x_{P_d} \end{array}}\Vert u_{d} \Vert _{\alpha _d^*}\right) \right\| _{\gamma ^*}. \end{aligned}$$If, additionally, \(n_i = \dim ({\text {Im}}(P_i^T))\) for all \(i=1,\ldots ,d\), then
$$\begin{aligned} \Vert x \Vert _{\alpha ^*}=\Vert \big (\Vert (P_1^T)^{\dagger }x \Vert _{\alpha _1^*}, \ldots ,\Vert (P_d^T)^{\dagger } x \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}, \end{aligned}$$where \((P_i^T)^{\dagger }\) is the Moore–Penrose inverse of \(P_i^T\).
5.2 The Power Method for Compositions of \(\ell ^p\)-Norms
We discuss here consequences of Theorems 4 and 6 when applied to a special family of norms defined in terms of subsets of entries of the initial vector, i.e. the case where \(P_i\) is a nonnegative diagonal matrix.
For some nonnegative weight vector \(\omega \in \mathbb {R}^m\) and coefficient \(p\in (1,\infty )\), let \(\Vert \cdot \Vert _{\omega ,p}\) be the \(\omega \)-weighted \(\ell ^p\)-(semi)norm on \(\mathbb {R}^m\), defined as
To express the dual of \(\Vert x \Vert _{\omega ,p}\) and their compositions, let
If \(\omega \) is positive, then \(\Vert x \Vert _{\omega ,p}\) is a norm and it holds \((\Vert x \Vert _{\omega ,p})_*=\Vert x \Vert _{\omega ^*,p^*} \) by Proposition 1.
Let \(\omega _1,\ldots ,\omega _d\in \mathbb {R}^m\) be nonzero vectors of nonnegative weights such that \(\omega _1+\ldots +\omega _d\) is a positive vector. Further let \(s\in [1,\infty )\), \(p_1,\ldots ,p_d\in (1,\infty )\) and define
The fact that \(\omega _1+\cdots +\omega _d\) is positive ensures that \(\Vert \cdot \Vert _{\alpha }\) is a norm. Note that \(\Vert \cdot \Vert _{\alpha }\) is strongly monotonic.
The differentiability of \(\Vert \cdot \Vert _{\alpha }\) is discussed in the following lemma.
Lemma 12
Let \(\Vert \cdot \Vert _{\alpha }\) be as in (22), then \(\Vert \cdot \Vert _{\alpha }\) is differentiable if either \(s>1\) or \(s=1\) and \(\omega _i\) has at least two positive entries for every \(i=1,\ldots ,d\).
Proof
As \(p_k>1\), \(\Vert \cdot \Vert _{\omega _k,p_k}\) is differentiable if \(\omega _{k}\) has at least two positive entries. If it has only one positive entry then \(\Vert \cdot \Vert _{\omega _k,p_k}\) is just a weighted absolute value. Hence, if \(s>1\), then the differentiability of \(\Vert \cdot \Vert _{\alpha }\) follows from that of the \(\ell ^{s}\)-norm. While if \(s=1\) and \(\omega _i\) has at least two positive entries for every \(i=1,\ldots ,d\), then \(\Vert \cdot \Vert _{\alpha }\) is just a sum of differentiable norms. \(\square \)
If \(\Vert \cdot \Vert _\alpha \) is differentiable, we have
and the following lemma provides an upper bound for \(\kappa _H(J_{\alpha })\).
Lemma 13
Let \(\Vert \cdot \Vert _{\alpha }\) be as in (22). If \(\Vert \cdot \Vert _{\alpha }\) is differentiable then
Proof
Let \(\delta =\sum _{k=1}^d\max \{0,p_k-s\}\). We have \(J_{\alpha }(x)=\Vert x \Vert _{\alpha }^{1-s}(F(x)+G(x))\) where for all \(x\in \mathbb {R}^m_{+}{\setminus }\{0\}\) we let \(F(x) = \sum _{p_k\le s}\Vert x \Vert _{\omega _k,p_k}^{s-p_k}\mathrm {diag}(\omega _k)\Phi _{p_k}(x)\) and \( G(x)=\sum _{p_k>s}\Vert x \Vert _{\omega _k,p_k}^{s-p_k}\mathrm {diag}(\omega _k)\Phi _{p_k}(x).\) Note that if \(p_k > s\) for all k then \(F(x)=0\), whereas \(G(x)=0\) when \(p_k\le s\) for all k. Moreover, note that F is order-preserving and homogeneous of degree \(s-1\). Now let us set \(\tau (x)=1\) if \(p_j\le s\) for all j and \(\tau (x) =\prod _{p_j>s}\Vert x \Vert _{\omega _j,p_j}^{p_j-s}\) otherwise. Then \(\tau \) is order-preserving and homogeneous of degree \(\delta \) and \(x\mapsto \tau (x)F(x)\) is order-preserving and homogeneous of degree \(\delta +(s-1)\). Finally, note that
is order-preserving as well and homogeneous of degree \(\delta +(s-1)\). This implies that \(\delta +(s-1)\) is a Lipschitz constant of \(H(x)=\tau (x)(F(x)+G(x))\) with respect to the Hilbert metric \(\mu \). Hence, for any \(x,y\in \mathbb {R}^m_+{\setminus }\{0\}\) with \(x\sim y\), we finally obtain
which concludes the proof. \(\square \)
If \(s>1\), by Theorem 6, we have
It is not difficult to realize that the case \(s=1\) has a similar form, where the sum is replaced by a maximum. We henceforth omit that case, for the sake of brevity.
Now, consider a norm \(\Vert \cdot \Vert _\beta \) defined as the dual norm of a norm of the type (22)
where h is some positive integer, \(\varpi _i\) are nonnegative weight vectors whose sum \(\varpi _1+\cdots +\varpi _h\) is positive and \(q_1,\dots ,q_h, t\in (1,\infty )\). As \(\min _x f(x)=(\max _x f(x)^{-1})^{-1}\) for continuous positive f, we deduce that for this choice of norm we have
for any matrix \(A\in \mathbb {R}^{m\times n}\).
We emphasize that, while the norm \(\Vert \cdot \Vert _\beta \) is defined implicitly in the general case, when the weight vectors \(\varpi _i\) have disjoint support Corollary 2 yields the following explicit formula
which also simplifies the definition of \(\Vert A\Vert _{\beta \rightarrow \alpha }\).
The advantage of choosing \(\Vert \cdot \Vert _\beta \) as in (25) relies on the fact that both \(\Vert x\Vert _{\beta ^*}\) and \(J_{\beta ^*}\) admit an explicit expression analogous to (22) and (23), precisely
for all choices of the weights \(\varpi _i\) such that \(\varpi _1+\dots +\varpi _h>0\).
Thus, we obtain an explicit formula for the operator
which allows us to easily implement the power method (11) for the matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\). An efficient implementation of the operator \(J_\alpha \) for a norm \(\Vert \cdot \Vert _alpha\) of the form (22) is provided by Algorithm 1. If we let \(\mathrm {nnz}(X)\) denote the number of nonzero entries in X and we assume arithmetic operations have unit cost, evaluating \(J_{\alpha }\) and \(J_{\beta ^*}\) via Algorithm 1 costs \({\mathcal {O}}\left( \sum _{i=1}^d\mathrm {nnz}(\omega _i)\right) \) and \(\mathcal O\left( \sum _{i=1}^h\mathrm {nnz}(\varpi _i)\right) \) operations, respectively. So, the total cost of evaluating \({\mathcal {S}}_A\), i.e. of each iteration of the power method in (11), is \(\mathcal O\big (C({\mathcal {S}}_A)\big )\) where
which boils down to \({\mathcal {O}}(dn+hn+n^2)\) when all the \(\omega _i\), \(\varpi _i\) and A are full.
As a consequence, we have
Theorem 7
Let \(A\in \mathbb {R}^{m\times n}\) be a nonnegative matrix such that \(A^TA\) is irreducible. Let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) be as in (22) and (25), respectively. Let
If \(\tau <1\) and \(\Vert \cdot \Vert _{\alpha }\), \(\Vert \cdot \Vert _{\beta }\) are differentiable, then \(\Vert A \Vert _{\beta ^*\rightarrow \alpha }\) can be approximated to \(\varepsilon \) precision in \(\mathcal {O}\big (C({\mathcal {S}}_A)\ln (1/\varepsilon )\big )\) arithmetic operations with the power sequence (11).
Proof
Besides the complexity bound, the result is a direct consequence of Theorem 4 and the upper bounds for \(\kappa _H(J_\alpha )\) and \(\kappa _H(J_\beta )\) obtained in Lemma 13. Let us provide and estimates for the total number of operations required by the fixed point sequence (11). Let \({\widetilde{C}}\) be as in Theorem 4. We have \({\widetilde{C}}\tau ^k <\varepsilon \) if and only if \(k>(\ln (\varepsilon )-\ln (\widetilde{C}))/\ln (\tau )\). As \((\ln (\varepsilon )-\ln (\widetilde{C}))/\ln (\tau )\in \mathcal {O}(-\ln (\varepsilon ))\) for \(\varepsilon \rightarrow 0\), we deduce that \(\Vert A \Vert _{q\rightarrow p}-\varepsilon \le \Vert Ax_k \Vert _{p}\) after \(\mathcal {O}(\ln (\varepsilon ^{-1}))\) iterations of \({\mathcal {S}}_{A}\), leading to a total complexity of \(\mathcal {O}(C(\mathcal S_A)\ln (\varepsilon ^{-1}))\). \(\square \)
We conclude the section by proving a number of corollaries of Theorem 7 that illustrate the richness of the class of problem that can be addressed via that theorem. For simplicity, in the statements we assume that the involved matrices are square and positive. However, more general statements involving irreducible and rectangular matrices can be easily derived by reproducing the proof of the corresponding corollary.
Corollary 3
Let \(A\in \mathbb {R}^{n\times n}\) be a positive matrix. Let \(\omega ,\varpi \in \mathbb {R}^n\) be positive weights and \(1<p,q<\infty \). Let
It \(\tau <1\), then \(\Vert A \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (\mathrm {nnz}(A)\ln (1/\varepsilon )\big )\) operations.
Proof
As \(d=h=1\), \(C({\mathcal {S}}_A)=\mathrm {nnz}(A)\), \(\Vert y \Vert _{\alpha }=\Vert y \Vert _{\omega ,p}\) and \(\Vert x \Vert _{\beta ^*}=\Vert x \Vert _{\varpi ^*,q^*}\) in Theorem 7. \(\square \)
Corollary 4
Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices. Further, let \(1<p,q,r<\infty \),
If \(\tau <1\), then \(\left\| \bigg [\begin{matrix}A\\ B\end{matrix}\bigg ] \right\| _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).
Proof
Let \(d=2,h=1\), \(\omega _i=2^p,\varpi _i=3^q\) for \(i=1,\ldots ,n\), and \(\Vert x \Vert _{\beta ^*}=\Vert x \Vert _{r^*}\), \(\Vert (y,z) \Vert _{\alpha }=\Vert \Vert y \Vert _{\omega ,p},\Vert z \Vert _{\varpi ,q} \Vert _1\) in Theorem 7. Also note that \(\mathcal O(C({\mathcal {S}}_A))={\mathcal {O}}(N)\). \(\square \)
Corollary 5
Let \(A\in \mathbb {R}^{n\times n}\) positive, \(1<p<\infty \), \(2\le q,r<\infty \),
If \(\tau <1\), then \(\Vert A \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (\mathrm {nnz}(A)\ln (1/\varepsilon )\big )\) operations.
Proof
Let \(d=1,h=2\), \(\Vert y \Vert _{\alpha }=\Vert y \Vert _p\), \(\Vert x \Vert _{\beta ^*}=\Vert \Vert x \Vert _{q^*},\Vert x \Vert _{r^*} \Vert _2\) in Theorem 7. \(\square \)
Corollary 6
Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices, \(1<s\le \theta \le p,q,r<\infty \),
If \(\tau <1\), then \(\Vert [A\ B] \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).
Proof
Let \(d=2,h=2\), \(\Vert (y,z) \Vert _{\alpha }=\Vert \Vert y \Vert _p,\Vert z \Vert _q \Vert _\theta \) and \(\Vert x \Vert _{\beta ^*}=\Vert \Vert x \Vert _{r^*},\Vert x \Vert _{s^*} \Vert _{\theta ^*}\) in Theorem 7. \(\square \)
Corollary 7
Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices and \(1<p,q,r<\infty \), let
If \(\tau = \frac{p-1}{q-1}+\frac{p-1}{r-1}<1\), then \(\phi \) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).
Proof
Let \(M=\Big [\begin{matrix}A&{} A &{}0 \\ 0 &{} B &{} B \end{matrix}\Big ]\), \(d=2,h=2\), \(\Vert y \Vert _{\alpha }=\Vert y \Vert _{p}\) and \(\Vert (x,y,z) \Vert _{\beta ^*}=\Vert \Vert (x,z) \Vert _{r^*},\Vert (y,z) \Vert _{s^*} \Vert _{1}\) in Theorem 7. Note that \(\kappa _H(M)=1\) by Theorem 3. \(\square \)
Corollary 8
Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices and \(1<p,q,r<\infty \). Let \(\sigma _p:\mathbb {R}^n\rightarrow \mathbb {R}^n_+\) be defined as \(\sigma _p(x)=(|x_1|^p,\ldots ,|x_n|^p)^T\) and let
If \(\tau <1\) then \(\Vert B \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).
Proof
As \(x\mapsto A\sigma _p(Bx)\) is positively homogeneous of degree p, we have
Let \(\Vert \cdot \Vert _{\beta ^*}=\Vert \cdot \Vert _{r^*}\) and \(\Vert x \Vert _{\alpha }=\Vert A\sigma _p(x) \Vert _q^{1/p}\). Then, \(\Vert Bx \Vert _{\alpha }=\Vert A\sigma _{p}(Bx) \Vert _q^{1/p}\) and with \(\omega _i = (A_{i,1},\ldots ,A_{i,n})\), it holds \(\Vert x \Vert _{\alpha }=\Vert (\Vert x \Vert _{\omega _1,p}, \ldots ,\Vert x \Vert _{\omega _n,p}) \Vert _{pq}\) for every x. The proof is now a direct consequence of Theorem 3 with \(s=n\) and \(t=1\). \(\square \)
6 Numerical Experiments
In this section we illustrate the numerical behaviour of the power sequence (11) on some example matrices and some choices of the norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \). In particular, we consider both a dense matrix and sparse matrix example.
6.1 Dense Matrix Example: Tightness of the Convergence Bound
We verify the convergence bound of Theorem 4 for the family of matrix norms analyzed in Sect. 4.2, that is we verify the bound of Theorem 4 on the convergence of the power sequence (11) for the computation of \(\Vert A_{\varepsilon } \Vert _{q\rightarrow p}\) for various \(1<p,q<\infty \) and \(A_{\varepsilon }\) defined as in (13).
By Lemma 7, we have \(\tau _{p,q,\varepsilon }=\kappa _H({\mathcal {S}}_{A_\varepsilon })= \kappa _H(A_{\varepsilon })^2\frac{p-1}{q-1}\). Moreover, with \(x^+ = 2^{-1/q}(1,1) ^T\) it holds \({\mathcal {S}}_{A_\varepsilon }(x^+)=x^+\), hence the power sequence converges to \(x^+\) when \(\tau _{p,q,\varepsilon }<1\). By Theorem 4, if \(\tau _{p,q,\varepsilon }<1\), then
We use \(\delta = 10^{-12}\) and \(x_0 = (\delta ,1-\delta )^T\) in our experiments. The choice of \(x_0\) is motivated by the fact that it is far from the limit point \(x^+\) in the Hilbert metric, so to model a worst-case-scenario setting. In Fig. 2, we plot the true error \(\Vert x_k-x^+ \Vert _{\infty }\) against the number of iterations k and compare it with the upper bound \((\tau _{p,q,\varepsilon })^kC\), for the choice \(\varepsilon = 1/3\), \(q=2\) and five increasing values of p chosen so that \(\tau _{p,q,\varepsilon }=\kappa _H({\mathcal {S}}_{A_\varepsilon })<1\). We can observe that the method converges linearly as expected and that the upper bound well captures the decay slope. Moreover, even though larger values of p yield larger values of the contraction constant \(\tau _{p,q,\varepsilon }\), the upper bound still behaves as the true error when p grows up to a multiplicative constant.
6.2 Sparse Matrix Example
For this experiment, we consider two families of matrices with growing size which are not irreducible but satisfy the requirement of Theorem 4, that is \(A^TA\) is irreducible. More precisely, let \(A_1\in \mathbb {R}^{3\times 3}\) and \(B_1 \in \mathbb {R}^{4\times 2}\) be given by
Clearly, neither \(A_1\) nor \(B_1\) is irreducible, however \(A_1^TA_1\) and \(B_1^TB_1\) have strictly positive entries and are therefore irreducible. Then, for \(s\ge 2\), we consider the matrices \(A_s\in \mathbb {R}^{3^s\times 3^s}\), \(B_s\in \mathbb {R}^{4^s \times 2^s}\) obtained by taking s times the Kronecker product of \(A_1\), resp. \(B_1\), with itself, i.e. \(A_s = A_1\otimes A_{s-1}\) and \(B_s = B_1 \otimes B_{s-1}\). Note that for all \(s\ge 1\), \(A_s\) has at least one full row of only zero entries and therefore cannot be irreducible. On the other hand, it holds \(A_s^TA_s = (A_1^TA_1)\otimes \ldots \otimes (A_1^TA_1)\) and thus \(A_s^TA_s\) has positive entries since it is the Kronecker product of s positive matrices. The same observation holds for the sequence \(B_s\). Furthermore, Theorem 3 implies that \(\kappa _H(A_s)=\kappa _H(B_s) = 1\) for all \(s\ge 1\). Hence, we have \(\tau ({\mathcal {S}}_{A_s})=\tau ({\mathcal {S}}_{B_s})=\frac{p-1}{q-1}\) for all s.
In our experiments we analyze the number of iterations until convergence of the power sequences associated to the computation of \(\Vert A_s \Vert _{q\rightarrow p}\) and \(\Vert B_s \Vert _{q\rightarrow p}\), where p, q are fixed so that \(\tau = \tau ({\mathcal {S}}_{A_s})=\tau ({\mathcal {S}}_{B_s})=3/4\). For each fixed p, q and s, we try 5000 different starting points drawn uniformly from \((0,1)^n\) with \(n=3^s\) or \(n=2^s\). The boxplots in Fig. 3 show the number of iterations required until the stopping criterion
is met, for both \(A_s\) (the two panels in the top row) and \(B_s\) (the two panels in the row at the bottom), and for \(\delta = 10^{-10}\). Note that, due to Theorem 4, if (27) holds for k then we are guaranteed to be \(\delta \)-close to the true solution \(x^+\), i.e. the computed approximation \(x_k\) is such that \(\Vert x_k-x^+\Vert _\infty <\delta \). Moreover, since \(\Vert x\Vert _p\le n^{1/p}\Vert x\Vert _\infty \) for all \(x\ne 0\), we have
for both \(M=A_s\) and \(M=B_s\). While Fig. 3 shows the steps required to guarantee approximation to the true solution, we emphasize that in practice the required number of steps to reach floating point precision on two consecutive iterates is typically much smaller.
7 Conclusions
On top of being a classical problem in numerical analysis, computing the norm of a matrix \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is a problem that appears in many recent applications in data mining and optimization. However, except for a few choices of \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \), computing such a matrix norm to an arbitrary precision is generally unfeasible for large matrices as this is known to be an NP-hard problem. The situation is different when the matrix has nonnegative entries, in which case \(\Vert A\Vert _{q\rightarrow p}\) is known to be computable for \(\ell ^p\) norms such that \(q\le p\). In this paper we have both (a) refined this result, by showing that the condition \(p<q\) is not necessarily required and (b) extended this result to much more general vector norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) than \(\ell ^p\) norms. In particular, we have shown how to compute matrix norms induced by monotonic norms of the form \(\Vert x \Vert _{\alpha }=\Vert \big (\Vert x \Vert _{\alpha _1},\ldots ,\Vert x \Vert _{\alpha _d}\big ) \Vert _{\gamma }\), where we also allow \(\Vert x \Vert _{\alpha _i}\) to measure only a subset of the coordinates of x. Using these kinds of norms we can globally solve in polynomial time quite sophisticated nonconvex optimization problems, as we discuss in the examples corollaries at the end of Sect. 5.
References
Allen-Zhu, Z., Gelashvili, R., Razenshteyn, I.: Restricted isometry property for general \(p\)-norms. IEEE Trans. Inf. Theory 62, 5839–5854 (2016)
Barak, B., Brandao, F.G., Harrow, A.W., Kelner, J., Steurer, D., Zhou, Y.: Hypercontractivity, sum-of-squares proofs, and their applications. In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, STOC’12, pp. 307–326. ACM (2012)
Bhaskara, A., Vijayaraghavan, A.: Approximating matrix \(p\)-norms. In: Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 497–511. SIAM (2011)
Boyd, D.W.: The power method for \(\ell ^p\) norms. Linear Algebra Appl. 9, 95–101 (1974)
Brezis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York (2010)
Candes, E.J.: The restricted isometry property and its implications for compressed sensing. C.R. Math. 346, 589–592 (2008)
Cioranescu, I.: Geometry of Banach Spaces, Duality Mappings and Nonlinear Problems, vol. 62. Springer, New York (2012)
Clarke, F.H.: Optimization and Nonsmooth Analysis, classics, I SIAM (1990)
Combettes, P.L., Salzo, S., Villa, S.: Regularized learning schemes in feature Banach spaces. Anal. Appl. 16(01), 1–54 (2018)
Drakakis, K., Pearlmutter, B.A.: On the calculation of the \(\ell ^2\rightarrow \ell ^1\) induced matrix norm. Int. J. Algebra 3, 231–240 (2009)
Duchenne, O., Bach, F., Kweon, I.S., Ponce, J.: A tensor-based algorithm for high-order graph matching. IEEE PAMI 33(12), 2383–2395 (2011)
Eveson, S.P., Nussbaum, R.D.: An elementary proof of the Birkhoff–Hopf theorem. Math. Proc. Camb. Philos. Soc. 117, 31–54 (1995)
Fletcher, R.: Practical Methods of Optimization. Wiley, New York (2013)
Friedland, S., Lim, L.H.: The computational complexity of duality. SIAM J. Optim. 26, 2378–2393 (2016)
Friedland, S., Gaubert, S., Han, L.: Perron–Frobenius theorem for nonnegative multilinear forms and extensions. Linear Algebra Appl. 438, 738–749 (2013)
Friedland, S., Lim, L.H., Zhang, J.: Grothendieck constant is norm of Strassen matrix multiplication tensor. Numer. Math. 143(4), 905–922 (2019)
Gaubert, S., Zheng, Q.: Dobrushin ergodicity coefficient for Markov operators on cones, and beyond (2013). https://hal.inria.fr/hal-00935272. arXiv:1302:5226
Gautier, A., Hein, M.: Tensor norm and maximal singular vectors of nonnegative tensors—a Perron–Frobenius theorem, a Collatz–Wielandt characterization and a generalized power method. Linear Algebra Appl. 505, 313–343 (2016)
Gautier, A., Tudisco, F.: The contractivity of cone-preserving multilinear mappings. Nonlinearity 32, 4713 (2019)
Gautier, A., Nguyen, Q.N., Hein, M.: Globally optimal training of generalized polynomial neural networks with nonlinear spectral methods. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Gautier, A., Tudisco, F., Hein, M.: The Perron–Frobenius theorem for multihomogeneous mappings. SIAM J. Matrix Anal. Appl. 40(3), 1179–1205 (2019)
Gautier, A., Tudisco, F., Hein, M.: A unifying Perron–Frobenius theorem for nonnegative tensors via multihomogeneous maps. SIAM J. Matrix Anal. Appl. 40(3), 1206–1231 (2019)
Hendrickx, J.M., Olshevsky, A.: Matrix \(p\)-norms are NP-hard to approximate if \(p\ne 1,2,\infty \). SIAM J. Matrix Anal. Appl. 31, 2802–2812 (2010)
Higham, N.J.: Experience with a matrix norm estimator. SIAM J. Sci. Stat. Comput. 11, 804–809 (1990)
Higham, N.J.: Estimating the matrix \(p\)-norm. Numer. Math. 62, 539–555 (1992)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms. SIAM (2002)
Higham, N.J., Relton, S.D.: Estimating the largest elements of a matrix. SIAM J. Sci. Comput. 38, C584–C601 (2016)
Johnson, C.R., Nylen, P.: Monotonicity properties of norms. Linear Algebra Appl. 148, 43–58 (1991)
Khamsi, M.A., Kirk, W.A.: An Introduction to Metric Spaces and Fixed Point Theory. Wiley-lnterscience, New York (2001)
Khot, S., Naor, A.: Grothendieck-type inequalities in combinatorial optimization. Commun. Pure Appl. Math. 65, 992–1035 (2012)
Lemmens, B., Nussbaum, R.D.: Nonlinear Perron–Frobenius Theory, vol. 189. Cambridge University Press, Cambridge (2012)
Lewis, A.D.: A top nine list: Most popular induced matrix norms (2010)
Lim, L.: Singular values and eigenvalues of tensors: a variational approach. In: 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing CAMSAP’05, pp. 129–132 (2005). https://doi.org/10.1109/CAMAP.2005.1574201
Nguyen, Q., Tudisco, F., Gautier, A., Hein, M.: An efficient multilinear optimization framework for hypergraph matching. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1054–1075 (2017)
Nussbaum, R.D.: Finsler structures for the part metric and Hilbert’s projective metric and applications to ordinary differential equations. Differ. Integral Equ. 7, 1649–1707 (1994)
Rohn, J.: Computing the norm \(\Vert {A}\Vert _{\infty, 1}\) is NP-hard. Linear Multilinear Algebra 47, 195–204 (2000)
Seneta, E.: Inhomogeneous Products of Non-negative Matrices, pp. 80–111. Springer, New York (1981)
Steinberg, D.: Computation of matrix norms with applications to robust optimization (2005)
Tao, P.D.: Convergence of a subgradient method for computing the bound norm of matrices (in French). Linear Algebra Appl. 62, 163–182 (1984)
Tudisco, F., Cardinali, V., Di Fiore, C.: On complex power nonnegative matrices. Linear Algebra Appl. 471, 449–468 (2015)
Zhang, H., Zha, Z.J., Yan, S., Wang, M., Chua, T.S.: Robust non-negative graph embedding: Towards noisy data, unreliable graphs, and noisy labels. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2464–2471 (2012)
Funding
Open access funding provided by Gran Sasso Science Institute - GSSI within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work of A.G. and M.H. has been founded by the ERC Grant ‘NOLEPRO’ Number 307793. The work of F.T. was funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska–Curie individual fellowship “MAGNET” Grant agreement No. 744014.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gautier, A., Hein, M. & Tudisco, F. The Global Convergence of the Nonlinear Power Method for Mixed-Subordinate Matrix Norms. J Sci Comput 88, 21 (2021). https://doi.org/10.1007/s10915-021-01524-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-021-01524-w