1 Introduction

Optimization with spherical \(\left\| x\right\| _2 = 1\) or norm constraints \(r_{\min } \le \left\| x\right\| _2 \le r_{\max }\) is used in a number of scientific fields. Many optimization problems, such as eigenvalue problems [29], dimensionality reduction [21] and compressed sensing [5], are naturally posed on or in the unit sphere, while norm constraints are often useful for regularization, e.g. in general nonlinear [7] or robust optimization [20]. In this paper we consider the solution of norm constrained problems in the form

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} f(x) \,{:=}\, \frac{1}{2}x^T P x + q^T x \\ \text {subject to} &{} r_{\min } \le \left\| x\right\| _2 \le r_{\max } \\ &{} Ax \le b \end{array} \end{aligned}$$
(P)

where \(x \in {\mathbb {R}}^n\) is the decision variable, \(P \in {\mathbb {S}}^{n}\), i.e. an \(n \times n\) symmetric matrix, \(A = [a_1^T \dots a_m^T] \in {\mathbb {R}}^{m \times n}\), \(q \in {\mathbb {R}}^{n}\), \(b = [b_1 \dots b_m]^T \in {\mathbb {R}}^m\) and \(r_{\min }, r_{\max }\) non-negative scalars.

It is generally difficult to solve problems of the form (P) due to the nonconvexity of the lower bounding norm constraint and the potential indefiniteness of the matrix P, which renders many, but certainly not all [20], of these problems intractable. Even finding a feasible point for (P) or testing if a first-order critical point of (P) is a local minimizer can be intractable (see Proposition 4 and [16]). As a result, we restrict our attention to the search for first order critical points and we assume that a feasible point is given or can be computed.

A specific tractable variation of (P) is the famous trust-region subproblemFootnote 1:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}x^T P x + q^T x \\ \text {subject to} &{} \left\| x\right\| _2 = r \\ &{} Ax = b. \end{array} \end{aligned}$$
(TRS)

Perhaps surprisingly, the TRS can be solved to global optimality despite its nonconvexity, and global solution of the TRS has been widely studied in the literature. Specific approaches to its solution include exact Semidefinite Reformulations, Gradient Steps, Truncated Conjugate Gradient Steps, Truncated Lanczos Steps and Newton root-finding, with associated software packages for its solution. The reader can find details in [7] and [27, Chapter 4].

We suggest an active-set algorithm for the solution of (P) that, as we proceed to describe, takes advantage of the existing efficient global solution methods for the TRS. Using such an approach, (P) is optimized by solving a series of equality constrained subproblems. Each subproblem is a modification of (P) where a subset of its inequalities (called the working-set) is replaced by equalities while the rest are ignored. If a subproblem does not include a norm constraint then it is called an Equality-constrained Quadratic Problem (EQP); otherwise it is a Trust Region Subproblem (TRS). As described earlier, the global solution for the TRS, and also for the EQP [14], has been widely studied in the literature. Thus we can efficiently address the global solution of each subproblem. However, a complication exists for the TRS because, unlike EQP, it can exhibit local-nonglobal minimizers i.e. local minimizers that are not globally optimal, which complicate the analysis. Indeed, if the optimal working-set includes a norm constraint and the global solutions of the respective TRS subproblem are infeasible for (P), then the solution of (P) must be obtained from a local-nonglobal optimizer of the TRS subproblem.

Thus, an algorithm for computing local-nonglobal minimizers of the TRS is required for our active-set algorithm. Unfortunately, algorithms for detecting the presence of/computing local-nonglobal solutions of (TRS) are significantly less mature than their global counterparts. A notable exception is the work of Martinez [25], which proved that there exists at most one local-nonglobal minimizer and gave conditions for its existence. Moreover, [25] presented a root-finding algorithm for the computation of the Lagrange Multiplier associated with the local-nonglobal minimizer that entailed the computation of the two smallest eigenvalues of the Hessian and the solution of a series of indefinite linear systems.

More recent work is based on a result from [1], which shows that each KKT point of the TRS, in the absence of linear equality constraints \(Ax = b\) (i.e. with a norm constraint on x only), can be extracted from an eigenpair of

$$\begin{aligned} M {:=}\begin{bmatrix} -P &{}\quad q q^T/r^2 \\ I &{}\quad -P \end{bmatrix}. \end{aligned}$$

This result was used in [30] to calculate the local-nonglobal minimizer under the assumption that it exists. We extend these results by proving that both checking if the local-nonglobal minimizer exists and computing it, in the case where it exists, can be performed at the same time by simply calculating the two rightmost eigenpairs of M. Crucially, and similarly to the results of [1] for the global minimizer, this allows the use of efficient factorization-free methods such as the Arnoldi method for the detection and computation of the local-nonglobal minimizer. Furthermore, we show that this approach can efficiently handle equality constraints \(Ax = b\) in the Trust Region Subproblem, without the need for variable reduction, by means of a projected Arnoldi method.

The paper is organised as follows. In Sect. 2 we present an introduction to the TRS and present novel results for the detection/computation of its local-nonglobal minimizer and for the incorporation of linear equality constraints. In Sect. 3.1 we introduce an active-set algorithm for solving (P) starting from the special case where \(r_{\min } = r_{\max }\), and then generalizing to any \(r_{\min }, r_{\max }\). Finally, in Sect. 4, we conclude with a series of numerical results.

Notation used: \({\mathbb {S}}^n\) denotes the set of symmetric \(n\times n\) real matrices. Given a vector (or matrix) x, \(x^T\) denotes its transpose and \(x^H\) its conjugate transpose.

2 The trust-region subproblem

This section concerns the computation of global as well as local-nonglobal minimizers for (TRS). Although this section only considers solutions on the boundary \(\left\| x\right\| = r\), the results can be trivially extended to the interior \(\left\| x\right\| \le r\). Indeed, if an interior solution exists then the TRS is essentially convex (P is positive semidefinite in the nullspace of A [15, §1]), thus no local-nonglobal solution exists and the interior solution is obtained by solving an equality constrained quadratic problem. The presence of a (necessarily global) interior solution can be detected by checking the sign of the Lagrange Multiplier of the global boundary solution(s) [25, Lemma 2.2].

For clarity of exposition, we will first assume that there are no linear equality constraints in (TRS), resulting in the following problem:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}x^T P x + q^T x \\ \text {subject to} &{} \left\| x\right\| _2 = r, \\ \end{array} \end{aligned}$$
(T)

where \(x \in {\mathbb {R}}^n\) is the decision variable, and \(P \in {\mathbb {S}}^n\), \(q \in {\mathbb {R}}^n, r \in {\mathbb {R}}_{+}\) are the problem data. For the rest of the section we will assume that \(n > 1\), excluding the trivial one dimensional case where the feasible set consists of two points, and that \(r > 0\). We will then show in Sect. 2.1 how to extend the results of this section to allow for the inclusion of linear equality constraints. Finally, in Sect. 2.3 we prove the main result of this section, which is presented in a separate subsection to facilitate the presentation.

In the sequel we will make frequent use of the eigendecomposition of \(P\,{:=}\, W \varLambda W^T\) defined by an orthonormal matrix \(W {:=}[w_1 \ldots w_n]\) and \(\varLambda {:=}\, \mathrm {diag}(\lambda _1, \ldots , \lambda _n)\) where \(\lambda _1 \le \dots \le \lambda _n\).

Every KKT point of (T) satisfies

$$\begin{aligned} P x + q + \mu x = 0 \Rightarrow x = -(P + \mu I)^{-1} q, \end{aligned}$$
(1)

where \(\mu \) is a Lagrange multiplier associated with the norm constraintFootnote 2. Equation (1) is well defined when \(\mu \ne -\lambda _i, \; i = 1, \dots , n\). For the purposes of clarity of the subsequent introduction, which follows [27, §4], we will assume that this holds. However, the conclusions of this section are independent of this assumption.

Using the feasibility of x, we have

$$\begin{aligned} \left\| x\right\| _2^2 = r^2 \Leftrightarrow \left\| (P + \mu I)^{-1}q\right\| _2^2 - r^2 = 0. \end{aligned}$$
(2)

Then, noting that

$$\begin{aligned} \left\| (P + \mu I)^{-1}q\right\| _2^2&= \left\| (W(\varLambda + \mu I)W^T)^{-1} q\right\| _2^2 = \left\| W(\varLambda + \mu I)^{-1} W^T q\right\| _2^2 \\&= \left\| (\varLambda + \mu I)^{-1} W^T q\right\| _2^2 = \sum _{i=1}^n{\frac{(w_i^T q)^2}{(\lambda _i + \mu )^2}}, \end{aligned}$$

we can express the rightmost condition in (2) as

$$\begin{aligned} s(\mu ) {:=}\sum _{i=1}^{n}\frac{(w_i^T q)^2}{(\lambda _i + \mu )^2} - r^2 = 0. \end{aligned}$$
(3)

Determining the KKT points of (T) is therefore equivalent to finding the roots of s, which is often referred to as the secular equation [27]. We depict s for a particular choice of Pqr in Fig. 1. As might be expected from the tight connection between polynomial root-finding problems and eigenproblems, solving \(s(\mu ) = 0\) is equivalent to the following eigenproblem [1, Equation (22)]:

$$\begin{aligned} \underbrace{ \begin{bmatrix} -P &{}\quad q q^T/r^2 \\ I &{}\quad -P \end{bmatrix}}_{{:=}M} \begin{bmatrix} z_1 \\ z_2 \end{bmatrix} = \mu \begin{bmatrix} z_1 \\ z_2 \end{bmatrix}. \end{aligned}$$
(4)

This elegant result was first noted more than 50 years ago [9, Equation (2.21)], only to be later disregarded as inefficient by some of the same authors [10], and then rediscovered by [1] who highlighted its great applicability owing to the remarkable efficacy of modern eigensolvers [24].

Fig. 1
figure 1

A typical example of the secular equation \(s(\mu )\). The dashed vertical lines mark the locations of the eigenvalues of \(-P\), i.e. \(-\lambda _1, \dots , -\lambda _n\). The Lagrange multiplier of the global solution of (T) is the rightmost root of \(s(\mu )\) (marked by a cross). The TRS associated with this Figure also exhibits a local-nonglobal minimizer with a Lagrange multiplier that corresponds to the second-rightmost root of \(s(\mu )\) (marked by a circle)

The relation between the spectrum of M and the Lagrange multipliers of (T) is formally stated below:

Lemma 1

The spectrum of M includes the Lagrange multiplier of every KKT point of (T).

Proof

This is a consequence of Proposition 2 of Sect. 2.3 and [9, Theorem 4.1]. \(\square \)

Note that the eigenvalues of M are complex in general since M is asymmetric. Furthermore, Lemma 1 and all the subsequent results allow for potential “degenerate” KKT points with \(\mu = -\lambda _i\).

We will focus on the KKT points that are minimizers of (T) rather than maximizers or saddle points. The characterization of global minimizers is widely known in the literature. The following theorem shows that the Lagrange multiplier \(\mu ^g\) of the global minimizer corresponds to the rightmost eigenvalue of M in \({\mathbb {C}}\), which is always real:

Theorem 1

A KKT point of (T) with Lagrange multiplier \(\mu ^g\) is a global minimizer if and only if \(\mu ^g\) is the rightmost eigenvalue of M. Furthermore, we necessarily have that \(\mu ^g \in [-\lambda _1, \infty )\).

Proof

See [1, Theorem 3.2]. \(\square \)

In general, (T) is guaranteed to possess a unique global optimizer, except perhaps in a special case that has deservedly been given a special name:

Definition

(Hard case) Problem (T) belongs to the hard case when \(\mu ^g = -\lambda _1\).

We will see that in the hard case (T) is still guaranteed to have a global minimizer, but it is not necessarily unique. Furthermore, the computation of the minimizer(s) can be more challenging than in the “standard” case.

Moreover, due to the nonconvexity of (T) there can exist a local minimizer that is not global. We derive an analogous result for this minimizer that is called “local-nonglobal” [25]:

Theorem 2

Problem (T) has a second-order sufficient local-nonglobal minimizer if and only if it does not belong to the hard case and the second-rightmost eigenvalue of M is real and simple. Furthermore, if such a minimizer exists, then its Lagrange multiplier \(\mu ^{\ell }\) is equal to this second-rightmost eigenvalue.

Proof

See Sect. 2.3. \(\square \)

The Lagrange multipliers for the global and local-nonglobal minimizers of (T) can therefore be identified by calculating the two rightmost eigenvaluesFootnote 3 of M, which can be done efficiently e.g. with the Arnoldi method.

Furthermore, for each of the respective multipliers a corresponding minimizer can be extracted immediately by the respective eigenvectors \([z_1^*;\; z_2^*]\) of M using as

$$\begin{aligned} x^* = -\text {sign}(q^T z_2^*) r \frac{z_1^*}{\left\| z_1^*\right\| _2}, \end{aligned}$$
(5)

unless \(z_1^* = 0\). Indeed, this follows from the next Proposition:

Proposition 1

Every eigenpair \((\mu ^*, [z_1^*; \; z_2^*])\) of M gives a KKT point \((\mu ^*, x^*)\) for (T) where \(x^*\) is defined by (5), unless \(z_1^* = 0\).

Proof

This proof is based on [1, §3.3] that shows the result only for the rightmost eigenpair of M.

We will first show the result for \(y^* = \frac{-r^2}{q^T z_2^*}z_1^*\) and then show that \(y^* = x^*\). Note that using \(x^*\) is preferred over \(y^*\) because it can avoid unnecessary numerical errors according to [1, §3.3]. We also emphasize that all of the proof assumes that \(z_1^*\) is nonzero.

For \(y^*\), we have to show that it is well defined, feasible and stationary. Regarding the first two points, note that (4) (first row) gives

$$\begin{aligned} -P z_1^* + q q^Tz_2^*/r^2 = \mu ^* z_1^* \Rightarrow (z_2^*)^T (P + \mu ^* I)z_1^* = (z_2^*)^T q q^T z_2^*/r^2, \end{aligned}$$

or since \((P + \mu ^* I)z_2^* = z_1^*\) due to (4) (second row), we have \(\left\| z_1^*\right\| ^2 = {(q^Tz_2^*/r)}^2\), which implies that \(q^T z_2^* \ne 0\) because \(z_1^* \ne 0\) by assumption. Thus:

$$\begin{aligned} \left\| z_1^*\right\| ^2_2 = (q^Tz_2^*/r)^2 \Rightarrow \left\| -\frac{r^2}{q^T z_2^*}z_1^*\right\| _2^2 = r^2 \Rightarrow \left\| y^*\right\| _2 = r. \end{aligned}$$

To show stationarity, note that the first row of (4) also gives:

$$\begin{aligned}&-P z_1^* + qq^T z_2^*/r^2 = \mu ^* z_1^* \Rightarrow P \frac{-z_1^*r^2}{q^T z_2^*} + q + \mu ^* \frac{-z_1^*r^2}{q^T z_2^*} = 0 \\&\quad \Rightarrow P y^* + q + \mu ^* y^* = 0. \end{aligned}$$

Finally, we show that \(y^* = x^*\):

$$\begin{aligned} y^* = y^* \frac{r}{\left\| y^*\right\| _2} = \frac{-r^2z_1^*}{q^T z_2^*} {\frac{|q^T z_2^*|}{\left\| -r^2 z_1^*\right\| }}_2 r = -\text {sign}(q^T z_2^*) r \frac{z_1^*}{\left\| z_1^*\right\| _2} = x^*. \end{aligned}$$

\(\square \)

Dealing with the hard case: Extracting a solution \(x^*\) via (5) is not possible when \(z_1^* = 0\), since (5) is not well-defined in that case. In this case, \((\mu ^*, z_2^*)\) is an eigenpair of \(-P\) due to (4). Note that, according to [25, Lemma 3.3], this can never happen for the local-nonglobal minimizer; thus we can always extract the local-nonglobal minimizer with (5). However, it is possible for global minimizers in the hard case, i.e. the case where \(\mu ^* = -\lambda _1\). In the hard case, the necessary and sufficient conditions for global optimality are, due to the KKT conditions of (T) and Theorem (1), the following:

$$\begin{aligned}&(P - \lambda _1 I) x + q = 0 \end{aligned}$$
(6)
$$\begin{aligned}&\left\| x\right\| _2 = r. \end{aligned}$$
(7)

Note that \(P - \lambda _1 I\) is singular, and the above system of equations represents the intersection of an affine subspace with a sphere. This intersection must be non-empty, since (T) necessarily has a global minimizer as it arises from the minimization of a smooth function over a compact subset of \({\mathbb {R}}^n\). One can then solve (6)–(7), by computing the minimum length solution \(y_{\text {min}}\) of \((P - \lambda _1 I) x + q = 0\) and return \(x^* = y_{\text {min}} + \alpha \upsilon \) where \(\alpha \) is a scalar such that \(\left\| x^*\right\| _2 = \left\| y_{\text {min}} + \alpha \upsilon \right\| _2 = r\) and \(\upsilon \) is any null-vector of \(P - \lambda _1I\). Interestingly, when \(z_1^*=0\), \(z_2^*\) is a null vector of \(P - \lambda _1I\) due to (4), thus the only additional computation for extracting a solution when \(z_1^* = 0\) is finding the minimum-length solution of a symmetric linear system. This can be achieved e.g. with MINRES-QLP [6], or with the Conjugate Gradient method [1, Theorem 4.3].

2.1 Equality-constrained trust-region subproblems

We will now extend the preceding results of this section to also account for the presence of linear equality constraints, \(Ax = b\), in the TRS. We will show that, given an operator that projects an \(n-\)dimensional vector to the nullspace of A, we can calculate global as well as local-nonlocal minima of (TRS) by applying a projected Arnoldi method to M.

For simplicity, and without loss of generalityFootnote 4, we will assume that \(b = 0\), thus solving the following problem:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}x^T P x + q^T x \\ \text {subject to} &{} \left\| x\right\| _2 = r \\ &{} Ax = 0 \end{array} \end{aligned}$$
(8)

where \(x \in {\mathbb {R}}^n\) is the decision variable, \(P \in {\mathbb {S}}^{n}\), i.e. an \(n \times n\) symmetric matrix, A is a \(m \times n\) matrix of full row rank and q is an \(n-\)dimensional vector. Similarly to the preceding results for (T), we assume that \(n - m> 1\) and \(r > 0\).

In principle, (8) can be solved with the “eigenproblem approach” described in this section if we reduce (8) into a smaller TRS subproblem via a nullspace elimination procedure [27, §15.3], obtaining

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}y^T {\tilde{P}} y + {\tilde{q}}^T y\\ \text {subject to} &{} \left\| y\right\| _2 = r \end{array} \end{aligned}$$
(9)

where we define Z to be a \(n \times (n-m)\) orthonormal matrix with \(\mathcal {R}(Z) = \mathcal {N}(A)\) and

$$\begin{aligned} {\tilde{P}} {:=}Z^T P Z, \quad {\tilde{q}} {:=}Z^Tq. \end{aligned}$$

According to the preceding results of this section, global and local-nonglobal minimizer(s) \(y^*\) of (9) can be computed by calculating the two rightmost eigenvectors of

$$\begin{aligned} {\tilde{M}} {:=}\begin{bmatrix} -{\tilde{P}} &{}\quad {\tilde{q}} {\tilde{q}}^T/r^2 \\ I &{}\quad -{\tilde{P}} \end{bmatrix}. \end{aligned}$$

The respective solution(s) \(x^*\) of (P) can then be recovered as

$$\begin{aligned} x^* = Zy^*. \end{aligned}$$
(10)

Some disadvantages of this approach are that a nullspace basis is required and that the structure or sparsity of the problem might be destroyed.

We will present an alternative method that avoids these issues, mirroring standard approaches for solving equality constrained quadratic programs [14]. To this end, note that

$$\begin{aligned} {\tilde{M}} = \begin{bmatrix} -{\tilde{P}} &{}\quad {\tilde{q}} {\tilde{q}}^T/ r^2 \\ I &{}\quad -{\tilde{P}} \end{bmatrix} = \begin{bmatrix} Z^T &{}\quad 0\\ 0 &{}\quad Z^T\\ \end{bmatrix} \begin{bmatrix} -P &{}\quad q q^T / r^2 \\ I &{}\quad -P \end{bmatrix} \begin{bmatrix} Z &{}\quad 0\\ 0 &{}\quad Z\\ \end{bmatrix}. \end{aligned}$$

Since Z is an orthonormal matrix, every eigenvalue-eigenvector pair \(\{\mu , [{\tilde{z}}_1;\; {\tilde{z}}_2] \}\) of \({\tilde{M}}\) corresponds to an eigenvalue-eigenvector pair \(\{\mu , [z_1;\; z_2] \}\) of the matrix

$$\begin{aligned} \begin{bmatrix} ZZ^T &{}\quad 0\\ 0 &{}\quad ZZ^T\\ \end{bmatrix} \underbrace{ \begin{bmatrix} -P &{}\quad q q^T/ r^2 \\ I &{}\quad -P \end{bmatrix}}_{=M} \begin{bmatrix} ZZ^T &{}\quad 0\\ 0 &{}\quad ZZ^T\\ \end{bmatrix} \end{aligned}$$
(11)

with \(Z [{\tilde{z}}_1;\; {\tilde{z}}_2] = [z_1;\; z_2]\). Note that (11) also has an extra eigenvalue at zero with multiplicity 2m.

Recalling that Z is an orthonormal matrix spanning \(\mathcal {N}(A)\), we note that multiplication with \(ZZ^T\) is equivalent to projection onto the nullspace of A. In practice, eigenvalues of (11) can therefore be calculated using the Arnoldi method on M where we project the starting vector and every requested matrix-vector product with M onto the nullspace of \(\begin{bmatrix}A &{}\quad 0 \\ 0&{}\quad A \end{bmatrix}\).

Note that the two rightmost eigenvalues of \({\tilde{M}}\) correspond to the two rightmost eigenvalues of (11) if at least two eigenvalues of \({\tilde{M}}\) have positive real parts. We will show that we can always transform (8) so that this holds. Indeed if we shift P to \(P - \alpha I\) with an appropriate \(\alpha \in {\mathbb {R}}\) so that \({\tilde{P}}\) become negative definiteFootnote 5, then \({\tilde{M}}\) has at least two eigenvalues with positive real part:

Lemma 2

If \({\tilde{P}} \in {\mathbb {S}}^{n - m}\) is negative definite then \({\tilde{M}}\) has at least two eigenvalues with positive real part, unless we trivially have \(n - m = 1\).

Proof

To avoid redefining the notation used in the introduction of §2 we will show the equivalent result for (PM), i.e. that M has at least two eigenvalues with positive real part when P is negative definite and \(n > 1\).

To do so, it suffices to show that s has at most one root with non-positive real part. This is because M has \(2n \ge 4\) eigenvalues and the spectrum of M matches the roots of s on the left-hand complex plane (shown in Proposition 2 that we will prove on Sect. 2.3, given that P is negative definite).

To show this, note that s has at most one (simple) real root with nonpositive real part since

$$\begin{aligned} s'(\alpha ) = -2\sum _{j=1}^{n}\frac{(w_j^Tq)^2}{(\lambda _j + \alpha )^3} \end{aligned}$$
(12)

is positive for any \(\alpha \le 0\) since \(\lambda _1, \dots , \lambda _n < 0\) by assumption. To conclude the proof, it suffices to show that all the roots of s with nonpositive real part are real. Indeed, note that for any \(a, b \in {\mathbb {R}}\) we have

$$\begin{aligned} \text {Im}(s(a + bi)) = -\sum _{j=1}^{n}\frac{2(\lambda _j + a)b}{((\lambda _j + a)^2 + b^2)^2}(w_j^Tq)^2, \end{aligned}$$

where i is the imaginary unit. A detailed derivation of the above equation is provided in Lemma 4. For any \(a \le 0\) we have \(-a \ge 0 > \lambda _1, \dots , \lambda _n \Rightarrow 2(\lambda _j + a) < 0\), \(j = 1, \dots , n\) by assumption. Thus \(\text {Im}(s(a + bi)) \ne 0 \Rightarrow s(a + bi) \ne 0\) for any \(a \le 0\) unless \(b = 0\). \(\square \)

Such a shift would not affect the (local/global) optimizers of (P) as

$$\begin{aligned} \frac{1}{2}x^T (P - \alpha I) x + q^T x = \frac{1}{2}x^T P x + q^T x - \frac{\alpha }{2} ||x||_2^2 \end{aligned}$$

or, since \(\left\| x\right\| _2 = r\),

$$\begin{aligned} \frac{1}{2}x^T (P - \alpha I) x + q^T x&= \frac{1}{2}x^T P x + q^T x - \frac{\alpha }{2} r^2, \end{aligned}$$

where \(\alpha \in {\mathbb {R}}\). Thus, we can always transform (8) so that the two rightmost eigenvalues of \({\tilde{M}}\) are equal to the two rightmost eigenvalues of (11).

Having computed the two rightmost eigenvalues/vectors with a projected Arnoldi method applied to M, global/local solutions of (P) can be obtained from the respective eigenvectors. Indeed, recall that according to (5) and (10), an appropriate (rightmost/second-rightmost) eigenvector \([{\tilde{z}}_1^*;\; {\tilde{z}}_2^*]\) of (11) gives a solution \(x^*\) to (P) as follows

$$\begin{aligned} x^*&= Zy^* = -Z\text {sign}({\tilde{q}}^T {\tilde{z}}_2^*)r\frac{{\tilde{z}}_1^*}{\left\| {\tilde{z}}_1^*\right\| _2} = -\text {sign}(q^T (Z {\tilde{z}}_2^*))r\frac{Z {\tilde{z}}_1^*}{\left\| Z{\tilde{z}}_1^*\right\| _2} \end{aligned}$$
(13)
$$\begin{aligned}&= -\text {sign}(q^T z_2^*)r\frac{z_1^*}{\left\| z_1^*\right\| _2} \end{aligned}$$
(14)

where \([z_1^*;\; z_2^*]\) is an eigenvector of M. Thus, \(x^*\) can be extracted solely by the eigenvector of M unless \(z_1^* = 0\).

Dealing with the hard case: In the case where \(z_1^* = 0\), which can appear in the hard case, one would have to calculate the minimum length solution of

$$\begin{aligned} ({\tilde{P}} + \mu ^* I) y + {\tilde{q}} = 0 \end{aligned}$$
(15)

and then construct a global solution \(x^*\) as \(x^* = Z(y + \alpha {\tilde{z}}_2^*) = Zy + \alpha z_2^*\), where \(\alpha \) is a scalar such that \(\left\| x^*\right\| _2 = r\). However, the minimum length solution of (15) can be obtained by the minimum length solution of

$$\begin{aligned}&(P + \mu ^* I) x + q = 0 \end{aligned}$$
(16)
$$\begin{aligned}&Ax = 0 \end{aligned}$$
(17)

which can be calculated via projected MINRES-QLP or a projected CG algorithm [1, Theorem 4.3] [14].

2.2 An algorithm for computing global and local solutions of the TRS

Using the ideas presented in this section, we now present Algorithm 1, a practical algorithm for computing global and second-order sufficient local-nonglobal solutions of equality constrained trust-region subproblems. In the remainder of this section, we establish the correctness of Algorithm 1.

figure a

Algorithm 1 uses the results of this section (in particular Theorems 1 and 2, Equation (13) and the paragraph “dealing with the hard case” of Sect. 2.1) to compute the minimizers of (8). The only novel point is the way we detect the presence of the local-nonglobal minimisers, which we proceed to show to be valid.

If the TRS is in the hard case, then the algorithm should only return global minimiser(s). Indeed, this is the case as

  1. 1.

    If \(z_1^g=0\), then the algorithm terminates at line 16, according to the remarks of the paragraph “Dealing with the hard case” of Sect. 2.1.

  2. 2.

    If \(z_1^g\ne 0\), then there must exist another eigenvector associated with \(\mu ^g\) besides \([z_1^g; \; z_2^g]\). This is because \(\mu ^g = -\lambda _1\) thus according to Proposition 2 of Sect. 2.3 there exists a vector \(u \in \mathcal {N}(P - \lambda _1 I)\) that is orthogonal to q which makes \((\mu ^g, [0;\; u])\) an eigenpair of M due to (4). Thus \(\mu ^g=\mu ^l\), which implies that \(\mu ^l\) is not simple. As a result the algorithm will only return a global minimizer using (5) at line 11.

If the TRS is not in the hard case, then we must have \(z_1^g \ne 0\) (see remarks of paragraph “Dealing with the hard case” of Sect. 2). Thus the algorithm will return either in lines 9 or 11, detecting correctly the presence of the local-nonglobal minimiser according to Theorem 2.

Finally, we emphasize that in the hard case the TRS can have an infinite number of solutions, but only one or two of them are returned by Algorithm 1.

2.3 Proof of Theorem 2

In this subsection we provide a proof of our main theoretical result, Theorem 2, which shows that (T) has a second-order sufficient local-nonglobal minimizer if and only if it does not belong to the hard case and the second-rightmost eigenvalue of the matrix

$$\begin{aligned} M {:=}\begin{bmatrix} -P &{}\quad q q^T/r^2 \\ I &{}\quad -P \end{bmatrix}, \end{aligned}$$

with counted algebraic multiplicities, is real and simple. Furthermore, we show that if such a minimizer exists, its Lagrange multiplier \(\mu ^{\ell }\) is equal to this second-rightmost eigenvalue.

The proof is based on the following Lemma, which follows from [25]:

Lemma 3

A KKT point \(x^{\ell }\) of (T) with Lagrange multiplier \(\mu ^{\ell }\) is a second-order sufficient local-nonglobal minimizer iff \(q \not \perp w_1\), \(\mu ^{\ell } \in (-\lambda _1, -\lambda _2)\) and \(s'(\mu ^{\ell }) > 0\).

Proof

See [25, Theorem 3.1]. \(\square \)

Before we use Lemma 3 we will need the following two ancillary results:

Lemma 4

For any complex number \(a + bi \in {\mathbb {C}}\) where ab are real numbers such that a is in the domain of s and \(b \ne 0\), we have:

$$\begin{aligned} \mathrm {Re}\left( s(a + bi)\right) < s(a). \end{aligned}$$
(18)

Proof

Recall the definition of s:

$$\begin{aligned} s(a + bi) = \sum _{j=1}^{n}\frac{(w_j^T q)^2}{(\lambda _j + a + bi)^2} - r^2 \end{aligned}$$

and define, for convenience, \(\kappa _j = w_j^T q\), \(j = 1 ,\dots , n\). Thus

$$\begin{aligned} s(a + bi)&= -r^2 + \sum _{j=1}^{n}\frac{\kappa _j^2}{(\lambda _j + a + bi)^2}\\&= -r^2 + \sum _{j=1}^{n}\kappa _j^2\frac{(\lambda _j + a)^2 - b^2}{((\lambda _j + a)^2 + b^2)^2} - \kappa _j^2\frac{2(\lambda _j + a)b}{((\lambda _j + a)^2 + b^2)^2}i, \end{aligned}$$

hence

$$\begin{aligned} \mathrm {Re}(s(a + bi))&= -r^2 + \sum _{j=1}^{n}\kappa _j^2\frac{(\lambda _j + a)^2 - b^2}{((\lambda _j + a)^2 + b^2)^2} \\&< -r^2 + \sum _{j=1}^{n}\kappa _j^2\frac{(\lambda _j + a)^2}{((\lambda _j + a)^2)^2}, \end{aligned}$$

or, equivalently, \(\mathrm {Re}(s(a + bi)) < s(a)\). \(\square \)

Lemma 5

If \(s(x) < 0\) for some real x in the domain of \(s: {\mathbb {C}} \mapsto {\mathbb {C}}\) then s has an equal number of poles and zeros with real parts in \([x, \infty )\).

Proof

We will prove this result with the argument principle. We propose encircling the half-space

$$\begin{aligned} S(x) {:=}\{{c \in {\mathbb {C}}}|{c = a + bi, \text { where } a \ge x, b \in {\mathbb {R}}}\} \end{aligned}$$

with the contour of Fig. 2 and show that it maps to a contour that lies strictly in the left half plane. The suggested closed contour C intersects the real axis at x and at \(\infty \). It consists of two segments: \(C_1 = \{\omega + \beta i \mid \beta \in {\mathbb {R}} \}\), i.e. a segment that is parallel to the imaginary axis, and \(C_2\), a half-circle with infinite radius.

Fig. 2
figure 2

Suggested closed contour for encircling S(x)

By the argument principle [2] we have \(\frac{1}{2\pi i}\int _C\frac{s'(z)}{s(z)}dz=Z-P\), where ZP are the number of roots and poles of s in the domain enclosed by C. It follows that the number of poles in S is equal to the number of zeros in S if \(\int _C\frac{s'(z)}{s(z)}dz=0\), which holds if there are no poles or zeros of s on C and s(C) does not enclose the origin. By examining (3) we conclude that s has only real, finite poles. Thus, the contour C does not intersect any pole of s. It remains to show that there exist no zeros of s on C and s(C) does not enclose the origin. We show both by proving that every point in \(s(C) = s(C_1) \cup s(C_2)\) has negative real part. For \(s(C_2)\), it is easy to see that, since \(s(C_2) = \{ -r^2\}\) as \(\lim _{|w| \rightarrow \infty } s(w) = -r^2\). Regarding \(s(C_1)\) note that \(s(x) < 0\), and thus Lemma 4 gives \(\mathrm {Re}(s(x + \beta i)) \le s(x) < 0 \; \forall \beta \in {\mathbb {R}}\), i.e., any point in \(s(C_1)\) has negative real part. \(\square \)

We will now connect the second-rightmost root of the secular equation s with the results of Lemma 3. Note that, due to Lemma 4, if s has a real root \(\mu \) then it cannot also have an unreal root with real part \(\mu \) and vice versa. Thus, the concept of “the second-rightmost root of s” is well defined as long as we assume (or are prepared to check) that this root is real.

Lemma 6

  1. (i)

    Suppose that \(q \not \perp w_1\) and \(\mu \in (-\lambda _2, -\lambda _1)\) is a simple root of s with \(s'(\mu ) > 0\). Then, \(\mu \) is the second-rightmost root of s in \({\mathbb {C}}\).

  2. (ii)

    Suppose that \(q \not \perp w_1, w_2\), \(\lambda _1 \ne \lambda _2\), \(\mu \in {\mathbb {R}}\) is the second-rightmost root of s in \({\mathbb {C}}\) and \(\mu \) is simple. Then, \(\mu \ge -\lambda _2\).

Proof

Note that in both cases \(q \not \perp w_1\), thus (T) does not belong to the hard case [1]. Hence s has a real root in \((-\lambda _1, \infty )\) (Theorem 1) which we denote by \(\mu ^g\). Moreover, \(\mu ^g\) is simple because \(s'(\mu ^g) = -2\sum _{j=1}^{n}(w_j^Tq)^2/(\lambda _j + \mu ^g)^3\) is negative since \(\mu ^g> -\lambda _1> \dots > -\lambda _n\).

In the subsequent analysis, we will frequently refer to the poles of s which are

$$\begin{aligned} \{{-\lambda _i}|{i = 1, \dots , n \text { such that } q^Tw_i \ne 0}\}, \end{aligned}$$

and have double multiplicity, and the following parametric set

$$\begin{aligned} S(x) {:=}\{{c \in {\mathbb {C}}}|{c = a + bi, \text { where } a \ge x, b \in {\mathbb {R}}}\} \end{aligned}$$

defined for any \(x \in {\mathbb {R}}\).

We will begin with claim (i). Since \(\mu< -\lambda _1 < \mu ^g\) we conclude that \(\mu \) is not the rightmost root of s. In order to show that it is the second rightmost, it suffices to show that there are at most two roots of s in \(S(\mu )\).

Since \(s(\mu ) = 0\) and \(s'(\mu ) > 0\), we have \(s(\mu - \epsilon ) < 0\) for a sufficiently small \(\epsilon > 0\). According to Lemma 5, s has equal number of zeros and poles in \(S(\mu - \epsilon )\). By assumption \(q \not \perp w_1\) and \(\mu > -\lambda _2\), thus \(-\lambda _1\) is the only (double) pole of s in \(S(\mu - \epsilon )\). Hence, the number of zeros in \( S(\mu - \epsilon ) \supset S(\mu )\) must be two. This concludes the proof of claim (i).

We will now prove claim (ii). Assume the contrary, i.e. that \(\mu < -\lambda _2\). Consider first the case where \(s'(\mu ) > 0\). Since \(s(\mu ) = 0\) we can choose a sufficiently small \(\epsilon > 0\) such that \(s(\mu - \epsilon ) < 0\). Thus s has equal number of zeros and poles in \(S(\mu - \epsilon )\) (Lemma 5). By assumption \(\mu < -\lambda _2\) and \(q \not \perp w_1, w_2\), and thus the double poles \(-\lambda _1\) and \(-\lambda _2\) are in \(S(\mu - \epsilon )\). Thus s has (at least) four zeros in \(S(\mu - \epsilon )\), and since \(\epsilon \) can be arbitrarily small, the same holds for \(S(\mu )\).

We conclude that there should be extra roots for s in \(S(\mu )\), besides the simple roots \(\mu ^g\) and \(\mu \). Moreover, these extra roots cannot have real part equal to \(\mu \) as according to Lemma 4\(\mathrm {Re}(s(\mu + b i)) < s(\mu ) = 0\) for any real \(b \ne 0\). Thus their real parts must be in \((\mu , \infty )\). This means that \(\mu \) is not the second-rightmost root of s, which is a contradiction.

Finally, if \(s'(\mu ) < 0\), then, using the same arguments as before, we conclude that there exist four roots in \(S(\mu + \epsilon )\) for any sufficiently small \(\epsilon > 0\). Thus, there exist extra roots (besides the simple root \(\mu ^g\)) with real part in \((\mu , \infty )\), which, again, contradicts the assumption that \(\mu \) is the second-rightmost root of s. \(\square \)

Lemma 6, in combination with Lemma 3, provides an “almost” iff relationship between the second-rightmost root of s and the local-nonglobal minimizer of (T). However, the quantity of primary interest is the second-rightmost eigenvalue of M instead of the second-rightmost root of s. The following result characterizes the spectrum of M and its tight connection with the roots of s:

Proposition 2

The spectrum of M consists of:

  • the roots of s with matched algebraic multiplicity; and

  • all of the eigenvalues \(-\lambda _i\) of \(-P\) that are non-simple for \(-P\) or have \(w_i\) (the \(i-\)th eigenvector of P) orthogonal to q.

Furthermore, every \(-\lambda _i\) that is in the spectrum of M is non-simple for M.

Proof

We will prove the result by deriving a closed form expression for the characteristic polynomial of M. Since \(-I\) and \(P + \mu I\) commute, we have [31, Theorem 3]:

$$\begin{aligned} \det (\mu I - M) = \det \left( \begin{bmatrix} P + \mu I &{}\quad -qq^T/r^2 \\ -I &{}\quad P + \mu I \end{bmatrix} \right) = \det ( (P + \mu I)^2 - q q^T/r^2) \end{aligned}$$

or, using the matrix determinant lemma,

$$\begin{aligned} \det (\mu I - M)&= -r^{-2}\left( q^T (P + \mu I)^{-2}q - r^2 \right) \det ((P + \mu I)^2) \end{aligned}$$

for all \(\mu \ne -\lambda _1, \dots -\lambda _n\). Thus, recalling (2)–(3) we have

$$\begin{aligned} det(\mu I - M)&= -r^{-2} \left( \sum _{i=1}^{n}\frac{(w_i^T q)^2}{(\lambda _i + \mu )^2} - r^2\right) \prod _{i = 1}^n(\lambda _i + \mu )^2 \end{aligned}$$
(19)
$$\begin{aligned}&= -r^{-2}\sum _{i = 1}^{n} (w_i^T q)^2 \prod _{j \ne i} (\lambda _j + \mu )^2 + \prod _{i = 1}^{n} (\lambda _i + \mu )^{2}, \end{aligned}$$
(20)

for all \(\mu \ne -\lambda _1, \dots -\lambda _n\). Thus, (20) and the characteristic polynomial of M coincide in all \({\mathbb {C}}\) except perhaps on 2n points. It follows from continuity arguments that (20) is in fact the characteristic polynomial of M.

By examining (19) (2nd leftmost) and (20), we conclude that the eigenvalues of M include all the roots of s with matched algebraic multiplicity. The secular equation s has 2n roots whenever the eigenvalues of P are distinct and \(w_i^T q \ne 0\) for all \(i = 1, \dots , n\). Otherwise, i.e. for every i with \(\lambda _i\) non-simple or \(w_i^T q = 0\), \(M \in {\mathbb {R}}^{2n \times 2n}\) has extra non-simple eigenvalues at these \(-\lambda _i\), which constitute the only differences between the spectrum of M and the roots of s. \(\square \)

The following result follows directly from the above Proposition and will be useful for the rest of this section.

Corollary 1

No simple eigenvalue of M is in the spectrum of \(-P\).

We are now ready to prove the main result of this section, Theorem 2. The proof will occupy the rest of this subsection. Note, again, that, due to Lemma 4 and Proposition 2, if M has a real eigenvalue \(\mu \) then it cannot also have an unreal eigenvalue with real part \(\mu \) and vice versa. Thus, the concept of “the second-rightmost eigenvalue of M” is well defined as long as we assume (or are prepared to check) that this eigenvalue is real.

“only-if” In this case we know that \(\mu ^{\ell }\) is the Lagrange multiplier of a second-order sufficient local-nonglobal minimizer and we have to show that

  1. (i)

    (T) is not in the hard case.

  2. (ii)

    \(\mu ^{\ell }\) is a simple eigenvalue of M.

  3. (iii)

    \(\mu ^{\ell }\) is the second-rightmost eigenvalue of M.

Points (i), and (ii) follow directly from Lemma 3 and Proposition 2 (note that \(q \not \perp w_1\) implies that (T) is not in the hard case [1]). In order to prove (iii) we will first show that the spectrum (counted with algebraic multiplicities) of M with real part larger than \(-\lambda _2\) coincides with the roots of s in the same region. This follows from Proposition 2, because \(\lambda _1\) is simple (due to \(\mu ^{\ell } \in (-\lambda _2, -\lambda _1)\)) and \(q^T w_1 \ne 0\). Point (iii) then follows from Lemma 6(i) which shows that \(\mu ^{\ell }\) is the second-rightmost root of s, and thus the second-rightmost eigenvalue of M.

“if” In this case, we know that (T) is not in the hard case and that \(\mu ^{\ell }\) is the second-rightmost eigenvalue of M which is real and simple, and we want to show that \(\mu ^{\ell }\) is the Lagrange multiplier of the second-order sufficient local-nonglobal minimizer of (T).

Note that, by assumption, \(\mu ^{\ell }\) is in the spectrum of M but, due to Corollary 1, not in that of \(-P\). Thus \(\mu ^{\ell }\) is a root of s due to Proposition 2. As a result, it suffices to show that \(\mu ^{\ell }\) is in \((-\lambda _2, -\lambda _1)\) and \(s'(\mu ^{\ell }) > 0\) (Lemma 3).

We will first show that \(\mu ^{\ell } \in (-\lambda _2, -\lambda _1)\). Since we are not in the hard case, M has a real eigenvalue in \((-\lambda _1, \infty )\) (Theorem 1) which is simple and unique in \((-\lambda _1, \infty )\) due to Proposition 2 and the fact that \(s'(\mu ) = -2\sum _{j=1}^{n}(w_j^Tq)^2/(\lambda _j + \mu )^3\) is negative in that region. Thus, \(-\lambda _1\) is not in the spectrum of M as otherwise it would be its second-rightmost eigenvalue (which by assumption is real) and it would imply that \(\mu ^{\ell }\) is in the spectrum of \(-P\), which we have already excluded. Thus, \(\mu ^{\ell } \in (-\infty , -\lambda _1)\), and \(\mu ^{\ell } \ne -\lambda _2\).

It remains to show that \(\mu ^{\ell } \ge -\lambda _2\). If \(q^T w_2 = 0\) then, \((-\lambda _2, [0; w_2])\) is an eigenpair of M, thus we must have \(\mu ^{\ell } > -\lambda _2\) so as to avoid \(\mu ^{\ell }\) (the second-rightmost eigenvalue of M) being in the spectrum of \(-P\). Consider now the case where \(q^T w_2 \ne 0\). Note that \(\lambda _1 \ne \lambda _2\) and \(q \not \perp w_1\) as otherwise \(-\lambda _1\) is in the spectrum of M (Proposition 2), which, according to the paragraph above is a contradiction. Thus, \(\lambda _1 \ne \lambda _2\) and \(q \not \perp w_1, w_2\) resulting in \(\mu ^{\ell } \ge -\lambda _2\) due to Lemma 6(ii).

Finally we show that \(s'(\mu ^{\ell }) > 0\), thereby concluding the proof. Note that \(s'(\mu ^{\ell }) \ne 0\) since by assumption \(\mu ^{\ell }\) is a simple root of s. Assume the contrary, i.e. \(s'(\mu ^{\ell }) < 0\). Since s is convex in \((-\lambda _2, -\lambda _1)\) [25, (3.12)] and \(\lim _{\mu \rightarrow -\lambda _1}s(\mu ) = \infty \) we conclude that there must exist another root of s in \((\mu ^{\ell }, -\lambda _1)\). This contradicts the assumption that \(\mu ^{\ell }\) is the second-rightmost root of s.

3 An active-set algorithm for (P)

We are now in a position to present the main contribution of this paper, an active-set algorithm for solving (P). We will first present an algorithm for the special that \(r_{\min } = r_{\max } {:=}r\) and then describe how to generalize for any \(r_{\min }, r_{\max }\).

3.1 Solving (P) when \(r_{\min } = r_{\max }{:=}r\)

In this section we introduce a primal active-set approach for the optimization of (P) when \(r_{\min } = r_{\max } {:=}r\). It will be useful in the subsequent analysis to recall the Karush-Kuhn-Tucker (KKT) conditions of (P), which are

$$\begin{aligned} \nabla f(x) + A^T\kappa + \mu x&= 0 \end{aligned}$$
(21)
$$\begin{aligned} \kappa&\ge 0 \end{aligned}$$
(22)
$$\begin{aligned} \kappa _i( a_i^T x - b_i)&= 0, \quad i = 1, \dots , m \end{aligned}$$
(23)
$$\begin{aligned} A x&\le b, \quad \quad x^Tx = r^2. \end{aligned}$$
(24)

As is typical from a primal active-set approach, our algorithm starts from a given feasible point of (P) and generates iterates that remain feasible for (P) and have non-increasing objective values. At every iteration we treat a subset of the inequality constraints \(Ax \le b\) as equalities. We refer to this subset as the working set \(\mathcal {W}_k\) and define

$$\begin{aligned} {\bar{A}}_k = [a_i]_{i \in \mathcal {W}_k}, \quad {\bar{b}}_k = [b_i]_{i \in \mathcal {W}_k}, \end{aligned}$$

where \([\cdot ]\) denotes vertical (row-wise) concatenation. To simplify the analysis, we will assume that one of the simplest and most common constraint qualification holds for every iterate of the algorithm:

Assumption 1

Linear Independence Constraint Qualification [LICQ]

The LICQ holds for every iterate of the suggested Algorithm, i.e. \(\begin{bmatrix} {\bar{A}}_k \\ {\bar{x}}_k^T \end{bmatrix}\) is full row rank.

We will first give a brief, schematic description of our algorithm. At every iteration k of our active-set based minimization procedure, we use Algorithm 1 to compute minimizers of the subproblem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}x^T P x + q^T x \\ \text {subject to} &{} \left\| x\right\| _2 = r \\ &{} {\bar{A}}_k x = {\bar{b}}_k. \end{array} \end{aligned}$$
(SP)

We then attempt to move towards those minimizers in that hope that we either (i) “hit” a new constraint or (ii) set \(x_{k + 1}\) to a (potentially local) minimizer of (SP). Unfortunately due to the nonconvexity of the problem, we might achieve neither (i) nor (ii) by simply moving towards the minimizer of (SP), as the objective of (P) is not guaranteed to be non-increasing along these moves. In that case, which we will show can happen only when (SP) has a unique minimizer, we perform a projected gradient descent followed (if necessary) by a move towards the global minimizer of (SP) that is guaranteed to achieve either (i) or (ii)Footnote 6. Finally, at the end of each iteration, we check for termination and update the working set.

Our overall active-set procedure is shown in Algorithm 2. We now proceed in the following subsections that describe in detail each of the Algorithm’s steps. For the rest of this section, we will say that a point is “feasible” when it is in the feasible region of (P). Also, given a point x we will call the projection of \(\nabla f(x)\) to the nullspace of constraint normals of (SP), \(\begin{bmatrix} {\bar{A}}_k \\ x^T \end{bmatrix}\), as “projected gradient of \(f(\cdot )\) at x”.

figure b

3.1.1 Move towards the minimizers of (SP)

Notice that Algorithm 1 always returns a global minimizer \(x^g\) of (SP) and possibly a second minimizer \(x^*\).

Let us consider first the case where two distinct minimizers are returned which is treated in lines 5-6 of Algorithm 2. In this case it is always possible to either set \(x_{k+1}\) to a minimizer of (SP) or “hit” a new constraint as follows. Consider the circle defined by the intersection of the sphere \(\left\| x\right\| _2 = r\) and the plane defined by \(x_k\) and the two returned minimizers. When (SP) is constrained in this circle, then it can be reduced to a two-dimensional TRS. As such, we will show that it has at most 2 maximizers and 2 minimizers, except perhaps when \(x_k\) is a global minimizer of (SP) in which case we can trivially set \(x_{k+1} = x_{k}\).

Proposition 3

A two dimensional TRS has at most 2 minimizers and 2 maximizers except when its objective is constant across its feasible region.

Proof

It suffices to show the result only for the minimizers, as a negation of the TRS’s objective would show the same result for its maximizers. Suppose that the TRS does not belong to the hard case. Then it has a unique global minimizer and at most one local-nonglobal minimizer [25]. If it belongs to the hard case, then only global minimizers can exist. These are intersections of the affine subspace (6) and the sphere (7). Note that this intersection is either a distinct point, two points or a circle. In the latter case, every feasible point of (T) is a global minimizer. This completes the proof. \(\square \)

We can identify the two minimizers of this two-dimensional TRS as \(x^g\) and \(x^*\). It follows that at least in one of the circular arcs connecting \(x_k\) with \(x^g\) and \(x^*\) the objective value is always less than \(f(x_k)\). Thus, by moving into that circular arc one will either end up with \(x^g\) or \(x^*\) or “hit” a new constraint.

On the other hand, if a single global minimizer \(x^g\) was returned, which corresponds to lines 7-8 of Algorithm 2, then we consider the circle defined by the intersection of the sphere \(\left\| x\right\| _2 = r\) and the plane defined by \(x_k\), \(x^g\) and the projected gradient of \(f(\cdot )\) at \(x_k\). Obviously, if \(x^g\) is feasible for (P) then we choose \(x_{k+1} = x^g\), but otherwise we are not guaranteed to “hit” a new constraint. This is because the associated two-dimensional TRS might possess a second minimizer, which is not necessarily a stationary point of (SP), but might be the best feasible solution on this circle.

3.1.2 Perform projected gradient descent (if necessary)

This subsection concerns the case where the procedure described in the previous subsection could neither set \(x_{k+1}\) to a minimizer of SP nor “hit” a new constraint. In this case (lines 10-14 of Algorithm 2), we proceed by performing projected gradient descent (PGD) on (SP) starting from the current iterate of the active-set algorithm. This is guaranteed to converge to a KKT point \(x^s\) with \(f(x^s) \le f(x_k)\) [4, Theorem 4.5]. Suppose that \(x^s\) is not feasible for (P). Since the feasible region is a closed set, PGD exits the feasible region in finitely many steps. By stopping the projected gradient descent just before it exits the feasible region, we can find a point \(x_{k+1}\) with a newly “hit” constraint.

Consider now the case where \(x^s\) is feasible for (P). If the minimum eigenvalue \(\lambda _{\min }\) of the projected Hessian of \(f(\cdot )\) at \(x^s\) is nonnegative (lines 12-14 of Algorithm 2), then we set \(x_{k+1} = x^s\) and we proceed to the steps outlined in the next subsection. Otherwise \(\lambda _{\min } < 0\), and there exists a feasible sequence \(\{z_k\}\) (w.r.t. to (SP)) converging to \(x^s\) with an appropriate limiting direction d

$$\begin{aligned} d {:=}\lim _{k \rightarrow \infty } \frac{z_k - x^s}{\left\| z_k - x^s\right\| _2}, \end{aligned}$$
(25)

such that

$$\begin{aligned} f(z_k) = f(x^s) + \frac{1}{2}\lambda _{\min }\left\| z_k - x^s\right\| ^2 + {\mathbb {O}}\left( \left\| z_k - x^s\right\| ^2\right) . \end{aligned}$$
(26)

In practice, we can choose d equal to some projected eigenvector associated with \(\lambda _{\min } < 0\). Consider now the circle defined by the intersection of the sphere \(\left\| x\right\| _2 = r\) and the plane defined by \(x^s, x^g\) and the limiting direction d. As before, this can be reduced to a two-dimensional TRS that possesses at most 2 minimizers and 2 maximizers (Proposition 3). We can identify two of them: \(x^g\) which is a global minimum and \(x^s\) which, according to (26), is a local maximum. It follows that in at least one of the two circular arcs connecting \(x^s\) with \(x^g\) the objective value is always less than \(f(x^s)\). Thus, by moving into that circular arc, and since \(x^s\) is feasible for (P) and \(x^g\) infeasible, we can identify a suitable feasible point \(x_{k+1}\) such that \(f(x^g) \le f(x_{k+1}) \le f(x^s)\) for which a new constraint, not in the current working set, becomes active.

3.1.3 Update the working set and check for termination

If a new constraint was identified in the steps above, then we update the working set and proceed to the next iteration (lines 15-16 of Algorithm 2). Otherwise, \(x_{k+1}\) was set to a second-order necessary stationary point of (SP) and we proceed as follows (lines 17-18 of Algorithm 2). Stationarity of \(x_{k+1}\) gives

$$\begin{aligned} \nabla f(x_{k+1}) + \mu x_{k+1} + \sum _{i \in \mathcal {W}_k} a_i \kappa _i = 0 \end{aligned}$$

for some Lagrange multipliers \(\mu \) and \(\kappa _i \; \forall i \in \mathcal {W}_k\). Thus, \((x_{k+1}, \mu , \kappa )\) satisfy the first KKT condition (21) if we define \(\kappa _i = 0 \; \forall \not \in \mathcal {W}_k\). It follows from feasibility of \(x_{k+1}\) and the definition of \(\kappa \) that the third and fourth KKT conditions (23)-(24) also hold. If we also have that \(\kappa _i \ge 0 \; \forall i \in \mathcal {W}_k\), then \((x_{k+1}, \mu , \kappa )\) is a KKT pair for (P) and we terminate the algorithm (lines 23-24 of Algorithm 2). If, on the other hand, \(\kappa _i < 0\) for some \(i \in \mathcal {W}_k\) (lines 25-27 of Algorithm 2), then (22) is not satisfied and \(x_{k+1}\) is not a local minimizer for (P). In fact, the objective \(f(\cdot )\) may be decreased by dropping one of the constraints corresponding to a negative Lagrange multiplier [27, Section 12.3].

3.1.4 Finite termination

We will show that Algorithm 2 terminates in finitely many outer iterations, assuming that \(x_{k+1} \ne x_k\) for every \(k = 1, 2. \dots \). Indeed at every iteration the algorithm either

  • Moves to a stationary point of the current TRS subproblem; or

  • Activates a new constraint.

Since there can be at most m constraints in the working set it follows that \(x_k\) visits a stationary point of the \(k-\)th TRS subproblem periodically (at least every m iterations). Furthermore, note that every TRS subproblem, resulting from a particular working set, has at most 2n sets of stationary points of the same objective value [9, Theorem 4.1 and subsequent comments]. Since there are a finite number of different working set configurations, it follows that there exist finitely many such sets. On the other hand, every constraint that is deleted from the working set has an associated negative Lagrange multiplier, thus every iterate \(x_k\) is not a stationary point of the next iteration. Furthermore, note that if \(x_k\) is not a stationary point for the \(k+1\) subproblem then our algorithm generates \(x_{k+1}\) with \(f(x_{k+1}) < f(x_{k})\) unless \(x_k = x_{k+1}\) which is excluded by assumption. This means that once the algorithm visits one of these sets of stationary points with equal objective value, it can never visit it again. Hence, the algorithm terminates after finitely many iterations.

We believe that it is not possible to bound the number of iterations by a polynomial in mn. Indeed, even finding an initial feasible point is NP-complete, as we show in Sect. 3.2.1.

3.1.5 Further remarks

Similarly to active-set algorithms for quadratic programs, we can always update the working set such that \(\bar{A}_k\) is full row rank. However, LICQ might still fail to be satisfied when the gradient of the spherical constraint lies on \(\mathcal {R}(\bar{A}^T_k)\). In these cases we terminate the algorithm without guarantees about local optimality.

3.2 Solving (P) for any \(r_{\min }, r_{\max }\)

We are now ready to present an active-set algorithm that solves (P) for any \(r_{\min }\) and \(r_{\max }\). At every iteration of the suggested active-set algorithm, the spherical inequality constraint will either be in the working set or not. If it is, then we iterate as described in Algorithm 2; otherwise we proceed with a generic (nonconvex) quadratic programming active-set algorithm [27, §16.8] [12, 18]. We switch between the two algorithms when an iterate of the QP algorithm hits the spherical boundary or when the Lagrange Multiplier \(\mu \) of the norm constraint in Algorithm 2 is negative and less than any other Lagrange Multiplier \(\kappa _i\).

In the special the case where \(r_{\min } = 0\), the norm constraint reduces to \(\left\| x\right\| _2 \le r_{\max }\). In this case, when a switch from Algorithm 2 to the QP Algorithm happens, the projected Hessian of the Lagrangian of \(f(\cdot )\) at \(x_k\) is positive semidefinite, i.e.

$$\begin{aligned} Q^T (P + \mu I)Q \succeq 0 \end{aligned}$$
(27)

where \(Q {:=}[Q_1 \; q_2]\) is defined by the “thin” QR decomposition of \([{\bar{A}}_k^T \; x_k]\) and \(q_2\) is an appropriate vector. Recall that for a switch to happen, we need \(\mu < 0\); hence \(Q^TPQ\) is positive definite. Thus, \(Q_1^TPQ_1\), i.e. the projected Hessian of the next iteration that is to be handled by the generic QP algorithm, has at most one nonpositive eigenvalue [29, Corollary 4.1]. This means that even the popular, but less flexible, class of “inertia controlling” QP algorithms can be used as part of the suggested active-set algorithm for solving (P) when \(r_{\min } = 0\).

3.2.1 Finding an initial point

Algorithm 2 is a primal active-set algorithm and, as such, it requires an initial feasible point. Unfortunately, finding such a point is, in general, NP-complete as we formally establish at the end of this subsection. Nevertheless, we can find a feasible point for (P) with standard tools, albeit in exponential worst-case complexity, as we proceed to show. Indeed, define the following problems:

$$\begin{aligned} \begin{array}{ll} \text {maximize} &{} x^T x\\ \text {subject to} &{} Ax \le b \\ &{} \left\| x\right\| _{\infty } \le r_{\min } \end{array} \qquad \qquad \begin{array}{ll} \text {minimize} &{} x^T x\\ \text {subject to} &{} Ax \le b \end{array} \end{aligned}$$

i.e. a convex minimization and a convex maximization problem. Denote with \(x^*_{\min }\) and \(x^*_{\max }\) the solutions to the convex minimization and maximization problems respectively. Then, (P) is feasible iff \(\left\| x^*_{\min }\right\| _2 \le r_{\max }\) and \(\left\| x^*_{\max }\right\| _2 \ge r_{\min }\), in which case a feasible point for (P) can be found by interpolating between \(x^*_{\min }\) and \(x^*_{\max }\).

The convex minimization problem above can be solved in polynomial time with e.g. interior point or first order [32] methods. On the other hand, the convex maximization problem can have exponential worst-case complexity, but it can be solved to local or even global optimality with standard commercial solvers e.g. with IBM CPLEX.

Finally, we formally show that determining if (P) is feasible is NP-complete:

Proposition 4

Determining if there is a solution to

$$\begin{aligned} \begin{array}{ll} \text {find} &{} x \\ \text {subject to} &{} Ax \le b \\ &{} r_{\min } \le \left\| x\right\| _2 \le r_{\max } \\ \end{array} \end{aligned}$$
(F)

is NP-complete.

Proof

Determining if (F) has a solution is NP since we can easily check whether a candidate point x is feasible for (F). Furthermore, it can be decomposed into the following two independent problems: (i) “is there a point in the polytope \(Ax \le b\) such that \(\left\| x\right\| _2 \le r_{\max }?\)” and (ii) “is there a point in the polytope \(Ax \le b\) such that \(\left\| x\right\| _2 \ge r_{\min }?\)” Problem (i) can be answered in polynomial time with e.g. interior point methods. Thus determining if (F) has a solution is reducible to problem (ii) which is known to be NP-complete [26, Problem 3]. \(\square \)

4 Applications and experiments

In this section we present numerical results of the suggested active-set algorithms on a range of numerical optimization problems. A Julia implementation of the algorithm, along with code for the generation of the results of this section is available at:

https://github.com/oxfordcontrol/QPnorm.jl

As described in the previous sections, the algorithm is based on a TRS solver, and a general (nonconvex) Quadratic Programming solver. We implemented these dependencies as separate packages that we also release publicly as listed below.

  • TRS.jl: A Julia package for the computation of global and local-nonglobal minimizers of

    $$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \frac{1}{2}x^T P x + q^T x \\ \text {subject to} &{} \left\| x\right\| = r \quad \text {or} \quad \left\| x\right\| \le r \\ &{} Ax = b, \end{array} \end{aligned}$$
    (P)

    essentially implementing in Julia the theoretical results of Sect. 2 and [1]. Available at:

    https://github.com/oxfordcontrol/TRS.jl

  • GeneralQP.jl: A Julia implementation of [13], i.e. an “inertia controlling” active-set solver for nonconvex, dense quadratic problems, with efficient and numerically stable routines for updating QR decompositions of the working set and LDLt factorizations of the projected Hessian. Available at:

    https://github.com/oxfordcontrol/GeneralQP.jl

    This solver is useful as a part of the suggested algorithm for solving (P) according to the remarks of Sect. 3.2.

For simplicity our implementations of the active-set algorithms are based on dense linear algebra that uses a QR factorization to compute/update a nullspace basis for every working set. Thus the presented results are limited to dense problems, except for Sect. 4.3 where the special structure of the constraints of (32) results in trivial, sparse QR factorizations of the nullspace bases.

Before presenting the results we will discuss some practical considerations regarding the suggested active-set algorithm. The eigenproblems (4) associated with the trust-region subproblems of each working set are solved with ARPACK [24]. In the occasional cases where ARPACK fails, a direct eigensolver is used. Finally, before solving the subproblem (SP) of every iteration, we perform a few projected gradient steps with the hope to quickly activate a new constraint.

We now proceed with the results, starting with the simplest case of dense random problems.

4.1 Random dense constant norm QPs

We compared the performance of our algorithm against the state-of-the-art non-linear solver Ipopt [33] with its default parameters. We use the Julia interface of Ipopt, Ipopt.jl, which exhibits negligible interface overhead times due to the excellent interfacing capabilities of Julia with C++.

We consider a set of these randomly generated problems with varying number of variables n. A feasible point for each problem instance is calculated for our Algorithm as described in Sect. 3.2.1. The time required to compute the initial feasible points is included in the subsequent results.

Figure 3 (left) shows the execution times. We observe that our algorithm is significantly faster than Ipopt by a factor of up to 50. Both of the solvers achieve practically identical objectives (relative difference less than \(10^{-9}\)) in all of the problem instances, except in two cases where there is a considerable difference due to the fact that the solvers ended up converging in different local minimizers. Finally, Fig. 3 (right) shows the infeasibility of the returned solution, where we observe that our solver returns solutions of significantly smaller (i.e. better) infeasibility.

Fig. 3
figure 3

Timing results (left) and maximum feasibility violation (right) of the solution of random problems with varying size n generated with the following parameters \(P =\) Symmetric(randn(nn)), \(q =\) randn(n), \(A =\) randn(1.5nn), \(b =\) randn(n) and \(r = 100\). Hardware used: Intel E5-2640v3, 64GB memory. The maximum feasibility violation is defined as \(\max ((Ax^* - b)_1, \dots , (Ax^* - b)_m, |\left\| x^*\right\| _2^2 - r^2|, 0)\) for a solution \(x^*\)

4.2 Computing search directions for sequential quadratic programming

Sequential Quadratic Programming (SQP) is a powerful and popular algorithm that aims to solve the problem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} g(x) \\ \text {subject to} &{} h(x) \le 0, \end{array} \end{aligned}$$
(28)

where \(g : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) and \(h : {\mathbb {R}}^n \mapsto {\mathbb {R}}^m\) are general (nonlinear) functions. At every iteration, SQP minimizes the nonlinear objective over some search directions. These search directions can, for example, be obtained by the optimization of a quadratic approximation of the original objective subject to some linear approximation of the original constraints. It is common to introduce a norm constraint \(\left\| \delta x\right\| _2 \le r\) that ensures that the solution will be inside a confidence or trust region of this approximation thus resulting in a problem of the form (P). This is particularly useful at points where the Hessian of the nonlinear objective is singular or indefinite as the search directions computed by the respective QPs might be unrealistically large or unbounded.

Typically, SQP algorithms solve problems simpler than (P) by e.g. not including the inequality constraints (P) in explicit form for the calculation of the search directions [27, §18.2]. We will show, however, that our solver is capable of computing search directions from (P) directly, without the need for these simplifications.

We demonstrate this on all the problems of the CUTEst collection [17] that have linear constraints and number of variables less than 2000. For each of these problems we consider the quadratic approximation

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} f(\delta x){:=}\frac{1}{2} \delta x^T \nabla ^2g(x_0) \delta x + \delta x^T \nabla g(x_0) + c \\ \text {subject to} &{} \left\| \delta x\right\| _2^2 \le 1 \\ &{} \nabla h(x_0) \delta x + h(x_0) \le 0 \end{array} \end{aligned}$$
(29)

***where \(x_0\) is the closest feasible point (in the 2-norm) to the initial point suggested by CUTEst, which we compute with GUROBI [19], and \(\delta x {:=}x - x_0\). We discard problems where GUROBI fails to calculate an initial feasible point. Furthermore, we do not consider problems for which (29) is convex, since these problems can be solved to global optimality in polynomial time with standard solvers such as MOSEK or COSMO.jl [11]. Finally, some problems in the CUTEst collection are parametric; if the default parameters lead to a problem with more than 2000 variables then we choose a parameter that, if possible, leads to a number of variables close to, but not exceeding, 2000. Table 4 list all the 63 problems considered, along with any non-default parameters.

We compare the performance of our algorithm against Ipopt with its default options. For some problems, Ipopt terminates without indication of failure but returns a low quality solution. We consider a problem as “solved” using the same criteria as [8]. That is, we require that the overall error of the KKT conditions is less than \(10^{-4}\), defined as

$$\begin{aligned} \text {error}&{:=}\max (\epsilon _{\text {primal inf}}, \epsilon _{\text {dual inf}}, \epsilon _{\text {stationarity}}, \epsilon _{\text {compl}}) \end{aligned}$$
(30)

with

$$\begin{aligned} \epsilon _{\text {primal inf}}&{:=}\max \left( 0, \delta x^T \nabla h(x_0) + h(x_0), \left\| \delta x\right\| _2 - 1\right) \\ \epsilon _{\text {dual inf}}&{:=}-\min (0, \kappa _1, \dots , \kappa _m, \mu ) \\ \epsilon _{\text {stationarity}}&{:=}\left\| \nabla ^2g(x_0) \delta x + \nabla g(x_0) + \kappa ^T \nabla h(x_0) + 2 \mu \delta x\right\| _{\infty } \\ \epsilon _{\text {compl}}&{:=}\max (\{\min (\kappa _i, \Vert \nabla h(x_0) \delta x + h(x_0) \Vert _i)\}, \min (\mu , |\left\| \delta x\right\| _2^2 - 1|)), \end{aligned}$$

and \(\kappa , \mu \) are the Lagrange multipliers corresponding to the linear inequality and norm constraints of (29). Figure 4 shows the performance profile for the problems considered. The performance profile suggests that our algorithm significantly outperforms Ipopt on this set of problems especially in reliability, in the sense that we consistently obtain high quality solutions. Note that unlike the dense implementation of our solver, Ipopt can exploit the sparsity of the problems efficiently. Further speedups can be brought to our algorithm with a sparse implementation which might allow its use in large scale sparse problems. Moreover, as an active-set solver, our algorithm can efficiently exploit warm starting, which might be highly desirable in SQP where repeated solution of problems of the form (29) is required as part of the SQP procedure for minimizing (28). This is in contrast to Ipopt whose Interior Point nature makes warm starting very difficult to exploit.

Fig. 4
figure 4

Performance graph of the suggested algorithm against Ipopt on calculating SQP search directions on all 63 problems of the CUTEst library listed in Table 4. Matrices are passed to Ipopt in a sparse format. Hardware used: Intel E5-2640v3, 64GB memory

4.3 Sparse principal component analysis

Principal Component Analysis (PCA) is a standard, widely used dimensionality reduction algorithm. It has applications in numerous fields, including statistics, machine learning, bioinformatics, genetics, meteorology and others. Given a \(k \times n\) data matrix D that consists of k points of n variables, PCA suggests a few linear combinations of these variables, which we call principal vectors, that explain as much variance of the data as possible. Standard PCA amounts to the following problem

$$\begin{aligned} \begin{array}{ll} \text {maximize} &{} x^T \varSigma x \\ \text {subject to} &{} \left\| x\right\| _2 \le 1 \end{array} \end{aligned}$$

where \(\varSigma \) is the covariance matrix of the data. We will assume for the rest of this section that the data matrix D is “centered”, i.e. that it has column-wise zero mean. We can then consider \(\varSigma \) to be the empirical covariance matrix \(\varSigma {:=}\frac{D^T D}{k-1} \in {\mathbb {S}}_+^n\). The above problem is essentially an eigenvalue problem on \(\varSigma \) or a singular value problem on D and, as such, can be solved with standard linear algebra tools.

In general, each principal vector is a linear combination of all the variables, i.e. typically all components of the principal vectors are non-zero. This can pose an issue of interpretability of the reduced dimensions, as it is often desirable to express the principal vector as a combination of a few variables, especially when the variables are associated with a user interpretable meaning. To alleviate this problem, sparsity has to be enforced in the original PCA problem, resulting in a new optimization problem that aims to identify a small set of variables the linear combination of which will hopefully still be able to explain a significant proportion of the variance of the data.

Unfortunately, enforcing sparsity in PCA results in a combinatorial optimization problem. Various remarkably efficient and scalable heuristics have been suggested to avoid this intractability [23, 34]. We will focus on one of the most popular heuristics that is based on a lasso type constraint originally introduced by [22], resulting in the following optimization problem:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} f(x) {:=}-x^T \varSigma x \\ \text {subject to} &{} \left\| x\right\| _2 \le 1 \\ &{} \left\| x\right\| _1 \le \gamma , \end{array} \end{aligned}$$
(31)

where \(x \in {\mathbb {R}}^n\) is the decision variable and \(\gamma \ge 1\) is a parameter that controls the sparsity of the solution.

We will now show how Problem (31) can be solved with our algorithm. Note that (31) is not in the standard norm-constrained form (P) that we address. However, we show in the “Appendix” that it is equivalent to

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} g(w) {:=}-(w_1 - w_2)^T \varSigma (w_1 - w_2) \\ \text {subject to} &{} \left\| w\right\| _2^2 \le 1 \\ &{} {\mathbf {1}}^T w \le \gamma \\ &{} w \ge 0, \end{array} \end{aligned}$$
(32)

where \([w_1^T\; w_2^T] {:=}w^T \in {\mathbb {R}}^{2n}\), in the sense that (31) and (32) have the same optimal value and every minimizer of (32) \({\bar{w}}\) defines a minimizer \({\bar{x}} = {\bar{w}}_1 - {\bar{w}}_2\) for (31). Also, the optimal \({\bar{w}}\) has the same cardinality as the respective optimal \({\bar{x}} = {\bar{w}}_1 - {\bar{w}}_2\).

The reader might think that solving (32) with an active-set method requires a solve time that is polynomial in the number of variables, thus limiting its scalability as compared to simpler gradient based methods that are scalable but might have inferior accuracy and, potentially, weaker convergence guarantees. This is not true however, since our algorithm takes advantage of the sparsity of the iterates. Indeed, excluding the lasso constraint, the working set \(\mathcal {W}_k\) of Algorithm 2 applied in Problem (32) can be interpreted as the set of variables \(w_i\) that are fixed to zero. Thus every subproblem can be trivially reduced to k dimensions, where k is the number of variables \(w_i\) that are not included in the current working set. This property is particularly attractive when the user is interested in a very sparse solution and has been recognized in the literature on \(\ell _1\)-penalized active-set algorithms [3, §6].

We will demonstrate the scalability and flexibility of our algorithm by applying it in a large “bag of words” dataset from articles of the New York Times newspaper obtained by the University of California, Irvine (UCI) Machine Learning RepositoryFootnote 7. In this dataset, every row of the associated data matrix D corresponds to a word used in any of the documents and every column to a document. The entries of the matrix are the number of occurrences of a word in a document. The dataset contains \(m \approx 300{,}000\) articles that consist of \(n \approx 102{,}660\) words (or keywords) with \(\approx 70\) million non-zeros in the dataset matrix D. We seek to calculate a few sparse principal vectors, each of which will hopefully correspond to a user-interpretable category, like politics, economics, the arts etc.

The usefulness of Sparse PCA on this dataset has been already demonstrated in the literature. Both [28, 35] have generated a set of five sparse principal vectors, each of which is a linear combination of five words. The results of [28] are listed in Table 1. The five resulting principal vectors have distinct, user interpretable associated meanings that correspond to sports, economics, politics, education, and US foreign policy. The resulting words can be used to create user interpretable categories of the documents [35]. However, this interpretability is in general lost when we increase the cardinality of the sparse principal vectors, i.e. when we require more representative words for each category. For example, the result of applying TPower [34] (a state-of-the art algorithm for Sparse PCA) in the NYtimes dataset to generate a sparse principal vector that consists of 30 words, listed in Table 2, does not have a distinct associated meaning. For reference, TPower runs in 4.91 s on a standard Laptop with an Intel i7-5557U CPU and 8 GB of memory and has a resulting variance of 58.79. Another popular algorithm, \(\ell _1-\)GPower, runs in 355.86 s and results in a variance of 56.66. The results of our active-set algorithm are better than \(\ell _1-\)GPowerFootnote 8 and comparable (slightly inferior) to TPower: our algorithm runs in 5.22 s and has a resulting variance equal to 58.23. However, our algorithm is more general, in the sense that it can solve any problem of the form (P) which, as we proceed to show, allows for variants of (31) that can generate principal vectors with more meaningful interpretation.

Table 1 Results of [28] on applying sparse PCA on the NYtimes dataset
Table 2 Results of applying sparse PCA on the NYtimes dataset with TPower, allowing weights with mixed signs
Table 3 Results of applying non-negative sparse PCA on the NYtimes dataset with the suggested active-set algorithm

Indeed, improving interpretability of the results of Sparse PCA can be achieved by the incorporation of additional constraints in the underlying optimization problem. Unfortunately, most of the algorithms that are specific for sparse PCA do not allow for the presence of extra constraints. An exception is perhaps GPower of [23], which is a generic framework for minimizing a concave function f over a compact subset \(\mathcal {F}\). The iterates of GPower are generated as

$$\begin{aligned} \begin{array}{rll} x_{k+1} = &{}\text {argmin} &{} x^T f'(x_k) \\ &{}\text {subject to} &{} x \in \mathcal {F} \end{array} \end{aligned}$$
(33)

where \(f'(\cdot )\) is a subgradient of \(f(\cdot )\). [23] showed that for the case of Sparse PCA of (31) the above optimization problem reduces to two matrix multiplications with D and \(D^T\). However, in the presence of general linear inequality constraints, solving (33) can be at least as hard as solving a linear program (LP), which is obviously considerably more computationally demanding than two matrix multiplications.

In contrast, due to the generality of our approach, imposing any linear constraint is straightforward for our algorithm. To demonstrate this, we impose a non-negativity constraint on the elements of the principal vector to avoid cases where the principal vectors consist of elements that have opposing meanings which happens, for example, in the results of TPower listed in Table 2. Table 3 lists five sparse principal vectors obtained with this approach which take 24.37 s to compute on a standard Laptop with an Intel i7-5557U CPU and 8GB of memory. Note that, unlike the results of Table 2, the principal vectors of Table 3 have a clear associated meaning that can be labeled as sports, economics, politics, US foreign policy and education. These words could for example be used for organizing the documents in a user interpretable way [35].

5 Conclusion

This paper introduced an active-set algorithm for the solution of quadratic functions subject to linear constraints and a single norm constraint. The suggested algorithm is based on repeated solutions of the TRS, for which we derived novel theoretical results regarding its local-nonglobal minimizer. The usefulness of the suggested algorithm was demonstrated in a range of real world experiments.