1 Introduction

1.1 Primal-dual pair of convex quadratic programming problems

In this paper, we consider the following primal-dual pair of linearly constrained convex quadratic programming problems, in the standard form:

$$\begin{aligned} \text {min}_{x} \ \left( c^Tx + \frac{1}{2}x^T Q x \right) , \ \ \text {s.t.} \ Ax = b, \ x \ge 0, \end{aligned}$$
(P)
$$\begin{aligned} \text {max}_{x,y,z} \ \left( b^Ty - \frac{1}{2}x^T Q x\right) , \ \ \text {s.t.}\ -Qx + A^Ty + z = c,\ z\ge 0, \end{aligned}$$
(D)

with \(c,x,z \in {\mathbb {R}}^n\), \(b,y \in {\mathbb {R}}^m\), \(A \in {\mathbb {R}}^{m\times n}\), and symmetric positive semi-definite \(Q \in {\mathbb {R}}^{n \times n}\) (i.e. \(Q \succeq 0\)). Without loss of generality we assume that \(m \le n\). Duality, between the problems (P)–(D), arises by introducing the Lagrangian function of the primal, using \(y \in {\mathbb {R}}^m\) and \(z \in {\mathbb {R}}^n,\ z \ge 0\), as the Lagrange multipliers for the equality and inequality constraints, respectively. Hence, we obtain:

$$\begin{aligned} {\mathcal {L}}(x,y,z) :=c^T x + \frac{1}{2}x^T Q x - y^T(Ax - b) - z^T x. \end{aligned}$$
(1)

Using the Lagrangian function, one can formulate the first-order optimality conditions [known as Karush–Kuhn–Tucker (KKT) conditions] for this primal-dual pair. In particular, we define the vector \(w = (x^T,y^T,z^T)^T\), and compute the gradient of \({\mathcal {L}}(w)\). Using \(\nabla {\mathcal {L}}(w)\), as well as the complementarity conditions, we may define a function \(F(w): {\mathbb {R}}^{2n + m} \mapsto {\mathbb {R}}^{2n+m}\), using which, we write the KKT conditions as follows:

$$\begin{aligned} F(w) :=\begin{bmatrix} c + Qx - A^T y - z \\ Ax - b\\ XZe_n\\ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0\\ \end{bmatrix}, \qquad (x,z) \ge (0,0), \end{aligned}$$
(2)

where \(e_n\) denotes the vector of ones of size n, while \(X,\ Z \in {\mathbb {R}}^{n \times n}\) denote the diagonal matrices satisfying \(X^{ii} = x^i\) and \(Z^{ii} = z^i\), for all \(i \in \{1,\ldots ,n\}\). For the rest of this manuscript, superscripts will denote the components of a vector/matrix. For simplicity of exposition, when referring to convex quadratic programming problems, we implicitly assume that the problems are linearly constrained. If both (P) and (D) are feasible problems, it can easily be verified that there exists an optimal primal-dual triple (xyz), satisfying the KKT optimality conditions of this primal-dual pair (see for example [7, Proposition 2.3.4]).

1.2 A primal-dual interior point method

Primal-dual Interior Point Methods (IPMs) are popular for simultaneously solving (P) and (D), iteratively. As indicated in the name, primal-dual IPMs act on both primal and dual variables. There are numerous variants of IPMs and the reader is referred to [17] for an extended literature review. In this paper, an infeasible primal-dual IPM is presented. Such methods are called infeasible because they allow intermediate iterates of the method to be infeasible for (P)–(D), in contrast to feasible IPMs, which require intermediate iterates to be strictly feasible.

Interior point methods handle the non-negativity constraints of the problems with logarithmic barriers in the objective. That is, at each iteration, we choose a barrier parameter \(\mu \) and form the logarithmic barrier primal-dual pair:

$$\begin{aligned}&\text {min}_{x} \ \left( c^Tx + \frac{1}{2}x^T Q x - \mu \sum _{j=1}^n \ln x^j \right) \ \ \text {s.t.} \ Ax = b, \end{aligned}$$
(3)
$$\begin{aligned}&\text {max}_{x,y,z} \ \left( \ b^Ty - \frac{1}{2}x^T Q x + \mu \sum _{j=1}^n \ln z^j \right) \ \ \text {s.t.} \ -Qx + A^Ty + z = c, \end{aligned}$$
(4)

in which non-negativity constraints \(x>0\) and \(z>0\) are implicit. We form the KKT optimality conditions of the pair (3)–(4), by introducing the Lagrangian of the primal barrier problem:

$$\begin{aligned} {\mathcal {L}}^{\text {IPM}}_{\mu }(x,y) :=c^T x + \frac{1}{2} x^T Q x -y^T (Ax - b) - \mu \sum _{i=1}^n \ln x^j. \end{aligned}$$

Equating the gradient of the previous function to zero, gives the following conditions:

$$\begin{aligned} \begin{aligned} \nabla _x {\mathcal {L}}^{\text {IPM}}_{\mu } (x,y) =&\ c + Qx - A^T y -\mu X^{-1} e_n = 0, \\ \nabla _y {\mathcal {L}}^{\text {IPM}}_{\mu } (x,y) =&\ Ax - b = 0.\\ \end{aligned} \end{aligned}$$

Using the variable \(z = \mu X^{-1}e_n\), the final conditions read as follows:

$$\begin{aligned} \begin{aligned} c + Qx - A^T y - z&= 0\\ Ax - b&= 0\\ XZe_n - \mu e_n&= 0. \end{aligned} \end{aligned}$$

At each IPM iteration, we want to approximately solve the following mildly non-linear system:

$$\begin{aligned} F^{\text {IPM}}_{\sigma ,\mu }(w) :=\begin{bmatrix} c + Qx - A^Ty - z \\ b - Ax \\ XZe_n-\sigma \mu e_n\\ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}, \end{aligned}$$
(5)

where \(F^{\text {IPM}}_{\sigma ,\mu }(w) = 0\) is a slightly perturbed form of the previously presented optimality conditions. In particular, \(\sigma \in (0,1)\) is a centering parameter which determines how fast \(\mu \) will be forced to decrease at the next IPM iteration. For \(\sigma = 1\), we recover the barrier optimality conditions, while for \(\sigma = 0\) we recover the initial problems’ optimality conditions (2). The efficiency of the method depends heavily on the choice of \(\sigma \). In fact, various improvements of the traditional IPM schemes have been proposed in the literature, which solve the previous system for multiple carefully chosen values of the centering parameter \(\sigma \) and of the right hand side, at each IPM iteration. These are the so called predictor-corrector schemes, proposed for the first time in [23]. Various extensions of such methods have been proposed and analyzed in the literature (e.g. see [16, 35] and references therein). However, for simplicity of exposition, we will follow the traditional approach; \(\sigma \) is chosen heuristically at each iteration, and the previous system is solved only once.

In order to approximately solve the system \(F^{\text {IPM}}_{\sigma ,\mu }(w) = 0\) for each value of \(\mu \), the most common approach is to apply Newton’s method. Newton’s method is favored for systems of this form, due to the self-concordance of the function \(\ln (\cdot )\). For more details on this subject, the interested reader is referred to [25, Chapter 2]. At the beginning of the kth iteration of the IPM, for \(k \ge 0\), we have available an iterate \(w_k = (x_k^T,y_k^T,z_k^T)^T\), and a barrier parameter \(\mu _k\), defined as \(\mu _k = \frac{x_k^Tz_k}{n}\). We choose a value for the centering parameter \(\sigma _k \in (0,1)\) and form the Jacobian of \(F^{\text {IPM}}_{\sigma _k, \mu _k}(\cdot )\), evaluated at \(w_k\). Then, the Newton direction is determined by solving the following system of equations:

$$\begin{aligned} \begin{bmatrix} -\,Q &{}\quad A^T &{}\quad I\\ A &{}\quad 0 &{}\quad 0\\ Z_k &{}\quad 0 &{}\quad X_k \end{bmatrix} \begin{bmatrix} \varDelta x_k\\ \varDelta y_k\\ \varDelta z_k \end{bmatrix} = - \begin{bmatrix} -\,c - Qx_k + A^T y_k + z_k\\ Ax_k - b\\ X_kZ_ke_n - \sigma _{k} \mu _k e_n \end{bmatrix}. \end{aligned}$$
(6)

Notice that as \(\mu _k \rightarrow 0\), the optimal solution of (3)–(4) converges to the optimal solution of (P)–(D). Polynomial convergence of such methods has been established multiple times in the literature (see for example [36, 37] and the references therein).

1.3 Proximal point methods

1.3.1 Primal proximal point method

One possible method for solving the primal problem (P), is the so called proximal point method. Given an arbitrary starting point \(x_0\), the kth iteration of the method is summarized by the following minimization problem:

$$\begin{aligned} x_{k+1} = \text {arg}\min _x \left\{ c^T x + \frac{1}{2}x^T Q x + \frac{\mu _k}{2}\Vert x - x_k\Vert _2^2,\ \ \text {s.t.} \ Ax = b,\ x \ge 0\right\} , \end{aligned}$$

where \(\mu _k > 0\) is a non-increasing sequence of penalty parameters, and \(x_k\) is the current estimate for an optimal solution of (P) (the use of \(\mu _k\) is not a mistake; as will become clearer in Sect. 2, we intend to use the barrier parameter \(\mu _k\) to control both the IPM and the proximal method). Notice that such an algorithm is not practical, since we have to solve a sequence of sub-problems of similar difficulty to that of (P). Nevertheless, the proximal method contributes the \(\mu _k I\) term to the Hessian of the objective, and hence the sub-problems are strongly convex. This method is known to achieve a linear convergence rate (and possibly superlinear if \(\mu _k \rightarrow 0\) at a suitable rate), as shown in [13, 31], among others, even in the case where the minimization sub-problems are solved approximately. Various extensions of this method have been proposed in the literature. For example, one could employ different penalty functions other than the typical 2-norm penalty presented previously (e.g. see [9, 11, 19]).

Despite the fact that such algorithms are not practical, they have served as a powerful theoretical tool and a basis for other important methods. For an extensive review of the applications of standard primal proximal point methods, we refer the reader to [28].

1.3.2 Dual proximal point method

It is a well-known fact, observed for the first time in [31], that the application of the proximal point method on the dual problem, is equivalent to the application of the augmented Lagrangian method on the primal problem, which was proposed for the first time in [18, 30]. In view of the previous fact, we will present the derivation of the augmented Lagrangian method for the primal problem (P), while having in mind that an equivalent reformulation of the model results in the proximal point method for the dual (D) (see [8, Chapter 3.4.4] or [12, 31]).

We start by defining the augmented Lagrangian function of (P). Given an arbitrary starting guess \(y_0\) for the dual variable, the augmented Lagrangian function is defined at the kth iteration as:

$$\begin{aligned} {\mathcal {L}}^{\text {ALM}}_{\mu _k}(x;y_k) :=c^Tx + \frac{1}{2}x^T Q x -y_k^T (Ax - b) + \frac{1}{2\mu _k}\Vert Ax-b\Vert _2^2, \end{aligned}$$
(7)

where \(\mu _k > 0\) is a non-increasing sequence of penalty parameters and \(y_k\) is the current estimate of an optimal Lagrange multiplier vector. The augmented Lagrangian method (ALM) is summarized below:

$$\begin{aligned} \begin{aligned} x_{k+1} =&\ \text {arg}\min _{x} \big \{ {\mathcal {L}}^{\text {ALM}}_{\mu _k}(x;y_k), \ \ \text {s.t.}\ x \ge 0\big \}, \\ y_{k+1} =&\ y_k - \frac{1}{\mu _k} (Ax_{k+1} - b). \end{aligned} \end{aligned}$$

Notice that the update of the Lagrange multiplier estimates can be interpreted as the application of the dual ascent method. A different type of update would be possible, however, the presented update is cheap and effective, due to the strong concavity of the proximal (“regularized”) dual problem (since \(\mu _k > 0\)). Convergence and iteration complexity of such methods have been established multiple times (see for example [21, 31]). There is a vast literature on the subject of augmented Lagrangian methods. For a plethora of technical results and references, the reader is referred to [6]. For convergence results of various extended versions of the method, the reader is referred to [11].

It is worth noting here that a common issue, arising when using augmented Lagrangian methods, is that of the choice of the penalty parameter. To the authors’ knowledge, an optimal tuning for this parameter is still unknown.

1.3.3 Proximal method of multipliers

In [31], the author presented, for the first time, the proximal method of multipliers (PMM). The method consists of applying both primal and dual proximal point methods for solving (P)–(D). For an arbitrary starting point \((x_0,y_0)\), and using the already defined augmented Lagrangian function, given in (7), PMM can be summarized by the following iteration:

$$\begin{aligned} \begin{aligned} x_{k+1} =&\ \text {arg}\min _x \left\{ {\mathcal {L}}^{\text {ALM}}_{\mu _k}(x;y_k) + \frac{\mu _k}{2}\Vert x-x_k\Vert _2^2,\ \text {s.t.}\ x \ge 0\right\} ,\\ y_{k+1} =&\ y_k - \frac{1}{\mu _k} (Ax_{k+1} - b). \end{aligned} \end{aligned}$$
(8)

One can observe that at every iteration of the method, the primal problem that we have to solve is strongly convex, while its dual, strongly concave. As shown in [31], the addition of the term \(\frac{\mu _k}{2}\Vert x - x_k\Vert _2^2\) in the augmented Lagrangian method does not affect its convergence rate, while ensuring better numerical behaviour of its iterates. An extension of this algorithm, allowing for the use of more general penalties, can be found in [11].

We can write (8) equivalently by making use of the maximal monotone operator \(T_{{\mathcal {L}}}: {\mathbb {R}}^{n+m} \rightrightarrows {\mathbb {R}}^{n+m}\) associated to (P)–(D) (see [31, 32]), that is defined as:

$$\begin{aligned} T_{{\mathcal {L}}}(x,y) :=\{(u,v): v \in Qx + c - A^Ty + \partial \delta _+(x), u = Ax-b \}, \end{aligned}$$
(9)

where \(\delta _+(\cdot )\) is an indicator function defined as:

$$\begin{aligned} \delta _+(x) :={\left\{ \begin{array}{ll} \infty &{}\quad \text {if } \exists \ j :x^j < 0, \\ 0 &{}\quad \text {otherwise,} \\ \end{array}\right. } \end{aligned}$$
(10)

and \(\partial (\cdot )\) denotes the sub-differential of a function. In our case, we have that (see [33, Section 23]):

$$\begin{aligned} z \in \partial \delta _+(x) \Leftrightarrow {\left\{ \begin{array}{ll} z^j = 0 &{} \text {if } x^j > 0,\\ z^j \le 0 &{} \text {if } x^j = 0. \end{array}\right. },\ \forall \ j = \{1,\ldots ,n\}. \end{aligned}$$

By convention, if there exists a component j such that \(x_j < 0\), we have that \(\partial \delta _+(x) = \{\emptyset \}\). Given a bounded pair \((x^*,y^*)\) such that \(0 \in T_{{\mathcal {L}}}(x^*,y^*)\), we can retrieve a vector \(z^* \in \partial \delta _+(x^*)\), using which \((x^*,y^*,-z^*)\) is an optimal solution for (P)–(D). By letting:

$$\begin{aligned} P_k :=\left( I_{(m+n)} + \frac{1}{\mu _k}T_{{\mathcal {L}}}\right) ^{-1}, \end{aligned}$$
(11)

we can express (8) as:

$$\begin{aligned} (x_{k+1},y_{k+1}) = P_k(x_k,y_k), \end{aligned}$$
(12)

and it can be shown that \(P_k\) is single valued and non-expansive (see [32]).

1.4 Regularization in interior point methods

In the context of interior point methods, it is often beneficial to include a regularization in order to improve the spectral properties of the system matrix in (6). For example, notice that if the constraint matrix A is rank-deficient, then the matrix in (6) might not be invertible. The latter can be immediately addressed by the introduction of a dual regularization, say \(\delta > 0\), ensuring that \(\text {rank}([A\ |\ \delta I_m] ) = m\). For a detailed analysis of the effect of dual regularization on such systems, the reader is referred to [2]. On the other hand, the most common and efficient approach in practice is to eliminate the variables \(\varDelta z\) from system (6), and solve the following symmetric reduced (augmented) system instead:

$$\begin{aligned} \begin{bmatrix} -(Q+\varTheta _k^{-1}) &{}\quad A^T \\ A &{}\quad 0 \end{bmatrix} \begin{bmatrix} \varDelta x_k\\ \varDelta y_k \end{bmatrix} = \begin{bmatrix} c + Qx_k - A^T y_k - \sigma _{k} \mu _k X_k^{-1} e\\ b-Ax_k \end{bmatrix}, \end{aligned}$$
(13)

where \(\varTheta _k = X_k Z_k^{-1}\). Since \(X_k z_k \rightarrow 0\) close to optimality, one can observe that the matrix \(\varTheta _k\) will contain some very large and some very small elements. Hence, the matrix in (13) will be increasingly ill-conditioned, as we approach optimality. The introduction of a primal regularization, say \(\rho > 0\), can ensure that the matrix \(Q+\varTheta _k^{-1} + \rho I_n\) will have eigenvalues that are bounded away from zero, and hence a significantly better worst-case conditioning than that of \(Q+\varTheta _k^{-1}\). In other words, regularization is commonly employed in IPMs, as a means of improving robustness and numerical stability (see [1]). As we will discuss later, by introducing regularization in IPMs, one can also gain efficiency. This is because regularization transforms the matrix in (13) into a quasi-definite one. It is known that such matrices can be factorized efficiently (see [34]).

In view of the previous discussion, one can observe that a very natural way of introducing primal regularization to problem (P), is through the application of the primal proximal point method. Similarly, dual regularization can be incorporated through the application of the dual proximal point method. This is a well-known fact. The authors in [1] presented a primal-dual regularized IPM, and interpreted this regularization as the application of the proximal point method. Consequently, the authors in [12] developed a primal-dual regularized IPM, which applies PMM to solve (P), and a single IPM iteration is employed for approximating the solution of each PMM sub-problem. There, global convergence of the method was proved, under some assumptions. A variation of the method proposed in [12] is given in [29], where general non-diagonal regularization matrices are employed, as a means of further improving factorization efficiency.

Similar ideas have been applied in the context of IPMs for general non-linear programming problems. For example, the authors in [3] presented an interior point method combined with the augmented Lagrangian method, and proved that, under some general assumptions, the method converges at a superlinear rate. Similar approaches can be found in [4, 5, 14], among others.

In this paper, we develop a path-following primal-dual regularized IPM for solving convex quadratic programming problems. The algorithm is obtained by applying one or a few iterations of the infeasible primal-dual interior point method in order to solve sub-problems arising from the primal-dual proximal point method. Under standard assumptions, we prove polynomial complexity of the algorithm and provide global convergence guarantees. To our knowledge, this is the first polynomial complexity result for a general primal-dual regularized IPM scheme. Notice that a complexity result is given for a primal regularized IPM for linear complementarity problems in [38]. However, the authors significantly alter the Newton directions, making their method very difficult to generalize and hard to achieve efficiency in practice. An important feature of the presented method is that it makes use of only one penalty parameter, that is the logarithmic penalty term. The aforementioned penalty has been extensively studied and understood, and as a result, IPMs achieve fast and reliable convergence in practice. This is not the case for the penalty parameter of the proximal methods. In other words, IP-PMM inherits the fast and reliable convergence properties of IPMs, as well as the strong convexity of the PMM sub-problems, hence improving the conditioning of the Newton system solved at each IPM iteration, while providing a reliable tuning for the penalty parameter, independently of the problem at hand. The proposed approach is implemented and its reliability is demonstrated through extensive experimentation. The implementation slightly deviates from the theory, however, most of the theoretical results are verified in practice. The main purpose of this paper is to provide a reliable method that can be used for solving general convex quadratic problems, without the need of pre-processing, or of previous knowledge about the problems. The implemented method is supported by a novel theoretical result, indicating that regularization alleviates various issues arising in IPMs, without affecting their important worst-case polynomial complexity. As a side-product of the theory, an implementable infeasibility detection mechanism is also derived and tested in practice.

Before completing this section, let us introduce some notation that is used in the rest of this paper. An iteration of the algorithm is denoted by \(k \in {\mathbb {N}}\). Given an arbitrary matrix (vector) A (x, respectively), \(A_k\) (or \(x_k\)) denotes that A (resp., x) depends on the iteration k. An optimal solution to the pair (P)–(D) will be denoted as \((x^*,y^*,z^*)\). Optimal solutions of different primal-dual pairs will be denoted using an appropriate subscript, in order to distinguish them. For example, we use the notation \((x_r^*,y_r^*,z_r^*)\), for representing an optimal solution for a PMM sub-problem. The subscript is employed for distinguishing the “regularized” solution, from the solution of the initial problem, that is \((x^*,y^*,z^*)\). Any norm (semi-norm respectively) is denoted by \(\Vert \cdot \Vert _{\nu }\), where \(\nu \) is used to distinguish between different norms. For example, the 2-norm (Euclidean norm) will be denoted as \(\Vert \cdot \Vert _2\). When a given scalar is assumed to be independent of n and m, we mean that this quantity does not depend on the problem dimensions. Given two vectors of the same size x, y, \(x \ge y\) denotes that the inequality holds component-wise. Given two logical statements \(T_1,\ T_2\), the condition \(T_1 \wedge T_2\) is true only when both \(T_1\) and \(T_2\) are true. Let an arbitrary matrix A be given. The maximum (minimum) singular value of A is denoted as \(\eta _{\max }(A)\) (\(\eta _{\min }(A)\)). Similarly, if A is square, its maximum eigenvalue is denoted as \(\nu _{\max }(A)\). Given a set of indices, say \({\mathcal {B}} \subset \{1,\ldots ,n\}\), and an arbitrary vector (or matrix) \(x \in {\mathbb {R}}^n\) (\(A \in {\mathbb {R}}^{n \times n}\)), \(x^{{\mathcal {B}}}\) (\(A^{{\mathcal {B}}}\), respectively) denotes that sub-vector (sub-matrix) containing the elements of x (columns and rows of A) whose indices belong to \({\mathcal {B}}\). Finally, the cardinality of \({\mathcal {B}}\) is denoted as \(|{\mathcal {B}}|\).

The rest of the paper is organized as follows. In Sect. 2, we provide the algorithmic framework of the method. Consequently, in Sect. 3, we prove polynomial complexity of the algorithm, and global convergence is established. In Sect. 4, a necessary condition for infeasibility is derived, which is later used to construct an infeasibility detection mechanism. Numerical results of the implemented method are presented and discussed in Sect. 5. Finally, we derive some conclusions in Sect. 6.

2 Algorithmic framework

In the previous section, we presented all the necessary tools for deriving the proposed Interior Point-Proximal Method of Multipliers (IP-PMM). Effectively, we would like to merge the proximal method of multipliers with the infeasible interior point method. For that purpose, assume that, at some iteration k of the method, we have available an estimate \(\lambda _k\) for a Lagrange multiplier vector. Similarly, we denote by \(\zeta _k\) the estimate of a primal solution. We define the proximal penalty function that has to be minimized at the kth iteration of PMM, for solving (P), given the estimates \(\lambda _k,\ \zeta _k\), as:

$$\begin{aligned} {\mathcal {L}}^{PMM}_{\mu _k} (x;\zeta _k, \lambda _k) :=c^Tx + \frac{1}{2}x^T Q x -\lambda _k^T (Ax - b) + \frac{1}{2\mu _k}\Vert Ax-b\Vert _2^2 + \frac{\mu _k}{2}\Vert x-\zeta _k\Vert _2^2, \end{aligned}$$
(14)

with \(\mu _k > 0\) some sequence of non-increasing penalty parameters. In order to solve the PMM sub-problem (8), we will apply one (or a few) iterations of the previously presented infeasible IPM. To do that, we alter (14), by including logarithmic barriers, that is:

$$\begin{aligned} {\mathcal {L}}^{IP-PMM}_{\mu _k} (x;\zeta _k, \lambda _k) :={\mathcal {L}}^{PMM}_{\mu _k} (x;\zeta _k, \lambda _k) - \mu _k \sum _{j=1}^n \ln x^j, \end{aligned}$$
(15)

and we treat \(\mu _k\) as the barrier parameter. In order to form the optimality conditions of this sub-problem, we equate the gradient of \({\mathcal {L}}^{IP-PMM}_{\mu _k}(\cdot ;\lambda _k, \zeta _k)\) to the zero vector, i.e.:

$$\begin{aligned} c + Qx - A^T \lambda _k + \frac{1}{\mu _k}A^T(Ax - b) + \mu _k (x - \zeta _k) - \mu _k X^{-1}e_n = 0. \end{aligned}$$

Following the developments in [3], we can define the variables \(y = \lambda _k - \frac{1}{\mu _k}(Ax - b)\) and \(z = \mu _k X^{-1}e_n\), to get the following (equivalent) system of equations (first-order optimality conditions):

$$\begin{aligned} \begin{bmatrix} c + Qx - A^T y - z + \mu _k (x-\zeta _k)\\ Ax + \mu _k (y - \lambda _k) - b\\ Xz - \mu _k e_n \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}. \end{aligned}$$
(16)

Let us introduce the following notation, that will be used later.

Definition 1

Let two real-valued positive functions be given: \(T(x) : {\mathbb {R}}_+ \mapsto {\mathbb {R}}_+,\ f(x): {\mathbb {R}}_+ \mapsto {\mathbb {R}}_+\), with \({\mathbb {R}}_+ :=\{x \in {\mathbb {R}} : x \ge 0 \}\). We say that:

  • \(T(x) = O(f(x))\) (as \(x \rightarrow \infty \)) if and only if, there exist constants \(c>0,\ x_0 \ge 0\), such that:

    $$\begin{aligned} T(x) \le c f(x),\ \ \text {for all}\ x \ge x_0. \end{aligned}$$
  • \(T(x) = \varOmega (f(x))\) if and only if, there exist constants \(c>0,\ x_0 \ge 0\), such that:

    $$\begin{aligned} T(x) \ge c f(x), \ \ \text {for all}\ x \ge x_0. \end{aligned}$$
  • \(T(x) = \varTheta (f(x))\) if and only if, \(T(x) = O(f(x))\) and \(T(x) = \varOmega (f(x)).\)

Next, given two arbitrary vectors \(b \in {\mathbb {R}}^m,\ c \in {\mathbb {R}}^n\), we define the following semi-norm:

$$\begin{aligned} \Vert (b,c)\Vert _{{\mathcal {A}}} :=\min _{x,y,z}\big \{\Vert (x,z)\Vert _2\ :\ Ax = b, -Qx + A^Ty + z = c\big \}. \end{aligned}$$
(17)

This semi-norm has been used before in [24], as a way to measure infeasibility for the case of linear programming problems (\(Q = 0\)). For a discussion on the properties of the aforementioned semi-norm, as well as how one can evaluate it (using the QR factorization of A), the reader is referred to [24, Section 4].

Starting point At this point, we are ready to define the starting point for IP-PMM. For that, we set \((x_0,z_0) = \rho (e_n,e_n)\), for some \(\rho > 0\). We also set \(y_0\) to some arbitrary vector (e.g. \(y_0 = e_m\)), such that \(\Vert y_0\Vert _{\infty } = O(1)\) (i.e. the absolute value of its entries is independent of n and m), and \(\mu _0 = \frac{x_0^T z_0}{n}\). Then we have:

$$\begin{aligned} Ax_0 = b + {\bar{b}},\ -Qx_0 + A^Ty_0 + z_0 = c + {\bar{c}},\ \zeta _0 = x_0,\ \lambda _0 = y_0. \end{aligned}$$
(18)

for some vectors \({\bar{b}},\ {\bar{c}}\).

Neighbourhood We mentioned earlier that we develop a path-following method. Hence, we have to describe a neighbourhood in which the iterations of the method should lie. However, unlike typical path-following methods, we define a family of neighbourhoods that depend on the PMM sub-problem parameters.

Given the starting point in (18), penalty \(\mu _k\), and estimates \(\lambda _k, \zeta _k\), we define the following regularized set of centers:

$$\begin{aligned} {\mathcal {P}}_{\mu _k}(\zeta _k,\lambda _k) :=\big \{(x,y,z)\in {\mathcal {C}}_{\mu _k}(\zeta _k,\lambda _k)\ :\ (x,z) > (0,0),\ Xz = \mu _k e_n \big \}, \end{aligned}$$

where

$$\begin{aligned} {\mathcal {C}}_{\mu _k}(\zeta _k,\lambda _k) :=\bigg \{(x,y,z)\ :\ \begin{matrix} Ax + \mu _k(y-\lambda _k) = b + \frac{\mu _k}{\mu _0} {\bar{b}},\\ -Qx + A^Ty + z - \mu _k (x- \zeta _k) = c + \frac{\mu _k}{\mu _0}{\bar{c}} \end{matrix} \bigg \}, \end{aligned}$$

and \({\bar{b}},\ {\bar{c}}\) are as in (18). The term set of centers originates from [24], where a similar set is studied.

In order to enlarge the previous set, we define the following set:

$$\begin{aligned} \tilde{{\mathcal {C}}}_{\mu _k}(\zeta _k,\lambda _k) :=\Bigg \{(x,y,z)\ :\ \begin{matrix} Ax + \mu _k(y-\lambda _k) = b + \frac{\mu _k}{\mu _0} ({\bar{b}}+{\tilde{b}}_{k}),\\ -Qx + A^Ty + z - \mu _k (x- \zeta _k) = c + \frac{\mu _k}{\mu _0}({\bar{c}}+{\tilde{c}}_{k})\\ \Vert ({\tilde{b}}_{k},{\tilde{c}}_{k})\Vert _2 \le C_N,\ \Vert ({\tilde{b}}_{k},{\tilde{c}}_{k})\Vert _{{\mathcal {A}}} \le \gamma _{{\mathcal {A}}} \rho \end{matrix} \Bigg \}, \end{aligned}$$

where \(C_N > 0\) is a constant, \(\gamma _{{\mathcal {A}}} \in (0,1)\), and \(\rho >0\) is as defined in the starting point. The vectors \({\tilde{b}}_{k}\) and \({\tilde{c}}_{k}\) represent the current scaled (by \(\frac{\mu _0}{\mu _k}\)) infeasibility and vary depending on the iteration k. In particular, these vectors can be formally defined recursively, depending on the iterations of IP-PMM. However, such a definition is not necessary for the developments to follow. In essence, the only requirement is that these scaled infeasibility vectors are bounded above by some constants, with respect to the 2-norm as well as the semi-norm defined in (17). Using the latter set, we are now ready to define a family of neighbourhoods:

$$\begin{aligned} {\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k) :=&\big \{(x,y,z) \in \tilde{{\mathcal {C}}}_{\mu _k}(\zeta _k,\lambda _k)\ :\ (x,z) >(0,0),\ \nonumber \\&\quad x^i z^i \ge \gamma _{\mu } \mu _k,\ i \in \{1,\ldots ,n\}\big \}, \end{aligned}$$
(19)

where \(\gamma _{\mu } \in (0,1)\) is a constant preventing component-wise complementarity products from approaching zero faster than \(\mu _k = \frac{x_k^T z_k}{n}\). Obviously, the starting point defined in (18) belongs to the neighbourhood \({\mathcal {N}}_{\mu _0}(\zeta _0,\lambda _0)\), with \(({\tilde{b}}_{0},{\tilde{c}}_{0}) = (0,0)\). Notice from the definition of the neighbourhood, that it depends on the choice of the constants \(C_N\), \(\gamma _{{\mathcal {A}}}\), \(\gamma _{\mu }\). However, as the neighbourhood also depends on the parameters \(\mu _k,\ \lambda _k,\ \zeta _k\), we omit the dependence on the constants, for simplicity of notation.

Newton system At every IP-PMM iteration, we approximately solve a perturbed form of the conditions in (16), by applying a variation of Newton’s method. In particular, we form the Jacobian of the left-hand side of (16) and we perturb the right-hand side of the Newton equation as follows:

$$\begin{aligned} \begin{aligned}&\begin{bmatrix} -\,(Q + \mu _k I_n) &{}\quad A^T &{}\quad I\\ A &{}\quad \mu _k I_m &{}\quad 0\\ Z_k &{} \quad 0 &{}\quad X_k \end{bmatrix} \begin{bmatrix} \varDelta x_k\\ \varDelta y_k\\ \varDelta z_k \end{bmatrix} \\&\quad =- \begin{bmatrix} - (c + \frac{\sigma _k \mu _k}{\mu _0}{\bar{c}}) - Qx_k + A^T y_k + z_k -\sigma _k\mu _k (x_k - \zeta _k)\\ Ax_k +\sigma _k\mu _k (y_k - \lambda _k)- (b +\frac{\sigma _k \mu _k}{\mu _0}{\bar{b}}) \\ X_kZ_ke_n - \sigma _{k} \mu _k e_n \end{bmatrix}, \end{aligned} \end{aligned}$$
(20)

where \({\bar{b}},\ {\bar{c}}\) are as in (18). Notice that we perturb the right-hand side of the Newton system in order to ensure that the iterates remain in the neighbourhood (19), while trying to reduce the value of the penalty (barrier) parameter \(\mu _k\).

We are now able to derive Algorithm IP-PMM, summarizing the proposed interior point-proximal method of multipliers. We will prove polynomial complexity of this scheme in the next section, under standard assumptions.

figure a

Notice, in Algorithm IP-PMM, that we force \(\sigma \) to be less than 0.5. This value is set, without loss of generality, for simplicity of exposition. Similarly, in the choice of the step-length, we require that \(\mu _k(\alpha ) \le (1-0.01 \alpha )\mu _k\). The constant 0.01 is chosen for ease of presentation. It depends on the choice of the maximum value of \(\sigma \). The constants \(C_N,\ \gamma _{{\mathcal {A}}},\ \gamma _{\mu }\), are used in the definition of the neighbourhood in (19). Their values can be considered to be arbitrary. The input \(\text {tol}\), represents the error tolerance (chosen by the user). The terminating conditions require the Euclidean norm of primal and dual infeasibility, as well as complementarity, to be less than this tolerance. In such a case, we accept the iterate as a solution triple. The estimates \(\lambda ,\ \zeta \) are not updated if primal or dual infeasibility are not both sufficiently decreased. In this case, we keep the estimates constant while continuing decreasing the penalty parameter \(\mu _k\). Following the usual practice with proximal and augmented Lagrangian methods, we accept a new estimate when the respective residual is sufficiently decreased. However, the algorithm requires the evaluation of the semi-norm defined in (17), at every iteration. While this is not practical, it can be achieved in polynomial time, with respect to the size of the problem. For a detailed discussion on this, the reader is referred to [24, Section 4].

3 Convergence analysis of IP-PMM

In this section we prove polynomial complexity and global convergence of Algorithm IP-PMM. The proof follows by induction on the iterations of IP-PMM. That is, given an iterate \((x_k,y_k,z_k)\) at an arbitrary iteration k, we prove that the next iterate belongs to the appropriate neighbourhood required by the algorithm. In turn, this allows us to prove global and polynomial convergence of IP-PMM. An outline of the proof can be briefly explained as follows:

  • Initially, we present some technical results in Lemmas 13 which are required for the analysis throughout this section.

  • In turn, we prove boundedness of the iterates \((x_k,y_k,z_k)\) of IP-PMM in Lemma 4. In particular we show that \(\Vert (x_k,y_k,z_k)\Vert _2 = O(n)\) and \(\Vert (x_k,z_k)\Vert _1 = O(n)\) for every \(k \ge 0\).

  • Then, we prove boundedness of the Newton direction computed at every IP-PMM iteration in Lemma 5. More specifically, we prove that \(\Vert (\varDelta x_k,\varDelta y_k,\varDelta z_k)\Vert _2 = O(n^3)\) for every \(k \ge 0\).

  • In Lemma 6 we prove the existence of a positive step-length \({\bar{\alpha }}\) so that the new iterate of IP-PMM, \((x_{k+1},y_{k+1},z_{k+1})\), belongs to the updated neighbourhood \({\mathcal {N}}_{\mu _{k+1}}(\zeta _{k+1},\lambda _{k+1})\), for every \(k \ge 0\). In particular, we show that \({\bar{\alpha }} \ge \frac{{\bar{\kappa }}}{n^4}\), where \({\bar{\kappa }}\) is a constant independent of n and m.

  • Q-linear convergence of the barrier parameter \(\mu _k\) (to zero) is established in Theorem 1.

  • The polynomial complexity of IP-PMM is then proved in Theorem 2, showing that it converges to an \(\epsilon \)-accurate solution in at most \(O(n^4|\log \big (\frac{1}{\epsilon }\big )|)\) steps.

  • Finally, global converge to an optimal solution of (P)–(D) is established in Theorem 3.

For the rest of this section, we will make use of the following two assumptions, which are commonly employed when analyzing the complexity of an IPM.

Assumption 1

There exists an optimal solution \((x^*,y^*,z^*)\) for the primal-dual pair (P)–(D), such that \(\Vert x^*\Vert _{\infty } \le C^*\), \(\Vert y^*\Vert _{\infty } \le C^*\) and \(\Vert z^*\Vert _{\infty } \le C^*\), for some constant \(C^* \ge 0\), independent of n and m.

Assumption 2

The constraint matrix of (P) has full row rank, that is \(\text {rank}(A) = m\). Furthermore, we assume that there exist constants \(C_A^1 > 0\), \(C_A^2 > 0\), \(C_Q >0\) and \(C_r > 0\), independent of n and m, such that:

$$\begin{aligned} \eta _{\min }(A) \ge C_A^1,\quad \eta _{\max }(A) \le C_A^2,\quad \nu _{\max }(Q) \le C_Q,\quad \Vert (c,b)\Vert _{\infty }\le C_r. \end{aligned}$$

Note that the independence of the previous constants from the problem dimensions is assumed for simplicity of exposition; this is a common practice when analyzing the complexity of interior point methods. If these constants depend polynomially on n (or m), the analysis still holds by suitably altering the worst-case polynomial bound for the number of iterations of the algorithm.

Let us now use the properties of the proximal operator defined in (11).

Lemma 1

Given Assumption 1, and for all \(\lambda \in {\mathbb {R}}^m\), \(\zeta \in {\mathbb {R}}^n\) and \(0 \le \mu < \infty \), there exists a unique pair \((x_r^*,y_r^*)\), such that \((x_r^*,y_r^*) = P(\zeta ,\lambda ),\) and

$$\begin{aligned} \Vert (x_r^*,y_r^*)-(x^*,y^*)\Vert _2 \le \Vert (\zeta ,\lambda )-(x^*,y^*)\Vert _2, \end{aligned}$$
(21)

where \(P(\cdot )\) is defined as in (11) and \((x^*,y^*)\) is the same as in Assumption 1.

Proof

We know that \(P(\cdot ,\cdot )\) is single-valued and non-expansive ([32]), and hence there exists a unique pair \((x_r^*,y_r^*)\), such that \((x_r^*,y_r^*) = P(\zeta ,\lambda ),\) for all \(\lambda ,\ \zeta \) and \(0 \le \mu < \infty \). Given the optimal triple of Assumption 1, we can use the non-expansiveness of \(P(\cdot )\) in (11), to show that:

$$\begin{aligned} \Vert P(\zeta ,\lambda ) - P(x^*,y^*)\Vert _2 = \Vert (x_r^*,y_r^*)-(x^*,y^*)\Vert _2 \le \Vert (\zeta ,\lambda )-(x^*,y^*)\Vert _2, \end{aligned}$$

where we used the fact that \(P(x^*,y^*) = (x^*,y^*)\), which follows directly from (9), as we can see that \((0,0) \in T_{{\mathcal {L}}}(x^*,y^*)\). This completes the proof. \(\square \)

Lemma 2

Given Assumptions 12, there exists a triple \((x_{r_k}^*,y_{r_k}^*,z_{r_k}^*)\), satisfying:

$$\begin{aligned} \begin{aligned} A x_{r_k}^* + \mu (y_{r_k}^*-\lambda _k) -b&= 0\\ -c-Qx_{r_k}^* + A^T y_{r_k}^* + z_{r_k}^* - \mu (x_{r_k}^* - \zeta _k)&= 0,\\ (x_{r_k}^*)^T(z_{r_k}^*)&= 0, \end{aligned} \end{aligned}$$
(22)

with \(\Vert (x_{r_k}^*,y_{r_k}^*,z_{r_k}^*)\Vert _2 = O(\sqrt{n})\), for all \(\lambda _k \in {\mathbb {R}}^m\), \(\zeta _k \in {\mathbb {R}}^n\) produced by Algorithm IP-PMM, and any \(\mu \in [0,\infty )\). Moreover, we have that \(\Vert (\zeta _k,\lambda _k)\Vert _2 = O(\sqrt{n})\), for all \(k \ge 0\).

Proof

We prove the claim by induction on the iterates, \(k \ge 0\), of Algorithm IP-PMM. At iteration \(k = 0\), we have that \(\lambda _0 = y_0\) and \(\zeta _0 = x_0\). But from the construction of the starting point in (18), we know that \(\Vert (x_0,y_0)\Vert _2 = O(\sqrt{n})\). Hence, \(\Vert (\zeta _0,\lambda _0)\Vert _2 = O(\sqrt{n})\) (assuming that \(n > m\)). From Lemma 1, we know that there exists a unique pair \((x_{r_0}^*,y_{r_0}^*)\) such that:

$$\begin{aligned} (x_{r_0}^*,y_{r_0}^*) = P_0(\zeta _0,\lambda _0),\quad \text {and} \quad \Vert (x_{r_0}^*,y_{r_0}^*) - (x^*,y^*)\Vert _2 \le \Vert (\zeta _0,\lambda _0)-(x^*,y^*)\Vert _2. \end{aligned}$$

Using the triangular inequality, and combining the latter inequality with our previous observations, as well as Assumption 1, yields that \(\Vert (x_{r_0}^*,y_{r_0}^*)\Vert _2 = O(\sqrt{n})\). From the definition of the operator in (12), we know that:

$$\begin{aligned} \begin{aligned} -c - Qx_{r_0}^* + A^T y_{r_0}^* - \mu (x_{r_0}^* - \zeta _k)&\ \in \partial \delta _+(x_{r_0}^*),\\ Ax_{r_0}^* + \mu (y_{r_0}^*-\lambda _k) - b&\ = 0, \end{aligned} \end{aligned}$$

where \(\partial (\delta _+(\cdot ))\) is the sub-differential of the indicator function defined in (10). Hence, we know that there must exist \(-z_{r_0}^* \in \partial \delta _+(x_{r_0}^*)\) [and hence, \(z_{r_0}^* \ge 0\), \((x_{r_0}^*)^T(z_{r_0}^*) = 0\)], such that:

$$\begin{aligned} z_{r_0}^* = c + Qx_{r_0}^* - A^T y_{r_0}^* + \mu (x_{r_0}^* - \zeta _k),\ (x^*_{r_0})^T(z^*_{r_0}) = 0,\ \Vert z_{r_0}^*\Vert _2 = O(\sqrt{n}), \end{aligned}$$

where \(\Vert z_{r_0}^*\Vert _2 = O(\sqrt{n})\) follows from Assumption 2, combined with \(\Vert (x_0,y_0)\Vert _2 = O(\sqrt{n})\).

Let us now assume that at some iteration k of Algorithm IP-PMM, we have \(\Vert (\zeta _k,\lambda _k)\Vert _2 = O(\sqrt{n})\). There are two cases for the subsequent iteration:

1.:

The proximal estimates are updated, that is \((\zeta _{k+1},\lambda _{k+1}) = (x_{k+1},y_{k+1})\), or

2.:

the proximal estimates stay the same, i.e. \((\zeta _{k+1},\lambda _{k+1}) = (\zeta _k,\lambda _k)\).

Case 1. We know by construction that this case can only occur if the following condition is satisfied:

$$\begin{aligned} \Vert (r_p,r_d)\Vert _2 \le C_N \frac{\mu _{k+1}}{\mu _0}, \end{aligned}$$

where \(r_p,\ r_d\) are defined in Algorithm IP-PMM. However, from the neighbourhood conditions in (19), we know that:

$$\begin{aligned} \Vert \big (r_p + \mu _{k+1}(y_{k+1}-\lambda _k), r_d + \mu _{k+1}(x_{k+1}-\zeta _k)\big )\Vert _2 \le C_N \frac{\mu _{k+1}}{\mu _0}. \end{aligned}$$

Combining the last two inequalities by applying the triangular inequality, and using the inductive hypothesis (\(\Vert (\lambda _k,\zeta _k)\Vert _2 = O(\sqrt{n})\)), yields that \(\Vert (x_{k+1},y_{k+1})\Vert _2 = O(\sqrt{n})\). Hence, \(\Vert (\zeta _{k+1},\lambda _{k+1})\Vert _2 = O(\sqrt{n})\). Then, we can invoke Lemma 1, with \(\lambda = \lambda _{k+1}\), \(\zeta = \zeta _{k+1}\) and \(\mu = \mu _{k+1}\), which gives that:

$$\begin{aligned} \Vert (x_{r_{k+1}}^*,y_{r_{k+1}}^*) - (x^*,y^*)\Vert _2 \le \Vert (\zeta _{k+1},\lambda _{k+1})-(x^*,y^*)\Vert _2. \end{aligned}$$

A simple manipulation yields that \(\Vert (x_{r_{k+1}}^*,y_{r_{k+1}}^*)\Vert _2 = O(\sqrt{n})\). As before, we use (12) alongside Assumption 2 to show the existence of \(-z_{r_{k+1}}^* \in \partial \delta _+(x_{r_{k+1}}^*)\), such that the triple \((x_{r_{k+1}}^*,y_{r_{k+1}}^*,z_{r_{k+1}}^*)\) satisfies (22) with \(\Vert z_{r_{k+1}}^*\Vert _2 = O(\sqrt{n})\).

Case 2. In this case, we have \((\zeta _{k+1},\lambda _{k+1}) = (\zeta _k,\lambda _k)\), and hence the inductive hypothesis gives us directly that: \(\Vert (\zeta _{k+1},\lambda _{k+1})\Vert _2 = O(\sqrt{n})\). The same reasoning as before implies the existence of a triple \((x_{r_{k+1}}^*,y_{r_{k+1}}^*,z_{r_{k+1}}^*)\) satisfying (22), with \(\Vert (x_{r_{k+1}}^*,y_{r_{k+1}}^*,z_{r_{k+1}}^*)\Vert _2 = O(\sqrt{n})\). \(\square \) \(\square \)

Lemma 3

Given Assumptions 12, \(\lambda _k\) and \(\zeta _k\) produced at an arbitrary iteration \(k \ge 0\) of Algorithm IP-PMM and any \(\mu \in [0,\infty )\), there exists a triple \(({\tilde{x}},{\tilde{y}},{\tilde{z}})\) which satisfies the following system of equations:

$$\begin{aligned} \begin{aligned} A {\tilde{x}} + \mu {\tilde{y}}&= b + {\bar{b}} + \mu \lambda _k + {\tilde{b}}_k,\\ -Q{\tilde{x}} + A^T {\tilde{y}} + {\tilde{z}} - \mu {\tilde{x}}&= c + {\bar{c}} - \mu \zeta _k + {\tilde{c}}_k,\\ {\tilde{X}}{\tilde{z}}&= \theta e_n, \end{aligned} \end{aligned}$$
(23)

for some arbitrary \(\theta > 0\ (\theta = \varTheta (1))\), with \(({\tilde{x}},{\tilde{z}}) \ge \xi (e_n,e_n)\ (\xi = \varTheta (1))\) and \(\Vert ({\tilde{x}},{\tilde{y}},{\tilde{z}})\Vert _2 = O(\sqrt{n})\), where \({\tilde{b}}_{k},\ {\tilde{c}}_{k}\) are defined in (19), while \({\bar{b}},\ {\bar{c}}\) are defined with the starting point in (18).

Proof

Let \(k \ge 0\) denote an arbitrary iteration of Algorithm IP-PMM. Let also \({\bar{b}},\ {\bar{c}}\) as defined in (18), and \({\tilde{b}}_{k},\ {\tilde{c}}_{k}\), as defined in the neighbourhood conditions in (19). Given an arbitrary positive constant \(\theta > 0\), we consider the following barrier primal-dual pair:

$$\begin{aligned}&\text{ min}_{x} \ \left( (c+{\bar{c}} + {\tilde{c}}_k)^Tx + \frac{1}{2}x^T Q x -\theta \sum _{j=1}^n \ln x^j \right) , \end{aligned}$$
(24)
$$\begin{aligned}&\text{ s.t. } \qquad Ax = b + {\bar{b}} + {\tilde{b}}_k,\nonumber \\&\text {max}_{x,y,z} \ \left( (b + {\bar{b}} + {\tilde{b}}_k)^Ty - \frac{1}{2}x^T Q x+\theta \sum _{j=1}^n \ln z^j\right) , \ \nonumber \\&\text {s.t.}\ -Qx + A^Ty + z = c+{\bar{c}} + {\tilde{c}}_k. \nonumber \\ \end{aligned}$$
(25)

Let us now define the following triple:

$$\begin{aligned} ({\hat{x}},{\hat{y}},{\hat{z}}) :=\arg \min _{(x,y,z)} \big \{\Vert (x,z)\Vert _2: Ax = {\tilde{b}}_k,\ -Qx + A^T y + z = {\tilde{c}}_k \big \}. \end{aligned}$$

From the neighbourhood conditions (19), we know that \(\Vert ({\tilde{b}}_k,{\tilde{c}}_k)\Vert _{{\mathcal {A}}} \le \gamma _{{\mathcal {A}}}\rho \), and from the definition of the semi-norm in (17), we have that: \(\Vert ({\hat{x}},{\hat{z}})\Vert _2 \le \gamma _{{\mathcal {A}}} \rho \). Using (17) alongside Assumption 2, we can also show that \(\Vert {\hat{y}}\Vert _2 = \varTheta (\Vert ({\hat{x}},{\hat{z}})\Vert _2)\). On the other hand, from the definition of the starting point, we have that: \((x_0,z_0) = \rho (e_n,e_n)\). By defining the following auxiliary point:

$$\begin{aligned} ({\bar{x}},{\bar{y}},{\bar{z}}) = (x_0,y_0,z_0) + ({\hat{x}},{\hat{y}},{\hat{z}}), \end{aligned}$$

we have that \(({\bar{x}},{\bar{z}}) \ge (1-\gamma _{{\mathcal {A}}})\rho (e_n,e_n)\). By construction, the triple \(({\bar{x}},{\bar{y}},{\bar{z}})\) is a feasible solution for the primal-dual pair in (24)–(25), giving bounded primal and dual objective values, respectively.

Using our previous observations, alongside the fact that \(\text {rank}(A) = m\) (Assumption 2), we can confirm that there must exist a large constant \(M > 0\), and a triple \((x_s^*,y_s^*,z_s^*)\) solving (24)–(25), such that \(\Vert (x_s^*,z_s^*)\Vert _{\infty } \le M \Rightarrow \Vert (x_s^*,y_s^*,z_s^*)\Vert _2 = O(\sqrt{n})\).

Let us now apply the proximal method of multipliers to (24)–(25), given the estimates \(\zeta _k,\ \lambda _k\). We should note at this point, that the proximal operator used here is different from that in (11), since it is based on a different maximal monotone operator from that in (9). In particular, we associate the following maximal monotone operator to (24)–(25):

$$\begin{aligned} {\tilde{T}}_{{\mathcal {L}}}(x,y) :=\{(u,v): v = Qx + (c + {\bar{c}} + {\tilde{c}}_k) - A^Ty - \theta X^{-1}e_n,\ u = Ax-(b+{\bar{b}}+{\tilde{b}}_k) \}, \end{aligned}$$

As before, the proximal operator is defined as \({\tilde{P}} :=(I_{m+n} + {\tilde{T}}_{{\mathcal {L}}})^{-1}\), and is single-valued and non-expansive. We let any \(\mu \in [0,\infty )\) and define the following penalty function:

$$\begin{aligned} \begin{aligned} \tilde{{\mathcal {L}}}_{\mu ,\theta }(x;\zeta _k,\lambda _k):=&\ (c + {\bar{c}} + {\tilde{c}}_k)^T x + \frac{1}{2}x^TQx\ \\&\ +\frac{1}{2}\mu \Vert x-\zeta _k\Vert _2^2 + \frac{1}{2\mu }\Vert Ax-(b+{\bar{b}}+{\tilde{b}}_k)\Vert _2^2 \\&\ - (\lambda _k)^T(Ax - (b+{\bar{b}}+{\tilde{b}}_k))-\theta \sum _{j=1}^n \ln x^j. \end{aligned} \end{aligned}$$

By defining the variables \(y = \lambda _k - \frac{1}{\mu }(Ax - (b+{\bar{b}}+{\tilde{b}}_k))\) and \(z = \theta X^{-1}e_n\), we can see that the optimality conditions of this PMM sub-problem are exactly those stated in (23). Equivalently, we can find a pair \(({\tilde{x}},{\tilde{y}})\) such that \(({\tilde{x}},{\tilde{y}}) = {\tilde{P}}(\zeta _k,\lambda _k)\) and set \({\tilde{z}} = \theta {\tilde{X}}^{-1}e_n\). Then, we can use the non-expansiveness of \({\tilde{P}}\), as in Lemma 1, to obtain:

$$\begin{aligned} \Vert ({\tilde{x}},{\tilde{y}})-(x_s^*,y_s^*)\Vert _2 \le \Vert (\zeta _k,\lambda _k)-(x_s^*,y_s^*)\Vert _2. \end{aligned}$$

But we know, from Lemma 2, that \(\Vert (\zeta _k,\lambda _k)\Vert _2 = O(\sqrt{n})\), for all \(k \ge 0\). Combining this with our previous observations yields that \(\Vert ({\tilde{x}},{\tilde{y}})\Vert _2 = O(\sqrt{n})\). Setting \({\tilde{z}} = \theta {\tilde{X}}^{-1}e_n\), gives a triple \(({\tilde{x}},{\tilde{y}},{\tilde{z}})\) that satisfies (23), while \(\Vert ({\tilde{x}},{\tilde{y}},{\tilde{z}})\Vert _2 = O(\sqrt{n})\). To conclude the proof, let us notice that the value of \(\tilde{{\mathcal {L}}}_{\mu ,\theta }(x;\zeta _k,\lambda _k)\) will grow unbounded as \(x \rightarrow 0\) or \(x \rightarrow \infty \). Hence, there must exist a constant \({\tilde{M}} > 0\), such that the minimizer of this function must satisfy \(\frac{1}{{\tilde{M}}} \le {\tilde{x}}^j \le {\tilde{M}}\), for all \(j \in \{1,\ldots ,n\}\). The relation \({\tilde{X}}{\tilde{z}} = \theta e_n\) then implies that \(\frac{\theta }{{\tilde{M}}} \le {\tilde{z}}^j \le \theta {\tilde{M}}\). Hence \(({\tilde{x}},{\tilde{z}}) \ge \xi (e_n,e_n)\), where \(\xi > 0\) and \(\xi = \varTheta (1)\). \(\square \)

In the following lemma, we derive boundedness of the iterates of Algorithm IP-PMM.

Lemma 4

Given Assumptions 1 and 2, the iterates \((x_k,y_k,z_k)\) produced by Algorithm IP-PMM, for all \(k \ge 0\), are such that:

$$\begin{aligned} \Vert (x_k,z_k)\Vert _1 = O(n),\ \Vert (x_k,y_k,z_k)\Vert _2 = O(n). \end{aligned}$$

Proof

Let an iterate \((x_k,y_k,z_k) \in {\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k)\), produced by Algorithm IP-PMM during an arbitrary iteration \(k \ge 0\), be given. Firstly, we invoke Lemma 3, from which we have a triple \(({\tilde{x}},{\tilde{y}},{\tilde{z}})\) satisfying (23), for \(\mu = \mu _k\). Similarly, by invoking Lemma 2, we know that there exists a triple \((x_{r_k}^*,y_{r_k}^*,z_{r_k}^*)\) satisfying (22), with \(\mu = \mu _k\). Consider the following auxiliary point:

$$\begin{aligned}&\left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}} - x_k,\ \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^*\right. \nonumber \\&\left. +\frac{\mu _k}{\mu _0} {\tilde{y}} - y_k,\ \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{z}} - z_k\right) . \end{aligned}$$
(26)

Using (26) and (22)–(23) (for \(\mu = \mu _k\)), one can observe that:

$$\begin{aligned} \begin{aligned}&A\left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}} - x_k\right) + \mu _k \left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{y}} - y_k\right) \\&\quad =\left( 1-\frac{\mu _k}{\mu _0}\right) \left( Ax_{r_k}^* + \mu _k y_{r_k}^*\right) + \frac{\mu _k}{\mu _0} \left( A{\tilde{x}}+ \mu _k {\tilde{y}}\right) - Ax_k -\mu _k y_k \\&\quad =\left( 1-\frac{\mu _k}{\mu _0}\right) \left( b + \mu _k \lambda _k\right) + \frac{\mu _k}{\mu _0} (b + \mu _k \lambda _k + {\tilde{b}} +{\bar{b}}) - Ax_k - \mu _k y_k \\&\quad =b +\mu _k \lambda _k + \frac{\mu _k}{\mu _0}({\tilde{b}}+{\bar{b}}) - Ax_k - \mu _k y_k = 0, \end{aligned} \end{aligned}$$

where the last equality follows from the definition of the neighbourhood \({\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k)\). Similarly:

$$\begin{aligned}&-\left( Q+\mu _k I_n\right) \left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}} - x_k\right) + A^T\left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* +\frac{\mu _k}{\mu _0} {\tilde{y}} - y_k\right) \\&\quad + \left( \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{z}} - z_k\right) = 0. \end{aligned}$$

By combining the previous two relations, we have:

$$\begin{aligned} \begin{aligned}&\left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}} - x_k\right) ^T\left( \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{z}} - z_k\right) \\&\quad = \left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0}{\tilde{x}} - x_k\right) ^T\left( Q+\mu _k I\right) \left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}} - x_k\right) \ \\&\qquad +\mu _k \left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{y}} - y_k\right) ^T\left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{y}} - y_k\right) \ge \ 0. \end{aligned} \end{aligned}$$
(27)

From (27), it can be seen that:

$$\begin{aligned} \begin{aligned}&\left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}}\right) ^T z_k + \left( \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0}{\tilde{z}}\right) ^T x_k \\&\quad \le \left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}}\right) ^T\left( \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{z}}\right) + x_k^T z_k. \end{aligned} \end{aligned}$$

However, from Lemmas 2 and 3, we have that: \(({\tilde{x}},{\tilde{z}}) \ge \xi (e_n,e_n)\), for some positive constant \(\xi = \varTheta (1)\), while \(\Vert (x_{r_k}^*,z_{r_k}^*)\Vert _2 = O(\sqrt{n})\), and \(\Vert ({\tilde{x}},{\tilde{z}})\Vert _2 = O(\sqrt{n})\). Furthermore, by definition we have that \( n \mu _k = x_k^Tz_k\). By combining all the previous, we obtain:

$$\begin{aligned} \begin{aligned}&\frac{\mu _k}{\mu _0} \xi (e^T x_k + e^T z_k) \\&\quad \le \left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}}\right) ^T z_k + \left( \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{z}}\right) ^T x_k \\&\quad \le \left( \left( 1-\frac{\mu _k}{\mu _0}\right) x_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{x}}\right) ^T\left( \left( 1-\frac{\mu _k}{\mu _0}\right) z_{r_k}^* + \frac{\mu _k}{\mu _0} {\tilde{z}}\right) + x_k^T z_k \\&\quad =\frac{\mu _k}{\mu _0}\left( 1-\frac{\mu _k}{\mu _0}\right) \left( x_{r_k}^*\right) ^T {\tilde{z}} + \frac{\mu _k}{\mu _0} \left( 1-\frac{\mu _k}{\mu _0}\right) {\tilde{x}}^T z_{r_k}^* + \left( \frac{\mu _k}{\mu _0}\right) ^2 {\tilde{x}}^T {\tilde{z}} \\&\qquad + x_k^T z_k = \ O(\mu _k n), \end{aligned} \end{aligned}$$
(28)

where we used (22) (\((x_{r_k}^*)^T(z_{r_k}^*) = 0\)). Hence, (28) implies that:

$$\begin{aligned} \Vert (x_k,z_k)\Vert _1 = O(n). \end{aligned}$$

From equivalence of norms, we have that \(\Vert (x_k,z_k)\Vert _2 \le \Vert (x_k,z_k)\Vert _1\). Finally, from the neighbourhood conditions we know that:

$$\begin{aligned} c+ Qx_k - A^T y_k - z_k + \mu _k(x_k - \zeta _k) + \frac{\mu _k}{\mu _0} ({\tilde{c}}_k + {\bar{c}}) = 0. \end{aligned}$$

All terms above (except for \(y_k\)) have a 2-norm that is O(n) [note that \(\Vert ({\bar{c}},{\bar{b}})\Vert _2 = O(\sqrt{n})\) using Assumption 2 and the definition in (18)]. Hence, using again Assumption 2 yields that \(\Vert y_k\Vert _2 = O(n)\), and completes the proof. \(\square \)

As in a typical IPM convergence analysis, we proceed by bounding some components of the scaled Newton direction. The proof of that uses similar arguments to those presented in [36, Lemma 6.5]. Combining this result with Assumption 2, allows us to bound also the unscaled Newton direction.

Lemma 5

Given Assumptions 1 and 2, and the Newton direction \((\varDelta x_k, \varDelta y_k, \varDelta z_k)\) obtained by solving system (20) during an arbitrary iteration \(k \ge 0\) of Algorithm IP-PMM, we have that:

$$\begin{aligned} \Vert D_k^{-1}\varDelta x_k\Vert _2 = O(n^{2}\mu ^{\frac{1}{2}}),\ \Vert D_k \varDelta z_k \Vert _2 = O(n^{2}\mu ^{\frac{1}{2}}),\ \Vert (\varDelta x_k,\varDelta y_k,\varDelta z_k)\Vert _2 = O(n^{3}), \end{aligned}$$

with \(D_k^2 :=X_k Z_k^{-1}\).

Proof

Consider an arbitrary iteration k of Algorithm IP-PMM. We invoke Lemmas 23, for \(\mu = \sigma _k \mu _k\). That is, there exists a triple \((x_{r_k}^*,y_{r_k}^*,z_{r_k}^*)\) satisfying (22), and a triple \(({\tilde{x}},{\tilde{y}},{\tilde{z}})\) satisfying (23), for \(\mu = \sigma _k \mu _k\). Using the centering parameter \(\sigma _k\), we define the following vectors:

$$\begin{aligned}&{\hat{c}} :=-\left( \sigma _k \frac{{\bar{c}}}{\mu _0} - (1-\sigma _k)\left( x_k - \zeta _k + \frac{\mu _k}{\mu _0}({\tilde{x}}-x_{r_k}^*)\right) \right) ,\ \nonumber \\&{\hat{b}} :=-\left( \sigma _k \frac{{\bar{b}}}{\mu _0} + (1-\sigma _k)\left( y_k - \lambda _k +\frac{\mu _k}{\mu _0}({\tilde{y}}-y_{r_k}^*)\right) \right) , \end{aligned}$$
(29)

where \({\bar{b}},\ {\bar{c}},\ \mu _0\) are defined in (18). Using Lemmas 234, and Assumption 2, we know that \(\Vert ({\hat{c}},{\hat{b}})\Vert _2 = O(n)\). Then, by applying again Assumption 2, we know that there must exist a vector \({\hat{x}}\) such that: \(A{\hat{x}} = {\hat{b}},\ \Vert {\hat{x}}\Vert _2 = O(n)\), and by setting \({\hat{z}} = {\hat{c}} + Q{\hat{x}} + \mu {\hat{x}}\), we have that \(\Vert {\hat{z}}\Vert _2 = O(n)\) and:

$$\begin{aligned} A{\hat{x}} = {\hat{b}},\ -Q{\hat{x}}+ {\hat{z}} - \mu _k {\hat{x}} = {\hat{c}}. \end{aligned}$$
(30)

Using \((x_{r_k}^*,y_{r_k}^*,z_{r_k}^*)\), \(({\tilde{x}},{\tilde{y}},{\tilde{z}})\), as well as the triple \(({\hat{x}},0,{\hat{z}})\), where \(({\hat{x}},{\hat{z}})\) is defined in (30), we can define the following auxiliary triple:

$$\begin{aligned} ({\bar{x}},{\bar{y}},{\bar{z}}) = (\varDelta x_k, \varDelta y_k, \varDelta z_k) + \frac{\mu _k}{\mu _0} ({\tilde{x}}, {\tilde{y}}, {\tilde{z}}) - \frac{\mu _k}{\mu _0} (x_{r_k}^*, y_{r_k}^*,z_{r_k}^*) + \mu _k ({\hat{x}},0,{\hat{z}}). \end{aligned}$$
(31)

Using (31), (29), (22)–(23) (with \(\mu = \sigma _k \mu _k\)), and the second block equation of (20), we have:

$$\begin{aligned} \begin{aligned} A{\bar{x}} + \mu _k {\bar{y}} =&\ (A \varDelta x_k + \mu _k \varDelta y_k) + \frac{\mu _k}{\mu _0}((A{\tilde{x}} + \mu _k {\tilde{y}})- (Ax_{r_k}^*+ \mu _k y_{r_k}^*)) + \mu _k A{\hat{x}}\\ =&\ \left( b + \sigma _k\frac{\mu _k}{\mu _0}{\bar{b}}-Ax_k - \sigma _k \mu _k(y_k-\lambda _k)\right) \\&\quad + \frac{\mu _k}{\mu _0}((A{\tilde{x}} + \mu _k {\tilde{y}})- (Ax_{r_k}^*+ \mu _k y_{r_k}^*))\\&\ - \mu _k \left( \sigma _k \frac{{\bar{b}}}{\mu _0} + (1-\sigma _k)(y_k-\lambda _k)\right) - \frac{\mu _k}{\mu _0}(1-\sigma _k)\mu _k({\tilde{y}}-y_{r_k}^*)\\ =&\ \left( b + \sigma _k\frac{\mu _k}{\mu _0}{\bar{b}}-Ax_k - \sigma _k \mu _k(y_k-\lambda _k)\right) + \frac{\mu _k}{\mu _0}(b+\sigma _k\mu _k \lambda _k+{\bar{b}}+{\tilde{b}}_k)\\&\ - \frac{\mu _k}{\mu _0} (\sigma _k \mu _k \lambda _k + b) - \mu _k \left( \sigma _k \frac{{\bar{b}}}{\mu _0} + (1-\sigma _k)(y_k-\lambda _k)\right) \\ =&\ b + \frac{\mu _k}{\mu _0}({\bar{b}}+{\tilde{b}}_k) - Ax_k - \mu _k (y_k-\lambda _k)\\ =&\ 0, \end{aligned} \end{aligned}$$

where the last equation follows from the neighbourhood conditions (i.e. \((x_k,y_k,z_k) \in {\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k)\)). Similarly, we can show that:

$$\begin{aligned} -Q{\bar{x}} + A^T{\bar{y}} + {\bar{z}}-\mu _k {\bar{x}} = 0. \end{aligned}$$

The previous two equalities imply that:

$$\begin{aligned} {\bar{x}}^T {\bar{z}} = {\bar{x}}^T(Q{\bar{x}} - A^T {\bar{y}}+\mu _k {\bar{x}}) = {\bar{x}}^T (Q + \mu _k I){\bar{x}} + \mu _k {\bar{y}}^T {\bar{y}} \ge 0. \end{aligned}$$
(32)

On the other hand, using the last block equation of the Newton system (20), we have:

$$\begin{aligned} Z_k{\bar{x}} + X_k {\bar{z}} = - X_kZ_ke_n + \sigma _k \mu _k e_n + \frac{\mu _k}{\mu _0} Z_k({\tilde{x}}-x_{r_k}^*)+\frac{\mu _k}{\mu _0} X_k({\tilde{z}}- z_{r_k}^*) + \mu _k Z_k {\hat{x}} + \mu _k X_k {\hat{z}}. \end{aligned}$$

Let \(W_k = (X_kZ_k)^{\frac{1}{2}}\). By multiplying both sides of the previous equation by \(W_k^{-1}\), we get:

$$\begin{aligned} D_k^{-1}{\bar{x}} + D_k{\bar{z}}= & {} - W_k^{-1}(X_kZ_ke_n- \sigma _k \mu _ke_n) + \frac{\mu _k}{\mu _0} \big (D_k^{-1}({\tilde{x}}-x_{r_k}^*) + D_k({\tilde{z}}-z_{r_k}^*)\big )\nonumber \\&+\, \mu _k \big (D_k^{-1} {\hat{x}} + D_k {\hat{z}}\big ). \end{aligned}$$
(33)

But, from (32), we know that \({\bar{x}}^T {\bar{z}} \ge 0\) and hence:

$$\begin{aligned} \Vert D_k^{-1}{\bar{x}} + D_k {\bar{z}}\Vert _2^2 \ge \Vert D_k^{-1} {\bar{x}}\Vert _2^2 + \Vert D_k {\bar{z}}\Vert _2^2. \end{aligned}$$

Combining (33) with the previous inequality, gives:

$$\begin{aligned} \begin{aligned} \Vert D_k^{-1}{\bar{x}}\Vert _2^2 + \Vert D_k{\bar{z}}\Vert _2^2 \le&\ \left\{ \Vert W_k^{-1}\Vert _2\Vert X_kZ_ke_n-\sigma _k \mu _k e_n\Vert _2 \right. \\&\left. \ +\, \frac{\mu _k}{\mu _0} \big (\Vert D_k^{-1}({\tilde{x}}-x_{r_k}^*)\Vert _2 + \Vert D_k({\tilde{z}}-z_{r_k}^*)\Vert _2\big ) \right. \\&\left. +\, \mu _k\big (\Vert D_k^{-1} {\hat{x}}\Vert _2+\Vert D_k {\hat{z}}\Vert _2 \big ) \right\} ^2. \end{aligned} \end{aligned}$$

We isolate one of the two terms of the left hand side of the previous inequality, take square roots on both sides, use (31) and apply the triangular inequality to it, to obtain:

$$\begin{aligned} \begin{aligned} \Vert D_k^{-1} \varDelta x_k \Vert _2 \le&\ \Vert W_k^{-1}\Vert _2 \Vert X_kZ_ke_n -\sigma _k \mu _k e_n\Vert _2\\&\ + \frac{\mu _k}{\mu _0}\big ( 2\Vert D_k^{-1} ({\tilde{x}}-x_{r_k}^*)\Vert _2\\&\ +\Vert D_k({\tilde{z}}-z_{r_k}^*)\Vert _2\big )+ \mu _k\big (2\Vert D_k^{-1} {\hat{x}}\Vert _2+\Vert D_k {\hat{z}}\Vert _2 \big ). \end{aligned} \end{aligned}$$
(34)

We now proceed to bounding the terms in the right hand side of (34). Firstly, notice from the neighbourhood conditions (19) that \(\gamma _{\mu } \mu _k \le x^i_k z^i_k\). This in turn implies that:

$$\begin{aligned} \Vert W_k^{-1}\Vert _2 = \max _i \frac{1}{(x^i_k z^i_k)^{\frac{1}{2}}} \le \frac{1}{(\gamma _{\mu } \mu _k)^{\frac{1}{2}}}. \end{aligned}$$

On the other hand, we have that:

$$\begin{aligned} \begin{aligned} \Vert X_kZ_ke_n-\sigma _k \mu _k e_n\Vert _2^2 =&\ \Vert X_kZ_ke_n\Vert ^2 - 2\sigma _k \mu _k x_k^T z_k + \sigma _k^2\mu _k^2 n\\ \le&\ \Vert X_kZ_ke_n\Vert _1^2 - 2\sigma _k \mu _k x_k^T z_k + \sigma _k^2\mu _k^2 n\\ =&\ (\mu _k n)^2 - 2\sigma _k \mu _k^2 n + \sigma _k \mu _k^2 n\\ \le&\ \mu _k^2 n^2 . \end{aligned} \end{aligned}$$

Hence, combining the previous two relations yields:

$$\begin{aligned} \Vert W_k^{-1}\Vert _2\Vert X_kZ_ke_n-\sigma _k \mu _k e_n\Vert _2 \le \frac{n}{\gamma _{\mu }^{\frac{1}{2}}} \mu _k^{\frac{1}{2}} = O(n \mu ^{\frac{1}{2}}). \end{aligned}$$

We proceed by bounding \(\Vert D_k^{-1}\Vert _2\). For that, using Lemma 4, we have:

$$\begin{aligned} \Vert D_k^{-1}\Vert _2&= \max _i |(D_k^{ii})^{-1}| = \Vert D_k^{-1}e_n\Vert _{\infty }\\&= \Vert W_k^{-1}Z_k e_n\Vert _{\infty } \le \Vert W_k^{-1}\Vert _2\Vert z_k\Vert _1 = O\left( \frac{n}{\mu _k^{\frac{1}{2}}} \right) . \end{aligned}$$

Similarly, we have that:

$$\begin{aligned} \Vert D_k\Vert _2 = O\left( \frac{n}{\mu _k^{\frac{1}{2}}}\right) . \end{aligned}$$

Hence, using the previous bounds, as well as Lemmas 23, we obtain:

$$\begin{aligned} \begin{aligned}&2\frac{\mu _k}{\mu _0}\Vert D_k^{-1}({\tilde{x}}-x_{r_k}^*)\Vert _2 +\frac{\mu _k}{\mu _0}\Vert D_k({\tilde{z}}-z_{r_k}^*)\Vert _2 \\&\quad \le \ 2\frac{\mu _k}{\mu _0}(\Vert D_k^{-1}\Vert _2 + \Vert D_k\Vert _2)\max \{\Vert {\tilde{x}}-x_{r_k}^*\Vert _2,\Vert {\tilde{z}}-z_{r_k}^*\Vert _2\}\\&\quad = \ O\left( n^{\frac{3}{2}}\mu _k^{\frac{1}{2}}\right) , \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \mu _k\big (2\Vert D_k^{-1} {\hat{x}}\Vert _2+\Vert D_k {\hat{z}}\Vert _2\big ) \le 2 \mu _k(\Vert D_k^{-1}\Vert _2 + \Vert D_k\Vert _2)\max \{\Vert {\hat{x}}\Vert _2,\Vert {\hat{z}}\Vert _2\} = O(n^2 \mu _k^{\frac{1}{2}}). \end{aligned} \end{aligned}$$

Combining all the previous bounds yields the claimed bound for \(\Vert D_k^{-1}\varDelta x_k\Vert _2\). One can bound \(\Vert D_k \varDelta z_k\Vert _2\) in the same way. The latter is omitted for ease of presentation.

Finally, we have that:

$$\begin{aligned} \Vert \varDelta x_k\Vert _2 = \Vert D_k D_k^{-1} \varDelta x_k\Vert _2 \le \Vert D_k\Vert _2\Vert D_k^{-1} \varDelta x_k\Vert _2 = O(n^3). \end{aligned}$$

Similarly, we can show that \(\Vert \varDelta z_k \Vert _2 = O(n^3)\). From the first block equation of the Newton system in (20), alongside Assumption 2, we can show that \(\Vert \varDelta y_k\Vert _2 = O(n^3)\), which completes the proof. \(\square \)

We are now able to prove that at every iteration of Algorithm IP-PMM, there exists a step-length \(\alpha _k > 0\), using which, the new iterate satisfies the conditions required by the algorithm.

Lemma 6

Given Assumptions 1 and 2, there exists a step-length \({\bar{\alpha }} \in (0,1)\), such that for all \(\alpha \in [0,{\bar{\alpha }}]\) and for all iterations \(k \ge 0\) of Algorithm IP-PMM, the following relations hold:

$$\begin{aligned} (x_k + \alpha \varDelta x_k)^T(z_k + \alpha \varDelta z_k)\ge & {} (1-\alpha (1-\beta _1))x_k^T z_k, \end{aligned}$$
(35)
$$\begin{aligned} (x^i_k + \alpha \varDelta x^i_k)(z^i_k + \alpha \varDelta z^i_k)\ge & {} \frac{\gamma _{\mu }}{n}(x_k + \alpha \varDelta x_k)^T(z_k + \alpha \varDelta z_k),\ \text {for all } i \in \{1,\ldots ,n\},\nonumber \\ \end{aligned}$$
(36)
$$\begin{aligned} (x_k + \alpha \varDelta x_k)^T(z_k + \alpha \varDelta z_k)\le & {} (1-\alpha (1-\beta _2))x_k^T z_k, \end{aligned}$$
(37)

where, without loss of generality, \(\beta _1 = \frac{\sigma _{\min }}{2}\) and \(\beta _2 = 0.99\). Moreover, \({\bar{\alpha }} \ge \frac{{\bar{\kappa }}}{n^4}\) for all \(k\ge 0\), where \({\bar{\kappa }} > 0\) is independent of n and m, and if \((x_k,y_k,z_k) \in {\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k)\), then letting:

$$\begin{aligned} (x_{k+1},y_{k+1},z_{k+1})=&\ {} (x_k + \alpha \varDelta x_k,y_k + \alpha \varDelta y_k, z_k + \alpha \varDelta z_k),\\ \mu _{k+1} =&\ {} \frac{x_{k+1}^Tz_{k+1}}{n},\ \ \text{ for } \text{ all }\ \ \alpha \in (0,{\bar{\alpha }}] \end{aligned}$$

gives \((x_{k+1},y_{k+1},z_{k+1}) \in {\mathcal {N}}_{\mu _{k+1}}(\zeta _{k+1},\lambda _{k+1})\), where \(\lambda _k,\ \zeta _k\) are updated as in Algorithm IP-PMM.

Proof

In order to prove the first three inequalities, we follow the developments in [36, Chapter 6, Lemma 6.7]. From Lemma 5, we have that there exists a constant \(C_{\varDelta } >0\), such that:

$$\begin{aligned} (\varDelta x_k)^T \varDelta z_k = (D_k^{-1} \varDelta x_k)^T (D_k \varDelta z_k) \le \Vert D_k^{-1} \varDelta x_k\Vert _2 \Vert D_k \varDelta z_k\Vert _2 \le C_{\varDelta }^2 n^4 \mu _k. \end{aligned}$$

Similarly, it is easy to see that:

$$\begin{aligned} |\varDelta x^i_k \varDelta z^i_k| \le C_{\varDelta }^2 n^4 \mu _k. \end{aligned}$$

On the other hand, by summing over all n components of the last block equation of the Newton system (20), we have:

$$\begin{aligned} z_k^T \varDelta x_k + x_k^T \varDelta z_k = e_n^T(Z_k \varDelta x_k + X_k \varDelta z_k) = e_n^T(-X_kZ_ke_n+ \sigma _k \mu _k e_n) = (\sigma _k - 1) x_k^T z_k, \end{aligned}$$
(38)

while the components of the last block equation of the Newton system (20) can be written as:

$$\begin{aligned} z^i_k\varDelta x^i_k + x^i_k \varDelta z^i_k = -x^i_k z^i_k + \sigma _k \mu _k. \end{aligned}$$
(39)

We proceed by proving (35). Using (38), we have:

$$\begin{aligned} \begin{aligned}&(x_k + \alpha \varDelta x_k)^T (z_k + \alpha \varDelta z_k) - (1-\alpha (1 -\beta _1))x_k^T z_k \\&\quad =x_k^T z_k +\alpha (\sigma _k - 1)x_k^T z_k + \alpha ^2 \varDelta x_k^T \varDelta z_k - (1-\alpha )x_k^T z_k -\alpha \beta _1 x_k^T z_k \\&\quad \ge \alpha (\sigma _k - \beta _1) x_k^Tz_k - \alpha ^2 C_{\varDelta }^2 n^4 \mu _k \ge \alpha \left( \frac{\sigma _{\min }}{2}\right) n \mu _k - \alpha ^2 C_{\varDelta }^2 n^4 \mu _k, \end{aligned} \end{aligned}$$

where we set (without loss of generality) \(\beta _1 = \frac{\sigma _{\min }}{2}\). The most-right hand side of the previous inequality will be non-negative for every \(\alpha \) satisfying:

$$\begin{aligned} \alpha \le \frac{\sigma _{\min }}{2 C_{\varDelta }^2 n^3}. \end{aligned}$$

In order to prove (36), we will use (39) and the fact that from the neighbourhood conditions we have that \(x^i_k z^i_k \ge \gamma _{\mu } \mu _k\). In particular, we obtain:

$$\begin{aligned} \begin{aligned} (x^i_k + \alpha \varDelta x^i_k)(z^i_k + \alpha \varDelta z^i_k) \ge&\ (1-\alpha )x^i_k z^i_k + \alpha \sigma _k \mu _k - \alpha ^2 C_{\varDelta }^2 n^4 \mu _k \\ \ge&\ \gamma _{\mu }(1-\alpha )\mu _k + \alpha \sigma _k \mu _k - \alpha ^2 C_{\varDelta }^2 n^4 \mu _k. \end{aligned} \end{aligned}$$

By combining all the previous, we get:

$$\begin{aligned} \begin{aligned}&(x^i_k + \alpha \varDelta x^i_k)(z^i_k + \alpha \varDelta z^i_k) - \frac{\gamma _{\mu }}{n}(x_k + \alpha \varDelta x_k)^T(z_k + \alpha \varDelta z_k) \\&\quad \ge \alpha \sigma _k (1-\gamma _{\mu })\mu _k - \left( 1 + \frac{\gamma _{\mu }}{n}\right) \alpha ^2 C_{\varDelta }^2 n^4\mu _k \\&\quad \ge \alpha \sigma _{\min }(1-\gamma _{\mu })\mu _k - 2\alpha ^2 C_{\varDelta }^2 n^4\mu _k. \end{aligned} \end{aligned}$$

In turn, the most-right hand side of the previous relation will be non-negative for every \(\alpha \) satisfying:

$$\begin{aligned} \alpha \le \frac{\sigma _{\min }(1-\gamma _{\mu })}{2C_{\varDelta }^2 n^4}. \end{aligned}$$

Finally, to prove (37), we set (without loss of generality) \(\beta _2 = 0.99\). We know, from Algorithm IP-PMM, that \(\sigma _{\max } \le 0.5\). With the previous two remarks in mind, we have:

$$\begin{aligned} \begin{aligned}&\frac{1}{n}(x_k + \alpha \varDelta x_k)^T (z_k + \alpha \varDelta z_k) - (1-0.01\alpha )\mu _k \\&\quad \le (1-\alpha )\mu _k + \alpha \sigma _k \mu _k + \alpha ^2 \frac{C_{\varDelta }^2 n^4}{n}\mu _k - (1-0.01 \alpha )\mu _k \\&\quad \le -0.99\alpha \mu _k + 0.5\alpha \mu _k + \alpha ^2 \frac{C_{\varDelta }^2 n^4}{n} \mu _k \\&\quad =-0.49\alpha \mu _k +\alpha ^2\frac{C_{\varDelta }^2 n^4}{n}\mu _k. \end{aligned} \end{aligned}$$

The last term will be non-positive for every \(\alpha \) satisfying:

$$\begin{aligned} \alpha \le \frac{0.49 }{C_{\varDelta }^2 n^3}. \end{aligned}$$

By combining all the previous bounds on the step-length, we have that (35)–(37) will hold for every \(\alpha \in (0,\alpha ^*)\), where:

$$\begin{aligned} \alpha ^* :=\min \left\{ \frac{\sigma _{\min }}{2 C_{\varDelta }^2 n^3},\ \frac{\sigma _{\min }(1-\gamma _{\mu })}{2C_{\varDelta }^2 n^4},\ \frac{0.49 }{C_{\varDelta }^2 n^3},\ 1\right\} . \end{aligned}$$
(40)

Next, we would like to find the maximum \({\bar{\alpha }} \in (0,\alpha ^*]\), such that:

$$\begin{aligned} (x_k(\alpha ),y_k(\alpha ),z_k(\alpha )) \in {\mathcal {N}}_{\mu _k(\alpha )}(\zeta _k,\lambda _k),\ \text {for all}\ \alpha \in (0,{\bar{\alpha }}], \end{aligned}$$

where \(\mu _k(\alpha ) = \frac{x_k(\alpha )^T z_k(\alpha )}{n}\) and:

$$\begin{aligned} (x_k(\alpha ),y_k(\alpha ),z_k(\alpha )) = (x_k + \alpha \varDelta x_k,y_k + \alpha \varDelta y_k,z_k + \alpha \varDelta z_k). \end{aligned}$$

Let:

$$\begin{aligned} {\tilde{r}}_p(\alpha ) = Ax_k(\alpha )+ \mu _k(\alpha )(y_k(\alpha )-\lambda _k) - \left( b + \frac{\mu _k(\alpha )}{\mu _0}{\bar{b}}\right) , \end{aligned}$$

and

$$\begin{aligned} {\tilde{r}}_d(\alpha ) = -Qx_k(\alpha ) + A^Ty_k(\alpha ) + z_k(\alpha ) - \mu _k(\alpha )(x_k(\alpha )- \zeta _k) - \left( c + \frac{\mu _k(\alpha )}{\mu _0}{\bar{c}}\right) . \end{aligned}$$

In other words, we need to find the maximum \({\bar{\alpha }} \in (0,\alpha ^*]\), such that:

$$\begin{aligned} \Vert {\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha )\Vert _2 \le C_N \frac{\mu _k(\alpha )}{\mu _0},\ \ \Vert {\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha )\Vert _{{\mathcal {A}}} \le \gamma _{{\mathcal {A}}} \rho \frac{\mu _k(\alpha )}{\mu _0} ,\ \text {for all}\ \alpha \in (0,{\bar{\alpha }}]. \end{aligned}$$
(41)

If the latter two conditions hold, then \((x_k(\alpha ),y_k(\alpha ),z_k(\alpha )) \in {\mathcal {N}}_{\mu _k(\alpha )}(\zeta _k,\lambda _k),\ \text {for all}\ \alpha \in (0,{\bar{\alpha }}]\). Then, if Algorithm IP-PMM updates \(\zeta _k,\ \lambda _k\), it does so only when similar conditions [as in (41)] hold for the new parameters. If the parameters are not updated, then the new iterate lies in the desired neighbourhood because of (41), alongside (35)–(37).

We start by rearranging \({\tilde{r}}_p(\alpha )\). Specifically, we have that:

$$\begin{aligned} \begin{aligned} {\tilde{r}}_p(\alpha ) =&\ A(x_k + \alpha \varDelta x_k) +\left( \mu _k + \alpha (\sigma _k-1)\mu _k + \alpha ^2\frac{\varDelta x_k^T \varDelta z_k}{n}\right) \\&\ \ \cdot \ \left( (y_k + \alpha \varDelta y_k -\lambda _k)-\frac{{\bar{b}}}{\mu _0} \right) -b \\ =&\ \left( Ax_k +\mu _k (y_k -\lambda _k)-b -\frac{\mu _k}{\mu _0}{\bar{b}}\right) + \alpha (A \varDelta x_k + \mu _k \varDelta y_k)\ \\&\ +\ \left( \alpha (\sigma _k-1)\mu _k + \alpha ^2\frac{\varDelta x_k^T \varDelta z_k}{n}\right) \left( (y_k - \lambda _k + \alpha \varDelta y_k) - \frac{{\bar{b}}}{\mu _0}\right) \\ =&\ \frac{\mu _k}{\mu _0}{\tilde{b}}_k + \alpha \left( b- Ax_k - \sigma _k\mu _k\left( (y_k-\lambda _k)-\frac{{\bar{b}}}{\mu _0} \right) + \mu _k \left( (y_k-\lambda _k)-\frac{{\bar{b}}}{\mu _0} \right) \ \right. \\&\left. \ -\ \mu _k \left( (y_k-\lambda _k)-\frac{{\bar{b}}}{\mu _0} \right) \right) + \left( \alpha (\sigma _k-1)\mu _k + \alpha ^2\frac{\varDelta x_k^T \varDelta z_k}{n}\right) \\&\cdot \ \left( (y_k - \lambda _k + \alpha \varDelta y_k) - \frac{{\bar{b}}}{\mu _0}\right) , \end{aligned} \end{aligned}$$

where we used that \(\mu _k(\alpha ) = \left( \mu _k + \alpha (\sigma _k-1)\mu _k + \alpha ^2 \frac{\varDelta x_k^T \varDelta z_k}{n}\right) \), which can be derived from (38), as well as the neighbourhood conditions (19), and the third block equation of the Newton system (20). By using again the neighbourhood conditions, and then by deleting the opposite terms in the previous equation, we obtain:

$$\begin{aligned} \begin{aligned} {\tilde{r}}_p(\alpha ) =&\ (1-\alpha )\frac{\mu _k}{\mu _0}{\tilde{b}}_k + \alpha ^2(\sigma _k - 1)\mu _k \varDelta y_k + \alpha ^2\frac{\varDelta x_k^T \varDelta z_k}{n}\left( y_k - \lambda _k + \alpha \varDelta y_k - \frac{{\bar{b}}}{\mu _0} \right) . \end{aligned} \end{aligned}$$
(42)

Similarly, we can show that:

$$\begin{aligned} {\tilde{r}}_d(\alpha ) = (1-\alpha )\frac{\mu _k}{\mu _0}{\tilde{c}}_k - \alpha ^2(\sigma _k-1)\mu _k \varDelta x_k - \alpha ^2\frac{\varDelta x_k^T \varDelta z_k}{n}\big (x_k - \zeta _k + \alpha \varDelta x_k + \frac{{\bar{c}}}{\mu _0} \big ). \end{aligned}$$
(43)

Define the following two quantities:

$$\begin{aligned} \begin{aligned} \xi _2 :=&\ \mu _k \Vert (\varDelta y_k,\varDelta x_k)\Vert _2 \\&+ C_{\varDelta }^2n^3 \mu _{k}\bigg (\Vert (y_k - \lambda _k,x_k-\zeta _k)\Vert _2 + \alpha ^* \Vert (\varDelta y_k,\varDelta x_k)\Vert _2 + \bigg \Vert \bigg (\frac{{\bar{b}}}{\mu _0},\frac{{\bar{c}}}{\mu _0}\bigg )\bigg \Vert _2\bigg ),\\ \xi _{{\mathcal {A}}} :=&\ \mu _k \Vert (\varDelta y_k,\varDelta x_k)\Vert _{{\mathcal {A}}} \\&+ C_{\varDelta }^2n^3 \mu _{k}\bigg (\Vert (y_k - \lambda _k,x_k-\zeta _k)\Vert _{{\mathcal {A}}} + \alpha ^* \Vert (\varDelta y_k,\varDelta x_k)\Vert _{{\mathcal {A}}} + \bigg \Vert \bigg (\frac{{\bar{b}}}{\mu _0},\frac{{\bar{c}}}{\mu _0}\bigg )\bigg \Vert _{{\mathcal {A}}}\bigg ), \end{aligned} \end{aligned}$$
(44)

where \(\alpha ^*\) is given by (40). Using the definition of the starting point in (18), as well as results in Lemmas 45, we can observe that \(\xi _2 = O(n^4 \mu _k)\). On the other hand, using Assumption 2, we know that for every pair \((r_1,r_2) \in {\mathbb {R}}^{m+n}\), if \(\Vert (r_1,r_2)\Vert _2 = \varTheta (f(n))\), where \(f(\cdot )\) is a positive polynomial function of n, then \(\Vert (r_1,r_2)\Vert _{{\mathcal {A}}} = \varTheta (f(n))\). In other words, we have that \(\xi _{{\mathcal {A}}} = O(n^4 \mu _k)\). Using the quantities in (44), equations (42), (43), as well as the neighbourhood conditions, we have that:

$$\begin{aligned} \begin{aligned} \Vert {\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha )\Vert _2 \le&\ (1-\alpha )C_N \frac{\mu _k}{\mu _0} + \alpha ^2 \mu _k \xi _2,\\ \Vert {\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha )\Vert _A \le&\ (1-\alpha )\gamma _{{\mathcal {A}}}\rho \frac{\mu _k}{\mu _0} + \alpha ^2 \mu _k \xi _{{\mathcal {A}}}, \end{aligned} \end{aligned}$$

for all \(\alpha \in (0,\alpha ^*]\), where \(\alpha ^*\) is given by (40). On the other hand, we know from (35), that:

$$\begin{aligned} \mu _k(\alpha ) \ge (1-\alpha (1-\beta _1))\mu _k,\ \text {for all}\ \alpha \in (0,\alpha ^*]. \end{aligned}$$

By combining the last two inequalities, we get that:

$$\begin{aligned} \begin{aligned} \Vert {\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha )\Vert _2 \le \frac{\mu _k(\alpha )}{\mu _0} C_N,\qquad \text {for all}\ \alpha \in \bigg (0, \min \left\{ \alpha ^*,\frac{\beta _1 C_N}{\xi _2 \mu _0}\right\} \bigg ]. \end{aligned} \end{aligned}$$

Similarly,

$$\begin{aligned} \begin{aligned} \Vert {\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha )\Vert _{{\mathcal {A}}} \le \frac{\mu _k(\alpha )}{\mu _0} \gamma _{{\mathcal {A}}} \rho ,\qquad \text {for all}\ \alpha \in \bigg (0, \min \left\{ \alpha ^*,\frac{\beta _1 \gamma _{{\mathcal {A}}} \rho }{\xi _{{\mathcal {A}}} \mu _0}\right\} \bigg ]. \end{aligned} \end{aligned}$$

Hence, we have that:

$$\begin{aligned} {\bar{\alpha }} :=\min \left\{ \alpha ^*,\frac{\beta _1 C_N}{\xi _2 \mu _0}, \frac{\beta _1 \gamma _{{\mathcal {A}}} \rho }{\xi _{{\mathcal {A}}} \mu _0} \right\} , \end{aligned}$$
(45)

where \(\beta _1 = \frac{\sigma _{\min }}{2}\). Since \({\bar{\alpha }} = \varOmega \big (\frac{1}{n^4}\big )\), we know that there must exist a constant \({\bar{\kappa }} > 0\), independent of n, m and of the iteration k, such that \({\bar{\alpha }} \ge \frac{{\bar{\kappa }}}{n^4}\) for all \(k \ge 0\), and this completes the proof. \(\square \)

The following theorem summarizes our results.

Theorem 1

Given Assumptions 12, the sequence \(\{\mu _k\}\) generated by Algorithm IP-PMM converges Q-linearly to zero, and the sequences of regularized residual norms

$$\begin{aligned}&\left\{ \left\| Ax_k + \mu _k (y_k-\lambda _k) - b-\frac{\mu _k}{\mu _0}{\bar{b}}\right\| _2\right\} , \quad \text{ and }\\&\left\{ \left\| -Qx_k + A^T y_k + z_k - \mu _k(x_k - \zeta _k) - c - \frac{\mu _k}{\mu _0}{\bar{c}}\right\| _2\right\} \end{aligned}$$

converge R-linearly to zero.

Proof

From (37) we have that:

$$\begin{aligned} \mu _{k+1} \le (1-0.01\alpha _k)\mu _k, \end{aligned}$$

while, from (45), we know that \(\text {for all}\ k \ge 0\), there exists \({\bar{\alpha }} \ge \frac{{\bar{\kappa }}}{n^4}\) such that \(\alpha _k \ge {\bar{\alpha }}\). Hence, we can easily see that \(\mu _k \rightarrow 0\). On the other hand, from the neighbourhood conditions, we know that for all \(k \ge 0\):

$$\begin{aligned} \left\| Ax_k + \mu _k (y_k-\lambda _k) - b - \frac{\mu _k}{\mu _0}{\bar{b}}\right\| _2 \le C_N \frac{\mu _k}{\mu _0} \end{aligned}$$

and

$$\begin{aligned} \left\| -Qx_k + A^T y_k + z_k - \mu _k(x_k - \zeta _k) - c- \frac{\mu _k}{\mu _0}{\bar{c}}\right\| _2 \le C_N \frac{\mu _k}{\mu _0}. \end{aligned}$$

This completes the proof. \(\square \)

Theorem 2

Let \(\epsilon \in (0,1)\) be a given error tolerance. Choose a starting point for Algorithm IP-PMM as in (18), such that \(\mu _0 \le \frac{C}{\epsilon ^{\omega }}\) for some positive constants \(C,\ \omega \). Given Assumptions 1 and 2, there exists an index K with:

$$\begin{aligned} K = O\left( n^4\left| \log \left( \frac{1}{\epsilon }\right) \right| \right) \end{aligned}$$

such that the iterates \(\{w_k\} = \{(x_k^T,y_k^T,z_k^T)^T\}\) generated from Algorithm IP-PMM satisfy:

$$\begin{aligned} \mu _k \le \epsilon ,\quad \text {for all}\ \ k\ge K. \end{aligned}$$

Proof

The proof follows the developments in [36, 37] and is only provided here for completeness. Without loss of generality, we can chose \(\sigma _{\max } \le 0.5\) and then from Lemma 6, we know that there is a constant \({\bar{\kappa }}\) independent of n such that: \({\bar{a}} \ge \frac{{\bar{\kappa }}}{n^4}\), where \({\bar{a}}\) is the worst-case step-length. Given the latter, we know that the new iterate lies in the neighbourhood \({\mathcal {N}}_{\mu _{k+1}}(\zeta _{k+1},\lambda _{k+1})\) defined in (19). We also know, from (37), that:

$$\begin{aligned} \mu _{k+1} \le (1 - 0.01{\bar{a}})\mu _k \le \left( 1-0.01\frac{{\bar{\kappa }}}{n^4}\right) \mu _k,\ \ \ \ k = 0,1,2,\ldots \end{aligned}$$

By taking logarithms on both sides in the previous inequality, we get:

$$\begin{aligned} \log (\mu _{k+1}) \le \log \left( 1 - \frac{{\tilde{\kappa }}}{n^4}\right) + \log (\mu _k), \end{aligned}$$

where \({\tilde{\kappa }} = 0.01{\bar{\kappa }}\). By applying repeatedly the previous formula, and using the fact that \(\mu _0 \le \frac{C}{\epsilon ^{\omega }}\), we have:

$$\begin{aligned} \log (\mu _k) \le k \log \left( 1 - \frac{{\tilde{\kappa }}}{n^4}\right) + \log (\mu _0) \le k \log \left( 1 - \frac{{\tilde{\kappa }}}{n^4}\right) + \omega \log \left( \frac{1}{\epsilon }\right) + \log (C). \end{aligned}$$

We use the fact that \(\log (1+\beta ) \le \beta ,\ \ \text {for all}\ \beta > -1\) to obtain:

$$\begin{aligned} \log (\mu _k) \le k\left( -\frac{{\tilde{\kappa }}}{n^4}\right) + \omega \log \left( \frac{1}{\epsilon }\right) +\log (C). \end{aligned}$$

Hence, convergence is attained if:

$$\begin{aligned} k\left( -\frac{{\tilde{\kappa }}}{n^4}\right) + \omega \log \left( \frac{1}{\epsilon }\right) +\log (C) \le \log (\epsilon ). \end{aligned}$$

The latter holds for all k satisfying:

$$\begin{aligned} k \ge K = \frac{n^4}{{\tilde{\kappa }}}\left( (1+\omega )\log \left( \frac{1}{\epsilon }\right) +\log (C)\right) , \end{aligned}$$

which completes the proof. \(\square \)

Finally, we present the global convergence guarantee of Algorithm IP-PMM.

Theorem 3

Suppose that Algorithm IP-PMM terminates when a limit point is reached. Then, if Assumptions 1 and 2 hold, every limit point of \(\{(x_k,y_k,z_k)\}\) determines a primal-dual solution of the non-regularized pair (P)–(D).

Proof

From Theorem 1, we know that \(\{\mu _k\} \rightarrow 0\), and hence, there exists a sub-sequence \({\mathcal {K}} \subseteq {\mathbb {N}}\), such that:

$$\begin{aligned}&\left\{ Ax_k + \mu _k (y_k - \lambda _k) -b -\frac{\mu _k}{\mu _0}{\bar{b}}\right\} _{{\mathcal {K}}} \rightarrow 0,\\&\left\{ -Qx_k + A^T y_k + z_k - \mu _k (x_k - \zeta _k)-c-\frac{\mu _k}{\mu _0}{\bar{c}}\right\} _{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

However, since Assumptions 1 and 2 hold, we know from Lemma 4 that \(\{(x_k,y_k,z_k)\}\) is a bounded sequence. Hence, we obtain that:

$$\begin{aligned} \big \{ Ax_k - b\big \}_{{\mathcal {K}}} \rightarrow 0,\ \big \{-Qx_k + A^Ty_k +z_k -c\big \}_{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

One can readily observe that the limit point of the algorithm satisfies the conditions given in (2), since \(\mu _k = \frac{x_k^Tz_k}{n}\). \(\square \)

Remark 1

Let us notice that the above analysis required Assumption 2, and hence that \(\text {rank}(A) = m\). However, one motivation for using regularization is that the resulting algorithm is able to solve rank deficient problems. While the complexity result would not hold in this case, we are able to perform an analysis following exactly the developments in [12], which would guarantee the global convergence of IP-PMM, given certain assumptions (see [12]). The latter is omitted for brevity of presentation.

4 Infeasible problems

Let us now drop Assumptions 12, in order to analyze the behaviour of the algorithm in the case where an infeasible problem is tackled. Let us employ the following two premises:

Premise 1

During the iterations of Algorithm IP-PMM, the sequences \(\{\Vert y_k - \lambda _k\Vert _2\}\) and \(\{\Vert x_k - \zeta _k\Vert _2\}\), remain bounded.

Premise 2

There does not exist a primal-dual triple, satisfying the KKT conditions for the primal-dual pair (P)–(D).

The following analysis is based on the developments in [2] and [12]. However, in these papers such an analysis is proposed in order to derive convergence of an IPM, while here, we use it as a tool in order to construct a reliable and implementable infeasibility detection mechanism. In what follows, we show that Premises 1 and 2 are contradictory. In other words, if Premise 2 holds (which means that our problem is infeasible), then we will show that Premise 1 cannot hold, and hence the negation of Premise 1 is a necessary condition for infeasibility.

Lemma 7

Given Premise 1, and by assuming that \(x_k^Tz_k > \epsilon \), for some \(\epsilon >0\), for all iterations k of Algorithm IP-PMM, the Newton direction produced by (20) is uniformly bounded by a constant dependent only on n.

Proof

Let us use a variation of Theorem 1 given in [2]. This theorem states that if the following conditions are satisfied,

  1. 1.

    \(\mu _k > 0\),

  2. 2.

    there exists \({\bar{\epsilon }} > 0 \ :\ x_k^i z_k^i \ge {\bar{\epsilon }},\ \text {for all}\ i=\{1,2,\ldots ,n\}, \text {and for all}\ k\ge 0\),

  3. 3.

    and the matrix \(H_k = \mu _k I + Q + X_k^{-1}Z_k +\frac{1}{\mu _k}A^T A\) is positive definite,

then the Jacobian matrix in (20) is non-singular and has a uniformly bounded inverse. Note that (1.), (3.) are trivial to verify, based on the our assumption that \(x_k^T z_k = n \mu _k > \epsilon \). Condition (2.) follows since we know that our iterates lie in \({\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k)\), while we have \(x_k^T z_k > \epsilon \), by assumption. Indeed, from the neighbourhood conditions (19), we have that \(x_k^i z_k^i \ge \frac{\gamma _{\mu }}{n} x_k^T z_k\). Hence, there exists \({\bar{\epsilon }} = \frac{\gamma _{\mu } \epsilon }{n} > 0\) such that \(x_k^i z_k^i > {\bar{\epsilon }},\ \text {for all} \ k \ge 0,\ \text {for all} \ i = \{1,\ldots ,n\}\).

Finally, we have to show that the right hand side of (20) is uniformly bounded. To that end, we bound the right-hand side of the second block equation of (20) as follows:

$$\begin{aligned} \begin{aligned}&\left\| Ax_k + \sigma _k \mu _k (y_k - \lambda _k) - b - \frac{\sigma _k \mu _k}{\mu _0}{\bar{b}} + \mu _k \left( y_k - \lambda _k -\frac{{\bar{b}}}{\mu _0}\right) -\mu _k \left( y_k - \lambda _k -\frac{{\bar{b}}}{\mu _0}\right) \right\| _2 \\&\quad \le \frac{\mu _k}{\mu _0}\Vert {\tilde{b}}_k\Vert _2 + \mu _k\left\| y_k - \lambda _k - \frac{{\bar{b}}}{\mu _0}\right\| _2, \end{aligned} \end{aligned}$$

where we used the neighbourhood conditions (19). Boundedness follows from Premise 1. A similar reasoning applies for bounding the right-hand side of the first block equation, while the right-hand side of the third block equation is bounded directly from the neighbourhood conditions. Combining the previous completes the proof. \(\square \)

In the following lemma, we prove by contradiction that the parameter \(\mu _k\) of Algorithm IP-PMM converges to zero, given that Premise 1 holds. The proof is based on the developments in [12, 20] and is only partially given here, for ease of presentation.

Lemma 8

Given Premise 1, and a sequence \((x_k,y_k,z_k) \in {\mathcal {N}}_{\mu _k}(\zeta _k,\lambda _k)\) produced by Algorithm IP-PMM, the sequence \(\{\mu _k\}\) converges to zero.

Proof

Assume, by virtue of contradiction, that \(\mu _k > \epsilon \), \(\text {for all}\ k \ge 0\). Then, we know that the Newton direction obtained by the algorithm at every iteration, after solving (20), will be uniformly bounded by a constant dependent only on n, that is, there exists a positive constant \(C^{\dagger }\), such that \(\Vert (\varDelta x_k,\varDelta y_k,\varDelta z_k)\Vert _2 \le C^{\dagger }\). As in Lemma 6, we define:

$$\begin{aligned} {\tilde{r}}_p(\alpha ) = Ax_k(\alpha ) + \mu _k(\alpha ) (y_k(\alpha ) - \lambda _{k}) - b - \frac{\mu _k(\alpha )}{\mu _0}{\bar{b}}, \end{aligned}$$

and

$$\begin{aligned} {\tilde{r}}_d(\alpha ) = - Qx_k(\alpha ) + A^T y_k(\alpha ) + z_k(\alpha ) - \mu _k(\alpha ) (x_k(\alpha ) - \zeta _{k}) - c - \frac{\mu _k(\alpha )}{\mu _0}{\bar{c}}, \end{aligned}$$

for which we know that equalities (42) and (43) hold, respectively. Take any \(k \ge 0\) and define the following functions:

$$\begin{aligned} \begin{aligned} f_1(\alpha ) :=&(x_k+\alpha \varDelta x_k)^T(z_k + \alpha \varDelta z_k) - \left( 1 -\alpha \left( 1-\frac{\sigma _{\min }}{2}\right) \right) x_k^T z_k,\\ f_2^i(\alpha ) :=&(x_k^i+ \alpha \varDelta x_k^i)(z_k^i + \alpha \varDelta z_k^i) - \gamma _{\mu } \mu _k(\alpha ),\quad i=1,\ldots ,n,\\ f_3(\alpha ) :=&(1-0.01\alpha )x_k^Tz_k - (x_k(\alpha ))^T (z_k(\alpha )),\\ g_{2}(\alpha ) :=&\frac{\mu _k(\alpha )}{\mu _0}C_N - \Vert ({\tilde{r}}_p(\alpha ),{\tilde{r}}_d(\alpha ))\Vert _2, \end{aligned} \end{aligned}$$

where \(\mu _k(\alpha ) = \frac{(x_k + \alpha \varDelta x_k)^T (z_k + \alpha \varDelta z_k)}{n}\), \((x_k(\alpha ),y_k(\alpha ),z_k(\alpha )) = (x_k + \alpha \varDelta x_k,y_k + \alpha \varDelta y_k,z_k + \alpha \varDelta z_k)\). We would like to show that there exists \(\alpha ^* > 0\), such that:

$$\begin{aligned} f_1(\alpha ) \ge 0,\ f_2^i(\alpha ) \ge 0,\ \text {for all}\ i = 1,\ldots ,n,\ f_3(\alpha ) \ge 0,\ g_2(\alpha ) \ge 0,\ \text {for all}\ \alpha \in (0,\alpha ^*]. \end{aligned}$$

These conditions model the requirement that the next iteration of Algorithm IP-PMM must lie in the updated neighbourhood: \({\mathcal {N}}_{\mu _{k+1}}(\zeta _k,\lambda _{k})\). Note that Algorithm IP-PMM updates the parameters \(\lambda _k,\ \zeta _k\) only if the selected new iterate belongs to the new neighbourhood, defined using the updated parameters. Hence, it suffices to show that \((x_{k+1},y_{k+1},z_{k+1}) \in {\mathcal {N}}_{\mu _{k+1}}(\zeta _k,\lambda _{k})\). It is important to observe that we do not require that the scaled infeasibilities are bounded with respect to the semi-norm defined in (17). This is because the aforementioned norm requires that \(\text {rank}(A) = m\). In other words, we are using a slightly different neighbourhood, and that does not affect the global convergence of the method (but only its complexity).

Proving the existence of \(\alpha ^* > 0\), such that each of the aforementioned functions is positive, follows exactly the developments in Lemma 6, with the only difference being that the bounds on the directions are not explicitly specified in this case. Using the same methodology as in Lemma 6, while keeping in mind our assumption, namely \(x_k^T z_k > \epsilon \), and hence \( x_k^i z_k^i > {\bar{\epsilon }}\), we can show that:

$$\begin{aligned} \alpha ^* :=\min \bigg \{1,\frac{\sigma _{\min }\epsilon }{2(C^{\dagger })^2}, \frac{(1-\gamma _{\mu })\sigma _{\min }{\bar{\epsilon }}}{2(C^{\dagger })^2},\frac{0.49\epsilon }{2(C^{\dagger })^2}, \frac{\sigma _{\min }C_N \epsilon }{2\mu _0(\xi _2)} \bigg \}, \end{aligned}$$
(46)

where \(\xi _2\) is a bounded constant dependent on \(C^{\dagger }\), and defined as in (44). However, using the inequality:

$$\begin{aligned} \mu _{k+1} \le (1-0.01 \alpha )\mu _k,\ \text {for all}\ \alpha \in [0,\alpha ^*], \end{aligned}$$

we get that \(\mu _k \rightarrow 0\), which contradicts our assumption that \(\mu _k > \epsilon ,\ \text {for all}\ k\ge 0\), and completes the proof. \(\square \)

Finally, using the following theorem, we derive a necessary condition for infeasibility.

Theorem 4

Given Premise 2, i.e. there does not exist a KKT triple for the pair (P)–(D), then Premise 1 fails to hold.

Proof

By virtue of contradiction, let Premise 1 hold. In Lemma 8, we proved that given Premise 1, Algorithm IP-PMM produces iterates that belong to the neighbourhood (19) and \(\mu _k \rightarrow 0\). But from the neighbourhood conditions we can observe that:

$$\begin{aligned} \Vert Ax_k + \mu _k(y_k - \lambda _k) - b - \frac{\mu _k}{\mu _0}{\bar{b}} \Vert _2 \le C_N\frac{\mu _k}{\mu _0}, \end{aligned}$$

and

$$\begin{aligned} \Vert -Qx_k + A^Ty_k + z_k - \mu _k(x_k - \zeta _k)-c-\frac{\mu _k}{\mu _0}{\bar{c}}\Vert _2 \le C_N \frac{\mu _k}{\mu _0}. \end{aligned}$$

Hence, we can choose a sub-sequence \({\mathcal {K}} \subseteq {\mathbb {N}}\), for which:

$$\begin{aligned}&\left\{ Ax_k + \mu _k(y_k - \lambda _k) - b - \frac{\mu _k}{\mu _0}{\bar{b}} \right\} _{{\mathcal {K}}} \rightarrow 0,\quad \text{ and } \\&\left\{ -Qx_k + A^Ty_k + z_k - \mu _k(x_k - \zeta _k)-c-\frac{\mu _k}{\mu _0}{\bar{c}}\right\} _{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

But since \(\Vert y_k-\lambda _k\Vert _2\) and \(\Vert x_k - \zeta _k\Vert _2\) are bounded, while \(\mu _k \rightarrow 0\), we have that:

$$\begin{aligned} \{Ax_k - b\}_{{\mathcal {K}}} \rightarrow 0,\ \{c + Qx_k - A^T y_k - z_k\}_{{\mathcal {K}}} \rightarrow 0,\ \text {and}\ \{x_k^T z_k\}_{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

This contradicts Premise 2, i.e. that the pair (P)-(D) does not have a KKT triple, and completes the proof. \(\square \)

In the previous theorem, we proved that the negation of Premise 1 is a necessary condition for infeasibility, since otherwise, we arrive at a contradiction. Nevertheless, this does not mean that the condition is also sufficient. In order to obtain a more reliable test for infeasibility, that uses the previous result, we will have to use the properties of Algorithm IP-PMM. In particular, we can notice that if the primal-dual problem is infeasible, then the PMM sub-problem will stop being updated after a finite number of iterations. In that case, we know from Theorem 4 that the sequence \(\Vert (x_k-\zeta _k,y_k - \lambda _k)\Vert _2\) will grow unbounded. Hence, we can define a maximum number of iterations per PMM sub-problem, say \(K^{\dagger } \ge 0\), as well as a very large constant \(C^{\dagger } \ge 0\). Then, if \(\Vert (x_k-\zeta _k,y_k - \lambda _k)\Vert _2 > C^{\dagger }\) and \(k \ge K^{\dagger }\), the algorithm is terminated. The specific choices for these constants will be given in the next section.

5 Computational experience

In this section, we provide some implementation details and present computational results of the method, over a set of small to large scale linear and convex quadratic programming problems. The code was written in MATLAB and can be found in the following link:

https://www.maths.ed.ac.uk/ERGO/software.html (source link).

5.1 Implementation details

Our implementation deviates from the theory, in order to gain some additional control, as well as computational efficiency. Nevertheless, the theory has served as a guideline to tune the code reliably. There are two major differences between the practical implementation of IP-PMM and its theoretical algorithmic counterpart. Firstly, our implementation uses different penalty parameters for the proximal terms and the logarithmic barrier term. In particular, we define a primal proximal penalty \(\rho \), a dual proximal penalty \(\delta \) and the barrier parameter \(\mu \). Using the previous, the PMM Lagrangian function in (14), at an arbitrary iteration k of the algorithm, becomes:

$$\begin{aligned} \begin{aligned} {\mathcal {L}}^{PMM}_{\mu _k,\delta _k,\rho _k} (x;\zeta _k, \lambda _k) = c^Tx + \frac{1}{2}x^T Q x -\lambda _k^T (Ax - b) + \frac{1}{2\delta _k}\Vert Ax-b\Vert _2^2 + \frac{\rho _k}{2}\Vert x-\zeta _k\Vert _2^2, \end{aligned} \end{aligned}$$

and (15) is altered similarly. The second difference lies in the fact that we do not require the iterates of the method to lie in the neighbourhood defined in (19) in order to gain efficiency. In what follows, we provide further details concerning our implementation choices.

5.1.1 Free variables

The method takes as input problems in the following form:

$$\begin{aligned} \text {min}_{x} \ \left( c^Tx + \frac{1}{2}x^T Q x \right) , \ \ \text {s.t.} \ Ax = b, \ x^{I} \ge 0,\ x^{F}\ \text {free}, \end{aligned}$$

where \(I = \{1,\ldots ,n\} \backslash F\) is the set of indices indicating the non-negative variables. In particular, if a problem instance has only free variables, no logarithmic barrier is employed and the method reduces to a standard proximal method of multipliers. Of course in this case, the derived complexity result does not hold. Nevertheless, a global convergence result holds, as shown in [31]. In general, convex optimization problems with only equality constraints are usually easy to deal with, and the proposed algorithm behaves very well when solving such problems in practice.

5.1.2 Constraint matrix scaling

In the pre-processing stage, we check if the constraint matrix is well scaled, i.e if:

$$\begin{aligned} \bigg (\max _{i \in \{1,\ldots ,m\},j \in \{1,\ldots ,n\}}(|A^{ij}|) < 10\bigg ) \wedge \bigg (\min _{i \in \{1,\ldots ,m\},j \in \{1,\ldots ,n\} :|A^{ij}|> 0}(|A^{ij}|) > 0.1\bigg ). \end{aligned}$$

If the previous is not satisfied, we apply geometric scaling in the rows of A, that is, we multiply each row of A by a scalar of the form:

$$\begin{aligned} d^i = \frac{1}{\sqrt{\underset{j \in \{1,\ldots ,n\}}{\max }(|A^{i,j}|) \cdot \underset{j \in \{1,\ldots ,n\} :|A^{ij}| > 0}{\min }(|A^{i,j}|)}},\quad \text {for all}\ i \in \{1,\ldots ,m\}. \end{aligned}$$

However, for numerical stability, we find the largest integer \(p^i\), such that \(2^{p^i} \le d^i\) and we set \(d^i = 2^{p^i}\), \(\text {for all}\ i \in \{1,\ldots ,m\}\). Hence, the scaling factors are powers of two. Based on the IEEE representation of floating point numbers, multiplying by a power of 2 translates into an addition of this power to the exponent, without affecting the mantissa. This scaling technique is based on the one used within the GLPK solver (see [15]).

5.1.3 Starting point, Newton-step computation and step-length

We use a starting point, based on the developments in [23]. To construct it, we try to solve the pair of problems (P)–(D), ignoring the non-negativity constraints.

$$\begin{aligned} {\tilde{x}} = A^T(AA^T)^{-1}b,\ \ {\tilde{y}} = (AA^T)^{-1}A(c+Q{\tilde{x}}), \ \ {\tilde{z}} = c - A^T {\tilde{y}} + Q{\tilde{x}}. \end{aligned}$$

However, in order to ensure stability and efficiency, we regularize the matrix \(AA^T\) and employ the Preconditioned Conjugate Gradient (PCG) method to solve these systems (in order to avoid forming \(AA^T\)). We use the classical Jacobi preconditioner to accelerate PCG, i.e. \(P = \text {diag}(AA^T) + \delta I\), where \(\delta = 8\), is set as the regularization parameter.

Then, to guarantee positivity and sufficient magnitude of \(x_I,z_I\), we compute \(\delta _x = \text {max}\{-1.5\cdot \text {min}({\tilde{x}}^{I}),0\}\) and \(\delta _z = \text {max}\{-1.5 \cdot \text {min}({\tilde{z}}^{I}),0\}\) and we obtain:

$$\begin{aligned} \tilde{\delta _x} = \delta _x + 0.5\frac{({\tilde{x}}^I+ \delta _x e_{|I|})^T({\tilde{z}}^I+ \delta _z e_{|I|})}{\sum _{i=1}^{|I|} ({\tilde{z}}^{I(i)} + \delta _z)}, \quad \tilde{\delta _z} = \delta _z + 0.5\frac{({\tilde{x}}^I+ \delta _x e_{|I|})^T({\tilde{z}}^I+ \delta _z e_{|I|})}{\sum _{i=1}^{|I|} ({\tilde{x}}^{I(i)} + \delta _x)}, \end{aligned}$$

where \(e_{|I|}\) is the vector of ones of appropriate dimension. Finally, we set:

$$\begin{aligned} y_0 = {\tilde{y}},\ \ z_0^{I} = {\tilde{z}}^{I} + \tilde{\delta _z}e_{|I|},\ \ z_0^{F} = 0,\ \ x_0^{I} = {\tilde{x}}^{I} + \tilde{\delta _x}e_{|I|},\ \ x_0^{F} = {\tilde{x}}^F. \end{aligned}$$

In order to find the Newton step, we employ a widely used predictor-corrector method. The practical implementation deviates from the theory at this point, in order to gain computational efficiency. We provide the algorithmic scheme in Algorithm PC for completeness, but the reader is referred to [23], for an extensive review of the method. It is important to notice that the centering parameter (\(\sigma > 0\)) is not present in Algorithm PC. Solving two different systems, serves as a way of decreasing the infeasibility and allowing \(\mu _k\) (and hence \(\delta _k,\ \rho _k\)) to decrease.

figure b

We solve the systems (47) and (48), using the built-in MATLAB symmetric decomposition (i.e. ldl). Such a decomposition always exists, with D diagonal, for the aforementioned systems, since after introducing the regularization, the system matrices are guaranteed to be quasi-definite; a class of matrices known to be strongly factorizable, [34]. In order to exploit that, we change the default pivot threshold of ldl to a value slightly lower than the minimum allowed regularization value (\(\text {reg}_{thr}\); specified in Sect. 5.1.4). Using such a small pivot threshold guarantees that no 2x2 pivots will be employed during the factorization process.

5.1.4 PMM parameters

At this point, we discuss how we update the PMM sub-problems in practice, as well as how we tune the penalty parameters (regularization parameters) \(\delta ,\ \rho \). Notice that in Sect. 3 we set \(\delta _k = \rho _k = \mu _k\). While this is beneficial in theory, as it gives us a reliable way of tuning the penalty parameters of the algorithm, it is not very beneficial in practice, as it does not allow us to control the regularization parameters in the cases of extreme ill-conditioning. Hence, we allow the use of different penalty parameters connected to the PMM sub-problems, while enforcing both parameters (\(\delta _k,\ \rho _k\)) to decrease at the same rate as \(\mu _k\) (based on the theoretical results in Sects. 3 and 4).

On the other hand, the algorithm is more optimistic in practice than it is in theory. In particular, we do not consider the semi-norm (17) of the infeasibility, while we allow the update of the estimates \(\lambda _k\ ,\zeta _k\), to happen independently. In particular, the former is updated when the 2-norm of the primal infeasibility is sufficiently reduced, while the latter is updated based on the dual infeasibility.

More specifically, at the beginning of the optimization, we set: \(\delta _0 = 8,\ \rho _0 = 8\), \(\lambda _0 = y_0\), \(\zeta _0 = x_0\). Then, at the end of every iteration, we employ the algorithmic scheme given in Algorithm PEU. In order to ensure numerical stability, we do not allow \(\delta \) or \(\rho \) to become smaller than a suitable positive value, \(\text {reg}_{thr}\). We set: \(\text {reg}_{thr} = \max \big \{\frac{\text {tol}}{\max \{\Vert A\Vert ^2_{\infty },\Vert Q\Vert ^2_{\infty }\}},10^{-10}\big \}\). This value is based on the developments in [29], in order to ensure that we introduce a controlled perturbation in the eigenvalues of the non-regularized linear system. The reader is referred to [29] for an extensive study on the subject. If the factorization fails, we increase the regularization parameters by a factor of 10 and repeat the factorization. If the factorization fails while either \(\delta \) or \(\rho \) have reached their minimum allowed value (\(\text {reg}_{thr}\)), then we also increase this value by a factor of 10. If this occurs 5 consecutive times, the method is terminated with a message indicating ill-conditioning.

figure c

5.1.5 Termination criteria

There are four possible termination criteria. They are summarized in Algorithm TC. In the aforementioned algorithm, tol represents the error tolerance chosen by the user. Similarly, \(\text {IP}_{\text {maxit}}\) is the maximum allowed IPM iterations, also chosen by the user. On the other hand, \(\text {PMM}_{\text {maxit}}\) is a threshold indicating that the PMM sub-problem needs too many iterations before being updated (that is if \(k_{\text {PMM}} > \text {PMM}_{\text {maxit}}\)). We set \(\text {PMM}_{\text {maxit}} = 5\). When either \(\lambda _k\) or \(\zeta _k\) is updated, we set \(k_{\text {PMM}} =0\).

Let us now support the proposed infeasibility detection mechanism. In particular, notice that as long as the penalty parameters do not converge to zero, every PMM-subproblem must have a solution, even in the case of infeasibility. Hence, we expect convergence of the regularized primal (dual, respectively) infeasibility to zero, while from Sect. 4, we know that a necessary condition for infeasibility is that the sequence \(\Vert (x_k-\zeta _k,y_k - \lambda _k)\Vert _2\) diverges. If this behaviour is observed, while the PMM parameters \(\lambda _k,\ \zeta _k\) are not updated (which is not expected to happen in the feasible but rank deficient case), then we can conclude that the problem under consideration is infeasible.

figure d

5.2 Numerical results

At this point, we present the computational results obtained by solving a set of small to large scale linear and convex quadratic problems. In order to stress out the importance of regularization, we compare IP-PMM with a non-regularized IPM. The latter implementation, follows exactly from the implementation of IP-PMM, simply by fixing \(\delta \) and \(\rho \) to zero. In the non-regularized case, the minimum pivot of the \(\texttt {ldl}\) function is restored to its default value, in order to avoid numerical instability. Throughout all of the presented experiments, we set the number of maximum iterations to 200. It should be noted here, that we expect IP-PMM to require more iterations to converge, as compared to the non-regularized IPM. In turn, the Newton systems arising in IP-PMM, have better numerical properties (accelerating the factorization process), while overall the method is expected to be significantly more stable. In what follows, we demonstrate that this increase in the number of iterations is benign, in that it does not make the resulting method inefficient. In contrast, we provide computational evidence that the acceleration of the factorization process more than compensates for the increase in the number of iterations. The experiments were conducted on a PC with a 2.2GHz Intel Core i7 processor (hexa-core), 16GB RAM, run under Windows 10 operating system. The MATLAB version used was R2018b.

5.2.1 Linear programming problems

Let us compare the proposed method with the respective non-regularized implementation, over the Netlib collection, [27]. The test set consists of 96 linear programming problems. We set the desired tolerance to \(\text {tol} = 10^{-6}\). Firstly, we compare the two methods, without using the pre-solved version of the problem collection (e.g. allowing rank-deficient matrices). In this case, the non-regularized IPM converged for only 66 out of the total 96 problems. On the other hand, IP-PMM solved successfully the whole set, in 160 s (and a total of 2609 IPM iterations). Hence, one of the benefits of regularization, that of alleviating rank deficiency of the constraint matrix, becomes immediately obvious.

However, in order to explore more potential benefits of regularization, we run the algorithm on a pre-solved Netlib library. In the pre-solved set, the non-regularized IPM converged for 93 out of 96 problems. The three failures occurred due to instability of the Newton system. The overall time spent was 353 s (and a total of 1871 IPM iterations). On the other hand, IP-PMM solved the whole set in 161 s (and a total of 2367 iterations). Two more benefits of regularization become obvious here. Firstly, we can observe that numerical stability can be a problem in a standard IPM, even if we ensure that the constraint matrices are of full row rank. Secondly, notice that despite the fact that IP-PMM required 26% more iterations, it still solved the whole set in 55% less CPU time. This is because in IP-PMM, only diagonal pivots are allowed during the factorization. We could enforce the same condition on the non-regularized IPM, but then significantly more problems would fail to converge (22/96) due to numerical instability.

In Table 1, we collect statistics from the runs of the two methods over some medium scale instances of the pre-solved Netlib test set.

Table 1 Medium-scale Netlib problems

From Table 1, it becomes obvious that the factorization efficiency is significantly improved by the introduction of the regularization terms. In all of the presented instances, IP-PMM converged needing more iterations, but requiring less CPU time.

In order to summarize the comparison of the two methods, we include Fig. 1, which contains the performance profiles, over the pre-solved Netlib set, of the two methods. IP-PMM is represented by the green line (consisted of triangles), while non-regularized IPM by the blue line (consisted of stars). In Fig. 1a, we present the performance profile with respect to time required for convergence, while in Fig. 1b, the performance profile with respect to the number of iterations. In both figures, the horizontal axis is in logarithmic scale, and it represents the ratio with respect to the best performance achieved by one of the two methods, for each problem. The vertical axis shows the percentage of problems solved by each method, for different values of the performance ratio. Robustness is “measured” by the maximum achievable percentage, while efficiency by the rate of increase of each of the lines (faster increase translates to better efficiency). For more information about performance profiles, we refer the reader to [10], where this benchmarking was proposed. One can see that all of our previous observations are verified in Fig. 1.

Fig. 1
figure 1

Performance profiles over the pre-solved Netlib test set

5.2.2 Infeasible problems

In order to assess the accuracy of the proposed infeasibility detection criteria, we attempt to solve 28 infeasible problems, coming from the Netlib infeasible collection ([27], see also Infeasible Problems). For 22 out of the 28 problems, the method was able to recognize that the problem under consideration is infeasible, and exit before the maximum number of iterations was reached. There were 4 problems, for which the method terminated after reaching the maximum number of iterations. For 1 problem the method was terminated early due to numerical instability. Finally, there was one problem for which the method converged to the least squares solution, which satisfied the optimality conditions for a tolerance of \(10^{-6}\). Overall, IP-PMM run all 28 infeasible problems in 34 s (and a total of 1813 IPM iterations). The proposed infeasibility detection mechanism had a 78% rate of success over the infeasible test set, while no feasible problem was misclassified as infeasible. A more accurate infeasibility detection mechanism could be possible, however, the proposed approach is easy to implement and cheap from the computational point of view. Nevertheless, the interested reader is referred to [4, 26] and the references therein, for various other infeasibility detection methods.

5.2.3 Quadratic programming problems

Next, we present the comparison of the two methods over the Maros–Mészáros test set [22], which is comprised of 122 convex quadratic programming problems. Notice that in the previous experiments, we used the pre-solved version of the collection. However, we do not have a pre-solved version of this test set available. Since the focus of the paper is not on the pre-solve phase of convex problems, we present the comparison of the two methods over the set, without applying any pre-processing. As a consequence, non-regularized IPM fails to solve 27 out of the total 122 problems. However, only 11 out of 27 failed due to rank deficiency. The remaining 16 failures occurred due to numerical instability. On the contrary, IP-PMM solved the whole set successfully in 127 s (and a total of 3014 iterations). As before, the required tolerance was set to \(10^{-6}\).

In Table 2, we collect statistics from the runs of the two methods over some medium scale instances of the Maros–Mészáros collection.

Table 2 Medium-scale Maros–Mészáros problems

In order to summarize the comparison of the two methods, we include Fig. 2, which contains the performance profiles, over the Maros–Mészáros test set, of the two methods. IP-PMM is represented by the green line (consisted of triangles), while non-regularized IPM by the blue line (consisted of stars). In Fig. 2a, we present the performance profile with respect to time required for convergence, while in Fig. 2b, the performance profile with respect to the number of iterations.

Fig. 2
figure 2

Performance profiles over the Maros–Mészáros test set

Similar remarks can be made here, as those given when summarizing the linear programming experiments. One can readily observe the importance of the stability introduced by the regularization. On the other hand, the overhead in terms of number of iterations that is introduced due to regularization, is acceptable due to the acceleration of the factorization (since we are guaranteed to have a quasi-definite augmented system).

5.2.4 Verification of the theory

We have already presented the benefits of using regularization in interior point methods. A question arises, as to whether a regularized IPM can actually find an exact solution of the problem under consideration. Theoretically, we have proven this to be the case. However, in practice one is not allowed to decrease the regularization parameters indefinitely, since ill-conditioning will become a problem. Based on the theory of augmented Lagrangian methods, one knows that sufficiently small regularization parameters suffice for exactness (see [6], among others). In what follows, we provide a “table of robustness” of IP-PMM. We run the method over the Netlib and the Maros-Mészáros collections, for decreasing values of the required tolerance and report the number of problems that converged.

Table 3 Table of robustness

One can observe from Table 3 that IP-PMM is sufficiently robust. Even in the case where a 10 digit accurate solution is required, the method is able to maintain a success rate larger than 91%.

5.2.5 Large scale problems

All of our previous experiments were conducted on small to medium scale linear and convex quadratic programming problems. We have shown (both theoretically and practically) that the proposed method is reliable. However, it is worth mentioning the limitations of the current approach. Since we employ exact factorization during the iterations of the IPM, we expect that the method will be limited in terms of the size of the problems it can solve. The main bottleneck arises from the factorization, which does not scale well in terms of processing time and memory requirements. In order to explore the limitations, in Table 4 we provide the statistics of the runs of the method over a small set of large scale problems. It contains the number of non-zeros of the constraint matrices, as well as the time needed to solve the problem. The tolerance used in these experiments was \(10^{-6}\).

Table 4 Large-scale problems

From Table 4, it can be observed that as the dimension of the problem grows, the time to convergence is significantly increased. This increase in time is disproportionate for some problems. The reason for that, is that the required memory could exceed the available RAM, in which case the swap-file is activated. Access to the swap memory is extremely slow and hence the time could potentially increase disproportionately. On the other hand, we retrieve two failures due to lack of available memory. The previous issues could potentially be addressed by the use of iterative methods. Such methods, embedded in the IP-PMM framework, could significantly relax the memory as well as the processing requirements, at the expense of providing inexact directions. Combining IP-PMM (which is stable and reliable) with such an inexact scheme (which could accelerate the IPM iterations) seems to be a viable and competitive alternative and will be addressed in a future work.

6 Conclusions

In this paper, we present an algorithm suitable for solving convex quadratic programs. It arises from the combination of an infeasible interior point method with the proximal method of multipliers (IP-PMM). The method is interpreted as a primal-dual regularized IPM, and we prove that it is guaranteed to converge in a polynomial number of iterations, under standard assumptions. As the algorithm relies only on one penalty parameter, we use the well-known theory of IPMs to tune it. In particular, we treat this penalty as a barrier parameter, and hence the method is well-behaved independently of the problem under consideration. Additionally, we derive a necessary condition for infeasibility and use it to construct an infeasibility detection mechanism. The algorithm is implemented, and the reliability of the method is demonstrated. At the expense of some extra iterations, regularization improves the numerical properties of the interior point iterations. The increase in the number of iterations is benign, since factorization efficiency as well as stability is gained. Not only the method remains efficient, but it outperforms a similar non-regularized IPM scheme.

We observe the limitations of the current approach, due to the cost of factorization, and it is expected that embedding iterative methods in the current scheme might further improve the scalability of the algorithm at the expense of inexact directions. Since the reliability of IP-PMM is demonstrated, it only seems reasonable to allow for approximate Newton directions and still expect fast convergence. Hence, a future goal is to extend the theory as well as the implementation, in order to allow the use of iterative methods for solving the Newton system.