An efficient Gauss–Newton algorithm for solving regularized total least squares problems

Zare, Hossein; Hajarian, Masoud

doi:10.1007/s11075-021-01145-2

An efficient Gauss–Newton algorithm for solving regularized total least squares problems

Original Paper
Published: 18 June 2021

Volume 89, pages 1049–1073, (2022)
Cite this article

Download PDF

Numerical Algorithms Aims and scope Submit manuscript

An efficient Gauss–Newton algorithm for solving regularized total least squares problems

Download PDF

Hossein Zare¹ &
Masoud Hajarian²

1050 Accesses
5 Citations
Explore all metrics

Abstract

The total least squares (TLS) method is a well-known technique for solving an overdetermined linear system of equations Ax ≈ b, that is appropriate when both the coefficient matrix A and the right-hand side vector b are contaminated by some noise. For ill-posed TLS poblems, regularization techniques are necessary to stabilize the computed solution; otherwise, TLS produces a noise-dominant output. In this paper, we show that the regularized total least squares (RTLS) problem can be reformulated as a nonlinear least squares problem and can be solved by the Gauss–Newton method. Due to the nature of the RTLS problem, we present an appropriate method to choose a good regularization parameter and also a good initial guess. Finally, the efficiency of the proposed method is examined by some test problems.

A Gauss–Newton iteration for Total Least Squares problems

Article 29 July 2017

Dario Fasino & Antonio Fazzi

Gauss–Newton–Kurchatov Method for the Solution of Nonlinear Least-Squares Problems

Article 04 April 2020

S. М. Shakhno

A Gauss–Newton method for mixed least squares-total least squares problems

Article 01 March 2024

Qiaohua Liu, Shan Wang & Yimin Wei

1 Introduction

Many problems in data fitting and application areas such as signal and image processing give rise to an inexact linear system of equations of the form

$$ Ax\approx b,~ A\in \mathbb{R}^{m\times n},~ m\ge n,~ b\in \mathbb{R}^{m}, $$

(1.1)

where the matrix A or the vector b is contaminated by some unknown errors which may stem from modeling, measurement inaccuracies, or problem discretization. In the classical least squares (LS) approach [4, 31] to (1.1), the matrix A is assumed to be free of error, and all errors are confined to the vector b. However, in practical situations, this assumption is frequently unrealistic and the coefficient matrix A is also affected by the errors. For example, in many data fitting problems, not only the right-hand side b but also the entries of A are obtained by measurements. Another case occurs when the matrix A is a discrete approximation of the underlying continuous operator. In such cases, it is more appropriate to consider the total least squares (TLS) problem to find an accurate solution to the system. The TLS problem is defined as follows.

Definition 1

For given an overdetermined linear system of equations Ax ≈ b, where $A\in \mathbb {R}^{m\times n}$, $b\in \mathbb {R}^{m}$, and m ≥ n, the TLS problem is to find a matrix $E\in {{\mathbb {R}}^{m\times n}}$ and a vector $r\in {\mathbb {R}}^{m}$ such that b + r ∈Range(A + E) and $\left \|[E,r] \right \|_{F}^{2}=\left \|E \right \|_{F}^{2}+\left \|r\right \|^{2}$ is minimal, i.e.,

$$ {\mathop{\min_{E,r}}}\left\|[E,r] \right\|_{F}^{2}~~\text{subject~to}~~(A+E)x=b+r. $$

(1.2)

If a minimizing pair (E,r) is found, then any x satisfying (A + E)x = b + r is called a TLS solution.

Throughout this paper, $\left \| \cdot \right \|_{F}$ denotes the Frobenius norm, while $\left \| \cdot \right \|$ denotes the Euclidean norm. It should be noted that the TLS problem may not have a solution and, when the solution exists, it may not be unique [4, 20]. However, in most problems, it has a unique solution which can be obtained by a simple scaling of the right singular vector of [A,b] corresponding to its smallest singular value, as is described in the following theorem [19, 47].

Theorem 1

Let C = UΣV^T be the singular value decomposition of the matrix $C=[A,b]\in {\mathbb {R}}^{m\times (n+1)}$, where

$${\Sigma}=\text{diag}(\sigma_{1},\dots,\sigma_{n},\sigma_{n+1}) \quad \text{and} \quad \sigma_{1} \geq {\dots} \geq \sigma_{n} \geq \sigma_{n+1}>0.$$

Let v_n+ 1 be the last column of V and let v_i,n+ 1 be the i-th component of v_n+ 1. If v_{n+ 1,n+ 1}≠ 0 and σ_n > σ_n+ 1 then the TLS problem (1.2) has a unique solution x_TLS, which is given by

$$ {{x}_{TLS}}=-\frac{1}{v_{n+1,n+1}}\begin{bmatrix}v_{1,n+1}, \dots, v_{n,n+1}\end{bmatrix}^{T}, $$

with the corresponding correction matrix $[E,r]=-\sigma _{n+1} u_{n+1} v_{n+1}^{T}$ for which $\left \|[E,r] \right \|_{F}^{2}=\sigma _{n+1}^{2}$.

In this theorem, the condition v_{n+ 1,n+ 1}≠ 0 implies that a TLS solution exists, and σ_n > σ_n+ 1, which means the smallest singular value σ_n+ 1 is simple, guarantees its uniqueness. It can be seen that these two conditions together are equivalent to $\sigma _{n}^{\prime }>\sigma _{n+1}$, where $\sigma _{n}^{\prime }$ is the smallest singular value of A; see [47, Corollary 3.4]. By using the above theorem, it can also be easily seen that the solution x_TLS satisfies the following system of equations

$$ \begin{bmatrix}A^{T}A & A^{T}b \\ b^{T}A & b^{T}b\end{bmatrix}\begin{bmatrix}x \\-1\end{bmatrix} =\sigma_{n+1}^{2} \begin{bmatrix}x \\-1\end{bmatrix}, $$

and hence it can be determined from $(A^{T}A-\sigma _{n+1}^{2} I)x=A^{T}b$. From the above equation, the vector $\hat {v}=[x_{TLS}^{T},-1]^{T}$ is an eigenvector corresponding to the smallest eigenvalue $\lambda _{\min \limits }=\sigma _{n+1}^{2}$ of the matrix M = [A,b]^T[A,b]. Therefore, $\hat {v}$ minimizes the Rayleigh quotient v^TMv/v^Tv of the matrix M. Hence, it follows that the unique solution x_TLS to the TLS problem minimizes the following “orthogonal distance” function

$$ F(x)=\frac{\left\| Ax-b\right\|^{2}}{1+\left\| x\right\|^{2}}. $$

The TLS problem was firstly introduced to the numerical linear algebra community by Golub and Van Loan [19] in 1980. However, under the names orthogonal regression and errors-in-variables, this fitting method has a long history in the statistical literature. The univariate problem, i.e., n = 1, was already studied in 1877 by Adcock [1]. See [36, 47] for more historical notes. So far, the TLS problems have been used in various fields of applied science such as signal processing, automatic control, system theory, statistics, physics, economics, biology, and medicine [47]. See also [35] for a comprehensive list of references about the total least squares problem and related methods and their applications.

The goal of this paper is to find the TLS solution of problems where the matrix A is severely ill-conditioned, i.e., the singular values of A decay gradually to zero without a significant gap and the ratio between the largest and the smallest nonzero singular values is large. Such problems arise, for instance, from the discretization of ill-posed problems, such as Fredholm integral equations of the first kind [13, 23]. In these problems, due to ill-conditioning of the matrix A and the presence of noise in the problem’s data, straightforward methods such as least squares, total least squares, or other classical methods often provide unreliable solutions. Therefore, regularization techniques are necessary to obtain an acceptable solution.

There are two main approaches to regularization of ill-posed TLS problems in the literature. The first approach is based on truncating the small singular values of the augmented matrix [A,b] which is called truncated TLS (TTLS) method. In [16], Fierro et al. studied filtering properties of the truncated TLS solution and they proposed an iterative algorithm based on Lanczos bidiagonalization for computing truncated TLS solutions. The second approach reduces the effect of the noise on the computed solution by adding a quadratic constraint as ${{\left \| Lx \right \|}^{2}}\!\le \! \delta ^{2} $ to the TLS problem [18]. By using this approach, the regularized TLS (RTLS) problem is given by

$$ \mathop{\min_{E,r} }\left\|[E,r] \right\|_{F}^{2}~~ \mathrm{subject~to~}~(A+E)x=b+r,~\left\| Lx \right\|^{2}\le \delta^{2}. $$

(1.3)

Here, $L\in \mathbb {R}^{k\times n} (k\leq n)$ is the regularization matrix, which defines a (semi)norm on the solution, and δ > 0 is the regularization parameter, which controls the size of the solution. Typically, L is chosen as the identity matrix or a discrete approximation of the first- or second-order derivative operators. We say that the RTLS problem is in the standard form if L = I, where I is the identity matrix, and in general form if L≠I. As shown before, the problem of finding the solution x_TLS to the TLS problem is equivalent to solve the following minimization problem:

$$ \mathop{\min_{x\in \mathbb{R}^{n}}}F(x)=\frac{\left\| Ax-b\right\|^{2}}{1+\left\| x\right\|^{2}}. $$

(1.4)

Thus, the RTLS problem (1.3) can be reformulated as follows:

$$ \underset{x\in \mathbb{R}^{n}}{\min}\frac{\left\| Ax-b\right\|^{2}}{1+\left\| x\right\|^{2}}~~\text{subject~to}~{{\left\| Lx \right\|}^{2}}\le \delta^{2}. $$

(1.5)

A closely related problem, studied in [2], is Tikhonov total least squares problem where regularization of the TLS problem is done by adding the typical Tikhonov term $\lambda \left \| Lx \right \|^{2}$ to the TLS objective function (1.4):

$$ {\underset{x\in \mathbb{R}^{n}}{\min}}\frac{\left\| Ax-b\right\|^{2}}{1+\left\| x\right\|^{2}}+\lambda\left\| Lx \right\|^{2}. $$

(1.6)

By comparing the corresponding Lagrangian functions of problems (1.5) and (1.6), it can be easily seen that they are equivalent in the sense that for each value of λ > 0 there exists a corresponding value of δ > 0 such that the solutions of (1.5) and (1.6) are identical.

The following theorem gives a sufficient condition for the attainability of the minimum in (1.5) or (1.6).

Theorem 2

Let $N\in \mathbb {R}^{n\times k}$ be a matrix whose columns form an orthogonal basis for the nullspace of L. If it holds that $\sigma _{\min \limits }([AN, b])<\sigma _{\min \limits }(AN)$, then the minimum of the regularized total least squares problem (1.6) is attained with ${F}_{\lambda }^{*} \leq \sigma _{\min \limits }^{2}([AN, b])$, where

$${{F}_{\lambda}^{*}}={{\underset{x\in \mathbb{R}^{n}}{\min}}}\left\{ \frac{{{\left\| Ax-b \right\|}^{2}}}{1+{{\left\| x \right\|}^{2}}}+\lambda {{\left\| Lx \right\|}^{2}} \right\}.$$

Proof

See [2], Theorem 3.1. □

An important difference between the RTLS problem and the Tikhonov regularization of least squares problems, is that the RTLS problem and its different formulations do not have closed-form solutions. However, in [18], Golub et al. proved the following characterization of the RTLS solution.

Theorem 3

If the constraint $\left \| Lx \right \|^{2}\le \delta ^{2}$ of the RTLS problem (1.3) is active, then the RTLS solution x = x_RTLS satisfies the equation

$$ (A^{T}A+\lambda_{L} L^{T}L+\lambda_{I} I)x=A^{T}b, $$

where the parameters λ_L and λ_I are given by

$$ \lambda_{L}= \mu (1+\left\| x\right\|^{2}),\quad \lambda_{I}=-\frac{\left\| Ax-b\right\|^{2}}{1+\left\| x\right\|^{2}}, $$

and μ > 0 is the Lagrange multiplier. Moreover, we have

$$ \left\|[E,r] \right\|_{F}^{2}=-\lambda_{I}=b^{T}(b-Ax)-\lambda_{L} \delta^{2}. $$

The RTLS problem (1.3), and its different formulations, was extensively studied in the literature and is still an engaging source of attention among scholars [2, 3, 18, 21, 26,27,28, 30, 32, 33, 43, 45, 49]. In [18], Golub, Hansen, and O’Leary presented and analyzed the properties of the RTLS problem (1.3). They discussed its relation to the classical regularized least squares problem and also suggested a method for solving it. Utilizing an approach similar to Rayleigh quotient iteration for the TLS problem [5], Guo and Renaut [21, 43] derived an eigenproblem for the RTLS problem and solved it using the iterative inverse power method. Inspired by the fact that a quadratically constrained least squares problem can be solved via a quadratic eigenvalue problem [17], Sima, Van Huffel, and Golub [45] introduced an iterative method for solving the RTLS problem, where the solution is approximated in a few iterations, each consisting in solving a quadratic eigenvalue problem. Lampe and Voss [26, 27] studied the acceleration of Renaut and Guo’s algorithm and of the algorithm from Sima et al. and applied a nonlinear Arnoldi method, which projects the solution onto a lower dimension subspace determined by the algorithm. Lee, Fu, and Barlow [33] solved the RTLS problem by introducing a system of nonlinear equations and by applying a two-parameter Newton method for solving the nonlinear system. Later, in [32], Lee and Barlow applied two projection-based algorithms to this method, in order to solve the large-scale RTLS problems. Recently, Yang et al. [49] have reformulated the standard form of the RTLS problem as a zero finding problem of a univariate parametric function, and have applied the classical Newton method to the reformulated problem.

In this paper, we reformulate the RTLS problem (with a general regularization matrix) as a nonlinear least squares problem and then we apply the Gauss–Newton method to solve it. The rest of this paper is organized as follows. In Section 2, we review some preliminaries. In Section 3, we solve the RTLS problem via Gauss–Newton method. We also present a multi-objective optimization-based method for choosing an appropriate regularization parameter for the RTLS problem. In Section 4, we present some experimental results to show the efficiency of presented method. Finally, we give a conclusion in Section 5.

2 Preliminaries

2.1 The Gauss–Newton method for nonlinear least squares problems

In view of the formulation (1.4), the TLS problem can also be rewritten as the following nonlinear least squares problem:

$$ {{\underset{x\in \mathbb{R}^{n}}{\min}}}{\left\| f(x)\right\|}^{2},~~~f(x)=\frac{Ax-b}{\sqrt{1+x^{T}x}}. $$

(2.1)

This is a standard nonlinear optimization problem and thus any optimization technique such as a gradient descent or Newton’s method can be used here. Recently, Fasino and Fazzi [14] have proposed a Gauss–Newton iteration for solving the problem (2.1). The purpose of this paper is to solve the RTLS problems by the Gauss–Newton method. In this subsection, we briefly review the Gauss–Newton method for solving nonlinear least squares problems, then in the next section we will show how the Gauss–Newton method can be applied for solving the RTLS problems.

Consider the nonlinear least squares problem

$$ {{\underset{x\in \mathbb{R}^{n}}{\min}}}{F(x)},~~~F(x)={\left\| f(x)\right\|}^{2}, $$

(2.2)

where $f:\mathbb {R}^{n}\rightarrow \mathbb {R}^{m}$ is a twice continuously differentiable function with m ≥ n. The i-th component of the m-vector f is denoted by f_i. Let the Jacobian matrix of f be denoted by J, and let the matrix G_i denote the Hessian matrix of f_i. Then the gradient and Hessian matrix of F are given by

$$ g(x)=2J(x)^{T}f(x),\qquad H(x)=2(J(x)^{T}J(x)+Q(x)), $$

where $Q(x)={\sum }_{i=1}^{m}f_{i}(x)G_{i}(x)$. Note that the necessary conditions for x^∗ to be a local minimizer of F are that g(x^∗) = 0, and H(x^∗) is positive semi-definite. Moreover, if g(x^∗) = 0 and H(x^∗) is positive definite, then x^∗ is a strong local minimizer. In Newton’s method, for given iterate x_k, the next iterate x_k+ 1 is given by x_k+ 1 = x_k + h_k, where H(x_k)h_k = −g(x_k) or equivalently

$$ (J(x_{k})^{T}J(x_{k})+Q(x_{k}))h_{k}=-J(x_{k})^{T}f(x_{k}). $$

The principal advantage of Newton’s method is that it has a quadratic rate of convergence near the optimal solution. Unfortunately, in spite of this attractive feature, it has several serious drawbacks:

At the very least, it may not be defined because of the singularity of H(x_k) at a given point x_k;
It requires that H(x_k) be positive definite, otherwise the search direction h_k = −H(x_k)^− 1g(x_k) is not a descent direction;
We need to compute the full Hessian matrix. In many problems, the term Q(x) of the Hessian is either unavailable or inconvenient to obtain, and it is too expensive to approximate by finite differences;
It requires to solve a linear system at each iteration that is costly from a computational point of view.

The Gauss–Newton method [4, Section 9.2] is a modification of Newton’s method in which the term Q(x) is neglected in the Hessian. One motivation for this approach is that Q(x) vanishes for zero residual problems and therefore might be negligible for small residual problems. In the Gauss–Newton method, the next iterate is given by x_k+ 1 = x_k + h_k, where the direction h_k is obtained by the solution of the equations

$$ (J(x_{k})^{T}J(x_{k}))h_{k}=-J(x_{k})^{T}f(x_{k}). $$

(2.3)

It can be seen that the solution of (2.3) is a solution of the linear least squares problem

$$ \min_{h\in \mathbb{R}^{n}}\left\| f(x_{k})+J(x_{k})h\right\|^{2}, $$

(2.4)

and is unique if J(x_k) has full column rank. The linear least squares problem (2.4) can be solved efficiently by means of orthogonal transformations or iterative methods [4]. Algorithm 1 describes the Gauss–Newton method.

In practice, the Gauss–Newton method is used with the line search

$$ x_{k+1}=x_{k}+\alpha_{k} h_{k}, $$

which is the so-called damped Gauss–Newton method, where α_k is a step length. Here, α_k is computed by Armijo backtracking line search [38]. This strategy guarantees the descent of the objective function in each step and makes the Gauss–Newton algorithm locally convergent and often globally convergent^{Footnote 1}. The backtracking procedure is described in Algorithm 2, where the initial step length $\bar {\alpha }=1$, the contraction factor ρ = 1/2, and the constant c = 10^− 4 are usually chosen in practical implementations.

2.2 Some basic concepts of the multi-objective optimization

Multi-objective optimization is a useful tool to optimize a number of conflicting objectives. This approach which is also referred to as multi-criteria or vector optimization is widely used in scientific and engineering applications. In Section 3.2, we present a method for choosing the regularization parameter for RTLS problems that involves solving a multi-objective optimization problem. Therefore, in this section, we review some basic concepts of the multi-objective optimization. For more details on multi-objective optimization problems, see [9, 11, 12, 37].

A multi-objective optimization problem can be formulated as

$$ \min f(x)={[{f_{1}}(x),{f_{2}}(x),\dots,{f_{k}}(x)]^{T}} \quad \mathrm{subject~to} \quad x \in \mathcal{S}, $$

(2.5)

where $ \mathcal {S} \subseteq \mathbb {R}^{n}$ is a non-empty set and $f_{i}:\mathbb {R}^{n} \rightarrow \mathbb {R}, i=1,\dots ,k$, are the conflicting objective functions to be minimized simultaneously. The set $\mathcal {S}$ is called the feasible region and the set

$$\mathcal{Z}=\{ f(x)| x \in \mathcal{S}\}\subseteq \mathbb{R}^{k}$$

is called the feasible objective region.

As pointed out in [37, p. 11], multi-objective problems are in a sense ill-defined because there is no natural ordering in the objective space. Therefore, the concept of minimization in single-objective optimization does not apply directly in the multi-objective framework. Here, the concept of Pareto optimality has to be introduced. To describe the concept, we define the following partial order relation on the feasible objective region $\mathcal {Z}$: given two objective vectors z and $z^{\prime }$, we say that $z\leq z^{\prime }$ if and only if $ z_{i} \leq z_{i}^{\prime }$ for all $i=1,2,\dots ,k$. The strict partial order relation < on $\mathcal {Z}$ is defined in a similar way. Using these relations, the concept of dominance is defined as follows. Given two vectors x and $x^{\prime }$ in the feasible region $\mathcal {S}$, we say that:

$x\preceq x^{\prime }$ (x weakly dominates $x^{\prime }$) if and only if $f(x)\leq f(x^{\prime })$.
$x\prec x^{\prime }$ (x dominates $x^{\prime }$) if and only if $x\preceq x^{\prime }$ and at least one component of f(x) is strictly less than the corresponding one of $f(x^{\prime })$.
$x \sim x^{\prime }$ (x is indifferent to $x^{\prime }$) if neither x dominates $x^{\prime }$ nor $x^{\prime }$ dominates x.

We can now formally state the concept of Pareto optimality in the multi-objective optimization setting. A point $x^{*}\in \mathcal {S}$ is called a Pareto optimal or a Pareto minimizer for the problem (2.5), if there is no other point in $\mathcal {S}$ that dominates it. In other words, $x^{*}\in \mathcal {S}$ is Pareto optimal if there does not exist $x\in \mathcal {S}$ such that f_i(x) ≤ f_i(x^∗) for $i=1,2,\dots ,k$ and f_j(x) < f_j(x^∗) for at least one index j. The set of all the Pareto optimal solutions is denoted by $\mathcal {P}$. The set $\mathcal {F}=\{ f(x^{*})|x^{*}\in \mathcal {P}\}$, containing the image of all Pareto optimal solutions, is called the Pareto front. The objective vector $z^{*}\in \mathbb {R}^{k}$ is called the ideal objective vector if its components $z_{i}^{*}$ are obtained by minimizing each of the objective functions individually subject to the feasible region, that is by solving,

$$z_{i}^{*}=\min f_{i}(x) \quad \mathrm{subject~to} \quad x \in \mathcal{S},$$

for $i=1,\dots ,k$. In general, because of conflicts among the objectives, the ideal objective vector is not attainable (i.e., $z^{*} \notin \mathcal {Z}$). However, it can be used as a reference point. Unlike the ideal objective vector which represents the lower bound of each objective in the entire feasible region $\mathcal {S}$, the nadir objective vector, z^nad, represents the upper bound of each objective in the entire set of all the Pareto optimal solutions. In other words, the components of the nadir objective vector are given by

$$z_{i}^{\text{nad}}=\max f_{i}(x) \quad \mathrm{subject~to} \quad x \in \mathcal{P},$$

for $i=1,\dots ,k$.

3 The Gauss–Newton method for solving RTLS problems

3.1 Solving RTLS problems by the Gauss–Newton method

In this section, our discussion is based on the formulation (1.6) of the RTLS problem. We notice that the objective function of this problem is not convex, and thus a descent algorithm may fail to converge. However, if the starting point x₀ is sufficiently close to the optimal solution x^∗, and the regularization parameter λ > 0 is chosen appropriately, one can expect good performance by applying standard minimization techniques. Note that the RTLS problem (1.6) can be rewritten as the following nonlinear least squares problem:

$$ \min_{x\in \mathbb{R}^{n}}F_{\lambda} (x)=\left\| f_{\lambda} (x)\right\|^{2}, $$

where

$$ f_{\lambda} (x)=\left[\begin{array}{c}f(x) \\ \sqrt \lambda Lx \end{array}\right], \quad f(x)=\frac{{Ax - b}}{{\sqrt {1 + {x^{T}}x} }}, $$

and the Jacobian matrix of f_λ is given by

$$ J_{\lambda} (x)=\left[\begin{array}{cc}J(x) \\ \sqrt \lambda L \end{array}\right], \quad J(x)=\frac{{A}}{\sqrt{1+x^{T}x}}-\frac{(Ax-b)x^{T}}{\sqrt{(1 + x^{T}x)^{3}}}. $$

(3.1)

Thus, the Gauss–Newton method can also be applied to obtain the optimal solution of (1.6) if we can find an appropriate regularization parameter λ and a good initial guess x₀. In this section, we discuss how to choose these parameters appropriately.

For a fixed λ > 0, the first-order optimality condition for the RTLS problem (1.6) yields:

$$ \left( A^{T}A+\lambda_{L} L^{T}L + \lambda_{I}I\right)x=A^{T}b, $$

(3.2)

where

$$ \lambda_{L}=\lambda (1+\left\| x\right\|^{2}), \quad \text{and} \quad \lambda_{I} =-\frac{\left\| Ax-b\right\|^{2}}{1+\left\| x\right\|^{2}}. $$

(3.3)

From the set of solutions of (3.2), the solution that corresponds to the smallest value of |λ_I| is the solution x_RTLS of (1.6). Note that the matrix $\left (A^{T}A+\lambda _{L} L^{T}L + \lambda _{I}I\right )$ is symmetric positive definite if $\sigma _{\min \limits }(A^{T}A+\lambda _{L} L^{T}L)>|\lambda _{I}|$, so this condition ensures that the RTLS problem has a unique solution. Unfortunately, this condition does not hold in general, and, as noted in [30], the matrix $\left (A^{T}A+\lambda _{L} L^{T}L + \lambda _{I}I\right )$ is often indefinite. It can only be shown that at the solution x_RTLS this matrix is positive semi-definite (where in case of a singular matrix, the RTLS solution might be non-unique). Some results on non-unique RTLS solutions can be found in [27]. A sufficient condition for $\left (A^{T}A+\lambda _{L} L^{T}L + \lambda _{I}I\right )$ to be positive definite is given in [33, Theorem 2.1].

Let x(λ_L,λ_I) be a solution of (3.2). Clearly, to guarantee that x(λ_L,λ_I) is a stable solution, the parameters λ_L and λ_I should be adjusted in such a way that the matrix $\left (A^{T}A+\lambda _{L} L^{T}L + \lambda _{I}I\right )$ is well-conditioned. On the other hand, from the basic idea of Tikhonov regularization, the solution x(λ_L,λ_I) should provide at the same time a small fractional residual ${\left \|Ax-b\right \|}^{2}/({1+\left \|x\right \|^{2})}$ and a moderate value of the penalty term $\left \| Lx \right \|^{2}$. In the following, we use these two key properties to find a good regularization parameter λ. The main idea of our parameter-choice method is that if an appropriate approximation x(λ_L,λ_I) of the optimal regularized solution x^∗ (in the sense of a minimal distance to the exact solution) is available, then from (3.3) one can obtain an appropriate approximation of the optimal regularization parameter via the relation $\lambda = \lambda _{L}/(1+\left \|x(\lambda _{L},\lambda _{I})\right \|^{2})$. Moreover, x₀ = x(λ_L,λ_I) can be used as an initial guess for the Gauss–Newton method.

We denote by $x(\lambda _{L}^{*},\lambda _{I}^{*})$ the regularized solution for which $\|x(\lambda _{L}^{*},\lambda _{I}^{*})-x_{\text {true}}\|$ is smallest. The corresponding optimal regularization parameter is denoted by λ^∗. We assume that the matrix $\left (A^{T}A+\lambda _{L}^{*} L^{T}L + \lambda _{I}^{*}I\right )$ is positive definite and thus $x(\lambda _{L}^{*},\lambda _{I}^{*})$ is unique. We also assume that the error-free linear system A_truex = b_true associated with the available error-contaminated linear system Ax ≈ b is zero-residual (i.e., b_true ∈Range(A_true)). As also pointed out in [47, p. 200], the TLS problem typically applies to such cases. This assumption implies that if the perturbations in the input data are sufficiently small then the value of $|\lambda _{I}^{*}|$ is close to 0 (note that $|\lambda _{I}^{*}|$ is equal to the fractional residual ${\left \|Ax(\lambda _{L}^{*},\lambda _{I}^{*})-b\right \|}^{2}/({1+\left \|x(\lambda _{L}^{*},\lambda _{I}^{*})\right \|^{2})}$ associated with the solution $x(\lambda _{L}^{*},\lambda _{I}^{*})$).

Remark 1

The optimal regularization parameter λ^∗ can be accurately computed only in simulation examples, when x_true is known. In practice, the exact solution x_true is unknown, and thus any parameter-choice method can only give an approximation of λ^∗, using the available data.

We notice that when λ_I≠ 0, it has a deregularizing effect on the solution of (3.2). Therefore, it seems reasonable to keep the value of |λ_I| as small as possible. Noting that the value of $|\lambda _{I}^{*}|$ is expected to be close to 0, we may expect that the solutions $x(\lambda _{L}^{*},\lambda _{I}^{*})$ and $x(\lambda _{L}^{*})=x(\lambda _{L}^{*},0)$ to be quite close to each other. This is confirmed by the following theorem.

Theorem 4

Let $x(\lambda _{L}^{*})$ and $x(\lambda _{L}^{*},\lambda _{I}^{*})$ be the solutions of

$$\left( A^{T}A+\lambda_{L}^{*} L^{T}L \right)x=A^{T}b,$$

and

$$\left( A^{T}A+\lambda_{L}^{*} L^{T}L +\lambda_{I}^{*}I\right)x=A^{T}b,$$

respectively, where $\lambda _{L}^{*}$ and $\lambda _{I}^{*}$ are optimal parameters satisfying (3.3). Assume that Null(A) ∩Null(L) = {0} and that $\left (A^{T}A+\lambda _{L}^{*} L^{T}L + \lambda _{I}^{*}I\right )$ is positive definite. Then

$$ \frac{\left\| x(\lambda_{L}^{*},\lambda_{I}^{*})-x(\lambda_{L}^{*})\right\|}{\left\| x(\lambda_{L}^{*})\right\|} \leq \frac{|\lambda_{I}^{*}|}{\hat{\sigma}_{n}^{2}-|\lambda_{I}^{*}|}, $$

(3.4)

where $\hat {\sigma }_{n}$ is the smallest singular value of $\left [\begin {array}{cc} A \\ \sqrt {\lambda _{L}^{*}} L \end {array}\right ]$.

Proof

Since $x(\lambda _{L}^{*})$ is the solution of $\left (A^{T}A+\lambda _{L}^{*} L^{T}L \right )x=A^{T}b$, we have

$$ \left( A^{T}A+\lambda_{L}^{*} L^{T}L +\lambda_{I}^{*}I\right)x(\lambda_{L}^{*})=A^{T}b+\lambda_{I}^{*} x(\lambda_{L}^{*}). $$

(3.5)

Subtracting (3.5) from the equation $\left (A^{T}A+\lambda _{L}^{*} L^{T}L +\lambda _{I}^{*} I\right )x(\lambda _{L}^{*},\lambda _{I}^{*})=A^{T}b$, we can obtain the relation

$$ x(\lambda_{L}^{*},\lambda_{I}^{*})-x(\lambda_{L}^{*})= -\lambda_{I}^{*} \left( A^{T}A+\lambda_{L}^{*} L^{T}L +\lambda_{I}^{*}I\right)^{-1} x(\lambda_{L}^{*}), $$

and hence by taking norms of the both sides we have

$$ \frac{\left\| x(\lambda_{L}^{*},\lambda_{I}^{*})-x(\lambda_{L}^{*})\right\|}{\left\| x(\lambda_{L}^{*})\right\|} \leq |\lambda_{I}^{*}| \left\|\left( A^{T}A+\lambda_{L}^{*} L^{T}L +\lambda_{I}^{*}I\right)^{-1}\right\|. $$

Now, the result follows from the fact that

$$ \left\|\left( A^{T}A + \lambda_{L}^{*} L^{T}L + \lambda_{I}^{*}I\right)^{-1}\right\| = \frac{1}{\sigma_{\min}\left( A^{T}A + \lambda_{L}^{*} L^{T}L + \lambda_{I}^{*}I\right)} = \frac{1}{\hat{\sigma}_{n}^{2} + \lambda_{I}^{*}} = \frac{1}{\hat{\sigma}_{n}^{2} - |\lambda_{I}^{*}|}. $$

□

Remark 2

If Ax = b is inconsistent, i.e., b∉Range(A), then from (3.3) the value of $\lambda _{I}^{*}$ is negative and thus we have

$$ \begin{array}{@{}rcl@{}} \text{cond}_{2}(A^{T}A+\lambda_{L}^{*} L^{T}L+\lambda_{I}^{*} I)&=&\frac{\sigma_{\max}(A^{T}A+\lambda_{L}^{*} L^{T}L+\lambda_{I}^{*} I)}{\sigma_{\min}(A^{T}A+\lambda_{L}^{*} L^{T}L+\lambda_{I}^{*} I)}= \frac{\hat{\sigma}_{1}^{2} -|\lambda_{I}^{*}|}{\hat{\sigma}_{n}^{2} -|\lambda_{I}^{*}|} \\&>&\frac{\hat{\sigma}_{1}^{2}}{\hat{\sigma}_{n}^{2}}=\text{cond}_{2}(A^{T}A+\lambda_{L}^{*} L^{T}L), \end{array} $$

where $\hat {\sigma }_{1}$ and $\hat {\sigma }_{n}$ are the largest and smallest singular values of $\left [\begin {array}{cc} A \\ \sqrt {\lambda _{L}^{*}} L \end {array}\right ]$, respectively. We conclude that the linear system $\left (A^{T}A+\lambda _{L}^{*} L^{T}L \right )x=A^{T}b$ is better conditioned than the linear system $\left (A^{T}A+\lambda _{L}^{*} L^{T}L +\lambda _{I}^{*}I\right )x=A^{T}b$.

Remark 3

Under the assumption $|\lambda _{I}^{*}|\ll \hat {\sigma }_{n}^{2}$, it follows from (3.4) that the solutions $x(\lambda _{L}^{*},\lambda _{I}^{*})$ and $x(\lambda _{L}^{*})$ differ only by terms of order $O(|\lambda _{I}^{*}|)$. This result is similar to that for the difference of the LS and TLS solutions; see, e.g., [4, p. 180] and references therein.

The above discussion motivates us to find a good approximation of the optimal solution $x(\lambda _{L}^{*},\lambda _{I}^{*})$ among all regularized solutions x(λ_L). For this purpose, we find the parameter λ_L such that the regularized least squares (RLS) solution x(λ_L) = (A^TA + λ_LL^TL)^− 1A^Tb provides a good compromise between the smallness of the quantities ${\left \|Ax-b\right \|}^{2}/(1+\left \|x\right \|^{2})$ and $\left \| Lx \right \|^{2}$. Then, the original regularization parameter λ can be approximated by the relation $\lambda = \lambda _{L} /(1+\left \|x(\lambda _{L})\right \|^{2})$ and x(λ_L) can be used as an appropriate initial guess for the Gauss–Newton method. For ill-posed problems where the value of $|\lambda _{I}^{*}|$ is quite small, the initial guess x₀ = x(λ_L) obtained with this method is very close to the solution $x(\lambda _{L}^{*},\lambda _{I}^{*})$. Thus, only a few iterations of the Gauss–Newton method are typically required to find the optimal solution. Another advantage of this construction technique for x₀ is that the Jacobian J(x) in (3.1) can be approximated by $\tilde {J}(x)=A/\sqrt {1+x^{T}x}$. In fact, we have

$$ \left\|\tilde{J}(x)-J(x) \right\|= \left\|\frac{(Ax-b)x^{T}}{\sqrt{(1+x^{T}x)^{3}}} \right\|= \frac{\|x\|}{1+\|x\|^{2}} \|f(x)\|, $$

where f(x) is defined as in (2.1). This approximation may be useful in large-scale problems, since $\tilde {J}(x)$ is obtained by a cheap scalar-matrix multiplication. We summarize the resulting method in Algorithm 3.

3.2 Determination of λ _L

The purpose of this section is to present a method for finding an appropriate regularization parameter λ_L for line 1 of Algorithm 3. For regularized least squares problems (where only the right-hand side b is assumed to be contaminated by noise), there are several well-known methods in order to choose the regularization parameter, such as discrepancy principle, generalized cross-validation (GCV), and L-curve [23]. We refer the reader to [6, 41] and [23, Chapter 7] for surveys and comparative studies of the different parameter choice methods for regularized least squares problems. Some recent studies on choosing the regularization parameter are given in [15, 34, 39, 40, 44]. In principle, it is possible to apply all of the methods mentioned above in the RTLS problems, by considering the fractional residual ${\left \|Ax-b\right \|}^{2}/({1+\left \|x\right \|^{2})}$ instead of the ordinary residual $\left \|Ax-b\right \|^{2}$. However, the L-curve is perhaps the most widely used strategy for choosing the regularization parameter in the RTLS problems; see e.g., [29, 43]. This method was originally designed for choosing the regularization parameter for regularized least squares problems [25, 31]. It is based on a parametric plot of the (semi)norm ∥Lx_λ∥, versus the residual norm ∥Ax_λ − b∥, where x_λ is the regularized solution corresponding to the regularization parameter λ. The scale can be logarithmic or classical, and the norms might be 2-norm or general p-norms [10, 48]. The obtained plot often has a characteristic L-shaped appearance and the chosen parameter is the one which is the closest to the left bottom corner. We note that in the recent decades some other versions of the L-curve have also been introduced in the literature; see e.g., [7, 8, 42].

In [50], inspired by the idea behind the L-curve, a multi-objective optimization-based method was introduced for choosing the regularization parameter. The main idea of this method is to scale the residual norm $\left \| Ax_{\lambda }-b \right \|$ and the (semi)norm $\left \| Lx_{\lambda } \right \|$ by a suitable method and then minimizing the sum of them. Here, we use a same technique to find an appropriate regularization parameter λ_L for the line 1 of Algorithm 3.

Consider the multi-objective optimization problem

$$ (P):~\underset{\lambda_{L}>0}{\mathop{\min }} g(\lambda_{L})=\left[\begin{array}{cc} g_{1}(\lambda_{L}) \\ g_{2}(\lambda_{L}) \end{array} \right], $$

(3.6)

where $g_{1}(\lambda _{L})={\left \|Ax(\lambda _{L})-b\right \|}/\sqrt {1+\left \|x(\lambda _{L})\right \|^{2}}$, $g_{2}(\lambda _{L})=\left \| Lx(\lambda _{L}) \right \|$ and $x(\lambda _{L}) ={{(A^{T}A+\lambda _{L} L^{T}L)}^{-1}}{A^{T}}b$. We first investigate the impact of the regularization parameter λ_L on the objective functions g₁ and g₂ in the following theorem. This theorem is a straightforward generalization of the Theorem 2 in [50]. For simplicity, throughout this section, we denote the regularization parameter λ_L and the regularized solution x(λ_L) by β and x_β, respectively.

Theorem 5

Assume that Null(A) ∩Null(L) = {0}. Given two functions $g_{1}(\upbeta )={\left \|Ax_{\upbeta }-b\right \|}/\sqrt {1+\left \|x_{\upbeta }\right \|^{2}}$ and $g_{2}(\upbeta )=\left \| Lx_{\upbeta } \right \|$, where $x_{\upbeta } ={{(A^{T}A+\upbeta L^{T}L)}^{-1}}{A^{T}}b$ the following statements are hold:

(a)
For 0 < β₁ < β₂ we have g₁(β₁) < g₁(β₂) and g₂(β₂) < g₂(β₁). In other words, g₁ is a strictly increasing function of β whereas g₂ is a strictly decreasing function of β.
(b)
$g_{1}(\upbeta ) \rightarrow \left \| Ax_{LS} -b\right \|/\sqrt {1+\left \|x_{LS}\right \|^{2}}$ and $g_{2}(\upbeta ) \rightarrow \left \| Lx_{LS} \right \| $ as $\upbeta \rightarrow 0$.
(c)
$g_{2}(\upbeta ) \rightarrow 0$ and $g_{1}(\upbeta ) \rightarrow \left \|(A(AP_{L})^{\dagger }-I) b \right \|/\sqrt {1+ \left \|(AP_{L})^{\dagger }b\right \|^{2}}$ as $\upbeta \rightarrow \infty $, where P_L = I − L^‡L and ^‡ denotes the Moore–Penrose pseudoinverse.

Theorem 5 gives the lower and upper bounds of the two objectives g₁(β) and g₂(β). From this theorem, it can be seen that the vectors

$$ \left[ \frac{ \left\| Ax_{LS} -b\right\|}{\sqrt{1+\left\|x_{LS}\right\|^{2}}} {\text{ ,}} 0 \right]^{T} \quad \text{and} \quad \left[\frac{\left\|(A(AP_{L})^{\dagger}-I) b \right\|}{\sqrt{1+ \left\|(AP_{L})^{\dagger}b\right\|^{2}}}{\text{ ,}} \left\|Lx_{LS}\right\|\right]^{T}, $$

are the ideal objective vector and the nadir objective vector for the problem (3.6), respectively. They will be applied later to normalize the objectives g₁(β) and g₂(β) such that their values are of approximately the same magnitude. It can be easily shown that composing each objective of the problem (3.6) with a strictly increasing function does not change the set of its Pareto minimizers, as is described in the following theorem. This theorem coincides essentially with Theorem 3 in [50].

Theorem 6

Let

$$\hat{g}=\left[\begin{array}{cc} T_{1}(g_{1}) \\ T_{2}(g_{2}) \end{array} \right], $$

where $T_{1},T_{2}:\mathbb {R}\rightarrow \mathbb {R}$ are two strictly increasing real functions. Then, $\tilde {\upbeta }$ is a Pareto minimizer of $\hat {g}$ if and only if it is a Pareto minimizer of g, where g is the objective function of the problem (3.6). In other words, the two problems $\underset {\upbeta >0}{\mathop {{\min \limits } }} g(\upbeta )$ and $\underset {\upbeta >0}{\mathop {{\min \limits } }} \hat {g}(\upbeta )$ are equivalent.

One general approach in order to solve a multi-objective problem is converting the problem to a single-objective problem and then solving the single-objective problem by a standard optimization method. A well-known technique to convert a multi-objective problem to a single-objective problem, assuming that the components of the objective vector are non negative, is replacing the objective vector by the p-norm of the problem. Moreover, the objective function of the new problem, in the case where p is finite, is replaced by its p th power that is a differentiable function. For example, by using this method, the optimization problem (3.6) is replaced by:

$$ \underset{\upbeta>0}{\mathop{\min }} {\left\| g(\upbeta) \right\|_{p}^{p}}=g_{1}(\upbeta)^{p}+g_{2}(\upbeta)^{p}, \qquad p>0. $$

(3.7)

In particular, when p = 1, this method is equivalent to another well-known method which is called weighted-sum method when the weights are considered to be uniform [37]. It can be easily seen that if $\tilde {\upbeta }$ is a global minimizer for the single-objective problem (3.7), then $\tilde {\upbeta }$ is a Pareto minimizer for the multi-objective problem (3.6). The problem (3.7) is useful for our purpose because it minimizes the distance between the feasible objective region of the problem (3.6) and the reference point (0,0), which is near the ideal objective vector.

Now, we are ready to present a new method for choosing the regularization parameter β = λ_L. We start from this point that when the regularization parameter β is chosen very small, the value of $g_{2}(\upbeta )=\left \|Lx_{\upbeta }\right \|$ is very large whereas the value of $g_{1}(\upbeta )={\left \|Ax_{\upbeta }-b\right \|}/\sqrt {1+\left \|x_{\upbeta }\right \|^{2}}$ is very small. See Fig. 1, for example. Moreover, when β is gradually increasing, the value of g₁(β) increases slowly while the value of g₂(β) rapidly tends to zero. Also, the functions g₁ and g₂ have very different ranges. Therefore, it is not easy to find a Pareto minimizer of

$$ g(\upbeta)=\left[\begin{array}{cc} g_{1}(\upbeta) \\ g_{2}(\upbeta) \end{array} \right], $$

without suitable transformations.

To overcome the above difficulties, we try to (approximately) equalize the ranges of the functions g₁ and g₂ via introducing some suitable transformations. Let $T:\mathbb {R}^{+}\rightarrow \mathbb {R}$ be an increasing function and consider the following function:

$$ \hat{g}(\upbeta)=\left[\begin{array}{cc} \hat{g}_{1}(\upbeta) \\ \hat{g}_{2}(\upbeta) \end{array} \right]= \left[\begin{array}{cc} T(g_{1}(\upbeta)) \\ T(g_{2}(\upbeta)) \end{array} \right]. $$

According to Theorem 6, two problems $\underset {\upbeta >0}{\mathop {{\min \limits } }} g(\upbeta )$ and $\underset {\upbeta >0}{\mathop {{\min \limits } }} \hat {g}(\upbeta )$ are equivalent. Noting that generally it is necessary to reduce the range of g₂, the function $T(x)=\arctan (x)$ is a good choice for our purpose. In the next step, we scale $\hat {g}_{1}$ and $ \hat {g}_{2}$ such that the scaled functions have the same range of values. We define the scaled functions as

$$ \hat{g}_{s}(\upbeta)=\left[\begin{array}{cc} S(\hat{g}_{1}) \\ S(\hat{g}_{2}) \end{array} \right], $$

where S is the following scaling function

$$ S(\hat{g}_{i}(\upbeta)) =\frac{\hat{g}_{i}(\upbeta)-\min\hat{g}_{i}}{\max\hat{g}_{i}-\min\hat{g}_{i}}, $$

(3.8)

and $\min \limits \hat {g}_{i}$ and $\max \limits \hat {g}_{i}$ are lower and upper bounds, respectively, for the objective function $\hat {g}_{i}$. Note that by using this scaling, the values of $\hat {g}_{i}$ lie in the interval [0,1]. Again, according to Theorem 6, two problems $\underset {\upbeta >0}{\mathop {{\min \limits } }} g(\upbeta )$ and $\underset {\upbeta >0}{\mathop {{\min \limits } }} \hat {g}_{s}(\upbeta )$ are equivalent. Finally, we choose the regularization parameter β^∗ by finding the global minimizer of problem $\underset {\upbeta >0}{\mathop {{\min \limits } }} {\left \| \hat {g}_{s}(\upbeta ) \right \|_{p}^{p}}$. By considering $T(x)=\arctan (x)$, and p = 1, we need to minimize the following function

$$ K(\upbeta)=\frac{\arctan (g_{1}(\upbeta))}{\max\arctan (g_{1}(\upbeta))}+\frac{\arctan (g_{2}(\upbeta))}{\max\arctan (g_{2}(\upbeta))}. $$

(3.9)

Notice that $\min \limits \arctan (g_{1}(\upbeta ))$ and $\min \limits \arctan (g_{2}(\upbeta ))$ are considered as zero in the above equation because according to Theorem 5, $g_{1}(\upbeta ) \rightarrow \left \| Ax_{LS} -b\right \|^{2}/(1+\left \|x_{LS}\right \|^{2})\approx 0$ and $g_{2}(\upbeta ) \rightarrow 0$, as $\upbeta \rightarrow 0$ and $\upbeta \rightarrow \infty $ respectively. To minimize the objective function (3.9), we need the values of $\max \limits \arctan (g_{1}(\upbeta ))$ and $\max \limits \arctan (g_{2}(\upbeta ))$. Since g₁ is a strictly increasing function of β, by considering Theorem 5, the value of $\max \limits \arctan (g_{1}(\upbeta ))$ can be approximated by evaluating $\arctan (g_{1}(\upbeta ))$ at a sufficiently big number (say β = 10¹⁰⁰). In a similar way, since g₂ is a strictly decreasing function of β, one may get an approximation of $\max \limits \arctan (g_{2}(\upbeta ))$ by evaluating $\arctan (g_{2}(\upbeta ))$ at a sufficiently small number (say β = eps). One of the best characteristics of the proposed objective function (3.9) is its uni-modality over a sufficiently big interval containing the minimizer. See Fig. 2, for example. Therefore, the minimizer of (3.9) can be easily found by using a bracketing search procedure such as golden section search method [46].

To find an initial search interval [β_f,β_ℓ] for the golden section search method, suppose that the minimum and the maximum values of the regularization parameter are β^(min) and β^(max), respectively. We generate a finite number of regularization parameters between β^(min) and β^(max) such that they form a geometric progression. If the number of all parameters is N, then they are given by

$${\upbeta}_{j}={\upbeta}^{\mathrm{(min)}}q^{j-1}, \qquad j=1,\dots,N, \qquad q=\left( \frac{{\upbeta}^{\mathrm{(max)}}}{{\upbeta}^{\mathrm{(min)}}}\right)^{1/(N-1)}.$$

In the next step, we evaluate the function (3.9) at these parameters. If the minimum of these values is attained at β_j, then the initial search interval can be chosen as

$$ [{\upbeta}_{f},{\upbeta}_{\ell}]=[{\upbeta}_{\max\{1, j-1\}},{\upbeta}_{\min\{ j+1,N\}}].$$

4 Numerical experiments

In this section, we apply the GNRTLS algorithm to solve a set of test problems, taken from the Hansen’s “Regularization Tools” package [24]. We select the BAART, FOXGOOD, GRAVITY, HEAT, PHILLIPS, and SHAW problems for generating the matrix A, the vector b, and the exact solution x. Each problem is derived from discretization of an integral equation of the first kind. These problems can be found in the manual of the package. For each test problem, we consider the dimensions 400 × 400, 800 × 800, 1000 × 1000, and 1200 × 1200. Using the data, we generate the noise-contaminated matrix A and the noise-contaminated vector b as

$$ A\leftarrow A+\sigma (E/\left\|E \right\|_{F}) , \quad b \leftarrow b+\sigma(e/\left\|e \right\|), $$

where the elements of the noises E and e are created by the MATLAB “randn” function and σ is the level of the noise. In the examples we consider the noise level σ = 1e − 2. We also consider the regularization matrix L as the following approximations to the first and second derivative operators:

$$ L = L_{1} = \left[\!\begin{array}{cccc}1 & -1 & &\\ & {\ddots} & {\ddots} & \\ & & 1 & -1 \end{array}\!\right] \!\in\! \mathbb{R}^{(n-1)\times n},\! \qquad L = L_{2} = \left[\!\begin{array}{ccccc} & -2 & 1 & &\\ & {\ddots} & {\ddots} & {\ddots} & \\ & & 1 & -2 & 1 \end{array}\!\right] \!\in\! \mathbb{R}^{(n-2)\times n}. $$

For the GNRTLS algorithm, the regularization parameter λ and the initial guess x₀ are determined by the relations $\lambda = \lambda _{L}/(1+\left \|x(\lambda _{L})\right \|^{2})$, and x₀ = x(λ_L) = (A^TA + λ_LL^TL)^− 1A^Tb, respectively, where λ_L is computed by minimizing the function (3.9). We have illustrated a sample run of the presented method in Figs. 3, 4, 5, 6, 7, and 8. More comprehensive numerical results are reported in Tables 1 and 2. For the examples in these tables, we set ε = 1e − 6 and the maximum of iterations maxit = 10 for the GNRTLS algorithm. For finding an initial search interval for the golden section search algorithm, we choose β^(min) = 16eps, where eps is the machine epsilon, β^(max) = 100, and N = 20 in our numerical examples. Also, the iteration of the golden section search algorithm is terminated when |β_ℓ −β_f| < ε = 10^− 4. CPU times for computing the regularization parameter by the golden section search algorithm and computing the RTLS solution by the Gauss–Newton method are reported as an ordered pair (T_GSS,T_GN). All computations were performed in MATLAB R2015a on a Pentium(R) 2.2 GHz laptop with a 32-bit operating system and 4 GB RAM. Each experiment has been implemented 10 times, and the average of the results is reported. A sample comparison of the L-curve and the presented method for determining the regularization parameter is given in Table 3. To find the corner of the L-curve, the MATLAB routine l_corner in [24] is used. The table shows that the L-curve sometimes gives a starting point x₀ that is far from the optimal solution.

Table 1 Numerical results for the GNRTLS algorithm with L = L₁ and σ = 0.01

Full size table

Table 2 Numerical results for the GNRTLS algorithm with L = L₂ and σ = 0.01

Full size table

Table 3 Average relative errors in x₀ = x(λ_L), where σ = 0.01

Full size table

Now, we give an image deblurring example to show that the approximated Jacobian $\tilde {J}(x)=A/\sqrt {1+x^{T}x}$ can be useful in the implementation of the algorithm. We consider the blur test problem from “Regularization Tools” package [24] with the input arguments = 40, = 3 and = 0.7. Via considering $A\in \mathbb {R}^{1600\times 1600}$ (the blurring matrix) and $x_{\text {true}}\in \mathbb {R}^{1600}$ (the columnwise stacked version of the uncontaminated image) returned by this test problem, we construct the blurred and noisy image b as follows:

$$ b=\hat{A}x_{\text{true}}+\sigma(e/\left\|e \right\|), \quad \hat{A} = A+\sigma (E/\left\|E \right\|_{F}). $$

As before, the elements of the noises E and e are created by the MATLAB “randn” function and σ = 0.01 is the level of the noise. We use the following first-order discrete derivative operator for two space dimensions as the regularization matrix:

$$ L_{1,2D}=\left[\begin{array}{cc} L_{1}\otimes I_{N} \\ I_{N} \otimes L_{1} \end{array}\right], $$

where $L_{1}\in \mathbb {R}^{(N-1)\times N}$ is defined as before. Figure 9 shows the result of this example. Also, Fig. 10 shows the convergence history of this example when using J and $\tilde {J}$.

From Tables 1 and 2 and Fig. 9, it can be seen that if the parameter λ_L is determined by minimization of the objective function (3.9), then generally the GNRTLS algorithm provides an appropriate approximate solution. Our experiments on the problems with a small or moderate size show that such problems can be effectively solved by the GNRTLS algorithm. However, as the size of problems increases, the computational cost of the GNRTLS algorithm increases. Thus, for large-scale problems, it is necessary to use iterative or projection methods to reduce the requirements of computation and storage. It would be interesting to combine the proposed method with a Krylov subspace method or to use an iteratively regularized Gauss–Newton method for solving large-scale problems.

5 Conclusions

We showed that the Gauss–Newton method can be applied for solving RTLS problems if a good initial guess and a good regularization parameter are available. We presented a multi-objective–based method for finding a regularization parameter λ_L such that the regularized least squares (RLS) solution x(λ_L) = (A^TA + λ_LL^TL)^− 1A^Tb makes at the same time both ${\left \|Ax-b\right \|}^{2}/(1+\left \|x\right \|^{2})$ and $\left \| Lx \right \|^{2}$ small, to the extent possible. We showed that if such a λ_L is available, then the regularization parameter λ for the RTLS problem can be approximated by the relation $\lambda = \lambda _{L} /(1+\left \|x(\lambda _{L})\right \|^{2})$, and x(λ_L) can be used as an appropriate initial guess for the Gauss–Newton method. Finally, several numerical experiments showed that the presented method gives accurate approximate solutions for the RTLS problems.

Notes

Global convergence means convergence to a local minimum from any initial point [22].

References

Adcock, R. J.: Note on the method of least squares. Analyst 4 (6), 183–184 (1877)
Article Google Scholar
Beck, A., Ben-Tal, A.: On the solution of the Tikhonov regularization of the total least squares problem. SIAM J. Optim. 17(1), 98–118 (2006)
Article MathSciNet Google Scholar
Beck, A., Ben-Tal, A., Teboulle, M.: Finding a global optimal solution for a quadratically constrained fractional quadratic problem with applications to the regularized total least squares. SIAM J. Matrix Anal. Appl. 28(2), 425–445 (2006)
Article MathSciNet Google Scholar
Björck, Å.: Numerical methods for least squares problems. SIAM (1996)
Björck, Å., Heggernes, P.x, Matstoms, P.: Methods for large scale total least squares problems. SIAM J. Matrix Anal. Appl. 22(2), 413–429 (2000)
Article MathSciNet Google Scholar
Buccini, A., Park, Y., Reichel, L.: Comparison of a-posteriori parameter choice rules for linear discrete ill-posed problems. J. Comput. Appl. Math. 373, 112138 (2020)
Article MathSciNet Google Scholar
Calvetti, D., Golub, G. H., Reichel, L.: Estimation of the L-curve via lanczos bidiagonalization. BIT Numer. Math. 39(4), 603–619 (1999)
Article MathSciNet Google Scholar
Calvetti, D., Reichel, L., Shuibi, A.: L-Curve and curvature bounds for Tikhonov regularization. Numer. Algorithm. 35(2), 301–314 (2004)
Article MathSciNet Google Scholar
Coello, C. A. C., Lamont, G. B., Van Veldhuizen, D. A., et al.: Evolutionary Algorithms for Solving Multi-Objective Problems, vol. 5. Springer (2007)
Cortiella, A., Park, K. -C., Doostan, A.: Sparse identification of nonlinear dynamical systems via reweighted ℓ1-regularized least squares. Comput. Methods Appl. Mech. Eng. 376, 113620 (2021)
Article MathSciNet Google Scholar
Deb, K.: Multi-objective Optimization Using Evolutionary Algorithms, vol. 16. Wiley (2001)
Ehrgott, M.: Multicriteria Optimization, vol. 491. Springer Science & Business Media (2005)
Engl, H. W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, vol. 375. Springer Science & Business Media (2000)
Fasino, D., Fazzi, A.: A Gauss–Newton iteration for total least squares problems. BIT Numer. Math., 1–19 (2018)
Fenu, C., Reichel, L., Rodriguez, G., Sadok, H.: GCV for Tikhonov regularization by partial SVD. BIT Numer. Math. 57(4), 1019–1039 (2017)
Article MathSciNet Google Scholar
Fierro, R. D., Golub, G. H., Hansen, P. C., O’Leary, D. P.: Regularization by truncated total least squares. SIAM J. Sci. Comput. 18(4), 1223–1241 (1997)
Article MathSciNet Google Scholar
Gander, W., Golub, G. H., von Matt, U.: A constrained eigenvalue problem. Linear Algebra Appl. 114, 815–839 (1989)
Article MathSciNet Google Scholar
Golub, G. H., Hansen, P. C., O’Leary, D. P.: TIkhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
Article MathSciNet Google Scholar
Golub, G. H., Van Loan, C. F.: An analysis of the total least squares problem. SIAM J. Numer. Anal. 17(6), 883–893 (1980)
Article MathSciNet Google Scholar
Golub, G. H., Van Loan, C. F.: Matrix Computations. Johns Hopkins University Press, Baltimore (2013)
Guo, H., Renaut, R. A.: A regularized total least squares algorithm. In Total Least Squares and Errors-in-Variables Modeling, pp. 57–66. Springer (2002)
Hansen, P., Pereyra, V., Scherer, G.: Least squares data fitting with applications. Johns Hopkins University Press (2013)
Hansen, P. C.: Rank-deficient and Discrete Ill-posed Problems: Numerical aspects of linear inversion. SIAM (1998)
Hansen, P. C.: Regularization tools version 4.0 for matlab 7.3. Numer. Algorithm. 46(2), 189–194 (2007)
Article MathSciNet Google Scholar
Hansen, P. C., O’Leary, D. P.: The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput. 14(6), 1487–1503 (1993)
Article MathSciNet Google Scholar
Lampe, J., Voss, H.: On a quadratic eigenproblem occurring in regularized total least squares. Comput. Stat. Data Anal. 52(2), 1090–1102 (2007)
Article MathSciNet Google Scholar
Lampe, J., Voss, H.: A fast algorithm for solving regularized total least squares problems. Electron. Trans. Numer Anal. 31, 12–24 (2008)
MathSciNet MATH Google Scholar
Lampe, J., Voss, H.: Solving regularized total least squares problems based on eigenproblems. Taiwan. J. Math. 14(3A), 885–909 (2010)
Article MathSciNet Google Scholar
Lampe, J., Voss, H.: Efficient determination of the hyperparameter in regularized total least squares problems. Appl. Numer. Math. 62(9), 1229–1241 (2012)
Article MathSciNet Google Scholar
Lampe, J., Voss, H.: Large-scale Tikhonov regularization of total least squares. J. Comput. Appl. Math. 238, 95–108 (2013)
Article MathSciNet Google Scholar
Lawson, C. L., Hanson, R. J.: Solving Least Squares Problems, vol. 15. SIAM (1995)
Lee, G., Barlow, J. L.: Two projection methods for regularized total least squares approximation. Linear Algebra Appl. 461, 18–41 (2014)
Article MathSciNet Google Scholar
Lee, G., Fu, H., Barlow, J. L.: Fast high-resolution image reconstruction using Tikhonov regularization based total least squares. SIAM J. Sci. Comput. 35(1), B275–B290 (2013)
Article MathSciNet Google Scholar
Levin, E., Meltzer, A. Y.: Estimation of the regularization parameter in linear discrete ill-posed problems using the Picard parameter. SIAM J. Sci. Comput. 39(6), A2741–A2762 (2017)
Article MathSciNet Google Scholar
Markovsky, I.: Bibliography on total least squares and related methods. Stat. Interface 3(3), 329–334 (2010)
Article MathSciNet Google Scholar
Markovsky, I., Van Huffel, S.: Overview of total least-squares methods. Signal Process. 87(10), 2283–2302 (2007)
Article Google Scholar
Miettinen, K.: Nonlinear Multiobjective Optimization, vol. 12. Springer Science & Business Media (2012)
Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media (2006)
Park, Y., Reichel, L., Rodriguez, G., Yu, X.: Parameter determination for Tikhonov regularization problems in general form. J. Comput. Appl. Math. 343, 12–25 (2018)
Article MathSciNet Google Scholar
Reddy, G.: The parameter choice rules for weighted Tikhonov regularization scheme. Comput. Appl. Math. 37(2), 2039–2052 (2018)
Article MathSciNet Google Scholar
Reichel, L., Rodriguez, G.: Old and new parameter choice rules for discrete ill-posed problems. Numer. Algorithm. 63(1), 65–87 (2013)
Article MathSciNet Google Scholar
Reichel, L., Sadok, H.: A new L-curve for ill-posed problems. J. Comput. Appl. Math. 219(2), 493–508 (2008)
Article MathSciNet Google Scholar
Renaut, R. A., Guo, H.: Efficient algorithms for solution of regularized total least squares. SIAM J. Matrix Anal. Appl. 26(2), 457–476 (2005)
Article MathSciNet Google Scholar
Renaut, R. A., Horst, M., Wang, Y., Cochran, D., Hansen, J.: Efficient estimation of regularization parameters via downsampling and the singular value expansion. BIT Numer. Math. 57(2), 499–529 (2017)
Article Google Scholar
Sima, D. M., Van Huffel, S., Golub, G. H.: Regularized total least squares based on quadratic eigenvalue problem solvers. BIT Numer. Math. 44 (4), 793–812 (2004)
Article MathSciNet Google Scholar
Snyman, J. A.: Practical mathematical optimization. Springer (2005)
Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis, vol. 9. SIAM (1991)
Wang, F., Zhao, X. -L., Ng, M. K.: Multiplicative noise and blur removal by framelet decomposition and l₁-based L-curve method. IEEE Trans. Image Process. 25(9), 4222–4232 (2016)
Article MathSciNet Google Scholar
Yang, M., Xia, Y., Wang, J., Peng, J.: Efficiently solving total least squares with Tikhonov identical regularization. Comput. Optim. Appl. 70(2), 571–592 (2018)
Article MathSciNet Google Scholar
Zare, H., Hajarian, M.: Determination of regularization parameter via solving a multi-objective optimization problem. Appl. Numer. Math. 156, 542–554 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to express their heartfelt thanks to the editor and anonymous referees for their useful comments and constructive suggestions which substantially improved the quality and presentation of this paper.

Author information

Authors and Affiliations

Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran
Hossein Zare
Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
Masoud Hajarian

Authors

Hossein Zare
View author publications
You can also search for this author in PubMed Google Scholar
Masoud Hajarian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masoud Hajarian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zare, H., Hajarian, M. An efficient Gauss–Newton algorithm for solving regularized total least squares problems. Numer Algor 89, 1049–1073 (2022). https://doi.org/10.1007/s11075-021-01145-2

Download citation

Received: 28 November 2020
Accepted: 25 May 2021
Published: 18 June 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11075-021-01145-2

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An efficient Gauss–Newton algorithm for solving regularized total least squares problems

Abstract

Similar content being viewed by others

A Gauss–Newton iteration for Total Least Squares problems

Gauss–Newton–Kurchatov Method for the Solution of Nonlinear Least-Squares Problems

A Gauss–Newton method for mixed least squares-total least squares problems

1 Introduction

Definition 1

Theorem 1

Theorem 2

Proof

Theorem 3

2 Preliminaries

2.1 The Gauss–Newton method for nonlinear least squares problems

2.2 Some basic concepts of the multi-objective optimization

3 The Gauss–Newton method for solving RTLS problems

3.1 Solving RTLS problems by the Gauss–Newton method

Remark 1

Theorem 4

Proof

Remark 2

Remark 3

3.2 Determination of λ L

Theorem 5

Theorem 6

4 Numerical experiments

5 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation

3.2 Determination of λ _L