1 Introduction

In theoretical physics, symmetry is an important guiding principle to search for laws of nature. When numerically simulating a physics system that has a symmetry, it is preferable to retain the symmetry property in the numerical simulation algorithm. Hermitian matrices often appear in analyzing physics systems, and spectra analysis is important to understand the nature. In this case the Hermitian symmetry must be considered in the spectra analysis and the eigensolver algorithm. Developing eigensolver algorithms for spectra analysis is one of the major research areas in applied mathematics.

In physics, matrices having both Hermiticity symmetry and J-symmetry exist. Let A be a Hermitian and J-symmetric matrix in \({\mathbb {C}}^{n\times n}\). The J-symmetric property of A is defined by

$$\begin{aligned} J A J^{-1}&= A^{{T}}, \quad J^{{T}} = -J,\quad J^{{T}}=J^{-1}, \end{aligned}$$
(1)

where J is a skew-symmetric matrix in \({\mathbb {R}}^{n\times n}\). We take this definition from [15] for the J-symmetry.Footnote 1 Throughout this paper, we employ the following notations: \(^{{T}}\) for matrix transposition, \(^{{H}}\) for Hermitian conjugation, and \(^*\) for complex conjugation. We use I and \(O\) to denote the identity and null matrices with appropriate sizes, respectively. The eigenvalues of A are doubly degenerated, or doubly multiple, with the J-symmetry. Let x be an eigenvector of A associated to the eigenvalue \(\lambda\), satisfying \(A x = \lambda x\). The vector defined by

$$\begin{aligned} y \equiv J x^{*}, \end{aligned}$$
(2)

is also the eigenvector of A associated to \(\lambda\) because of the symmetry (1).

Any Hermitian J-symmetric matrix A has the following block structure. Without loss of generality, J is

$$\begin{aligned} J&= \begin{bmatrix} O &{} -I\\ I &{} O \end{bmatrix}, \end{aligned}$$
(3)

where the size of each block is \((n/2)\times (n/2)\) and n is an even number. Based on this, the symmetry (1) requires

$$\begin{aligned} A&= \begin{bmatrix} A_{11} &{} A_{12} \\ A_{12}^{{H}}&{} A_{11}^{{T}} \end{bmatrix}, \end{aligned}$$
(4)

where \(A_{11}\) and \(A_{12}\) are \((n/2)\times (n/2)\) matrices satisfying \(A_{11}^{{H}}=A_{11}\) and \(A_{12}^{{T}}=-A_{12}\). We can also explicitly construct eigenvectors x and \(y=J x^*\) associated with an eigenvalue \(\lambda\) in the block structure. Then A can be diagonalized as

$$\begin{aligned} A&= U \begin{bmatrix} \varLambda &{} O \\ O &{} \varLambda \end{bmatrix} U^{{H}},\quad U = \begin{bmatrix} X_1 &{} -X_2^{*}\\ X_2 &{} X_1^{*} \end{bmatrix}, \end{aligned}$$
(5)

where \(\varLambda =\mathrm {diag}(\lambda _1,\lambda _2,\dots ,\lambda _{n/2})\), and \(X_1\) and \(X_2\) are \((n/2)\times (n/2)\) matrices satisfying \(X_1^{{H}}X_1+X_2^{{H}}X_2=I\) and \(X_1^{{T}}X_2-X_2^{{T}}X_1=O\).

To investigate the spectrum of a sparse matrix in a large dimension, iterative eigensolver algorithms are generally employed. Iterative eigensolver algorithms in the literature, such as the Lanczos algorithm for Hermitian matrices, do not consider the J-symmetry. Therefore, even after convergence on one of a pair of the degenerate eigenvector pairs, the algorithm continues to search for the other eigenvector associated to the converged eigenvalue even though the other eigenvector can be easily reconstructed from the converged one. In this article, we restrict ourselves to improving the Lanczos algorithm for large Hermitian J-symmetric matrices by incorporating the J-symmetry property.

We could improve the convergence of the Lanczos algorithm by restricting the search space of the Krylov subspace of A to that spanned by one of each pair of the doubly degenerate eigenvector pairs by imposing complete orthogonality to the subspace spanned by the other eigenvectors paired to them. However, this strategy cannot be directly realized before knowing the invariant subspace of A. We find that the Krylov subspace \({\mathscr {K}}_k(A,v)=\mathrm {span}\{v,Av,\dots ,A^{k-1}v\}\) generated with the Lanczos algorithm with a starting vector v is orthogonal to \({\mathscr {K}}_k(A,Jv^*)=\mathrm {span}\{Jv^*,A Jv^*,\dots ,A^{k-1} Jv^*\}\) with the same Lanczos algorithm with the starting vector \(Jv^*\). On the basis of this property, we can obtain only one half of the degenerate eigenvectors in \({\mathscr {K}}_k(A,v)\) and can reconstruct the other half via (2). This can achieve the strategy just stated above.

Early attempts have been made to incorporate the symmetry property or structure of matrices into eigensolver algorithms in [7, 15] in which dense matrix algorithms were developed for the Hermitian J-symmetric matrices that appeared in a quantum mechanical system with time-reversal and inversion symmetry. The dense matrix algorithms retain or respect the matrix structure (4) during the computation [7, 15]. To focus on the algorithm for large sparse matrices, we do not discuss algorithms for dense matrices in this paper. The one similar to our algorithm that incorporates the symmetry properties of matrices was studied in [3,4,5, 13], in which the Lanczos type eigensolvers for complex J-skew-symmetric matrices or Hamiltonian matrices were investigated.Footnote 2 Because the symmetry treated in [3,4,5, 13] is different from the Hermitian J-symmetry, their algorithm cannot be directly applied to Hermitian J-symmetric matrices, even though it is based on the Lanczos algorithm. See also [12] for numerical algorithms for structured eigenvalue problems.

This paper is organized as follows. In the next section, we prove the orthogonality between the Krylov subspace \({\mathscr {K}}_k(A,v)\) generated with the Lanczos iteration to \({\mathscr {K}}_k(A,Jv^*)\) with the same Lanczos iteration. Having observed the orthogonality, we describe the restarting method based on the Krylov–Schur transformation method [19] and the thick-restart method [20, 21]. Then, we propose the thick-restart Lanczos algorithm for Hermitian J-symmetric matrices in Sect. 3. We estimate the computational cost in terms of the matrix–vector multiplication. In Sect. 4, we test the proposed algorithm for two types of the Hermitian J-symmetric matrix. The one type is an artificial randomly generated matrix satisfying the structure of (4) and (5), and the other matrix originates from quantum field theory. The convergence behavior and the computational cost are compared with those of the standard thick-restart Lanczos algorithm. We summarize this paper in the last section.

2 Lanczos iteration and J-symmetry

The Lanczos iteration transforms A to a tridiagonal form by generating the orthonormal basis vectors. The Lanczos decomposition after m-step iteration starting with a unit vector \(v_1\) is given by

$$\begin{aligned} A V_{m} = V_{m} T_{m} + \beta _{m} v_{m+1} e_{m}^{{T}}, \end{aligned}$$
(6)

where \(V_m=\left[ v_1,v_2,\dots ,v_m\right]\) and \(v_j \in {\mathbb {C}}^n\), and \(e_m\) is the m-dimensional unit vector in the m-th direction. The basis vectors are orthonormal, \(V_{m+1}^{{H}}V_{m+1}=I\). \(T_m\) denotes the \(m\times m\) tridiagonal matrix that is given by

$$\begin{aligned} T_m = \begin{bmatrix} \alpha _1 &{} \beta _1 &{} \\ \beta _1 &{} \alpha _2 &{} \beta _2 &{} \\ &{} \beta _2 &{} \alpha _3 &{} \beta _3 \\ &{} &{} \ddots &{} \ddots &{} \ddots \\ &{} &{} &{} \beta _{m-2} &{} \alpha _{m-1} &{} \beta _{m-1} \\ &{} &{} &{} &{} \beta _{m-1} &{} \alpha _m \end{bmatrix}. \end{aligned}$$
(7)

The approximate eigenpairs are obtained from the eigenpairs of \(T_m\). Because of the Hermiticity property of A and the recurrence structure of the Lanczos iteration, all \(\alpha _j\) and \(\beta _j\) can be taken to be real. Thus, \(T_m\) becomes a real symmetric matrix. In the standard Lanczos iteration, the Hermitian symmetry of A is respected in the form \(T_m\).

We also investigate the J-symmetry property of decomposition (6). After taking the complex conjugate of (6) followed by multiplying it by J from the left-hand side, we have

$$\begin{aligned} JA^{*} V_m^* = J V_m^* T_m + \beta _{m} Jv_{m+1}^* e_m^{{T}}. \end{aligned}$$
(8)

Using the J-symmetry and Hermiticity, \(JA^*=A^{{H}}J=AJ\), and by defining \(w_j \equiv Jv_j^*\), we obtain

$$\begin{aligned} A W_m&= W_m T_m + \beta _{m} w_{m+1} e_m^{{T}}, \end{aligned}$$
(9)
$$\begin{aligned} W_m&= \left[ w_1, w_2, \dots , w_m\right] . \end{aligned}$$
(10)

The columns of \(W_{m+1}\) are also orthonormal, \(W_{m+1}^{{H}}W_{m+1}=I\). Consequently, the vectors \(W_{m+1}\) have the same Lanczos decomposition as that of \(V_{m+1}\) when it starts from \(w_1 = Jv_1^*\). Furthermore, we can prove the orthogonality between \(W_{m+1}\) and \(V_{m+1}\).

To implement the thick-restart method [20, 21], we show the J-symmetry property for a generalized decomposition similar to the Krylov–Schur decomposition [19] instead of the Lanczos decomposition in the following part of the paper. After preparing two lemmas related to the J-symmetry, we prove the main theorem for the orthogonality property between \(W_{m+1}\) and \(V_{m+1}\) on the generalized decomposition. Subsequently, the orthogonality properties for the Lanczos decomposition and the thick-restart method are proved as corollaries.

We first show orthogonal properties between v and \(w=Jv^*\).

Lemma 1

Let v be an arbitrary vector in \({\mathbb {C}}^n\), and J and A be matrices satisfying (1). Then, the vector w defined by \(w \equiv J v^*\) satisfies

$$\begin{aligned} w^{{H}} v&= 0, \end{aligned}$$
(11)
$$\begin{aligned} w^{{H}} A v&= 0. \end{aligned}$$
(12)

Proof

From the definition of w, it follows that

$$\begin{aligned} w^{{H}} v = (J v^*)^{{H}} v = v^{{T}} J^{{H}} v = v^{{T}} J^{{T}} v = - v^{{T}} J v, \end{aligned}$$
(13)

where \(J^{{T}} = -J\) is used. The identity \(v^{{T}} Jv = v^{{T}} J^{{T}} v\) and (13) yield

$$\begin{aligned} w^{{H}} v = -w^{{H}} v. \end{aligned}$$
(14)

Thus, \(w^{{H}} v = 0\). Similarly,

$$\begin{aligned} w^{{H}} A v = (J v^*)^{{H}} A v = v^{{T}} J^{{H}} A v = v^{{T}} J^{{T}} A v = -v^{{T}} J A v = - v^{{T}} A^{{T}} J v, \end{aligned}$$
(15)

where \(J^{{T}}=-J\) and \(JA=A^{{T}}J\) are used. The identity \(v^{{T}} A^{{T}} J v= v^{{T}} J^{{T}} A v\) and (15) yield

$$\begin{aligned} w^{{H}} A v = - w^{{H}} A v. \end{aligned}$$
(16)

Thus, \(w^{{H}} A v = 0\). \(\square\)

We have the following lemma that is a generalization of the relation between (6) and (9).

Lemma 2

Let \(v_1,v_2,\ldots ,v_k,v_{k+1}\) be the vectors having the following relation:

$$\begin{aligned} A V_k = V_k S_k + v_{k+1} b^{{T}}, \end{aligned}$$
(17)

where \(V_k=[v_1,\ldots ,v_k]\), \(S_k\) is a matrix in \({\mathbb {R}}^{k\times k}\), and b is a vector in \({\mathbb {R}}^k\) for a Hermitian J-symmetric matrix A satisfying (1). Then, the vectors \(w_j = J v_j^*\) \((j=1,\ldots ,k+1)\) satisfy the following decomposition:

$$\begin{aligned} A W_k = W_k S_k + w_{k+1} b^{{T}}, \end{aligned}$$
(18)

where \(W_k=JV_k^*=[w_1,\ldots ,w_k]\).

Proof

By taking the complex conjugate of (17) followed by multiplying it by J from the left-hand side, we obtain (18) using \(JA^{{T}}=AJ\) and \(A^* = A^{{T}}\). \(\square\)

Using Lemmas 1 and 2, we can show the orthogonality properties between \(W_{k+1}\) and \(V_{k+1}\) as follows.

Theorem 1

Let \(V_{k+1}=[v_1,\ldots ,v_{k+1}]\) be a matrix satisfying (17), and \(W_{k+1}=JV_{k+1}^*\). If the matrices \(V_{k}\) and \(W_{k}\) are orthogonal and A-orthogonal to each other: \(V_k^{{H}}W_k=O\) and \(V_k^{{H}}AW_k=O\), then the matrices \(V_{k+1}\) and \(W_{k+1}\) also satisfy the following orthogonality relations:

$$\begin{aligned} V_{k+1}^{{H}} W_{k+1}&= O, \end{aligned}$$
(19)
$$\begin{aligned} V_{k+1}^{{H}} A W_{k+1}&= O. \end{aligned}$$
(20)

Proof

The decomposition (18) follows from Lemma 2. Multiplying \(W_k^{{H}}\) and \(V_k^{{H}}\) to (17) and (18), respectively, yields

$$\begin{aligned} W_k^{{H}} A V_k&= W_k^{{H}} V_k S_k + W_k^{{H}} v_{k+1} b^{{T}}, \end{aligned}$$
(21)
$$\begin{aligned} V_k^{{H}} A W_k&= V_k^{{H}} W_k S_k + V_k^{{H}} w_{k+1} b^{{T}}. \end{aligned}$$
(22)

From the premise that \(W_k^{{H}} A V_k=V_k^{{H}} A W_k=O\) and \(W_k^{{H}} V_k =V_k^{{H}} W_k=O\), it follows that

$$\begin{aligned} W_k^{{H}} v_{k+1}&= O, \end{aligned}$$
(23)
$$\begin{aligned} V_k^{{H}} w_{k+1}&= O. \end{aligned}$$
(24)

Together with Lemma 1 and the premise, we find

$$\begin{aligned} V_{k+1}^{{H}}W_{k+1}&= O. \end{aligned}$$
(25)

Multiplying \(w_{k+1}^{{H}}\) and \(v_{k+1}^{{H}}\) to (17) and (18), respectively, yields

$$\begin{aligned} w_{k+1}^{{H}} A V_k&= w_{k+1}^{{H}} V_k S_k + w_{k+1}^{{H}} v_{k+1} b^{{T}}, \end{aligned}$$
(26)
$$\begin{aligned} v_{k+1}^{{H}} A W_k&= v_{k+1}^{{H}} W_k S_k + v_{k+1}^{{H}} w_{k+1} b^{{T}}. \end{aligned}$$
(27)

Because of (23) and (24) as well as Lemma 1, the right-hand sides of (26) and (27) vanish. Thus,

$$\begin{aligned} w_{k+1}^{{H}} A V_k&= O, \end{aligned}$$
(28)
$$\begin{aligned} v_{k+1}^{{H}} A W_k&= O. \end{aligned}$$
(29)

Together with Lemma 1 and the premise, we find

$$\begin{aligned} V_{k+1}^{{H}} A W_{k+1}&= O. \end{aligned}$$
(30)

Therefore, \(V_{k+1}\) and \(W_{k+1}\) are orthogonal and A-orthogonal to each other. \(\square\)

Using Theorem 1 as well as Lemmas 1 and 2 , we can show that the Lanczos vectors \(V_{m+1}\) generated with (6) are orthogonal and A-orthogonal to \(W_{m+1}=JV_{m+1}^*\).

Corollary 1

The m-step Lanczos vectors \(V_{m+1}\) with \(m\ge 1\) generated with a unit vector \(v_1\) for a Hermitian J-symmetric matrix A satisfy

$$\begin{aligned} V_{m+1}^{{H}} W_{m+1} = O,\quad V_{m+1}^{{H}} A W_{m+1} = O, \end{aligned}$$
(31)

with

$$\begin{aligned} W_{m+1} = J V_{m+1}^*, \end{aligned}$$
(32)

when no breakdown occurs.

Proof

Because the Lanczos decomposition (6) is a particular form of (17) with \(k\rightarrow m\), a real matrix \(S_m \rightarrow T_m\) and a real vector \(b \rightarrow \beta _m e_m\), Lemma 2 can be applied to obtain (9). Because \(w_1^{{H}} v_1=0\) and \(w_1^{{H}} A v_1=0\) hold from Lemma 1, we can apply Theorem 1 to (6) and (9) when \(m=1\). Moreover, the Lanczos decomposition retains its form applicable to Theorem 1 for any \(m>1\). Therefore, the corollary follows from Theorem 1 and Lemmas 1 and 2 by induction. \(\square\)

We cannot simultaneously find degenerate pairs of eigenvectors with the standard single-vector Lanczos process. This is true for computation with exact arithmetic. However, with finite precision arithmetic, the single-vector Lanczos process would generate a small overlap to \(W_k\) via round-off errors, so that even after the convergence of an eigenvector, a late convergence to the other paired eigenvector would be possible. Because this behavior is rather accidental, a block-type Lanczos algorithm has to be applied to accelerate the convergence for degenerate eigenvectors [1, 2, 8, 18, 22].

We further investigate the structure of the Lanczos decomposition. According to Corollary 1 and the Lanczos decompositions (6) and (9), the block-type decomposition can be constructed as:

$$\begin{aligned} A V_{[m]} = V_{[m]} T_{[m]} + {\mathscr {V}}_{m+1} {\mathscr {B}}_m E_{[m]}^{{T}}, \end{aligned}$$
(33)

where we define

$$\begin{aligned} V_{[m]}&\equiv \begin{bmatrix} {\mathscr {V}}_1&{\mathscr {V}}_2&\cdots&{\mathscr {V}}_{m-1}&{\mathscr {V}}_m \end{bmatrix}, \quad T_{[m]} \equiv \begin{bmatrix} {\mathscr {A}}_1 &{} {\mathscr {B}}_1&{} \\ {\mathscr {B}}_1 &{} {\mathscr {A}}_2&{} {\mathscr {B}}_2&{} \\ &{} \ddots &{} \ddots &{} \ddots \\ &{} &{} {\mathscr {B}}_{m-2} &{} {\mathscr {A}}_{m-1} &{} {\mathscr {B}}_{m-1} \\ &{} &{} &{} {\mathscr {B}}_{m-1} &{} {\mathscr {A}}_m \end{bmatrix}, \nonumber \\ {\mathscr {V}}_j&\equiv \begin{bmatrix} v_j, w_j \end{bmatrix}, \quad {\mathscr {A}}_j \equiv \mathrm {diag}(\alpha _j,\alpha _j),\quad {\mathscr {B}}_j \equiv \mathrm {diag}(\beta _j ,\beta _j ),\quad E^{{T}}_{[m]} \equiv \begin{bmatrix} O&O&\cdots&O&I \end{bmatrix}, \end{aligned}$$
(34)

where the size of \(E^{{T}}_{[m]}\) is \(2\times 2m\). When \(m = n/2\), the decomposition should terminate, because \(V_{[n/2]}\) completely block-tridiagonalize A and the Krylov subspaces \({\mathscr {K}}_{n/2}(A,v)\) and \({\mathscr {K}}_{n/2}(A,Jv^*)\) span the entire eigenspace of A. Because \({\mathscr {K}}_{n/2}(A,v)\) and \({\mathscr {K}}_{n/2}(A,Jv^*)\) are orthonormal and (6) and (9) are independent iterations, the Lanczos iteration for (6) terminates at \(m=n/2\) regardless of the iteration for (9). We, therefore, can construct eigenvectors from \({\mathscr {K}}_{n/2}(A,v)\) without the eigenvalue multiplicity associated with the J-symmetry. In other words, the standard Lanczos iteration with exact precision arithmetic is enough to find all the eigenvectors without multiplicity. However, this is impractical because it requires exact precision arithmetic. With the finite precision, the orthogonality to \({\mathscr {K}}_{n/2}(A,Jv^*)\) is not maintained because of round-off errors and, eventually, eigenvectors, including multiplicity, could be extracted from the single-vector Lanczos iteration, as stated previously.

By using Corollary 1 and with the above analysis, we can construct a Lanczos type algorithm in which the orthogonality to \(W_k\) is enforced to search for eigenvectors without multiplicity of eigenvalues associated to the J-symmetry. Additionally, the other vectors paired to them can be easily reconstructed. However, for a practical numerical algorithm of the Lanczos type iteration, the iteration should terminate at a finite step, and a restarting mechanism is required [6, 20, 21]. The most useful and simplest but effective restarting method is the so-called thick-restart method [20, 21] that is a specialization of the Krylov–Schur transformation [19] to Hermitian matrices. To involve the thick-restart method to the Lanczos algorithm with the J-symmetry, we have to prove the orthogonality between \(V_k\) and \(W_k\) after the Krylov–Schur transformation and restarting. To achieve this, we have the following corollary.

Corollary 2

Let \(V_{m+1}\) and \(W_{m+1}\) be the orthonormal matrices containing basis vectors generated with the m-step Lanczos process, (6) and (9), and \(Z_k\) be an orthonormal matrix in \({\mathbb {R}}^{k\times k}\). The Krylov–Schur transformation with \(Z_k\) on the Lanczos decomposition is defined by

$$\begin{aligned} A U_m&= U_m (Z_m^{-1} T_m Z_m) + v_{m+1} b_m^{{T}}, \end{aligned}$$
(35)
$$\begin{aligned} A Q_m&= Q_m (Z_m^{-1} T_m Z_m) + w_{m+1} b_m^{{T}}, \end{aligned}$$
(36)
$$\begin{aligned} U_m&\equiv V_m Z_m, \end{aligned}$$
(37)
$$\begin{aligned} Q_m&\equiv W_m Z_m, \end{aligned}$$
(38)
$$\begin{aligned} b_m^{{T}}&\equiv \beta _{m} e_m^{{T}} Z_m. \end{aligned}$$
(39)

Then, the matrices \(U_{m+1} = [ U_{m}, v_{m+1} ]\) and \(Q_{m+1} = [Q_{m}, w_{m+1} ]\) satisfy the orthogonal relations: \(U_{m+1}^{{H}} Q_{m+1}=O\) and \(U_{m+1}^{{H}} A Q_{m+1}=O\).

Proof

Because \(U_m\) and \(Q_m\) satisfy \(U_m^{{H}} Q_m=O\) and \(Q_m^{{H}} A U_m=O\) and the decompositions (35) and (36) are particular forms of the decomposition in Lemma 2, the orthogonality and A-orthogonality between \(U_{m+1}\) and \(Q_{m+1}\) simply follow from Theorem 1. \(\square\)

For the thick-restart method, \(Z_m\) is chosen to diagonalize \(T_m\), and the dimension of the decomposition is reduced from m to \(k<m\) with a selection criterion for vectors \(V_k \leftarrow V_m\). In the reduction, the last vectors are kept hold as \(v_{k+1} \leftarrow v_{m+1}\) and \(w_{k+1} \leftarrow w_{m+1}\) to retain the decomposition form properly. The orthogonality properties of the new basis \((U_{m+1},Q_{m+1})\) and the reduced basis \((U_{k+1},Q_{k+1})\) still hold according to Corollary 2. After restarting, the Lanczos iteration continues to keep the decomposed form applicable to Theorem 1. Because the structures of the Lanczos and Krylov–Schur decompositions for \(W_k\) and \(Q_k\) are the same as those for \(V_k\) and \(U_k\), respectively, we do not need to explicitly iterate the Lanczos algorithm for \(W_k\) and \(Q_k\). Consequently, we can continue the Lanczos thick-restart cycle only in the subspace \({\mathscr {K}}_k(A,v)\) that is orthogonal and A-orthogonal to \({\mathscr {K}}_k(A,Jv^*)\). We note that according to Corollaries 1 and 2, all the eigenpairs without multiplicity can be obtained with the thick-restart Lanczos algorithm using exact precision arithmetic. This is impractical and we must enforce the orthogonality between \({\mathscr {K}}_k(A,v)\) and \({\mathscr {K}}_k(A,Jv^*)\) for a practical algorithm.

3 Thick-restart Lanczos algorithm with J-symmetry

Based on Theorem 1 as well as Corollaries 1 and 2, we construct a thick-restart Lanczos algorithm for Hermitian J-symmetric matrices (TRLAN–JSYM) which efficiently searches for eigenvectors without the multiplicity of eigenvalues in \({\mathscr {K}}_k(A,v)\). Algorithm 1 shows the TRLAN–JSYM algorithm. The Lanczos iteration with the J-symmetry is described in Algorithm 2. We include the invert mode for the small eigenvalues.

The main difference from the standard thick-restart Lanczos algorithm (TRLAN) is in Algorithm 2, where we simultaneously construct \(w_{j}\) using \(w_{j}=Jv_j^*\) and enforce the orthogonality of \(V_{j+1}\) to \(W_j =J V_j^*\) to avoid the contamination of the search space \({\mathscr {K}}_k(A,v)\) from \({\mathscr {K}}_k(A,Jv^*)\).

We will compare the efficiency between the TRLAN–JSYM and the standard TRLAN algorithms in Sect. 4. For the comparison, we estimate the computational cost of the TRLAN–JSYM and the standard TRLAN algorithms as follows. We count the total number of matrix–vector multiplication \(Av_j\) or \(A^{-1}v_j\) contained in the Lanczos step Algorithm 2. For the invert mode with a large sparse matrix, it could require an iterative linear solver and a computational cost to obtain \(A^{-1}v_j\). To focus on the computational cost comparison between the TRLAN–JSYM and TRLAN algorithms, assuming the computation cost of a single inversion is identical between the two algorithms, we do not count the cost involved in the inversion and regard the single inversion operation \(A^{-1}v_j\) as one matrix–vector multiplication for the invert mode. We neglect the cost that explicitly computes the true residual at line 28 in Algorithm 1. For the first outer iteration, the count is m, and after restarting, it is \(m-k\). The thickness k for restarting is defined by

$$\begin{aligned} k=\min (\mathrm {icnv}+\mathrm {mwin},m-1), \end{aligned}$$
(40)

as shown in line 48 of Algorithm 1, where \(\mathrm {icnv}\) is the number of converged eigenvectors and \(\mathrm {mwin}\) is the initial thickness for restarting. When all desired eigenvectors are obtained at an outer iteration \(N_{\mathrm {conv}}\), the upper and lower bounds of the total number of matrix–vector multiplication \(N_{\mathrm {MV}}\) is estimated as

$$\begin{aligned} m + (m-\mathrm {mwin}-\mathrm {nev})(N_{\mathrm {conv}}-1)< N_{\mathrm {MV}} < m + (m-\mathrm {mwin} )(N_{\mathrm {conv}}-1), \end{aligned}$$
(41)

where \(\mathrm {nev}\) is the number of desired eigenvectors without the multiplicity of J-symmetry. The inequality (41) follows from the fact that \(\mathrm {icnv}\) increases monotonically from zero to \(\mathrm {nev}\) toward the convergence.

The same cost estimate can be derived for the standard TRLAN algorithm. The TRLAN algorithm can be obtained by removing \(W_m\) from Algorithms 1 and 2. Therefore, the cost bound for the TRLAN algorithm is identical to (41). However, to find all eigenvectors paired with the J-symmetry using the TRLAN algorithm, \(\mathrm {nev}\) for the TRLAN must be double that of the TRLAN–JSYM. Thus it is natural to double all parameters for the TRLAN algorithm than those of the TRLAN–JSYM. Therefore, the upper and lower bounds of the total number of matrix–vector multiplication \(N_{\mathrm {MV}}\) of the TRLAN algorithm is

$$\begin{aligned} 2(m + (m-\mathrm {mwin}-\mathrm {nev})(N'_{\mathrm {conv}}-1))< N_{\mathrm {MV}} < 2(m + (m-\mathrm {mwin} )(N'_{\mathrm {conv}}-1)), \end{aligned}$$
(42)

where the parameters \((\mathrm {nev},\mathrm {mwin},m)\) are those of the TRLAN–JSYM, and we introduced \(N'_{\mathrm {conv}}\) for the number of outer iterations because it could be different from that of the TRLAN–JSYM.

Although the computational cost of the Gram–Schmidt orthonormalization of the Lanczos step is minor compared to that of the matrix–vector multiplication, we briefly discuss the cost here. As shown in Algorithm 2 for the TRLAN–JSYM, the number of vectors to orthonormalize is 2m and the cost scales with \(O((2m)^2)\). On the other hand, it scales with \(O(m^2)\) to orthonormalize \(V_m\) with the standard Lanczos algorithm. As described above, it is natural to double the parameters for the TRLAN. The cost of the Lanczos part to orthonormalize \(V_{2m}\) then becomes \(O((2m)^2)\). Therefore, the scaling of the cost is the same for both algorithms.

Our naive estimates on the computational cost are (41) and (42), where we assume that the parameters of the TRLAN is twice as large as those of the TRLAN–JSYM. Although \(N_{\mathrm {conv}}\) and \(N'_{\mathrm {conv}}\) depend on \((\mathrm {nev},\mathrm {mwin},m)\) and the algorithm itself, if \(N_{\mathrm {conv}}\simeq N'_{\mathrm {conv}}\) holds, the TRLAN–JSYM algorithm has a better performance than the TRLAN algorithm. We will see whether the condition \(N_{\mathrm {conv}}\simeq N'_{\mathrm {conv}}\) holds or not in the numerical tests on the two types of the matrices in the next section.

So far, we have described the single-vector Lanczos iteration type algorithm to introduce the TRLAN–JSYM. If the matrix A has a dense cluster of eigenvalues or multiple eigenvalues other than those with the J-symmetry, we need to incorporate the block type Lanczos iteration in the algorithm for efficiency. We can extend the proposed algorithm to the blocked version in the same manner as it was conducted for the standard thick-restart Lanczos algorithm [18, 22]. The study on the block version will be addressed in future studies and we have only shown the single vector version to demonstrate the idea for simplicity.

figure a
figure b

4 Numerical test

In this section, we show two numerical tests to explore the efficiency of the TRLAN–JSYM algorithm compared to the TRLAN algorithm for Hermitian J-symmetric matrices. The first test is conducted for randomly generated Hermitian J-symmetric matrices satisfying the structure (4). The second test is applied to a matrix in quantum field theory. We refer to these two test cases as Case A and Case B, respectively.

We implement both the algorithms, TRLAN–JSYM and TRLAN, with Fortran 2003. The numerical tests were performed on a single node of the subsystem A of the ITO supercomputer system of Kyushu university [16]. The code is parallelized using OpenMP and the Intel MKL library and executed with 36 threads.

4.1 Case A

4.1.1 Definition of the test matrix (case A)

Fig. 1
figure 1

Large eigenvalue distribution of the ten random matrices

We generated ten matrices with a size of \(2000\times 2000\). These sample matrices are randomly generated, as explained below. Although these matrices are dense, we employ them to explore the proposed Algorithm 1.

To randomly generate matrices A with the structure (4), we employ (5). The eigenvalues \(\varLambda\) are generated from uniformly distributed random real numbers in \(\{x\in {\mathbb {R}}: 0<x<1\}\), and the elements of matrices \(X_1\) and \(X_2\) are constructed from uniformly distributed random complex numbers in \(\{z \in {\mathbb {C}}: -1<\mathrm {Re}(z)< 1, -1< \mathrm {Im}(z) <1\}\). The constraints \(X_1^{{H}}X_1+X_2^{{H}}X_2=I\) and \(X_1^{{T}}X_2-X_2^{{T}}X_1=O\) are then imposed by a Gram–Schmidt algorithm similar to that used in Algorithm 2. Fig. 1 shows large eigenvalues for the ten random matrices. We solve the largest several eigenvalues with the TRLAN–JSYM and TRLAN algorithms and compare the convergence behavior.

Table 1 Algorithm parameters and statistics for convergence (case A)

4.1.2 Numerical results (case A)

Table 1 shows the algorithmic parameters used in this test. For the stopping condition of the algorithms, we employ \(\mathrm {tol}=10^{-13}\) for tolerance. We also tabulate the number of outer iteration counts \(N_{\mathrm {conv}}\) and matrix–vector multiplications \(N_{\mathrm {MV}}\) for convergence in the table. The minimal, average, and maximal values from the ten samples are shown in square brackets, respectively.

Figures 2 and 3 are the convergence histories of the eigenvalues and corresponding residuals for the 1st random matrix, respectively. The left panels show the result with the TRLAN–JSYM algorithm with \((\mathrm {nev},\mathrm {mwin},m)=(10,20,50)\), and the right panels show the result with the TRLAN algorithm with (20, 40, 100). We employ this doubled parameter for the TRLAN, as discussed in the previous section. The behavior of the TRLAN–JSYM is smooth, while it reorders several times for the TRLAN, even though the TRLAN algorithm successfully captures all the eigenvalues with multiplicity two. The sorting algorithm for the eigenvalues and the property of the Lanczos iteration to the J-symmetry cause the reordering of the eigenvalues for the TRLAN. As mentioned in Sect. 2, the TRLAN tends to evaluate one eigenvector of a pair of the doubly degenerate eigenvalues in the early stage of the iterations. Because of the finite precision arithmetic, it loses complete orthogonality to the other half of the degenerate eigenspace during the iterations, so that the other eigenvalue paired to the converged eigenvalue emerges in the later stage. Similar behaviors are also observed for other random matrices at the same algorithmic parameter. For the cases with a larger m, we do not observe the eigenvalue reordering with the TRLAN algorithm, because they quickly converge. Increasing m, \(N_{\mathrm {conv}}\) rapidly decreases, as shown in Table 1. However, \(N_{\mathrm {MV}}\) is almost constant or slightly increasing.

Fig. 2
figure 2

Convergence behavior of the large eigenvalues from the random matrix # 1 (case A)

Fig. 3
figure 3

Residual history for the large eigenvalues from the random matrix # 1 (case A)

We compare the computational cost of the algorithms according to the discussion done in the previous section. The natural parameter choice for the TRLAN algorithm is to employ numbers twice as large as those of the TRLAN–JSYM algorithm. The \(N_{\mathrm {conv}}\) are comparable among algorithms paired with the doubled parameters, revealing that \(N_{\mathrm {conv}}\simeq N'_{\mathrm {conv}}\) holds for (41) and (42). The number of matrix–vector multiplications of the TRLAN–JSYM algorithm is smaller by roughly a factor of two than that of the TRLAN algorithm, as seen in the table. Even with the same maximum Krylov dimension m for both algorithms, e.g. the TRLAN–JSYM with \((\mathrm {nev},\mathrm {mwin},m)=(5,10,100)\) and the TRLAN with (10, 20, 100), the TRLAN–JSYM algorithm still beats the TRLAN algorithm because m does not drastically change the number of matrix–vector multiplications.

4.2 Case B

4.2.1 Definition of the test matrix (case B)

We evaluate the proposed algorithm 1 for a matrix that appears in a quantum field theory called the twisted Eguchi-Kawai (TEK) model with adjoint fermions [9,10,11]. The equation of motion for the adjoint fermions follows from a matrix D called the Wilson–Dirac operator in the physics literature. The matrix \(D=(D_{i,j})\) is defined by

$$\begin{aligned} D_{i,j}&= \delta _{\alpha ,\beta } \delta _{a,b} - \kappa \sum _{\mu =1}^{4} \left[ \left( \delta _{\alpha ,\beta } - (\gamma _{\mu })_{\alpha ,\beta }\right) (V_{\mu })_{a,b} +\left( \delta _{\alpha ,\beta } + (\gamma _{\mu })_{\alpha ,\beta }\right) (V_{\mu })_{b,a} \right] , \end{aligned}$$
(43)

where i and j are collective indices of \(i=(a,\alpha )\) and \(j =(b,\beta )\), respectively. \(V_\mu\) are \((N^2-1)\times (N^2-1)\) matrices satisfying \(V_{\mu }^{{T}}=V_{\mu }^{{H}}\), i.e. real orthonormal matrices in the adjoint representation of the SU(N) group, and ab denote the group indices running in \(1,\dots ,N^2-1\). \(\gamma _\mu\) denotes \(4\times 4\) Hermitian matrices satisfying the anti-commuting relation \(\{\gamma _\mu ,\gamma _\nu \} = 2 \delta _{\mu ,\nu }I\), and \(\alpha ,\beta\) denote spin indices running from 1 to 4. An explicit form of \(\gamma _\mu\) can be seen in [17]. The parameter \(\kappa\) implicitly determines the mass of the fermion. For more details of D, we refer to [11, 14].

The matrix D satisfies the following properties:

$$\begin{aligned} \gamma _5 D \gamma _5&= D^{{H}}, \end{aligned}$$
(44)
$$\begin{aligned} C D C^{{T}}&= D^{{T}}, \end{aligned}$$
(45)

where \(\gamma _5 = \gamma _4\gamma _1\gamma _2\gamma _3\) and \(C=\gamma _4 \gamma _2\). The matrices \(\gamma _5\) and C act only on the spin indices in this notation. We employ the definition for \(\gamma _\mu\) from [17] and give the explicit form in Appendix 1. In this case, \(\gamma _5\) is real and symmetric and C is real and skew symmetric \(C^{{T}}=-C\). Monte Carlo methods have been used for simulating quantum field theories. For the system considered herein, the quantum field \(V_\mu\) corresponds to the stochastic variable in a Monte Carlo algorithm. The spectrum of D becomes stochastic because it depends on \(V_\mu\).

We test the proposed algorithm for the matrix A defined by

$$\begin{aligned} A \equiv (D \gamma _5)^2 = D D^{{H}}. \end{aligned}$$
(46)

The matrix A is Hermitian and J-symmetric with \(J = C\gamma _5\). The distribution of the small eigenvalues of A is physically important because it carries the information about the dynamics of the theory. The details of the algebraic property of D and A are given in Appendix.

4.2.2 Numerical results (case B)

We set \(N=289\) of SU(N) for the test. The dimension of A is \(4\times (289^2-1)=334{,}080\). The ensemble for \(V_\mu\) is generated with a Monte Carlo algorithm at a parameter set of the TEK model. We employ a single Monte Carlo sample of \(V_\mu\) for the test.

Table 2 Algorithm parameters and statistics for convergence (Case B)

We compare the convergence behavior of the eigenvalues between the proposed algorithm (TRLAN–JSYM) and the standard (single vector) thick-restart Lanczos algorithm (TRLAN). We use the normal and invert modes for solving large and small eigenvalues, respectively. The conjugate–gradient (CG) algorithm is used in the invert mode. The algorithmic parameters, the number of desired eigenvalues \(\mathrm {nev}\), the restart window size \(\mathrm {mwin}\), and the maximum size of the search dimension m are shown in Table 2. We also tabulate the results of the outer iteration count, the number of matrix–vector multiplications, and the computational time for the convergence. The timings are shown as reference values, showing how the cost of the matrix–vector multiplication dominates the computational time in actual applications. We note that the convergence behavior of the CG in the invert mode was almost identical between the two algorithms as has been assumed in Sect. 3, justifying the cost comparison in terms of the number of matrix–vector multiplications of \(A^{-1}v_j\) in the invert mode. Compared with the TRLAN–JSYM, we double the parameters of the TRLAN to find all doubly degenerate eigenvalues. We set the tolerance to be \(10^{-13}\) for the eigensolvers.

Figures 4 and 5 show the convergence behavior and residual history of the large eigenvalues, respectively. The algorithmic parameters are \((\mathrm {nev},\mathrm {mwin},m)=(8,16,48)\) for the TRLAN–JSYM and (16, 32, 96) for the TRLAN, respectively. We observe similar convergence behavior as in Case A, where several reorderings occur among approximate eigenvalues during the iterations in the TRLAN algorithm. The same convergence behavior is seen in Figs. 6 and 7 for the small eigenvalues.

Fig. 4
figure 4

Convergence behavior of the large eigenvalues (case B)

Fig. 5
figure 5

Residual history for the large eigenvalues (case B)

The computational costs are compared in Table 2. According to the discussion done in Sect. 3, most cases satisfy \(N_{\mathrm {conv}}\simeq N'_{\mathrm {conv}}\) for (41) and (42) among algorithms paired with doubled parameters, and \(N_{\mathrm {MV}}\) for the TRLAN–JSYM algorithm is approximately twice as small as that for the TRLAN algorithm. Three cases, \((\mathrm {nev},\mathrm {mwin},m)=(16,32,48)\) for both modes, and \((\mathrm {nev},\mathrm {mwin},m)=(8,16,24)\) for the invert mode, are the exceptions. In these cases, one or two eigenvalues, which are the largest for the invert mode or the smallest for the normal mode among \(\mathrm {nev}\) eigenvalues, show slow convergence. Even for these cases, however, the TRLAN–JSYM shows smoother convergence behavior than that of the TRLAN. All the cases we have investigated show that the TRLAN–JSYM algorithm has better computational cost than that of TRLAN regarding \(N_{\mathrm {MV}}\). The computational timings are roughly proportional to \(N_{\mathrm {MV}}\), indicating that the matrix–vector multiplication dominates the timings. With the invert mode for small eigenvalue problems, the timings are well proportional to the number of matrix–vector multiplications, because the CG algorithm is used for \(A^{-1}v\) and the cost of the Lanczos and the true residual computing parts become negligible.

Fig. 6
figure 6

Convergence behavior of the small eigenvalues (case B)

Fig. 7
figure 7

Residual history for the small eigenvalues (case B)

5 Summary

In this study, we have shown the orthogonality and A-orthogonality between the two Krylov subspaces, \({\mathscr {K}}_k(A,v)\) and \({\mathscr {K}}_k(A,Jv^*)\) that are generated with the Lanczos algorithm for Hermitian J-symmetric matrices A. By employing this property, we proposed the thick-restarted Lanczos algorithm for Hermitian J-symmetric matrices (TRLAN–JSYM) using which we could efficiently search for one half of the doubly degenerate eigenvectors in \({\mathscr {K}}_k(A,v)\) without the need to explicitly construct \({\mathscr {K}}_k(A,Jv^*)\). The other half of the degenerate eigenvectors are simply constructed from the converged eigenvectors by utilizing the J-symmetry property.

We demonstrated the proposed algorithm TRLAN–JSYM for two test cases, the random matrices, and the fermion matrix from the quantum field theory called the TEK model. The convergence observed for the TRLAN–JSYM algorithm was smoother than that for the TRLAN algorithm, as expected. The TRLAN–JSYM algorithm performed better than the TRLAN algorithm regarding the matrix–vector multiplication. The TRLAN algorithm shows the reordering of eigenvalues among the degenerated eigenvalues caused by the loss of orthogonality between the eigenvectors paired with J-symmetry during the standard Lanczos iteration with the finite precision arithmetic. We did not discuss the mathematical background on the loss of orthogonality in the Lanczos algorithm. If exact arithmetic was employed for both of the algorithms, the TRLAN algorithm becomes identical to the TRLAN–JSYM algorithm, according to Corollary 1. However, our algorithm enforces the orthogonality to achieve the smooth convergence behavior at finite precision arithmetic, resulting in better performance.