Introduction

Variational quantum algorithms1,2,3 are expected to play a key role on noisy intermediate-scale quantum (NISQ) devices4. Especially, variational quantum eigensolver (VQE)5,6,7,8,9,10 has attracted much attention for its application to quantum chemistry where quantum entanglement is essential11,12. The scope of VQE has extended from ground state simulations of molecular systems7,9,13 to condensed matters14,15,16 and excited states17,18,19,20,21,22.

Building on the concept of imaginary time evolution (ITE), which aims to drive an arbitrary initial state to the exact ground state, several variants of quantum algorithms have recently been developed. McArdle and co-workers proposed variational ITE (VITE)23, which employs a fixed ansatz and then determines the optimal parameters using McLachlan’s variational principle. Therefore, VITE can be regarded as an optimizer not only for variational algorithms such as VQE24, but also for algorithms that employ pre-processed non-Hermitian Hamiltonians such as transcorrelated methods25,26,27. Probabilistic ITE (PITE) exploits measurements to perform the non-unitary operation of ITE on quantum devices probabilistically28,29,30. This can be achieved by introducing one ancilla qubit and embedding the ITE operator acting on an N-qubit system in an (N + 1)-qubit unitary gate28. In principle, ITE can be exactly performed on the N-qubit system by accepting (or discarding) the resultant state if the ancilla qubit is measured to be \(\left\vert 0\right\rangle\) (or \(\left\vert 1\right\rangle\)). However, the exact PITE generally requires the singular-value-decomposition of the N-qubit Hamiltonian, which hinders its practical applications in chemistry. Furthermore, the success probability decays exponentially with the number of imaginary time steps and the system size N. As such, there have been several proposals to circumvent these problems29,30.

Yet another scheme, Quantum ITE (QITE)31, approximates the non-unitary short evolution of ITE by a unitary evolution that is determined by solving a set of linear equations. Therefore, it circumvents the high-dimensional noisy optimizations in variational algorithms, while driving a quantum state towards the ground state at each evolution step. The promise of QITE has been demonstrated experimentally31,32,33, and numerous authors have extended the algorithm34,35,36,37,38,39,40. In our own recent study, a modified equation for the unitary approximation was presented, which enables faster convergence of QITE, thereby reducing the overall quantum resources.

Although QITE is a powerful tool for determining the ground state, there have been fewer developments that aim for obtaining excited states, especially when compared to variational algorithms that have seen a wide variety of developments17,18,19,20,21,39,41,42,43,44,45,46. The reason for this is perhaps that quantum Lanczos diagonalization (QLanczos) is expected to find reasonable excited states by increasing the size of the Krylov subspace31,32. However, our recent study showed that the component of excited states encoded in the initial state vanishes with imaginary time β at an exponential rate in general, and is lost in the numerical noise that is caused by the strong linear dependence of the chosen Krylov subspace39. This is particularly true if the excited states are separated from the ground state by large energy gaps, i.e., higher energy eigenstates.

Historically, there have been broad interests in obtaining excited states from classical ITE47,48,49,50,51, and we can gain many insights from them. For instance, to retain the excited state signature throughout the QITE simulation, we followed the work of Booth and Chan48 and adopted the folded-spectrum propagator \({e}^{-{\beta }^{2}{(\hat{H}-\omega )}^{2}}\) in ref. 39, an approach coined FSQITE. It was shown that FSQITE can in principle yield the desired excited states, and its convergence rate can be drastically accelerated with QLanczos. Nevertheless, FSQITE requires to estimate the target energy ω in advance and to treat the Hamiltonian squared \({\hat{H}}^{2}\), which can be quite challenging in general.

In this work, we develop the model-space QITE (MSQITE) algorithm to deliver stable and accurate solutions for excited states. MSQITE evolves an orthogonal model space to the complete subspace by ITE, simulating multiple states simultaneously. It also improves the behavior and accuracy for the ground states of strongly correlated systems, by directly incorporating important configurations. Because the method has many similarities to QITE and FSQITE, it can be also easily combined with QLanczos.

Furthermore, we present a scheme to deal with spin contamination in MSQITE. The spin quantum number is an essential quantity that characterizes a non-relativistic electronic state. Preserving spin symmetry is important but is more challenging in quantum simulations52,53,54,55,56,57 than conserving other symmetries such as point-group symmetry that can be usually constrained by removing the qubits from the simulation58,59. It should be easily imagined the problem of spin contamination is exacerbated in excited state calculations because excited states often exhibit more complicated electronic structures than the ground state and thus are prone to spin contamination. In the following, we provide a way to circumvent this difficulty.

As will be seen, there are two different flavors of MSQITE; one uses the same unitary for all states in the model space, and the other employs different unitaries for different states. We investigate how such unitary approximations in the MSQITE algorithm can affect its representability and accuracy, and report difficulties with the former approach.

Since the quantum circuits of both QITE and MSQITE necessarily elongate with imaginary time, their applicability on NISQ computers might be severely limited due to quantum noise. However, it is expected that MSQITE could potentially outperform QITE in many respects, even on NISQ computers, providing not only excited states but also faster convergence to the ground state. We demonstrate the efficacy of MSQITE through noisy simulations using a simple error mitigation protocol.

Results

Model-space QITE

In MSQITE, one prepares an orthogonal subspace that consists of zeroth-order ground and excited states, \(\{\left\vert {\Phi }_{I}\right\rangle ;I=0,\cdots \,,{n}_{\rm{states}}-1\}\), and evolves the entire subspace by the propagator \({e}^{-\beta \hat{H}}\) (which is Trotterized by a short time step Δβ). It is important to note that the imaginary time evolution makes the basis states nonorthogonal, \(\langle {\Phi }_{I}| {e}^{-2\Delta \beta \hat{H}}| {\Phi }_{J}\rangle \,\ne\, 0\), and therefore the orthonormalization of the subspace is necessary. Hence, in MSQITE, we consider the following unitary approximation on the th step:

$$\left\vert {\Phi }_{I}^{(\ell +1)}\right\rangle =\mathop{\sum}\limits_{J}{d}_{IJ}{e}^{-\Delta \beta (\hat{H}-{E}_{J})}\left\vert {\Phi }_{J}^{(\ell )}\right\rangle \approx {e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\left\vert {\Phi }_{I}^{(\ell )}\right\rangle$$
(1)

where \(\vert {\Phi }_{I}^{(\ell )}\rangle\) are the Ith state at the th time step, and \(\hat{A}\) is a Hermitian operator parameterized by the coefficients a,

$$\hat{A}=\mathop{\sum}\limits_{\mu }{a}_{\mu }{\hat{\sigma }}_{\mu }$$
(2)

with Pauli strings \({\hat{\sigma }}_{\mu }\), which are appropriately chosen31,34,39. In Eq. (1), we have intentionally introduced the energy shift \({E}_{J}=\langle {\Phi }_{J}^{(\ell )}| \hat{H}| {\Phi }_{J}^{(\ell )}\rangle\) for convenience. The transformation matrix d is also introduced to ensure the orthonormality of the time-evolved model space \(\{\vert {\Phi }_{I}^{(\ell +1)}\rangle \}\).

This transformation matrix can be defined in infinitely different ways; however, we require d → I (identity matrix) as Δβ → 0, because this would allow us to correctly obtain \(\vert {\Phi }_{I}^{(\ell +1)}\rangle \equiv \vert {\Phi }_{I}^{(\ell )}\rangle\). We also wish the change in each state to be minimum at each time step, to be able to “follow” the Ith state between the time steps in order for Eq. (1) to be a meaningful approximation. To this end, we employ the Löwdin symmetric orthonormalization60,61. Remarkably, the so-obtained d is the one that minimizes the distance in the Hilbert space, \({{{\bf{d}}}}=\arg {\min }_{{{{\bf{d}}}}}{\sum }_{I}\parallel \vert {\Phi }_{I}^{(\ell +1)}\rangle -{e}^{-\Delta \beta (\hat{H}-{E}_{I})}\vert {\Phi }_{I}^{(\ell )}\rangle {\parallel }^{2}\)62,63. In other words, the property of \(\left\vert {\Phi }_{I}^{(\ell )}\right\rangle\) is maximally preserved in \(\left\vert {\Phi }_{I}^{(\ell +1)}\right\rangle\) on average, and therefore it is expected that different states do not mix strongly. In particular, when the energy shift EJ is introduced, d is diagonal dominant with all the diagonal elements being equal to one. In the Methods section, we have detailed the Löwdin symmetric orthonormalization procedure in MSQITE and discussed other possibilities for the definition of d.

In MSQITE, \({\lim }_{\ell \to \infty }\vert {\Phi }_{I}^{(\ell )}\rangle\) may not be the exact ground and excited states. Instead, we retain them as a model space basis and express the physical states \(\left\vert {\psi }_{I}\right\rangle\) as a linear combination of these states, \(\left\vert {\psi }_{I}\right\rangle ={\lim }_{\ell \to \infty }{\sum }_{K}{c}_{KI}\vert {\Phi }_{K}^{(\ell )}\rangle\). This corresponds to solving the eigenvalue problem

$${{{{\bf{H}}}}}^{(\ell )}{{{\bf{c}}}}={{{{\bf{S}}}}}^{(\ell )}{{{\bf{c}}}}{{{\bf{{{{\mathcal{E}}}}}}}}$$
(3)

where

$${H}_{IJ}^{(\ell )}=\langle {\Phi }_{I}^{(\ell )}| \hat{H}| {\Phi }_{J}^{(\ell )}\rangle$$
(4)
$${S}_{IJ}^{(\ell )}=\langle {\Phi }_{I}^{(\ell )}| {\Phi }_{J}^{(\ell )}\rangle$$
(5)

and \({{{\bf{{{{\mathcal{E}}}}}}}}\) contains the ground and excited state energies in the diagonal. The eigenvalues become the exact energies if the entire model space is propagated appropriately.

Now, we have two approaches to determine the unitary \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\). In the so-called state-specific approach, a is different for different \(\left\vert {\Phi }_{I}\right\rangle\) (therefore, we write aI and \({\hat{A}}^{I}\) to indicate the state dependence). Similarly to QITE31,39, we minimize the following function

$${F}^{I}({{{{\bf{a}}}}}^{I})={\left\Vert \mathop{\sum}\limits_{J}{d}_{IJ}{e}^{-\Delta \beta (\hat{H}-{E}_{J})}\left\vert {\Phi }_{J}^{(\ell )}\right\rangle -{e}^{-{{{\rm{i}}}}\Delta \beta {\hat{A}}^{I}}\left\vert {\Phi }_{I}^{(\ell )}\right\rangle \right\Vert }^{2}$$
(6)

to the second-order of Δβ for each I. This results in the linear equation

$${{{{\bf{M}}}}}^{I}{{{{\bf{a}}}}}^{I}+{{{{\bf{b}}}}}^{I}={{{\bf{0}}}}$$
(7)

with

$${M}_{\mu \nu }^{I}=2{{{\rm{Re}}}}\left\langle {\Phi }_{I}^{(\ell )}\right\vert {\hat{\sigma }}_{\mu }{\hat{\sigma }}_{\nu }\left\vert {\Phi }_{I}^{(\ell )}\right\rangle$$
(8)
$${b}_{\mu }^{I}={{{\rm{Im}}}}\left\langle {\Phi }_{I}^{(\ell )}\right\vert \left[\hat{H},{\hat{\sigma }}_{\mu }\right]\left\vert {\Phi }_{I}^{(\ell )}\right\rangle +\frac{2}{\Delta \beta }\mathop{\sum}\limits_{J}{d}_{JI}{{{\rm{Im}}}}\left\langle {\Phi }_{I}^{(\ell )}\right\vert {\hat{\sigma }}_{\mu }\left\vert {\Phi }_{J}^{(\ell )}\right\rangle$$
(9)

We have provided a detailed derivation in Supplementary Notes.

In contrast, the state-averaged approach uses the same a and \(\hat{A}\) for all the states considered. This can be accomplished by solving

$$\mathop{\sum}\limits_{I}{{{{\bf{M}}}}}^{I}{{{\bf{a}}}}+\mathop{\sum}\limits_{I}{{{{\bf{b}}}}}^{I}={{{\bf{0}}}}.$$
(10)

Several important considerations have to be made with respect to the above derivations. Eqs. (8) and (9) are essentially the same as those corresponding to QITE39, except that \({b}_{\mu }^{I}\) has the additional second term, which ensures the orthogonality of the model space. For the state-specific method, the model space is not exactly orthogonal, but it is almost so because of this term. Indeed, without the second term, Eq. (3) quickly becomes unsolvable because all elements of \({S}_{IJ}^{(\ell )}\) tend to become one (i.e., all states in the model space become the ground state). We note that \(\frac{{d}_{IJ}}{\Delta \beta }\to 0\) for I ≠ J as Δβ → 0 and the diagonal term will not contribute because σμ is Hermitian and thus the expectation value is real; so the second term is stable. The importance of the term is less pronounced in the state-averaged method.

The state-averaged method would be preferred to the state-specific method because the model space \(\{\vert {\Phi }_{I}^{(\ell )}\rangle \}\) is guaranteed to be orthogonal (i.e., \({S}_{IJ}^{(\ell )}={\delta }_{IJ}\)), and also because its quantum circuit is significantly simpler. However, despite the existence of a single unitary \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\) that correctly transforms all the states simultaneously to the desired states, it should be noted that the corresponding Hermitian \(\hat{A}\) has to be quite complicated. In practice, because it is desirable to employ a simple \(\hat{A}\) in Eq. (2), the representability of the unitary is considerably limited, and therefore the performance of the state-averaged MSQITE may not be promising. This is quite similar to an issue recently reported by ref. 64, that the multistate contracted VQE, which minimizes the averaged energy of orthogonal states generated by the same unitary19, experiences large errors for excited state calculations. Indeed, below, we will show that with the state-specific MSQITE a model space converges to almost the exact one using only single and double excitations in \(\hat{A}\), whereas the accuracy of the state-averaged MSQITE is generally quite unsatisfactory and its errors in energy can be substantial especially when the number of states increases.

Quantum Circuit for MSQITE

The algorithmic difference between QITE and MSQITE is that the latter requires the estimation of quantities like \({H}_{IJ}^{(\ell )}\) for each pair I, J. Whereas the state-averaged MSQITE has a simple quantum circuit because all the states are evolved by the same unitary \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\), one needs the controlled gate for \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}({{{{\bf{a}}}}}^{I})}\) for the state-specific approach. Figure 1 illustrates how we implement the latter using the Hadamard test. We prepare the state register and an ancilla qubit as \(\vert {\Phi }_{I}^{(0)}\rangle\) and \(\left\vert +\right\rangle\), which controls UJI, \({e}^{-{{{\rm{i}}}}\Delta \beta {\hat{A}}^{I}}\), and \({e}^{-{{{\rm{i}}}}\Delta \beta {\hat{A}}^{J}}\). Here, UJI comprises simple gates to generate \(\vert {\Phi }_{J}^{(0)}\rangle ={\hat{U}}_{JI}\vert {\Phi }_{I}^{(0)}\rangle\) initially. In practice, the unitary \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}({{{{\bf{a}}}}}^{I})}\) is Trotter-decomposed as

$${e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}({{{{\bf{a}}}}}^{I})}\approx \mathop{\prod}\limits_{\mu }{e}^{-{{{\rm{i}}}}{\theta }_{\mu }^{I}{\hat{\sigma }}_{\mu }}$$
(11)

with

$${\theta }_{\mu }^{I}=\Delta \beta {a}_{\mu }^{I}.$$
(12)

Since \({\hat{A}}^{I}\) and \({\hat{A}}^{J}\) only differ by the parameters aI and aJ and share the same gate structure, it is convenient to order the controlled gates in an alternating manner as shown in Fig. 1a, noting that the controlled-\({e}^{-{{{\rm{i}}}}{\theta }_{\mu }^{I}{\hat{\sigma }}_{\mu }}\) and controlled-\({e}^{-{{{\rm{i}}}}{\theta }_{\nu }^{J}{\hat{\sigma }}_{\nu }}\) always commute. Without the control qubit, each Pauli rotation is performed by using the standard procedure65,66,67 as shown in Fig. 1b, where (i) the qubits to be rotated are transformed to either of the X, Y, Z basis by the corresponding single-qubit unitary gates (denoted by R), (ii) their parities are passed to the last qubit (denoted by the CNOT gate with a dotted line), and (iii) the Rz gate is applied followed by the Hermitian conjugate of (ii) and (i). Since the two adjacent controlled Pauli rotations carry out these unitary operations, the operations (i) and (ii) between them cancel out, and we can simplify the entire gate as depicted in Fig. 1c.

Fig. 1: Quantum circuit for obtaining matrix elements of state-specific MSQITE using the Hadamard test with an ancilla \(\left\vert +\right\rangle\).
figure 1

a UJI transforms \(\vert {\Phi }_{I}^{(0)}\rangle\) to \(\vert {\Phi }_{J}^{(0)}\rangle\). The unitary gates \({e}^{-{{{\rm{i}}}}\Delta \beta {\hat{A}}^{I}}\) and \({e}^{-{{{\rm{i}}}}\Delta \beta {\hat{A}}^{J}}\) are each decomposed into Pauli rotations. b The uncontrolled version of Pauli rotation is typically implemented by using one-qubit unitary gates (R), and a sequence of CNOT gates, which is abbreviated by the CNOT gate with a dotted line. Together with Rz, they perform \({e}^{-{{{\rm{i}}}}{\theta }_{\mu }{\hat{\sigma }}_{\mu }}\). c Two different controlled Pauli rotations by \({\theta }_{\mu }^{I}{\hat{\sigma }}_{\mu }\) and \({\theta }_{\mu }^{J}{\hat{\sigma }}_{\mu }\) can be summarized to one controlled-Rz.

Therefore, the additional complexity in the quantum circuit of the state-specific MSQITE arises from the two CNOT operations and one additional Rz rotation. We consider this additional effort may not be a significant overhead cost compared with the circuit shown in Fig. 1b.

For the diagonal terms \({H}_{II}^{(\ell )}\), they represent energy expectation values of \(\vert {\Phi }_{I}^{(\ell )}\rangle\), and therefore their quantum circuits are identical to that of QITE, without any controlled-\({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\) operations. Hence, for MSQITE with a model space comprising nstates states, one needs to prepare nstates(nstates − 1)/2 circuits at each to measure \({H}_{IJ}^{(\ell )}\) (noting that H() is Hermitian), in addition to nstates circuits to perform the same measurements as QITE.

To investigate the relative complexity of the quantum circuits required for the state-specific MSQITE, in Fig. 2, we have depicted the number of CNOT gates for each noiseless simulation conducted in this study, as a function of imaginary time β. It is evident that the number of CNOT gates for the off-diagonal circuit is generally approximately 1.2 to 1.3 times greater than that for the diagonal circuit. This observation underscores the efficacy and effectiveness of our quantum circuit designed for the Hadamard test. Nevertheless, it should be mentioned that these quantum circuits are too deep if noise is of concern, and in practice, we need to introduce appropriate simplifications, which will be discussed later.

Fig. 2: Number of CNOT gates required for each of noiseless simulations.
figure 2

The number of CNOT gates required to estimate the energy expectation value (diagonal) \(\langle {\Phi }_{I}| \hat{H}| {\Phi }_{I}\rangle\) and coupling (off-diagonal) \(\langle {\Phi }_{I}| \hat{H}| {\Phi }_{J}\rangle\) are plotted as circles and crosses, respectively, as a function of imaginary time. The blue, orange, and green plots represent the results for simulations of H4, BeH2, and N2, respectively.

MS-QLanczos

Similar to many other Krylov methods46,68,69,70, QLanczos forms and diagonalizes the effective Hamiltonian to generate a wave function as a linear combination of time-evolved states. Here, we can generalize QLanczos to the model space formalism, which we call MS-QLanczos. Let us consider to expand the Krylov model subspace as

$$\left\{{e}^{-\ell \Delta \beta \hat{H}}\left\vert {\Phi }_{I}^{(0)}\right\rangle ;(\ell =0,\cdots \,,n);(I=0,\cdots \,,{n}_{{{{\rm{states}}}}}-1)\right\}$$
(13)

which comprises the basis for the effective Hamiltonian to be diagonalized. Here, we choose to use the normalized states \(\left\vert {\Phi }_{I}^{(\ell )}\right\rangle\) to ease the derivation:

$$\left\{\left\vert {\Phi }_{I}^{(\ell )}\right\rangle ;(\ell =0,\cdots \,,n);(I=0,\cdots \,,{n}_{{{{\rm{states}}}}}-1)\right\}$$
(14)

Note that it spans the same space as Eq. (13). At an arbitrary time step Δβ, the Ith quantum state is given by,

$$\begin{array}{rcl}\left\vert {\Phi }_{I}^{(\ell )}\right\rangle &=&\mathop{\sum}\limits_{J}{d}_{JI}^{(\ell -1)}{e}^{-\Delta \beta (\hat{H}-{E}_{J}^{(\ell -1)})}\left\vert {\Phi }_{J}^{(\ell -1)}\right\rangle \\ &=&\mathop{\sum}\limits_{J}{\tilde{d}}_{JI}^{(\ell -1)}{e}^{-\Delta \beta (\hat{H}-{E}_{0})}\left\vert {\Phi }_{J}^{(\ell -1)}\right\rangle \end{array}$$
(15)

where E0 is some reference energy that is fixed throughout the imaginary time evolution (e.g., the average energy of the initial model space), and

$$\Delta {E}_{I}^{(\ell )}={E}_{I}^{(\ell )}-{E}_{0}$$
(16)
$${\tilde{d}}_{JI}^{(\ell )}={d}_{JI}^{(\ell )}{e}^{\Delta \beta \Delta {E}_{J}^{(\ell )}}$$
(17)

The global energy shift E0 is introduced to ensure that the propagator is independent of both state and imaginary time, while avoiding the vanishing norm due to \({e}^{\Delta \beta {E}_{0}}\). Using the relation (15) recursively, we find

$$\left\vert {\Phi }_{I}^{(\ell )}\right\rangle =\mathop{\sum}\limits_{J}{\left({\tilde{{{{\bf{d}}}}}}^{({\ell }^{{\prime} })}\cdots {\tilde{{{{\bf{d}}}}}}^{(\ell -1)}\right)}_{JI}{e}^{-(\ell -{\ell }^{{\prime} })\Delta \beta (\hat{H}-{E}_{0})}\left\vert {\Phi }_{J}^{({\ell }^{{\prime} })}\right\rangle$$
(18)

for arbitrary \({\ell }^{{\prime} } < \ell\).

Then, one can write the overlap matrix among the model space (14) as

$$\begin{array}{rcl}{{{{\mathcal{S}}}}}_{I(\ell ),J({\ell }^{{\prime} })}&\equiv &\langle {\Phi }_{I}^{(\ell )}| {\Phi }_{J}^{({\ell }^{{\prime} })}\rangle \\ &=&{\left[{\left({{{{\bf{D}}}}}^{\left(\frac{\ell +{\ell }^{{\prime} }}{2}\to \ell -1\right)}\right)}^{\top }{{{{\bf{S}}}}}^{\left(\frac{\ell +{\ell }^{{\prime} }}{2}\right)}{\left({{{{\bf{D}}}}}^{({\ell }^{{\prime} }\to \frac{\ell +{\ell }^{{\prime} }}{2}-1)}\right)}^{-1}\right]}_{IJ}\end{array}$$
(19)

where

$${{{{\bf{D}}}}}^{({\ell }^{{\prime} }\to \ell )}\equiv {\tilde{{{{\bf{d}}}}}}^{({\ell }^{{\prime} })}{\tilde{{{{\bf{d}}}}}}^{({\ell }^{{\prime} }+1)}\cdots {\tilde{{{{\bf{d}}}}}}^{(\ell )}\quad (\ell > {\ell }^{{\prime} })$$
(20)

Since we expect nstates to be small, the computational cost of D is negligible. The MS-QLanczos Hamiltonian matrix elements are similarly derived as

$$\begin{array}{lll}{{{{\mathcal{H}}}}}_{I(\ell ),J({\ell }^{{\prime} })}&\equiv &\langle {\Phi }_{I}^{(\ell )}| \hat{H}| {\Phi }_{J}^{({\ell }^{{\prime} })}\rangle \\ &=&{\left[{\left({{{{\bf{D}}}}}^{\left(\frac{\ell +{\ell }^{{\prime} }}{2}\to \ell -1\right)}\right)}^{\top }{{{{\bf{H}}}}}^{\left(\frac{\ell +{\ell }^{{\prime} }}{2}\right)}{\left({{{{\bf{D}}}}}^{({\ell }^{{\prime} }\to \frac{\ell +{\ell }^{{\prime} }}{2}-1)}\right)}^{-1}\right]}_{IJ}.\end{array}$$
(21)

and one simply solves the generalized eigenvalue problem using \({\bf{{\mathcal{H}}}}\) and \({\bf{{\mathcal{S}}}}\). Note that the derivation reduces to that of the modified single-state QLanczos39 when nstates = 1.

Illustrative noiseless simulations

Here, we assess the performance of MSQITE and MS-QLanczos, using molecular systems without the impact of noise. For this reason, we use the unitary coupled-cluster generalized singles and doubles (UCCGSD) ansatz66,71, which was found to be suitable for \(\hat{A}\) in simulating molecules39. Since the UCCGSD ansatz has the same number of parameters as the given Hamiltonian, the gate complexity for each time step in UCCGSD-based (MS)QITE scales similarly to the real time evolution. Therefore, the cost of UCCGSD-based MSQITE is expected to be comparable to algorithms that exploit real time evolution68,70.

We first consider the BeH2 molecule at equilibrium (Re = 1.334 Å). As the initial model space for MSQITE, we choose the following three configurations: the HF configuration, and the configurations where two electrons are promoted from the highest occupied orbital to πu orbitals, as listed in Fig. 3. In the same figure, the performances of various methods for the ground and excited states are depicted. Because the ground state of the system is only weakly correlated, QITE and especially QLanczos quickly converge. Also shown in the figure is the results of FSQITE (using the exact target energy \({E}_{2{\Sigma }_{{{{\rm{g}}}}}^{+}}=15.2263\) Hartree) and its extension to QLanczos (FS-QLanczos). Although FSQITE and FS-QLanczos eventually converge to the exact states, their evolutions are rather slow. Moreover, it cannot determine the ground state because it is far from the target state.

Fig. 3: Summary for BeH2 simulation results.
figure 3

a Orbital diagrams. b Convergence behavior of energy using the QITE algorithm. c Convergence behavior of energy using FSQITE. d Convergence behavior of energy using MSQITE.

In contrast, clearly, MSQITE delivers remarkably fast convergence to the desired excited states when compared with FSQITE. Nevertheless, we note the convergence profile is state-dependent. For example, while the ground state \(X{\Sigma }_{{{{\rm{g}}}}}^{+}\) converged rapidly, the convergence of the second excited state 1Δg required approximately β = 5 a.u., followed by the convergence of the first excited state \(2{\Sigma }_{{{{\rm{g}}}}}^{+}\) approximately 5 ~ 6 a.u. later. This difference is attributed to the fact that these excited states are strongly correlated. To see this, we have tabulated the coefficients of the exact eigenstates in Supplementary Table 1. It is verified that the initial configurations (ii) \(\left\vert 000000110011\right\rangle\) and (iii) \(\left\vert 000011000011\right\rangle\) are the dominant ones for 1Δg, each with a coefficient of about 0.5, but it also contains other dominant configurations such as \(\left\vert 000000111100\right\rangle\) (see the Methods section for our qubit mapping: here, two electrons are promoted from 2σg to 1πu with respect to HF). Such additional configurations need to be generated by the (MS)QITE procedure, and the imaginary time evolution typically takes more steps if their coefficients are non-negligible. From Supplementary Table 1, it is seen that the \(2{\Sigma }_{{{{\rm{g}}}}}^{+}\) state is even more strongly correlated than 1Δg, resulting in slower convergence in MSQITE.

We can expect a better performance of MSQITE if these additional configurations are included in the initial model space; however, of course, such detailed information may not be accessible a priori. Instead, MS-QLanczos can automatically detect and extract these states much earlier than MSQITE, as shown in Fig. 3. In contrast to FS-QLanczos, MS-QLanczos was not able to obtain the \(3{\Sigma }_{{{{\rm{g}}}}}^{+}\) state. This is simply because we have truncated the Krylov vectors to avoid numerical instabilities. If such higher states are desired, one needs to add more states in the model space, and MSQITE (MS-QLanczos) can find the eigenstates in the energy order.

It should be noted that the MSQITE method should bring a certain advantage not only for excited states but also for strongly correlated ground states, because the model space by definition can naturally provide multi-configuration states. To observe this advantage, we take the square H4 molecule with a bond length of 1 Å as an example. As shown in Fig. 4, QITE and QLanczos take more than 10 and 4 a.u. in imaginary time, respectively, to reach the ground state within the 1 mHartree accuracy. The slow convergence of the former is ascribed to the strong correlation in H4, which is a two-determinant system with \(\left\vert 00001111\right\rangle\) and \(\left\vert 00110011\right\rangle\).

Fig. 4: Comparison between different algorithms for H4.
figure 4

a QITE. b MSQITE.

The energies of MSQITE with the model space comprised of \(\left\vert 00001111\right\rangle\) and \(\left\vert 00110011\right\rangle\) approach the (near) exact energies very rapidly, within less than a few a.u. in imaginary time; MS-QLanczos convergence is even faster. We emphasize, however, that each of the resulting MSQITE basis states \(\left\vert {\Phi }_{I}\right\rangle\) are not the exact eigenstates. They are rather states that have either \(\left\vert 00001111\right\rangle\) or \(\left\vert 00110011\right\rangle\) as the dominant configuration, but possess almost no component of the other configuration. Nevertheless, the model space is developed to the complete space during the MSQITE procedure, such that linear combinations of \(\{\left\vert {\Phi }_{I}\right\rangle \}\) are the exact states, as described in the preceding section.

Avoiding spin contamination with shifted propagator

For a non-relativistic molecular Hamiltonian, the exact wave function is an eigenstate of the number operator \(\hat{N}\) and the spin operators \({\hat{S}}^{2}\) and \({\hat{S}}_{{{{\rm{z}}}}}\). However, since each of the Pauli rotations applied in QITE does not necessarily commute with these symmetry operators, both the number of electrons and spin quantum numbers fluctuate during the evolution. Nevertheless, for the one-particle symmetry operators (\(\hat{N}\) and \({\hat{S}}_{{{{\rm{z}}}}}\)), such fluctuations are moderate and do not affect the result in our numerical experiments. It is also relatively easy to constrict the quantum state to the fixed quantum numbers by using fermionic operators instead of Pauli operators, i.e., one can employ the parametrization of Eq. (2) and treat linear combinations of Pauli operators57.

However, we found that the \({\hat{S}}^{2}\) symmetry is difficult to preserve, especially for excited states. Usually, the initial model space is prepared such that only the target spin states (e.g., singlets in the above cases) are included. However, due to the approximate nature of (MS)QITE, the model space often starts to leak into different spin symmetry spaces and finds higher spin states (e.g., triplets s = 1 and quintets s = 2) with lower energies than states with the desired spin, by virtue of imaginary time propagation. For variational simulations, one can use the projection operator55,56,57, but it is not straightforward to apply it in the framework of ITE.

Such spin-contamination, and the resulting “spin-collapse”, are practical yet significant issues in MSQITE, as demonstrated below. In essence, when spin-collapse occurs, it becomes necessary to increase the number of states in a model space, nstates, and rerun the calculation to obtain the true target states with low spins. However, it is important to emphasize that states with higher spin (s) than the target spin (i.e., singlet) can be obtained much more efficiently through a separate calculation, in which the initial states are prepared with s = ms = (Nα − Nβ)/2, where Nα and Nβ are the numbers of α and β electrons, respectively. Such simulations preclude the convergence of low-spin states (the fluctuation of \({\hat{S}}_{{{{\rm{z}}}}}\) and hence ms is negligible in MSQITE). In fact, this protocol is a de facto standard frequently employed in quantum chemical calculations to track different spin states.

As an example, here we consider N2 at a stretched bond distance of 1.6 Å. We employed three configurations (i), (ii), and (iii) given in Fig. 5 to make an initial model space, and aimed to obtain the lowest singlet states with the \({\Sigma }_{{{{\rm{g}}}}}^{+}\) symmetry including the ground state.

Fig. 5: Configurations used in MSQITE for N2.
figure 5

In the MSQITE simulation of N2, we have used up to six electronic configurations as the model space. Configuration (i) is the HF determinant. Two electrons are excited from πu to πg in configurations (ii), (iii), (iv), and (v), and four electrons are excited in configuration (vi). All configurations belong to \(\Sigma_{\rm {g}}^{+}\).

Figure 6a plots the changes in energy of MSQITE state along with the exact energies of different spin symmetries (shown in different colors). Whereas the lowest state of MSQITE converges to the singlet ground state quickly, the two excited states suffer from slow convergence. More importantly, both converge to the quintet \({1}^{5}{\Sigma }_{{{{\rm{g}}}}}^{+}\) state. It is worth reiterating that this lowest quintet state can be obtained simply by preparing an initial state with ms = 2 in QITE, rather than adopting the more intricate MSQITE with ms = 0. In Fig. 6b, we monitor the change in \(\langle {\hat{S}}^{2}\rangle\), and the third state is trapped in some unphysical spin state with \(\langle {\hat{S}}^{2}\rangle \approx 2\). The problem here is that, in general, the exact eigenstates are unknown and therefore, one may get confused as if the MSQITE state achieved a stationary triplet state, as bI ≈ 0 for 90 < β < 120. However, this is an artifact of spin contamination. In fact, this contaminated state was found to be a half-and-half mixture between \({3}^{1}{\Sigma }_{{{{\rm{g}}}}}^{+}\) and \({1}^{5}{\Sigma }_{{{{\rm{g}}}}}^{+}\): the exact energies are \({E}_{{3}^{1}{\Sigma }_{{{{\rm{g}}}}}^{+}}=-108.293193\) and \({E}_{{1}^{5}{\Sigma }_{{{{\rm{g}}}}}^{+}}=-108.463729\), and the energy expectation value of the state is trapped at about \(-108.378\approx ({E}_{{3}^{1}{\Sigma }_{{{{\rm{g}}}}}^{+}}+{E}_{{1}^{5}{\Sigma }_{{{{\rm{g}}}}}^{+}})/2\).

Fig. 6: Importance of spin-shift in MSQITE to remove spin-contamination (N2 at a bond length of 1.6 Å).
figure 6

a Energy convergence profile of MSQITE. b Spin expectation value of MSQITE. c Energy convergence profile of MSQITE with spin-shift. d Spin expecation value of MSQITE with spin-shift.

Moreover, note that there exist several spin states (triplets and quintets) that have lower energies than singlet states as is clear from the figure. Hence, the convergence of MSQITE to these wrong spin states is highly likely. Of course, one could add more configurations in the model space to obtain higher singlet states; however, one can easily imagine that this approach is inefficient and is best avoided. We should also mention that, as the representability and flexibility of \(\hat{A}\) increase, MSQITE would become even more prone to spin-contamination.

Hence, we introduce the spin-shift to the propagator,

$${e}^{-\beta \hat{H}}\to {e}^{-\beta \left(\hat{H}+\lambda ({\hat{S}}^{2}-s(s+1))\right)}$$
(22)

where λ is an arbitrary positive number and s is the designated spin quantum number. Because MSQITE is expected to transform the initial model space into the complete subspace, the use of the spin-shift should also be able to fix the spin at the same time. The trick here is that, while the target spin component with s in the model space remains unaffected by the shifted propagator, the spin contaminants with \({s}^{{\prime} } > s\) rapidly vanish. Note that we can always assume \({s}^{{\prime} }\ge s\) by appropriately constructing the initial model space (namely, we set ms = s). Thereby, the model space will be projected to spin s. We would like to emphasize that this effect shares similarities with the widely-used penalty function method in variational algorithms, which penalizes the energy associated with incorrect spin states7,55,72,73.

As λ becomes large, the spin-projection acts more strongly; however, it could spoil the convergence of MSQITE because of the large Trotter error. In principle, it suffices to use \(\lambda > {E}_{{s}^{{\prime} }}-{E}_{s}\) where \({E}_{{s}^{{\prime} }}\) is the energy of excited state with spin \({s}^{{\prime} }\).

In Fig. 6c, we show the results of MSQITE with the spin-shift using λ = 0.5. As expected, all the states nicely converge to the desired singlet states. Throughout all imaginary time, these states retain \(\langle {\hat{S}}^{2}\rangle =0\) approximately, and get rid of spin contamination appropriately (Fig. 6d). We notice that the third state of MSQITE initially approaches the 11Δg state instead of directly converging to the \({3}^{1}{\Sigma }_{{{{\rm{g}}}}}^{+}\) state, and then starts to find the latter state as the lower state. However, this is not the weakness of the method; it is rather an indication of the ability of MSQITE to find the lowest states.

State-specific and state-averaged MSQITE

In the preceding section, we have discussed the advantages that MSQITE has to offer, focusing on the state-specific algorithm. As the state-averaged scheme is more attractive in terms of circuit complexity, we also carried out the state-averaged MSQITE to evaluate its accuracy. To properly evaluate the potential of the state-averaged MSQITE, we have performed noiseless simulations.

Table 1 compares the final energies obtained at convergence of the state-specific and state-averaged MSQITE methods. In addition to H4 and BeH2, N2 at equilibrium (a bond distance of 1.098 Å) was tested with two configurations (i) and (ii) in Fig. 5, as an initial model space. Whereas the state-specific MSQITE yields quite accurate energies independent of systems, the state-averaged MSQITE results become significantly inaccurate for larger systems. Its accuracy is satisfactory for H4 but deteriorates for N2 with an error of 47 mHartree for the \(2{\Sigma }_{{{{\rm{g}}}}}^{+}\) state. In general, increasing the model space tends to result in larger errors in energy, as shown in Fig. 7.

Table 1 Exact energy and error of MSQITE for each system (in Hartree).
Fig. 7: Energy errors of the converged UCCGSD-based MSQITE states with the state-specific and state-averaged schemes for N2 at equilibrium.
figure 7

States used for each MSQITE simulation are chosen from Fig. 5 in serial order.

Another prominent example of the failure of the state-averaged MSQITE is the N2 molecule with two πu orbitals and two πg orbitals and four electrons (comprising an eight qubit system). With a model space comprising six configurations—HF and all five pair-excited configurations derived from it (those listed in Fig. 5)—the UCCGSD-based state-averaged MSQITE methods immediately converge at β = 0, because ∑IbI = 0 by symmetry. Note that this convergence does not indicate, of course, bI = 0 for each state; in fact, the state-specific MSQITE performs quite well, yielding very accurate energies. It is worth noting that ∑IbI is equivalent to the averaged energy derivative that appear in the VQE-based state-averaged UCCGSD method74; indeed, we applied the method to this system and found that it suffers from the same problem and no optimization of parameters was carried out. Overall, this strongly implies the limitation of other state-averaged methods for general systems19,64.

It should be clear that this ill-behavior of the state-averaged MSQITE does not necessarily imply a possible theoretical flaw in our derivation. The failure is rather ascribed to the limitation of the form of \(\hat{A}\) that we employed, i.e., single and double substitutions. In other words, it is unlikely the same UCCGSD amplitudes can evolve any arbitrary states to the desired ones all at once through \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\), even qualitatively. That being said, with triples (T) and quadruples (Q) included, we can rigorously obtain the exact eigenstates by definition: such UCCGSDTQ ansatz is complete for a four-electron system. For this particular case, the UCCGSDT-based state-averaged MSQITE already delivers almost the exact result (with less than 10−12 Hartree error). Overall, therefore, we are led to conclude that the state-averaged MSQITE does not seem practical because one needs way more Pauli operators from higher rank excitations than double excitations, to achieve a satisfactory accuracy, and this will become quickly infeasible with the increase in number of electrons.

Noise simulations

To investigate the effect of quantum noise in state-specific MSQITE, here we conducted noise simulations on squared H4 with each side being 2 Å. To simplify the quantum circuits, we have used several techniques (refer to the METHODS section for details). Figure 8 showcases both noiseless and noisy results for QITE and MSQITE, using different error rates for depolarizing noise. As is evident from the figure, both noisy QITE and noisy MSQITE experience significant error accumulation as imaginary time extends. For a relatively conservative error rate for each CNOT, p2 = 10−2, the energies for both noisy QITE and noisy MSQITE become exceed that of HF already at β = 0.2 a.u. (see Fig. 8a).

Fig. 8: Noisy QITE and MSQITE for H4 using depolarizing error.
figure 8

The error rate for one-qubit gates p1 is set to 0.1 times the error rate for two-qubit gates p2. (a) p2 = 10−2 (b) p2 = 5 × 10−3, and (c) p2 = 10−3.

Such large errors arise from the fact that the circuit depth increases linearly with the imaginary-time step in these methods. Figure 9 plots the number of CNOT gates required to estimate the diagonal and off-diagonal matrix elements of the effective Hamiltonian, H00 and H01, respectively, as a function of β. We will omit the results for H11 because they are almost the same as the results of H00. As discussed earlier, the quantum circuit for H01 is not significantly more complicated compared to that for H00. Nevertheless, the number of CNOT gates exceeds 400 at β = 1 a.u. (corresponding to five imaginary-time steps, as we used Δβ = 0.2 a.u. for these simulations), which introduces tremendous errors even with a small error rate of p2 = 10−3 for each CNOT gate (Fig. 8c). Especially, as the noise accumulates, the expectation values of off-diagonal matrix elements tend to zero, leaving almost no advantage in performing MSQITE over QITE.

Fig. 9: Number of CNOT gates required for the noisy simulations of H4.
figure 9

The numbers of CNOT gates required to estimate the energy expectation value (diagonal) \(\langle {\Phi }_{I}| \hat{H}| {\Phi }_{I}\rangle\) and coupling (off-diagonal) \(\langle {\Phi }_{I}| \hat{H}| {\Phi }_{J}\rangle\) in the noisy simulations of H4 are plotted as circles and crosses, respectively, as a function of imaginary time.

Therefore, it is essential to mitigate errors with NISQ computers75,76, and we have employed the simple protocol of zero-noise-extrapolation (ZNE)77,78,79. The error-mitigated QITE and MSQITE results are also presented in Fig. 8 as filled squares and circles, respectively. Clearly, ZNE significantly improves the noisy results, successfully approximating the noiseless energies at short β values. Interestingly, we observe that error-mitigated MSQITE is almost always as stable as mitigated QITE, despite the slightly longer circuit required for H01.

This result can be rationalized by recognizing that the off-diagonal elements generally have a much smaller absolute value than the diagonal ones, serving as a correction to the QITE energies. As a result, errors in the off-diagonal elements have a less impact on the accuracy of the final energies of MSQITE, compared to errors in the diagonal elements.

In order to examine this result further, we plotted the noisy values (p2 = 5 × 10−3) of H00 and H01 along side the extrapolated values in Fig. 10a. The “ideal” noiseless estimates, generated using the same quantum circuits and parameters a, are also shown. We observe that the noisy H01 converges to 0, as expected. In most instances, ZNE successfully approximates the noiseless values of H00 and H01 with notable accuracy. Figure 10b illustrates the differences between the ZNE and noiseless values, namely, \(\Delta {H}_{IJ}=\left\vert {H}_{IJ}({{{\rm{ZNE}}}})-{H}_{IJ}({{{\rm{Noiseless}}}})\right\vert\) for IJ (00, 01), which serve as a metric for the reliability of ZNE. Remarkably, both ΔH00 and ΔH01 exhibit similar precision, even though H01 is presumed to be more susceptible to noise due to its higher CNOT gate count. Therefore, the errors introduced in the effective Hamiltonian are relatively balanced between diagonal and off-diagonal elements, yielding an overall noise impact on the MSQITE energies similar to that on the QITE energies.

Fig. 10: Accuracy of effective Hamiltonian matrix elements in the presence of noise.
figure 10

a Noisy, noiseless, and extrapolated values of the diagonal and off-diagonal matrix elements H00 and H01 for p2 = 5 × 10−3. b Difference between extrapolated and noiseless values.

In conclusion, MSQITE continues to offer its advantages over QITE even on NISQ computers, as long as suitable error-mitigation techniques are utilized to minimize the effects of noise.

Discussion

In this work, we introduced a model space into QITE to enable excited state simulations. The orthogonality condition was retained using the Löwdin symmetric orthonormalization, which minimizes the state change during time steps and thus is suitable for the short-time unitary approximation of the non-unitary imaginary time propagation. MSQITE was shown to be a promising route to obtaining both ground and excited states, and its extension to QLanczos allowed for further acceleration in obtaining approximate eigenstates.

This study also proposed the spin-shift in the propagator. Because excited states frequently suffer from spin-contamination, it is necessary to remove the irrelevant spin configurations from the MSQITE simulation. We have shown that the proposed spin-shift approach achieves this feat by projecting out the desired spin symmetry through ITE.

Based on the results obtained in this work, we conclude that the state-averaged MSQITE is likely to necessitate substantially complicated \(\hat{A}\) to appropriately evolve all the states considered in a model space, compared to the state-specific scheme. Namely, the former method requires fermionic excitations beyond double excitations in \(\hat{A}\) for \({e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}\) to achieve reasonable accuracy in ideal, noiseless computations; this is deemed unappealing because of the increasing number of Pauli operators that need to be included. It should be pointed out that the recently proposed state-averaged orbital-optimized VQE74 shares the same difficulty because it uses the same unitary for multiple orthonormal states as in the state-averaged MSQITE; thus, the scalability of the method with the increase in number of electrons and states remains to be an open question. In contrast, the state-specific MSQITE is potentially more promising than the state-averaged one, requiring only single and double excitations (i.e., UCCGSD) to achieve high accuracy.

We have also demonstrated that the effectiveness of state-specific MSQITE compared to QITE remains unaffected by quantum noise when appropriate error mitigation techniques are applied. This resilience can be attributed to two main factors. First, the quantum circuit for state-specific MSQITE was designed to minimize the CNOT gate counts in the estimation of off-diagonal elements. Second, both the diagonal and off-diagonal elements display a comparative magnitude of errors. Overall, we found that the impact of quantum noise on MSQITE is similar to that on QITE.

Finally, many of the ideas developed in this work are versatile, and we expect that they can be applied to fields beyond quantum chemistry, such as nuclear physics. Moreover, MSQITE can be extended to combine with adaptive algorithms32,34 and variational algorithms23,35. This integration holds potential for new synergies that can reduce the circuit depth. We are currently working along these directions.

Methods

Orthonormalization of model space

In the Results section, the transformation matrix d was introduced in MSQITE to preserve the orthonormality of the model space after a short-time propagation. We chose the Löwdin symmetric orthonormalization for this purpose. First, we form the overlap matrix of the target imaginary time evolved model space,

$$\begin{array}{ll}{\tilde{S}}_{IJ}^{(\ell )}=\langle {\Phi }_{I}^{(\ell )}| {e}^{-\Delta \beta (\hat{H}-{E}_{I})}{e}^{-\Delta \beta (\hat{H}-{E}_{J})}| {\Phi }_{J}^{(\ell )}\rangle \\ \qquad=\,{S}_{IJ}^{(\ell )}-2\Delta \beta \left({H}_{IJ}^{(\ell )}-\frac{1}{2}({E}_{I}+{E}_{J}){S}_{IJ}^{(\ell )}\right)+O(\Delta {\beta }^{2})\end{array}$$
(23)

which is truncated after the first-order of Δβ to obtain the approximate overlap. Note that here, we assume the model space is orthonormal \({S}_{IJ}^{(\ell )}={\delta }_{IJ}\); however, even if this assumption is not satisfied, one can still find such a basis and the argument does not lose generality, see Supplementary Notes. Diagonalizing \(\tilde{{{{\bf{S}}}}}\) gives

$$\tilde{{{{\bf{S}}}}}{{{\bf{U}}}}={{{\bf{U}}}}\tilde{{{{\bf{s}}}}}$$
(24)

where s is the diagonal matrix with the eigenvalues and U the eigenvectors. The d matrix from the Löwdin symmetric orthonormalization is then uniquely obtained as

$${{{\bf{d}}}}={{{\bf{U}}}}{\tilde{{{{\bf{s}}}}}}^{-1/2}{{{{\bf{U}}}}}^{{\dagger} }.$$
(25)

We note that, instead of the above d, it would be also tempting to employ the transformation that diagonalizes \(\langle {\Phi }_{I}^{(\ell )}| {e}^{-\Delta \beta (\hat{H}-{E}_{I})}\hat{H}{e}^{-\Delta \beta (\hat{H}-{E}_{J})}| {\Phi }_{J}^{(\ell )}\rangle\), such that \({\lim }_{\ell \to \infty }\vert {\Phi }_{I}^{(\ell )}\rangle\) is the exact ground or excited state, \(\left\vert {\psi }_{I}\right\rangle\). However, it is easily seen that the unitary matrix obtained from the diagonalization of such an effective Hamiltonian matrix is inadequate because it can flip the signs and even the ordering of the states, and thus Eq. (1) cannot be a valid approximation.

Following Blunt et al.50, one may perform the Gram-Schmidt orthogonalization to define d. However, the Gram-Schmidt orthogonalization is not unique about the order of orthogonalization steps and also leads to a biased update of \(\{\left\vert {\Phi }_{I}\right\rangle \}\). Importantly, the propagation of the first state \(\left\vert {\Phi }_{0}\right\rangle\) will remain unaffected by the presence of other states \(\left\vert {\Phi }_{I}\right\rangle (I \,>\, 0)\). Therefore, it will naturally become the exact ground state at β → . It is highly desirable that \(\left\vert {\Phi }_{0}\right\rangle\) is initially chosen to be the closest to the ground state among all the states in the model space at β = 0. Otherwise, the model space would experience large reorganization, which the short-time unitary evolution of Eq. (1) would find difficult to express. This requirement may be easily satisfied for the ground state (i.e., HF may be the most reasonable starting point). However, for excited states, the appropriate ordering is generally unknown.

Simulation details

MSQITE and MS-QLanczos were implemented in our Python-based emulator package, Quket80, which compiles other useful libraries such as OpenFermion81, PySCF82, and Qulacs83, to perform quantum simulations. In all simulations, we used the STO-6G basis set and HF orbitals. The Jordan-Wigner transformation was employed to map the fermion operators to the qubit representation, such that α and β spin orbitals were aligned alternately with the rightmost qubit represents the lowest energy α spin orbital. Δβ was set to 0.1 a.u. for QITE and MSQITE, and 0.05 a.u. for FSQITE, and the UCCGSD ansatz39,66,71 was employed for the form of \(\hat{A}\), except for the noise simulations, for which the computational details are provided below. To ensure stable solutions to Eqs. (7) and (10), we applied a truncation scheme to the singular values of the M matrix. Specifically, we retained singular values that were greater than 10−7 times the largest singular value, as implemented in scipy.linalg.lstsq84. The Be 1s orbital and the N 1s and 2s orbitals were not considered in the simulations. For H4, the initial HF calculation was performed with the C2h symmetry instead of D4h, to relax the orbitals.

To perform noisy QITE simulations, we used Qiskit85, and the quantum noise was modeled by depolarizing error. The error rate for two-qubit gates was set to p2 = 10−2, 5 × 10−2, and 10−3, and that for one-qubit gates was set to p1 = p2 × 0.1. To mitigate the error, we used the zero-noise-extrapolation technique with exponential fitting79, only for two-qubit (CNOT) gates. For this simulation, the Hamiltonian of H4 is transformed to the reduced representation by using the tapering-off technique58,59. This resulted in the following four-qubit-Hamiltonian:

$$\begin{array}{ll}\hat{H}={h}_{0}+{h}_{1}{Z}_{2}{Z}_{3}+{h}_{2}{Z}_{0}{Z}_{2}+{h}_{2}{Z}_{1}{Z}_{2}+{h}_{3}{Z}_{0}{Z}_{1}{Z}_{2}{Z}_{3}\\ \qquad+\,{h}_{4}{Z}_{0}{Z}_{1}+{h}_{5}({Z}_{0}{Z}_{3}+{Z}_{1}{Z}_{3})+{h}_{6}({Z}_{0}{Z}_{1}{Z}_{2}+{Z}_{0}{Z}_{1}{Z}_{3})\\ \qquad+\,{h}_{7}(-{X}_{1}{Y}_{2}{Y}_{3}+{Y}_{1}{Y}_{2}{X}_{3}-{X}_{0}{Y}_{2}{Y}_{3}+{Y}_{0}{Y}_{2}{X}_{3})\\ \qquad+\,{h}_{8}\left(-{Y}_{0}{Z}_{1}{Y}_{2}{Z}_{3}+{Y}_{0}{Z}_{1}{Z}_{2}{Y}_{3}+{Y}_{0}{Y}_{2}+{X}_{0}{Z}_{1}{Z}_{2}{X}_{3}\right.\\ \qquad-\,\left.{Z}_{0}{X}_{1}{Z}_{2}{X}_{3}+{Z}_{0}{Y}_{1}{Y}_{2}{Z}_{3}-{Z}_{0}{Y}_{1}{Z}_{2}{Y}_{3}-{Y}_{1}{Y}_{2}\right)\\ \qquad+\,{h}_{9}(-{X}_{0}{Z}_{1}{Z}_{2}+{X}_{0}{Z}_{2}{Z}_{3}-{Z}_{0}{X}_{1}{Z}_{2}+{X}_{1}{Z}_{2}{Z}_{3})\\ \qquad+\,{h}_{10}({X}_{0}{X}_{1}{Z}_{2}{Z}_{3}+{Y}_{0}{Y}_{1}{Z}_{2}{Z}_{3})\\ \qquad+\,{h}_{11}({X}_{0}{Y}_{1}{Y}_{2}-{Y}_{0}{X}_{1}{Y}_{2}+{Z}_{0}{Z}_{2}{X}_{3}-{Z}_{1}{Z}_{2}{X}_{3})\\ \qquad+\,{h}_{12}({Z}_{0}{Z}_{1}{Y}_{2}{Y}_{3}-{Y}_{2}{Y}_{3})\\ \qquad+\,{h}_{13}({Z}_{0}+{Z}_{1}+{Z}_{0}{Z}_{2}{Z}_{3}+{Z}_{1}{Z}_{2}{Z}_{3})\\ \qquad+\,{h}_{14}(-{Z}_{2}-{Z}_{3})\\ \qquad+\,{h}_{15}(-{Y}_{0}{X}_{1}{Y}_{3}+{X}_{0}{Y}_{1}{Y}_{3}+{Z}_{0}{X}_{2}{Z}_{3}-{Z}_{1}{X}_{2}{Z}_{3})\end{array}$$
(26)

with

$$\begin{array}{rcl}{h}_{0}&=&-1.0613356242517709\\ {h}_{1}&=&0.3752318182963852\\ {h}_{2}&=&0.3736748877137335\\ {h}_{3}&=&0.3722054115496151\\ {h}_{4}&=&0.26033869205932514\\ {h}_{5}&=&0.22299864958557725\\ {h}_{6}&=&0.09825586237928423\\ {h}_{7}&=&0.07901175885991207\\ {h}_{8}&=&0.0746839486241239\\ {h}_{9}&=&0.07166447926823467\\ {h}_{10}&=&0.05690174793752179\\ {h}_{11}&=&0.05592385851947026\\ {h}_{12}&=&0.05496497155276822\\ {h}_{13}&=&0.03491578410706995\\ {h}_{14}&=&0.021711508654056723\\ {h}_{15}&=&0.018760090104653643\end{array}$$

whose first two lowest eigenvalues are −1.91552763 and −1.87493645. We also simplified the quantum circuit by using a reduced form of UCCGD (neglecting single substitutes) where only one of eight Pauli tensors arising from a double excitation is used. The resulting form of the ansatz is comprised of 14 unitaries,

$${e}^{-{{{\rm{i}}}}\Delta \beta \hat{A}}=\mathop{\prod }\limits_{i=0}^{13}{e}^{-{{{\rm{i}}}}\Delta \beta {a}_{i}{\hat{P}}_{i}}$$
(27)

with

$$\begin{array}{rcl}{\hat{P}}_{0}&=&{X}_{1}{X}_{2}{Y}_{3}\\ {\hat{P}}_{1}&=&{X}_{0}{X}_{2}{Y}_{3}\\ {\hat{P}}_{2}&=&{Z}_{1}{Z}_{2}{Y}_{3}\\ {\hat{P}}_{3}&=&{X}_{0}{Y}_{1}{X}_{2}\\ {\hat{P}}_{4}&=&{X}_{2}{Y}_{3}\\ {\hat{P}}_{5}&=&{X}_{0}{Y}_{3}\\ {\hat{P}}_{6}&=&{X}_{1}{Y}_{2}\\ {\hat{P}}_{7}&=&{Y}_{0}{Z}_{1}{X}_{2}{Z}_{3}\\ {\hat{P}}_{8}&=&{Z}_{0}{Y}_{1}{Z}_{2}{X}_{3}\\ {\hat{P}}_{9}&=&{Y}_{0}{X}_{1}{X}_{3}\\ {\hat{P}}_{10}&=&{Z}_{0}{Y}_{2}{Z}_{3}\\ {\hat{P}}_{11}&=&{X}_{0}{Y}_{1}\\ {\hat{P}}_{12}&=&{Y}_{1}\\ {\hat{P}}_{13}&=&{Y}_{0}\end{array}$$

where we have used Δβ = 0.2 a.u.. We used the stabilization procedure for QLanczos as presented in ref. 39 to describe excited states. However, for MS-QLanczos, the numerical instability arising from the linear dependence in the Krylov subspace becomes even more challenging compared with the standard QLanczos. Hence, we adopted the same procedure as ref. 31, i.e., we use Krylov vectors that satisfy \({{{{\mathcal{S}}}}}_{\ell {\ell }^{{\prime} }} < 0.99\) to alleviate the linear dependence. However, we have made the following modifications: the selection of Krylov vectors is performed backwards (i.e., starting from the current time instead of from the initial time) in order to ensure the latest states are always included in the basis, and the number of states included in the subspace is limited to 5. The selection is based on the assumption that excessively old time states do not play an important role but only cause numerical instabilities.