Introduction

One of the major goals of computational chemistry is the development of methods and algorithms for the calculation of the molecular electronic ground and excited state energies and corresponding wave functions from first-principles. Such eigenvalues and eigenvectors can be obtained from the solution of the time-independent electronic Schrödinger equation. However, since the invention of classical digital computers in the early 1940s, the exact numerical solution of this central quantum mechanical equation remains infeasible for systems roughly having more than 12 electrons distributed on 184 spin-orbitals1. Even though modern quantum many-body methods, such as density matrix renormalization group (DMRG)2, selected configuration interaction (sCI)3, and coupled-cluster (CC) theory4 methods can solve larger systems, their applications are still limited to dozens of electrons and they are subject to truncation or approximations. The reason is that the solution space (i.e., the Fock space) grows factorially with the system size (e.g., the number of electrons and basis functions). One of the most promising and immediate applications of quantum computers is solving classically intractable quantum chemistry problems5,6.

The direct solution to these problems on currently available quantum computer hardware is intractable. For example, the quantum phase estimation (QPE) algorithm7 represents the natural translation of the full configuration interaction (FCI) procedure to quantum computers8. However, QPE requires millions of qubits and quantum gates even for relatively small systems, making the algorithm unsuitable for practical applications on near-term noisy intermediate-scale quantum (NISQ) devices9. Full quantum eigensolver (FQE), in which the complexity of basic gate operators is polylogarithmic in the number of spin-orbitals, represents another fully quantum algorithm to simulate molecular systems10. However, FQE might not be suitable for advantageous quantum chemistry demonstrations due to the limitations of NISQ devices. Indeed, leading digital quantum computers based on superconducting qubit technology have limited coherence times (~100 μs) and gate error rates (~2 × 10−2 for a two-qubit gate), limiting the possible number of operations that can be executed to evolve the quantum state. Under these conditions, leveraging classical resources as much as possible through a hybrid quantum-classical approach seems to be the most promising route toward achieving quantum chemical advantage on NISQ devices11,12,13.

The variational quantum eigensolver (VQE) is one of the leading hybrid quantum-classical algorithms developed specifically to simulate molecular systems in their ground14,15,16,17 and ultimately excited states18,19,20,21,22,23. In a typical VQE setup, a variational parameterized ansatz is used to represent the trial wave function prepared with a quantum circuit and the expectation value of the qubit Hamiltonian is measured. Then, the parameters of the ansatz are iteratively optimized on a classical computer using the Rayleigh-Ritz variational principle. Although VQE simulations of small molecules have been performed on various quantum architectures such as photons14, superconducting qubits24,25 and trapped ions26, major efforts are needed to scale up this approach to larger molecular systems of chemical interest. Here, albeit not as large as QPE circuits, VQE would also suffer from the large size of the quantum circuits, which coupled with a classical optimization requiring many variational parameters, can render calculations intractable. Consequently, the construction of an efficacious ansatz and improved classical optimizer is an active area of research12,13.

There are two main approaches for ansatz design for electronic structure calculations. Hardware-efficient ansatzes24,25 are constructed from repeated, dense blocks of a limited set of parameterized gates that are easy to implement on a quantum device. While such an ansatz requires fewer resources on the quantum processor, because of its totally chemically agnostic nature, it might face difficulties in the parameter optimization due to the barren-plateau problem27,28,29,30. Consequently, simulation of large molecular systems with hardware-efficient ansatzes might become practically challenging and even impossible. Chemically-inspired ansatzes, such as the Unitary Coupled-Cluster Singles and Doubles (UCCSD)31,32,33,34,35, are designed by using domain knowledge from classical quantum chemistry. The standard version of UCCSD, however, has unfavorable scaling in the number of gates required for larger molecular systems36. Consequently, various approaches are being developed to improve on the UCCSD method to obtain shorter circuits or higher than UCCSD accuracy at comparable circuit depth17,37,38,39,40,41. Among them, “iterative VQE” algorithms42,43,44,45,46,47, which instead of using a fixed ansatz, construct problem-tailored ansatzes on-the-fly, have gained attention because of their controllable circuit-size.

Two notable iterative VQE algorithms to reduce quantum circuit depth are (fermionic and qubit) ADAPT-VQE42,43 and iterative Qubit Coupled Cluster (iQCC)44,45,46. Fermionic-ADAPT-VQE employs a predetermined pool of spin-adapted fermionic (generalized) singles and doubles excitation operators from which the ansatz is dynamically constructed42. The more measurement- and circuit-efficient (e.g. less CNOT gates in particular) qubit-ADAPT-VQE uses selected Pauli words (obtained from mapping fermionic operators into Pauli operators) to form the “qubit” pool, but the algorithm comes with a larger number of ADAPT iterations because of the larger number of variational parameters (more gradient calculations required)43. IQCC uses arbitrarily shallow quantum circuit depth at the expense of an iterative canonical Hamiltonian dressing technique employing the qubit coupled cluster (QCC) ansatz. The exponential growth of the “dressed” Hamiltonian with the size of the QCC transformation in an anti-commuting set could be mitigated using the involuntary linear combinations of Paulis technique45. But the algorithm is exponential with respect to the dressing steps.

Although circuit depth is one of the most critical metrics for NISQ devices, the number of physical qubits on the actual quantum chip (circuit width) is the necessary condition that limits the problem that can be solved. Several techniques based on symmetries presented in the Hamiltonian have been used to reduce the number of qubits48,49,50,51. But these techniques can only reduce a few qubits and have certain symmetry requirements. Instead, there are recent efforts to solve larger problems with smaller quantum computers by either using tensor network or divide-and-conquer techniques52,53,54,55. In addition, a dressing technique has been proposed to downfold the correlation effect in a large space into a smaller active space by using the second-order approximation56. However, these methods are approximate or only applied to model systems. In this work, we present an algorithm that reduces both quantum circuit depth and width in VQE at the cost of additional classical resources. Our algorithm, called ClusterVQE, uses the mutual information (a measure of the correlation between spin-orbitals57,58) to group the qubits into different clusters, which can be distributed to different quantum circuits. The correlation between different clusters is minimized by the mutual information optimization method and is taken into account by building a “dressed” Hamiltonian. Even shallower circuit depths can be achieved compared to the qubit-ADAPT-VQE. Consequently, the ClusterVQE algorithm is more robust to noise, making it particularly attractive for NISQ devices. Because the correlations between different clusters are minimized, the dressed Hamiltonian protocol only introduces minor additional classical resource requirements compared to the iQCC method. Most importantly, the ClusterVQE algorithm removes the entanglement between different clusters via the dressed Hamiltonian. Consequently, ClusterVQE can solve the original problem in a much smaller qubit space (clusters), reducing the number of qubits significantly for simulating large molecules on NISQ devices.

Results and discussion

Simulation details

To demonstrate the validity of the ClusterVQE algorithm, we have simulated several molecules on both a quantum simulator and IBM quantum devices. In order to perform the simulations, an in-house modified Qiskit code59 has been developed. The Jordan-Wigner operator transformation60 and the Limited-memory BFGS Bound (L-BFGS-B) optimizer61 were used in this work. The obtained results are compared with that of exact diagonalization. In order to make the comparison over different methods (VQE, qubit-ADAPT-VQE, iQCC) meaningful, the same ansatz, QUCCSD (details in Supplementary Section II), is used for all simulations below unless specified otherwise. The convergence criteria of ϵ = 10−4 and the minimal STO-3G basis set are used for all the molecules. The minimal basis set is chosen in the test because the NISQ devices suffer from noise due to the limitations of NISQ devices. Using a larger basis set directly will significantly increase the number of qubits and circuit depth, further amplifying the noise effect. To address the basis set problem, we recently proposed a trans-correlated Hamiltonian approach which could significantly improve the accuracy by using only the minimal basis set, and consequently, is more NISQ friendly62.

LiH molecule

We start with simulating the ground state of the LiH molecule at equilibrium (bond distance is 1.547 Å). LiH requires 10 qubits with the minimal STO-3G basis set and frozen core orbitals. The graph representation of the qubit clustering within the ClusterVQE is shown in Supplementary Fig. 2. Qubits [0, 1, 4, 5, 6, 9] and [2, 3, 7, 8] are in the same cluster, respectively. Figure 1(a) and (c) compares the energy convergence of the qubit-ADAPT-VQE, ClusterVQE, and iQCC approaches for different Li-H bond lengths (1.547 Å and 2.4 Å). Because iQCC only implements one entangler on the quantum circuit, the number of parameters to be optimized is also smaller compared to that of qubit-ADAPT-VQE and ClusterVQE. Consequently, a smaller number of optimization cycles is required per iteration. For the iQCC method, once the entanglers in the previous iteration are used to dress the Hamiltonian, these parameters are fixed. In contrast, the parameterized ansatz in the qubit-ADAPT-VQE and ClusterVQE approaches are re-optimized at each iteration, making them more flexible to achieve better convergence. As a result, the convergence of iQCC is slightly worse than that of qubit-ADAPT-VQE for a given number of iterations, as shown in Fig. 1(a) and (c). For ClusterVQE, because only a few operators are used to dress the Hamiltonian, the performance is almost the same as that of qubit-ADAPT-VQE. Because the Hamiltonian of the iQCC and ClusterVQE methods is dressed by the entanglers, the number of Pauli words in the dressed Hamiltonian is increased. However, compared to iQCC, only two operators are used to dress the Hamiltonian in ClusterVQE as shown in Fig. 1(b). The number of Pauli words of the dressed Hamiltonian does not increase significantly. In contrast, the number of Pauli words in iQCC increases exponentially with the number of iterations before saturation, see Fig. 1(b) and (d). But it should be noted that the performance of ClusterVQE depends on the quality of qubit clustering. If a less optimal qubit clustering is used, more entanglers will be used to dress the Hamiltonian. Consequently, the size of the dressed Hamiltonian increases significantly as shown in Supplementary Fig. 3 even though it is still smaller than that of iQCC.

Fig. 1: Simulation of LiH molecule.
figure 1

a, c GS Energy errors and (b, d) number of Pauli words per each iteration for the LiH molecule obtained from the qubit-ADAPT-VQE, iQCC, and ClusterVQE algorithms. The Li-H bond lengths are 1.547 Å (a, b) and 2.4 Å (c, d), respectively.

We then compared the performance of the qubit-ADAPT-VQE, iQCC, and ClusterVQE approaches on the ground-state potential energy surface (PES) of LiH as shown in Fig. 2. Overall, ClusterVQE can achieve the same energy error as that of qubit-ADAPT-VQE but with smaller quantum circuits. And also uses fewer Pauli words compared to iQCC, as shown in Fig. 2.

Fig. 2: Potential energy surface of LiH molecule.
figure 2

a GS energy, (b) energy error, and (c) number of Pauli words in the qubit Hamiltonian of LiH for different bond lengths obtained from different methods. For VQE, one layer of the QUCCSD ansatz is used. One layer of the QUCCSD ansatz is also used as the operator pool for the other methods.

We also compared the performance of the different clustering methods used (Supplementary Section I). For most geometries of the LiH molecule, these three methods produce the same clustering. However, at the bond length of 2.4 Å, the Metis graph partitioning method results in less optimal clustering. Consequently, more entanglers are needed to dress the Hamiltonian, resulting in 808 Pauli words. This comparison suggests that the QCD and MI-based clustering methods provide better and more stable performance in clustering and help improve the ClusterVQE performance.

Non-covalently-bound (van der Waals) complex (H2)2

Since ClusterVQE intends to minimize the intercluster correlations, it is intuitively expected that it is ideal for molecular aggregate systems where the intermolecular coupling is usually smaller than the intramolecular correlation. Here, we employ the non-covalently-bond (van der Waals) complex \({({{{{\rm{H}}}}}_{2})}_{2}\)41 to demonstrate the application of ClusterVQE on molecular aggregates. The spin orbitals of each molecule of the complex are grouped in a cluster. The intramolecular correlation of H2 is 0.04 au. By tuning the distance between the two H2 molecules, the intermolecular correlations can be changed. The intermolecular correlations of the \({({{{{\rm{H}}}}}_{2})}_{2}\) complex with distances 2 Å and 5 Å are 0.59 and 0.35 au, respectively. To demonstrate the complexity of the ClusterVQE method, the qubit-clustering is molecule-wise, i.e., the spin orbitals of the same molecule are grouped together. The ground state energies of the \({({{{{\rm{H}}}}}_{2})}_{2}\) complex with different intermolecular distances (2 Å and 5 Å, respectively) are calculated, which are shown in Fig. 3. For the shorter distance, the intermolecular correlations are much stronger. Consequently, more inter-cluster entanglers are used to dress the Hamiltonian, making the complexity of ClusterVQE almost the same as iQCC. In contrast, the intermolecular correlations become smaller when the intermolecular distance is 5 Å. In this case, only 2 entanglers are used to dress the Hamiltonian. Moreover, increasing the intermolecular distance further reduces the intermolecular correlations and results in less dressing. This comparison indicates that ClusterVQE is ideal for the molecular aggregates where the intermolecular correlation is weak and the complexity increases with increasing inter-cluster correlations.

Fig. 3: Simulation of the \({({{{{\rm{H}}}}}_{2})}_{2}\) complex molecule.
figure 3

a, c GS Energy errors and (b, d) number of Pauli words per each iteration for the \({({{{{\rm{H}}}}}_{2})}_{2}\) complex molecule obtained from the qubit-ADAPT-VQE (blue), iQCC (purple) and ClusterVQE (green) methods. The intermolecular distances are 2 Å (a, b) and 5 Å (c, d), respectively.

N2 molecule

We next demonstrated the validity of ClusterVQE in the strongly correlated regime. The N2 molecule is a prototype system for testing multi-reference methods in quantum chemistry due to its strong electronic correlations, particularly when the bond is stretched. Here N2 at equilibrium (1.09 Å) and stretched configurations (1.6 Å) are used as examples. The 12-qubit N2/STO-3G system is simulated after freezing the lowest two spin-orbitals.

Since the N2 molecule has a stronger correlation, the QUCCSD ansatz (i.e., choosing only one Pauli word from each UCCSD excitation) could result in slow convergence43. Instead, adding one more adjoint Pauli word for each excitation (known as the QUCCSD2 ansatz with more details found in the SI) could significantly improve the accuracy. The results obtained from the QUCCSD and the QUCCSD2 ansatz are shown in Supplementary Fig. 4 and Fig. 4, respectively. Comparison of these figures reveals that QUCCSD2 significantly improves the convergence but at the same time increases the circuit depth and the Hamiltonian size in the ClusterVQE and iQCC methods. The development of a more efficient ansatz that balances accuracy and efficiency for ClusterVQE could be a subject of separate investigations. As shown in Fig. 4(a) and (c), the energy convergence of ClusterVQE is almost the same as that of qubit-ADAPT-VQE and is better than that of iQCC. In contrast, the VQE optimization may lead to a small improvement after certain iterations due to the limited correlation that could be further introduced by one entangler. The convergence can be further improved by using more Pauli words for each UCCSD excitation43. Nevertheless, for a given ansatz, the size of the dressed Hamiltonian in ClusterVQE is much smaller than that of iQCC. It should be noted that the stretched N2 molecule has stronger correlations (0.274 au) compared to the equilibrium one (0.113 au). However, accompanied by a better separation between clusters. Consequently, less dressing is required for the stretched N2 to obtain an even larger amount of correlation and achieve better accuracy. Hence, ClusterVQE only slightly increases the size of the dressed Hamiltonian. In contrast, the size of the Hamiltonian exceeds 100,000 Pauli words in iQCC after 25 iterations. These results suggest that, even for strongly correlated systems, our ClusterVQE reduces circuit complexity without a significant increase in Hamiltonian size.

Fig. 4: N2 simulation.
figure 4

a, c GS Energy errors and (b, d) number of Pauli words per each iteration for the N2 molecule obtained from the qubit-ADAPT-VQE (blue), iQCC (purple) and ClusterVQE (green) methods. The N-N bond lengths are 1.09 Å (a, b) and 1.6 Å (c, d), respectively. The numbers of Pauli words in the qubit-ADAPT-VQE are 539 and 383 for 1.09 Å and 1.6 Å, respectively, which are not shown in (b) and (d).

Noisy simulation and clustering on-the-fly

We next examine the dissociation curve of the LiH molecule in the presence of noise to demonstrate ClusterVQE’s resilience to this common feature of all NISQ architectures. Figure 5 shows the dissociation curve of the LiH molecule calculated with different algorithms. The energy values are averaged over 5 runs. As shown by the curves, in the presence of noise, the dissociation curves obtained from the VQE, qubit-ADAPT-VQE, iQCC, and ClusterVQE methods are different from the exact solution. Since the VQE algorithm requires more parameters and the deepest circuit, the noise influence is the largest across the set, showing the strongest deviation from the reference. For example, one layer of the QUCCSD ansatz for the LiH (N2) molecule requires 15 (92) parameters/entanglers. The qubit-ADAPT-VQE method, in contrast, is able to grow a compact ansatz with fewer parameters (the number of parameters in qubit-ADAPT-VQE, ClusterVQE, and iQCC is equal to the iteration number) and lower circuit depth as shown in Table 1. Consequently, the influence of noise on qubit-ADAPT-VQE is suppressed compared to VQE. On the other hand, the iQCC algorithm uses a much shallower circuit depth. As a result, iQCC is most immune to noise, and the corresponding dissociation curve is the closest to the reference. Next is ClusterVQE which partially dresses the Hamiltonian and grows the ansatz. The circuit depth of ClusterVQE lies between that of iQCC and qubit-ADAPT-VQE as shown by the comparison in Table 1. Consequently, the energy error in the presence of noise is also in between that of the iQCC and qubit-ADAPT-VQE methods.

Fig. 5: PES of LiH molecule in the presence of noise.
figure 5

a Dissociation curves of LiH and (b) corresponding errors with exact diagonalization (black), VQE (red), qubit-ADAPT-VQE (blue), iQCC (purple), and ClusterVQE (green) methods on a noisy simulator. The custom noise in Qiskit59 was used, where the error rates of single-qubit and two-qubit gates are 0.001 and 0.005, respectively. The means and error bars are calculated from 5 independent simulations.

We also compare the performance of the four algorithms with different error rates. Supplementary Fig. 1(a) and (b) compare the energy errors with respect to different single- and two-qubit error rates, respectively. The single-qubit gates usually have much lower error rates on real quantum devices. For example, the single- and two-qubit gate error rates of the IBM-Yorktown63 device are around ~10−3 and ~10−2, respectively. Consequently, the error of single-qubit gates has a much smaller influence on the results. In contrast, the results are significantly affected by the error of two-qubit gates. In general, the error of ClusterVQE lies between that of the qubit-ADAPT-VQE and iQCC methods on the noisy simulator, as expected.

In the previous examples, the MI is calculated based on the exact wavefunction (WF) for proof-of-principle. But, the exact WF is not known before the VQE calculation. However, an approximate calculation of MI provides a sufficient estimate of the correlation between different qubits, as shown in the previous DMRG-type selection of active space64 and our recent development of the PermVQE algorithm41. The approximate MI can be estimated on classical computers by using either MP2 or DMRG (with small bond-dimension) methods. Here, we also demonstrate that ClusterVQE even works with MI calculated on-the-fly. At each iteration, the MI is updated and used for re-clustering the qubits. As shown by the red dots in Fig. 1(a) and (c), the performance of ClusterVQE on-the-fly is almost the same as that using the exact MI. The clustering is suboptimal only for the first three iterations and the dressed Hamiltonian has a slightly larger size as a result. The composition of each cluster continues to change until a stable clustering is found. These results illustrate that ClusterVQE also works when clustering on-the-fly.

Simulation on a quantum computer and error mitigation

Finally, we implement ClusterVQE on an actual quantum backend. The IBMQ-Bogota device with 5 qubits was chosen to simulate the LiH molecule described by 10 qubits (orbital with the lowest energy is assumed to be frozen). In order to run on IBMQ-Bogota, we set the size of each cluster to be 5. Figure 6 shows the energies as a function of ClusterVQE iteration without error mitigation. Due to the large CNOT error rate, the energies go up in the first few ClusterVQE iterations without error mitigation. The number of entanglers encoded on the quantum circuits increases with the number of iterations and introduces more errors.

Fig. 6: Simulation of the LiH molecule (10 qubits) on the IBMQ-Bogota quantum device (5 qubits) with (red) and without (blue) error mitigation and their comparison to the exact solution (black).
figure 6

We do not show mitigated results for the first three iterations as Clifford data regression (CDR) gives perfect correction for these circuits since they contain at most one non-Clifford gate. The error rate of the CNOT gate is between 1.06 × 10−2 ~ 2.02 × 10−2 and 1.62 × 10−2 on average at runtime.

The ClusterVQE performance on real quantum devices can be further improved with recently developed error mitigation techniques65,66. For proof-of-principle, we first perform ClusterVQE on the noiseless simulator to generate 8 circuits of the first 8 iterations and run the circuits on the IBM quantum device (IBMQ-Bogota). We perform the mitigation using Clifford data regression (CDR)67 on the IBMQ-Bogota device as well. The CDR training set is constructed using near-Clifford circuits with at most 1 non-Clifford gate in each of 2 clusters. The training set contains 3–4 near-Clifford circuits depending on the circuit of interest. The training circuits are chosen by replacing non-Clifford gates with the closest Clifford gates. The results are shown in Fig. 6 in which the blue and red lines plot the raw results obtained from the IBM-Bogota and mitigated results, respectively. We do not show mitigated results for the first three iterations as CDR gives a perfect correction for these circuits since they contain at most one non-Clifford gate. Figure 6 shows that error mitigation reduces the error by several orders of magnitude. Such simulations demonstrate that the ClusterVQE method, combined with error mitigation, can readily achieve “chemical accuracy” on real quantum devices. It should be noted that the term “chemical accuracy” used throughout this manuscript refers to the relative accuracy compared to the exact (FCI) solution of the given basis set, which is not the true chemical accuracy. The rigorous chemical accuracy can only be achieved by improving the theoretical complexity (HF, CCSD, CCSDT, … , FCI, etc) and basis set size. However, increasing the basis set size in quantum algorithms is not NISQ friendly, significantly increasing the needed quantum resources. Alternatively, by using a trans-correlated Hamiltonian, one can achieve the accuracy at the cc-pVTZ basis even with a minimal basis set62.

Complexity and scalability

Even though ClusterVQE reduces the circuit complexity (circuit depth and number of qubits), the computational complexity is increased due to the dressing. As shown in the previous section, ClusterVQE has almost the same performance as qubit-ADAPT-VQE in terms of energy errors. But, more Pauli words are measured due to the dressing. The computational scaling of ClusterVQE is dependent on the number of dressing operators. That being said, the scaling is determined by the amount of inter-cluster correlation. Even though the scaling of computing the dressed Hamiltonian is exponential against the number of dressing operators68, the scaling of ClusterVQE against iterations is not exponential in general since not every entangler is used to dress the Hamiltonian. For the best cases, only a limited number of entanglers is used to dress the Hamiltonian (LiH and \({({{{{\rm{H}}}}}_{2})}_{2}\) complex with a large intermolecular distance in section “The ClusterVQE algorithm” for example). Such a situation can be intuitively expected in molecular aggregate systems, where the intermolecular coupling is usually smaller than the intramolecular correlation. For the worse cases (usually in strongly correlated systems in which qubit clustering is not optimal), the scaling of ClusterVQE may be exponential if a larger number of entanglers are used to dress the Hamiltonian (\({({{{{\rm{H}}}}}_{2})}_{2}\) complex with large intermolecular distance 2 Å for example). However, even in strongly correlated molecules, the scaling of different geometries can be different. For example, ClusterVQE simulation of N2 at equilibrium with 12-qubits ends up with a large number of Pauli words (Fig. 4(b)). However, the simulation of N2 with a stretched bond has much better scaling (Fig. 4(d)) even though the stretched configuration has a stronger correlation. That being said, the bottleneck of our approach relies on whether the inter-cluster MI can be efficiently minimized. In summary, ClusterVQE does not introduce exponential scaling for general cases and the complexity of ClusterVQE increases with increasing inter-cluster correlations.

Discussion

We have developed a ClusterVQE algorithm to reduce quantum circuit complexity, including both the number of qubits and circuit depth. Our algorithm groups qubits into clusters based on the maximal correlation between them and further takes into account the residual entanglement between clusters in a “dressed” Hamiltonian. ClusterVQE naturally implements the benefits of the previously developed PermVQE algorithm, which significantly reduces the number of controlled-NOT gates to connect strongly correlated qubits41. The entanglers that couple different clusters are further used to dress the Hamiltonian and remove the entanglement between different clusters. Consequently, the Hamiltonian can be decomposed as a tensor product of each cluster. Each cluster can then be measured separately with a smaller number of qubits and shallower circuit depth. Therefore, ClusterVQE is a particularly promising algorithmic development for quantum chemistry applications, where it is able to reduce the quantum hardware requirements significantly. Moreover, our algorithm has the flexibility of reducing qubits to any number at the expense of additional computational costs on classical computers. Since fewer entanglers are encoded on quantum circuits, ClusterVQE is more robust to noise than qubit-ADAPT-VQE, being one of the most circuit-efficient VQE algorithms, which is verified by our numerical experiments on both the noisy simulator and real quantum platforms. Since the correlation between different clusters is minimized, only a few operators are used to dress the Hamiltonian. Consequently, the Hamiltonian size does not increase significantly compared to that of the iQCC method.

Finally, it should be noted that cluster VQE uses a newly proposed gradient measurement method for VQE without using an ancillary qubit. The analytical gradient can significantly improve the performance of ClusterVQE in noisy environments. Simulation of the LiH molecule on a noisy quantum simulator showed that the energy error is smaller than that of the qubit-ADAPT-VQE method due to the shallower circuit depth. A groundbreaking example of the exact error-corrected LiH molecule simulation described by 10 qubits on the 5-qubits IBMQ-Bogota quantum computer with ClusterVQE is demonstrated. Results further demonstrate the validity and advantage of the ClusterVQE algorithm. We believe the algorithm developed in this work can pave the way toward the adaptation of molecular quantum chemistry simulation on NISQ devices. It should be mentioned that since the size of the qubit Hamiltonian increases with the number of operators used to dress the Hamiltonian, it increases both computational costs on classical computers and measurements on quantum computers. Recently developed measurement reduction techniques69,70,71,72,73,74,75 are likely a path forward to further reduce the simulation cost on quantum devices.

Methods

Mutual information based clustering of qubits

The first essential step of ClusterVQE is to split the qubits into different clusters and minimize the correlations between different clusters. With this aim, an entanglement map based on the mutual information (MI) that reflects the electronic correlations among all pairs of qubits is calculated41. Previous results obtained for the orbital ordering problem in the Density Matrix Renormalization Group (DMRG) method in classical quantum chemistry calculations57 showed that the MI is a reliable parameter to quantify the correlation between two quantum particles. Our previous work also demonstrated that an approximate MI is sufficient for the qubit re-arrangement41 for the reduction of circuit depth. The MI has also been used to construct a compact entangler pool for ADAPT-VQE76. According to quantum information theory, the MI between qubits i and j is defined as follows57:

$${I}_{ij}=\frac{1}{2}({S}_{i}+{S}_{j}-{S}_{ij})(1-{\delta }_{ij}),$$
(1)

where Si and Sij are the single- and two-qubit Von Neumann entropies, respectively, and δij is the Kronecker delta. Si and Sij are obtained from the corresponding one- and two-body density matrix41.

Once the MI is obtained, it can be used to cluster the qubits into different groups by minimizing the MI between different clusters which is defined as the summation of MIs between inter-cluster qubits. The qubit clustering can be achieved by using a classical graph partitioning method, such as the graph partition function of the Metis library77 (Supplementary Section I(A)). Alternatively, this can be mapped into a quadratic unconstrained binary optimization (QUBO) problem and solved on a quantum annealer by either the MI-based clustering method (details are given in Supplementary Section I(B)) or the quantum community detection method (QCD)78 (details are given in Supplementary Section I(C)). Problems with up to 64 variables or nodes can be solved directly on the D-Wave 2000Q quantum annealer. Larger sizes require a quantum-classical approach using qbsolv79 or alternative solvers80 + D-Wave. We have chosen to use quantum graph partitioning/clustering methods since they produce comparable results to classical methods81,82. Classical partitioning/clustering methods could also be used as an alternative.

VQE, qubit-ADAPT-VQE, and iQCC algorithms

Within the VQE algorithm, a parameterized ansatz \(\left|{{\Phi }}\right\rangle =\hat{U}(\theta )\left|{{{\Phi }}}_{r}\right\rangle\) is suggested and encoded on the quantum circuit, where \(\left|{{{\Phi }}}_{r}\right\rangle\) is the reference state. In this work, the mean-field Hartree-Fock (HF) state is used as a reference. The unitary operator \(\hat{U}(\theta )\) introduces the correlations between qubits. Since current quantum gates can only operate on a few qubits at a time, \(\hat{U}(\theta )\) is decomposed as a tensor product of many unitary operators, denoted as entanglers, with each entangler acting on a few qubits, i.e., \(\hat{U}(\theta )=\mathop{\prod }\nolimits_{j}^{{{{\mathcal{D}}}}}{\hat{U}}_{j}\) where \({\hat{U}}_{j}={e}^{i{\theta }_{j}{\hat{A}}_{j}}\) are the entanglers and \({{{\mathcal{D}}}}\) is the number of entanglers in the ansatz. In the widely used UCCSD ansatz, the \(\{{\hat{A}}_{j}\}\) can be readily obtained by mapping the single/double Fermionic excitation operators into qubit-space as shown in Supplementary Section II. With the ansatz, the ground-state (GS) energy of a given Hamiltonian can be obtained by imposing the variational principle within the VQE algorithm,

$${E}_{{{{\rm{GS}}}}}=\mathop{\min }\limits_{\theta }\left\langle {{{\Phi }}}_{r}\right|{\hat{U}}^{{\dagger} }(\theta )\hat{H}\hat{U}(\theta )\left|{{{\Phi }}}_{r}\right\rangle \equiv \mathop{\min }\limits_{\theta }f(\theta ),$$
(2)

where \(\hat{H}={\sum }_{k}{\alpha }_{k}{\hat{P}}_{k}\) qubit form of the electronic Hamiltonian, which is obtained by one of the fermion-to-qubit transformations. αk are the coefficients and \({\hat{P}}_{k}\) are the Pauli words. Hence, the expectation value (or cost function) in Eq. (2) is further decomposed as

$$f(\theta )=\mathop{\sum}\limits_{k}{\alpha }_{k}\left\langle {{{\Phi }}}_{r}\right|{\hat{U}}^{{\dagger} }(\theta ){\hat{P}}_{K}\hat{U}(\theta )\left|{{{\Phi }}}_{r}\right\rangle \equiv \mathop{\sum}\limits_{k}{\alpha }_{k}{f}_{k}(\theta )$$
(3)

and each term (fk(θ)) is measured on a quantum circuit.

However, the contribution of each entangler to the GS energy is different. Thus, the UCCSD ansatz can be made even more compact by the adaptive construction42,43 in the qubit-ADAPT-VQE, at the cost of more VQE calculations. At the n-th iteration of qubit-ADAPT-VQE, the ADAPT ansatz is

$$\left|{{{\Phi }}}_{n}^{{{{\rm{ADAPT}}}}}(\theta )\right\rangle =\mathop{\prod }\limits_{j=1}^{n}{e}^{i{\theta }_{j}{\hat{P}}_{j}}\left|{{{\Phi }}}_{r}\right\rangle .$$
(4)

Where the operators (\({\hat{P}}_{j}\)) are selected from an operator pool \(\{{\hat{A}}_{j}\}\). Within the qubit-ADAPT-VQE algorithm, the sequence of operators is selected based on their contributions to the energy weighted by the gradients with respect to the corresponding parameters42,

$$\frac{\partial E}{\partial {\theta }_{j}}=\left\langle {{{\Phi }}}^{{{{\rm{ADAPT}}}}}(\theta )\right|\left[\hat{H},{\hat{P}}_{j}\right]\left|{{{\Phi }}}^{{{{\rm{ADAPT}}}}}(\theta )\right\rangle .$$
(5)

The gradients can be readily measured on quantum computers. Once the gradients of all operators are measured, the operator with the largest gradient (\(\frac{\partial E}{\partial {\theta }_{j}}\)) is selected into the ansatz and the ansatz keeps growing with each iteration until convergence.

Because \(\left\langle {{{\Phi }}}_{r}\right|{\hat{U}}^{{\dagger} }(\theta )\hat{H}\hat{U}(\theta )\left|{{{\Phi }}}_{r}\right\rangle =\left\langle {{{\Phi }}}_{r}\right|{\hat{H}}_{d}\left|{{{\Phi }}}_{r}\right\rangle\) where \({\hat{H}}_{d}={\hat{U}}^{{\dagger} }(\theta )\hat{H}\hat{U}(\theta )\), the operator can be folded into a dressed Hamiltonian. Thus, instead of growing the ansatz, the iQCC algorithm grows the Hamiltonian68. After each iteration, the operators and corresponding parameters are used to dress the Hamiltonian,

$${\hat{H}}_{j}={e}^{-i{\theta }_{j}{\hat{P}}_{j}}{\hat{H}}_{j-1}{e}^{i{\theta }_{j}{\hat{P}}_{j}}.$$
(6)

where \({\hat{H}}_{j}\) is the dressed Hamiltonian of the j-th iteration.

The ClusterVQE algorithm

According to the qubit clustering introduced in Sec. 3.1, the entanglers used in the ansatz can be grouped into different sets of intracluster and intercluster entanglers,

$$\hat{U}(\theta )=\{{\hat{U}}_{{c}_{1}},\cdots \,,{\hat{U}}_{{c}_{M}},{\hat{U}}_{{c}_{12}},\cdots \,,{\hat{U}}_{{c}_{M-1,M}}\}$$
(7)

where M is the number of clusters. In this way, the ordering of entanglers in \(\hat{U}(\theta )\) is changed and the corresponding parameters can be different as well. The entangler set \(\{{\hat{U}}_{{c}_{i}}\}\) only acts on the corresponding cluster i, and \(\{{\hat{U}}_{{c}_{ij}}\}\) represents the entanglers that couple clusters i and j. Consequently, the cost function in Eq. (2) can be rewritten as

$$\begin{array}{l}f(\theta )=\left\langle {{{\Phi }}}_{r}\right|\mathop{\prod}\limits_{i}{\hat{U}}_{{c}_{i}}^{{\dagger} }\left[\mathop{\prod}\limits_{i\ne j}{\hat{U}}_{{c}_{ij}}^{{\dagger} }\hat{H}{\hat{U}}_{{c}_{ij}}\right]{\hat{U}}_{{c}_{i}}\left|{{{\Phi }}}_{r}\right\rangle \\\qquad =\left\langle {{{\Phi }}}_{r}\right|\mathop{\prod}\limits_{i}{\hat{U}}_{{c}_{i}}^{{\dagger} }{\hat{H}}_{d}{\hat{U}}_{{c}_{i}}\left|{{{\Phi }}}_{r}\right\rangle \end{array}$$
(8)

where \({\hat{H}}_{d}\equiv {\prod }_{i\ne j}{\hat{U}}_{{c}_{ij}}^{{\dagger} }\hat{H}{\hat{U}}_{{c}_{ij}}\) is the dressed Hamiltonian. Because the \({\hat{H}}_{d}\) can be represented in terms of Pauli words where each Pauli word can be trivially rewritten as the tensor product of different clusters, i.e., \({\hat{H}}_{d}={\sum }_{k}{\alpha }_{k}{\hat{P}}_{k}={\sum }_{k}{\alpha }_{k}{\prod }_{i}{\hat{P}}_{k}({c}_{i})\) and the initial state can be decomposed as ∏iΦr(ci). The energy in Eq. (8) can be further rewritten as

$${E}_{{{{\rm{GS}}}}}=\mathop{\min }\limits_{\theta }\mathop{\sum}\limits_{k}{\alpha }_{k}\left[\mathop{\prod}\limits_{i}\left\langle {{{\Phi }}}_{r}({c}_{i})\right|{\hat{U}}_{{c}_{i}}^{{\dagger} }{\hat{P}}_{k}({c}_{i}){\hat{U}}_{{c}_{i}}\left|{{{\Phi }}}_{r}({c}_{i})\right\rangle \right].$$
(9)

Here each expectation value \(\left\langle {{{\Phi }}}_{r}({c}_{i})\right|{\hat{U}}_{{c}_{i}}^{{\dagger} }{\hat{P}}_{k}({c}_{i}){\hat{U}}_{{c}_{i}}\left|{{{\Phi }}}_{r}({c}_{i})\right\rangle\) only involves the qubits in the cluster i. Hence, the energy of a larger problem can be measured on M smaller quantum circuits after Hamiltonian dressing, as shown by the schematic diagram in Fig. 7. In practice, like the qubit-ADAPT-VQE algorithm, ClusterVQE adaptively grows the ansatz in Eq. (8). The ansatz used in the jth iteration is denoted as \({\hat{{{{\mathcal{U}}}}}}_{j}\). However, in contrast to the growing ansatz in qubit-ADAPT-VQE42 and growing Hamiltonian in iQCC44, the ansatz in ClusterVQE is decomposed onto smaller quantum circuits, and only the entanglers between different clusters are used to dress the Hamiltonian. Consequently, the iterative construction breaks the original VQE algorithm with deeper and wider quantum circuits into a series of quantum circuits with a shallower depth and a smaller number of qubits, as illustrated by Fig. 7.

Fig. 7: Finding the lowest eigenvalue of a given qubit Hamiltonian \(\hat{H}={\sum }_{J}{\hat{P}}_{J}\) with VQE and ClusterVQE, where 2N is the number of qubits.
figure 7

VQE uses a quantum circuit defined on 2N qubits. In contrast, ClusterVQE splits the original 2N-qubit circuit into 2 N-qubit circuits by removing the entanglement between them. Here, shorter circuits are achieved at the expense of a dressed Hamiltonian \({\hat{H}}_{d}\).

The entangler \({\hat{U}}_{k}\) chosen in the ansatz \({\hat{{{{\mathcal{U}}}}}}_{j}\) for the jth iteration is determined by its contribution to the energy weighted by the gradient with respect to the parameter, i.e., the \({\hat{U}}_{k}\) terms with largest gradients \(\frac{\partial E}{\partial {\theta }_{k}}\) are selected42, where the gradient is defined as

$$\begin{array}{rcl}\frac{\partial E}{\partial {\theta }_{k}}&=&\frac{\partial }{\partial {\theta }_{k}}\left\langle {{{\Phi }}}_{r}\right|{\hat{U}}_{k}^{{\dagger} }({\theta }_{k}){\hat{H}}_{d,j-1}{\hat{U}}_{k}({\theta }_{k})\left|{{{\Phi }}}_{r}\right\rangle {| }_{{\theta }_{k} = 0}\\ &=&i\left\langle {{{\Phi }}}_{r}\right|{\hat{U}}_{k}^{{\dagger} }({\theta }_{k})[{\hat{H}}_{d,j-1},{\hat{P}}_{k}]{\hat{U}}_{k}({\theta }_{k})\left|{{{\Phi }}}_{r}\right\rangle .\end{array}$$
(10)

Where \({\hat{H}}_{d,j-1}\) is the dressed Hamiltonian of the j − 1 iteration. It should be noted that \({\hat{H}}_{d,j}\) may be the same as the \({\hat{H}}_{d,j-1}\) if the selected entangler of the jth iteration is not used to dress the Hamiltonian. Since the measurement of each gradient term is independent, all the gradients can be measured in parallel with several uncoupled quantum computers42.

The flowchart of the ClusterVQE scheme is presented in Fig. 8. In the first step, the qubit Hamiltonian and operator pool are prepared: (a) Provided classical mean-field calculation of the molecule, map the second quantized Hamiltonian into its qubit counterpart by using either Jordan-Wigner (JW)60, Bravyi-Kitaev (BK)83, or parity48,84 encoding. (b) Generate the UCCSD excitation operators and map them into Pauli words. Here only Pauli words with the different flip indices (Supplementary Section II) are chosen and added to the excitation pool \(\{{\hat{A}}_{i}\}\). In addition, clustering of qubits is performed if MI is provided. Otherwise, the qubits in the first iteration are simply grouped according to their spin index. If the largest gradient in the second step is smaller than the convergence criteria ϵ and the energy minimization criteria are satisfied, the algorithm exits from the iterative loop. The fifth step is only triggered if clustering on-the-fly is desired.

Fig. 8: Flowchart of the ClusterVQE algorithm.
figure 8

The fifth step in the red-dashed box is skipped if MI is provided.

Since some of the entanglers in \({\hat{{{{\mathcal{U}}}}}}_{j}\) are used to dress the Hamiltonian, the circuit depth is smaller than that of qubit-ADAPT-VQE as shown by Table 1. As a result, ClusterVQE is more robust to noise compared to qubit-ADAPT-VQE, as shown and discussed later, making the ClusterVQE algorithm more suitable for NISQ devices than qubit-ADAPT-VQE. Besides, because the clustering minimizes the entanglement between different clusters, the number of dressing entanglers is much smaller than that of iQCC. Consequently, ClusterVQE can balance the computational costs between the Quantum Processing Unit (QPU) and CPU. The exponential scaling in the growing of the Hamiltonian in iQCC is avoided. Most importantly, the circuit depth/width is further reduced compared to qubit-ADAPT-VQE by splitting the measurement into cluster-wise measurements (Eq. (8)). Thus, compared to previous approaches, ClusterVQE reduces the circuit complexity (depth and width) and can solve larger problems on smaller quantum circuits.

Table 1 Circuit depth of each method with the QUCCSD ansatz. The LiH and N2 molecules used in this table are in equilibrium configurations. The intermolecular distance of \({({{{{\rm{H}}}}}_{2})}_{2}\) is 2 Å. The circuit depths of VQE with 1 layer QUCCSD ansatz are 352, 1535, and 331 for LiH, N2, and \({({{{{\rm{H}}}}}_{2})}_{2}\), respectively.

It should be noted that even though the ClusterVQE method can reduce the number of qubits and ansatz complexity (depth and width) by distributing the measurements on smaller quantum circuits, the Hamiltonian size increases exponentially with the number of dressing entanglers. Consequently, the number of measurements is increased according to Eq. (3). Future work will consider measurement reduction techniques69,70,71,72,73,74,75 to reduce the number of measurements required. Among the many measurement reduction techniques, the classical shadow approach is a potential candidate for reducing the number of measurements for ClusterVQE72,85.

Analytical gradients for VQE optimization

In order to improve the algorithm performance in the presence of noise, an analytical gradient is essential for classical optimization. It should be noted that the gradient for VQE optimization is different from the gradients in Eq. (10) which are used to choose entanglers for each iteration. Ref. 86 suggests calculation of the analytical gradients by performing a measurement on an ancillary qubit35,87. An analytical approach has been developed in this work for the analytical gradient measurements but without introducing an ancillary qubit. The gradient of energy with respect to the variational parameter reads as follows.

$$\frac{\partial E}{\partial {\theta }_{i}}=2{{{\rm{Im}}}}\left\{\left\langle {{{\Phi }}}_{r}\right|{\hat{U}}^{{\dagger} }\hat{H}{\hat{P}}_{i}^{{{{\rm{eff}}}}}\hat{U}\left|{{{\Phi }}}_{r}\right\rangle \right\}$$
(11)

where \({\hat{P}}_{i}^{{{{\rm{eff}}}}}\) is the dressed Pauli word as shown in Supplementary Section III. Therefore, the energy gradient can be measured in the same way as the energy measurement by replacing \(\hat{H}\) with \(\hat{H}{\hat{P}}_{j}^{{{{\rm{eff}}}}}\) and no ancillary qubit is required. Moreover, a related approach without using an ancillary qubit was recently proposed as well within the unitary block optimization scheme88. On a noiseless simulator, VQE calculations with numerical gradients or analytical gradients give the same results. However, analytical gradients are much more stable on noisy simulators and quantum computers.