Introduction

The demonstration of scaling speedups1,2 using quantum hardware is the holy grail of quantum computing, and massive efforts are underway worldwide in their pursuit. A daunting obstacle is the fact that all physical implementations of quantum computers suffer from analog control errors, in which the coefficients of the Hamiltonian implemented differ from those intended (e.g., ref.,3) a fact that threatens to spoil the results of computations due to the accumulation of small errors. This problem was recognized early on in the gate model of quantum computing,4 and soon after theoretically dealt with by error discretization via quantum error correcting codes.5 Moreover, the accuracy threshold theorem guarantees that if the physical gates used to implement encoded, error-corrected quantum circuits have sufficiently high fidelity, then any real, noisy quantum computation can be made arbitrarily close to the intended, noiseless computation with modest overhead.6,7,8 In contrast, the ultimate impact of analog control errors in Hamiltonian quantum computing, in particular in adiabatic quantum optimization9 and quantum annealing (QA),10 is not as clear. While unlike the gate-model adiabatic quantum evolution is inherently robust to path variations due to unitary control errors,11 there is as of yet no equivalent mechanism of error discretization or an analogous accuracy threshold theorem in this paradigm. Yet, at the same time QA offers a currently unparalleled opportunity to explore NISQ-era (noisy intermediate-scale quantum-era)12 quantum optimization with thousands of qubits.13,14 It is thus of great importance to assess the role of analog control errors in QA, and to find ways to mitigate them. Here we do so in the context of spin glass problems, which are known to exhibit a type of control error-induced bond or disorder chaos already in a purely classical setting, causing chaotic changes in the ground or equilibrium state.15,16

The extent to which analog control errors present a challenge in using any physical realization of QA, such as the D-Wave processors,17,18,19 for optimization, cannot be overstated. Indeed, an earlier study of such errors in these processors found evidence of sub-classical performance and referred to the effect as \(J\)-chaos,20 a terminology we adopt here. More recently, it was shown that analog control noise causes a decrease in the probability that the implemented Hamiltonian shares a ground state with the intended Hamiltonian that scales exponentially in the size of the problem and the magnitude of the noise.21 This means that even if the annealer solves the implemented problem correctly, it has an exponentially shrinking probability of finding the intended ground state. In other words, subject to \(J\)-chaos an otherwise perfectly functioning quantum annealer will typically find the correct answer to the wrong problem.

To mitigate this “wrong Hamiltonian” problem and restore the prospects for a speedup in the use of QA for optimization, it is necessary to introduce techniques for error suppression and correction. This observation is not new,22,23,24 and repetition coding along with the use of energy penalties has been shown to significantly enhance the performance of quantum annealers.25,26,27,28,29,30 In contrast to previous work, here we directly address the impact of \(J\)-chaos on algorithmic scaling of optimization in experimental QA, while accessing a computational scale that is still far out of reach of current gate-model quantum computing devices. We employ a QA correction method to mitigate the problem, and demonstrate that while the scaling of uncorrected QA in solving random Ising spin glass problems is catastrophically affected by \(J\)-chaos—in that it is worse than even that of a deterministic (brute force) classical solver—hope for a quantum speedup31 is restored with error suppression and correction. This reassuring conclusion is reached here using the simplest possible error suppression and correction scheme,25 so that much room for improvement remains for more advanced methods. We expect our results to apply broadly, certainly beyond the D-Wave devices to other quantum32,33,34 and semiclassical annealing implementations,35,36 and to other forms of analog quantum computing.37

Results

Inspired by classical simulated annealing, in which thermal fluctuations are used to hop over barriers, quantum annealing uses quantum fluctuations to tunnel through barriers.10,38,39,40,41,42,43 The D-Wave processors are physical implementations of such devices.44 In the standard forward annealing protocol they apply a time-dependent transverse field \({H}_{X}={\sum }_{i}{\sigma }_{i}^{x}\) (\({\sigma }_{i}^{a}\) denotes the Pauli matrix of type \(a\in \{x,y,z\}\) applied to qubit \(i\in \{1,\ldots ,N\}\)) and Ising Hamiltonian as follows:

$$H(s)=A(s){H}_{X}+B(s){\tilde{H}}_{{\rm{Ising}}},$$
(1)

where \({\tilde H_{\text{Ising}}} = {H_{\text{Ising}}} + \delta {H_{\text{Ising}}}\) and \({H}_{{\rm{Ising}}}\) are, respectively, the implemented (perturbed) and intended (unperturbed) “problem” Hamiltonians, while \(\delta {H}_{{\rm{Ising}}}\) is an error term (the perturbation). Thus, \({\tilde{H}}_{{\rm{Ising}}}\) is the (wrong) Hamiltonian including analog control errors, while \({H}_{{\rm{Ising}}}\) is the Hamiltonian whose ground state we wish to find as the solution to the optimization problem specified by a set of local fields \(\{{h}_{i}\}\) and couplers \(\{{J}_{ij}\}\):

$${H}_{{\rm{Ising}}}=\mathop {\sum}\limits_{i\in {\mathcal{V}}}{h}_{i}{\sigma }_{i}^{z}+\mathop {\sum}\limits_{(i,j)\in {\mathcal{E}}}{J}_{ij}{\sigma }_{i}^{z}{\sigma }_{j}^{z},$$
(2a)
$$\delta {H}_{{\rm{Ising}}}=\mathop {\sum}\limits_{i\in {\mathcal{V}}}\delta {h}_{i}{\sigma }_{i}^{z}+\mathop {\sum}\limits_{(i,j)\in {\mathcal{E}}}\delta {J}_{ij}{\sigma }_{i}^{z}{\sigma }_{j}^{z},$$
(2b)

where \(\mathcal{V}\) and \(\mathcal{E}\) are the vertex and edge sets of the graph \(\mathcal{G}\), and \(N=| \mathcal{G}|\). We assume that the noise is Gaussian with zero mean and standard deviation \(\eta\):

$$\delta {h}_{i},\delta {J}_{ij} \sim {\cal{N}}(0,{\eta }^{2}).$$
(3)

We note that analog errors resulting in the replacement of \({H}_{X}\) by \({\sum }_{i}(1+{\epsilon }_{i}){\sigma }_{i}^{x}\) with random \({\epsilon }_{i}\) are expected as well, but we do not consider such transverse field errors here, since such errors are not directly programmable on the current generation of D-Wave annealers (unlike \(\delta {h}_{i},\delta {J}_{ij}\)), and also because such errors do not contribute to \(J\)-chaos. The normalized time, \(s=t/{t}_{{\rm{f}}}\), with \({t}_{{\rm{f}}}\) denoting the final time, increases from \(0\) to \(1\), with \(A(0)\gg B(0)\) and \(B(1)\gg A(1)\), and \(A(s)\) [\(B(s)\)] decrease (increase) monotonically. As such, the transverse field initially drives strong quantum fluctuations that eventually give way to the implemented Ising Hamiltonian. In the absence of \(\delta {H}_{{\rm{Ising}}}\) and an environment, the adiabatic theorem guarantees that the ground state of \({H}_{{\rm{Ising}}}\) will be found if \({t}_{{\rm{f}}}\) is large compared to the inverse of the minimum gap of \(H(s)\) and the maximum time derivative(s) of \(H(s)\).45,46 In the presence of an environment, the adiabatic theorem instead guarantees evolution towards the steady state of the corresponding Liouvillian,47,48 which becomes the ground state only for sufficiently low temperature (compared to the gap of the Liouvillian). When \(\delta {H}_{{\text{Ising}}}\,\ne\,0\), the probability that the computation ends in the ground state of \({H}_{{\rm{Ising}}}\) decreases exponentially in both \(N\) and some power of \(\eta\).21 The reason is that as more noise is added, there is an increasing probability that the spectrum changes such that the ground state is swapped with an excited state. Thus, increasing noise and problem size leads to a rapidly growing probability of failure to find the correct ground state. In experimental quantum annealing both the environment and control errors inevitably play a role.

Effect of control noise

In order to systematically test the effect of analog control errors, we studied the performance of two D-Wave devices on random Ising instances of varying size \(N\), to which we added artificially generated Gaussian control noise \(\eta\). This noise was added to the intrinsic analog device noise \({\eta }_{{\rm{int}}}\)49, so that the total control noise had variance \({\eta }_{{\rm{int}}}^{2}+{\eta }^{2}\). Adding noise in this manner allowed us to test its effect on algorithmic performance, and the efficacy of the quantum annealing correction (QAC) strategy described below.

Throughout this work we used Ising instances with local fields \({h}_{i}=0\) and couplers \({J}_{ij}\) selected uniformly at random from the set \(\pm\hskip-2pt \{1/6,1/3,1/2\}\), and chose \(\eta \in \{0,0.03,0.05,0.07,0.10,0.15\}\). Such instances have been studied before,26,31,50,51,52 but only with \(\eta =0\), that is, never subject to the systematic addition of control noise. We define “success” in all cases as finding a ground state of the unperturbed Hamiltonian \({H}_{{\rm{Ising}}}\). See Methods for a complete description of the instances and how we verified ground states.

The number of qubits \(N\) is proportional to \({L}^{2}\), the number of Chimera graph unit cells of the D-Wave devices we used (see Methods). Figure 1 displays a series of correlation plots between different levels of added noise, for different problem sizes parametrized by \(L\). At every size, addition of noise results in a lower success probability for all instances. Increased size also results in lower success probability, as expected. Thus, Fig. 1 gives a visual confirmation of the detrimental effect of control noise; we quantify this systematically below.

Fig. 1
figure 1

Correlation plots of ground state (success) probability Pg with increasing added noise \(\eta\). Results for random Ising instances are shown after addition of coupler noise \(\delta {J}_{ij} \sim {\cal{N}}(0,{\eta }^{2})\). In each panel every data point is Pg for the same instance with different \(\eta\) values. Top left: success probability decreases as we go from the no-added noise case \(\eta =0\) to \(\eta =0.05\). Note that even for \(\eta =0\) there is an intrinsic control error \({\eta }_{{\rm{int}}}\). The same trend persists as we compare \(\eta =0.05\) to \(\eta =0.10\) (top right) and \(\eta =0.10\) to \(\eta =0.15\) (bottom left). The bottom right panel shows the extreme comparison of \(\eta =0\) to \(\eta =0.15\): the ground state is almost never found for \(\eta =0.15\). In all plots success probability generally decreases with increasing size \(L\). Instances with zero success probability are not shown, hence the number of data points drops with increasing \(\eta\). Unless otherwise noted, here and in later plots error bars always denote 95% confidence intervals (CI) obtained via a bootstrap. Details of our data analysis are given in Methods.

Next, we test whether control noise results in \(J\)-chaos. The latter exhibits itself as large variations in the success probability of the programmed Hamiltonian across different runs. We quantify this in terms of the \(J\)-chaoticity measure \(\sigma /\mu\), where \(\sigma\) is the standard deviation of the success probability across repeated runs of a given instance, and \(\mu\) is the corresponding mean (see Methods for more details). In Fig. 2, we plot the correlation of the \(J\)-chaoticity measure with increasing noise. For most instances, this quantity becomes larger with increasing noise and size, which indicates that they are becoming more chaotic. Success probability is also strongly (negatively) correlated with increasing \(J\)-chaoticity, as shown in Fig. 3. This establishes that control noise-induced \(J\)-chaos is responsible for a strong decline in performance. Before we quantify this decline in terms of the time-to-solution metric, we first address how to mitigate this problem using error suppression and correction.

Fig. 2
figure 2

Correlation plots of \(J\)-chaoticity measure \(\sigma /\mu\) with increasing added noise \(\eta\). Details are as in Fig. 1, but here we compare the \(J\)-chaoticity measure of instances with different added noise values. The measure increases as we go from the no-added noise case \(\eta =0\) to \(\eta =0.05\) (top left). The same trend persists as we compare \(\eta =0.05\) to \(\eta =0.10\) (top right) and \(\eta =0.10\) to \(\eta =0.15\) (bottom left). The bottom right panel shows the extreme comparison of \(\eta =0\) to \(\eta =0.15\), where for all instances at all sizes \(J\)-chaoticity is higher to well within the 95% CIs.

Fig. 3
figure 3

Correlation between success probability and \(J\)-chaoticity. We plot the success probability and \(J\)-chaoticity measure with noise \(\eta =0.03\). There is evidently a strong correlation between the two, showing that the success probability is lowered due to \(J\)-chaos. The inset shows the Pearson’s correlation coefficient \(\rho\) between success probability and \(J\)-chaoticity, as a function of added noise, for all \(\eta\) values tested. As \(\eta\) grows, success probability and \(J\)-chaoticity become more negatively correlated.

Quantum annealing correction

The error suppression and correction scheme used in this work is the \({[3,1,3]}_{1}\) QAC code introduced in ref. 25 and further studied experimentally in ref. 26,27,28 We refer the reader to these references for details, and to Methods for a brief summary. The \({[3,1,3]}_{1}\) code is a three-qubit repetition code that corrects bit-flip errors, with the subscript denoting one extra penalty qubit. The penalty term energetically suppresses all errors that do not commute with \({\sigma }^{z}\) during the anneal.

Since the \({[3,1,3]}_{1}\) graph is a minor of the Chimera graph (see Methods Fig. 9), we can also implement these instances without QAC. However, to ensure a fair comparison we need to equalize the resources used with QAC and without it. The \({[3,1,3]}_{1}\) consumes four qubits to encode one logical qubit. Thus, we can use the same amount of resources as the encoded logical problem by running four unencoded copies in parallel, which is called the classical repetition strategy (C). To be clear, the difference between QAC and the C strategy is fourfold: QAC uses logical qubits, logical operators, and an energy penalty term, while C uses physical qubits, physical operators, and no penalty. The decoding strategy for QAC is a majority vote over the three data qubits of each logical qubit, while for C it is best of four copies of the logical problem solved by QAC. A more powerful, nested QAC strategy is known,29 but it requires more physical qubits per logical qubit, and hence is less suitable for a scaling analysis of the type we perform here.

We now discuss the results after the application of QAC and compare them to the C strategy. We note that this problem has also been studied using simulated quantum annealing (SQA);28,29 SQA is a good model for the thermalization of a quantum annealer.40 Figure 4 illustrates that for relatively large problem sizes and strong added control noise, such as at \(L=14\) and \(\eta =0.07\), QAC is able to find nearly all ground states, while the C strategy only finds the ground state of a small fraction of instances.

Fig. 4
figure 4

Sorted success probabilities for \(L=14\) and \(\eta =0.07\), all \(100\) instances. C is able to find the correct ground state for only \(11\) instances. Meanwhile, QAC is able to find the correct ground state for all but \(2\) instances and increases the mean success probability over C.

Fig. 5
figure 5

Fraction of instances where QAC outperforms the C strategy. In a, we plot the fraction of instances where QAC found higher success probability than the C strategy after removing any instances where both failed completely (see Supplementary Information Section B.1). In b, we plot the fraction of instances where QAC lowered the \(J\)-chaoticity measure when compared to the C strategy. QAC becomes better with increasing size. At large noise, QAC becomes the better strategy for all instances. We note that here and the other figures below, we have omitted the \(\eta =0\) results for \(L\ge 13\). The reason is a discontinuity between the DW2X and DW2000Q devices, which is discussed in detail in Supplementary Information Section C, and which is unrelated to the scaling analysis that forms the main focus of this work.

More systematically, we show in Fig. 5(b) the fraction of instances where using QAC improved success probability when compared to the C strategy. If more than half of the instances exhibit better performance for the QAC strategy, applying it is useful for median instances. Evidently, QAC becomes a better strategy for large size and large noise, that is, as finding the ground state becomes harder. Similarly, we compare the \(J\)-chaoticity measure \(\sigma /\mu\) for QAC and C, as seen in Fig. 5(a). Just as in Fig. 5(b), we see greater advantage using QAC over C with increasing size and noise. These observations are consistent with earlier results,25,26,27,28 where the \({[3,1,3]}_{1}\) performs better than the classical repetition strategy at large problem sizes. Here, we have shown that this is also true in the high control noise and \(J\)-chaos regime.

To understand the source of the improvement in the success probability, consider that application of repetition codes can decrease the effective noise on the encoded problem.22 In particular, the encoded operators of an \(n\)-qubit repetition code have an effective energy scale that scales extensively relative to the unencoded problem (see Methods), while the random control noise adds up incoherently and hence its energy contribution only scales up by a factor of \(\sqrt{n}\). Thus, the encoding reduces the effective noise of the problem by the factor \(1/\sqrt{n}\). This enhances the success probability of the encoded problem over the simple C strategy, essentially by leveraging just classical properties of the repetition code. However, there is a quantum mechanism at work as well: a mean-field analysis reveals that the penalty term reduces the tunneling barrier width and height in the QAC case.53,54 Indeed, we shall next see that our empirical results are inconsistent with the constant success probability enhancement that would be expected from a purely classical reduction of the effective noise by the factor \(1/\sqrt{3}\) (we use an \(n=3\) repetition code).

Scaling of the time to solution

We now discuss the impact of the analog noise on the computational effort required to find a solution of the problem. This can be quantified by the time-to-solution (TTS) metric,31 which is the number of runs required to obtain the correct ground state at least once with 99% success probability:

$${\rm{TTS}}={t}_{{\rm{f}}}\left\lceil \frac{\mathrm{ln}(1-0.99)}{{\mathrm{ln}}(1-{P}_{g})}\right\rceil \ ,$$
(4)

where tf is the total anneal time per run and Pg is the probability of finding the ground state in a given run (see Methods for details on how Pg was computed). The TTS metric is often used for benchmarking quantum annealing against classical algorithms (e.g., refs. 55,56,57,58,59) The TTS metric gives accurate scaling with problem size only when tf is optimized to minimize the TTS for each size.13,31 In our experiments, we used a fixed \({t}_{{\rm{f}}}=5\) μs, and hence these results only place a lower bound on the true scaling,55 but this is sufficient for our purposes. Since our anneal time was fixed, we actually report the number of runs \(R={\rm{TTS}}/{t}_{{\rm{f}}}\), which we still refer to as TTS. Note that \(R\ge 1\) as one needs to run the annealer at least once to find the correct ground state.

In Fig. 6, we show the TTS required to find the ground state for the median instances at each size, for both C and QAC, sorted by different levels of added noise \(\eta\). As expected for spin glasses, the TTS scales at least exponentially in \(L\) for both cases, with the scaling becoming worse for larger \(\eta\). However, we note that QAC exhibits both milder scaling and lower absolute effort. The same conclusion holds when we sort instances by hardness (see Supplementary Information Section B.2). To more directly see the advantage of QAC over C, we plot the speedup ratio55 in Fig. 7. This plot clearly shows the scaling advantage of QAC over C for sufficiently large size \(L\) and added noise \(\eta\): for all added noise levels \(\eta\), the slope of the speedup ratio becomes positive beyond an initial transient at small sizes \(L\), and this happens sooner the larger \(\eta\) is.

Fig. 6
figure 6

TTS for QAC and the C strategy, sorted by added noise. We show the TTS to find at least one ground state of the median instance of class \((L,\eta )\). In a, we show the results from the classical repetition strategy. For large instances \(L\ge 10\) and high noise \(\eta =0.15\), the C strategy fails to find any ground state for the median instance in our data set. In b, we show the results for QAC, which is always able to find the ground state of the median instance in our data set. The scaling in b is milder compared to a. For all \(\eta \ >\ 0\), a missing data point indicates that the ground state was never found at that \(L\) value, for example., for all \(L\ge 14\) in a.

Fig. 7
figure 7

Median speedup of QAC over C. We show the ratio of TTS of C and TTS of QAC, which will sit above the dashed line at speedup ratio = 1 for cases when it is advantageous to use QAC. Furthermore, a positive slope indicates a potential scaling speedup of QAC over C, since this occurs when the TTS of C is growing more quickly than of QAC.

Data collapse and scaling: doom vs. hope

We have seen that QAC outperforms the C strategy. But what is the worst-case classical cost of solving the same Ising problem instances? For a generalized Chimera graph of \(L\times L\) unit cells of complete bipartite graphs \({K}_{r,r}\), the tree width is \(w=rL+1\); for the D-Wave devices used here \(r=4\). Dynamic programming takes time \(O({L}^{2}{2}^{w})\) to find the ground state of any Ising problem defined on such a graph.59,60 Here \({2}^{w}\) is the dimension of the exhaustive search space for each of the \({L}^{2}\) tree nodes of width \(w\). Thus, in the present case, any problem can be solved exactly, deterministically, in time TTS\({}_{{\rm{DP}}}=O({L}^{2}{2}^{4L})\). However, adding analog errors exponentially suppresses the probability of success. Specifically, if the errors are drawn from a Gaussian distribution with standard deviation \(\eta\) on an instance with \(N\) spins, then \({P}_{g}=O({e}^{-{\eta }^{\alpha }N})\), where \(\alpha \le 1\) depends on the problem class.21 Thus, to find the ground state of the intended Hamiltonian \({H}_{{\rm{Ising}}}\), running dynamic programming on the Ising instances with noise added is expected to take a time scaling as \({{\rm{TTS}}}_{{\rm{DP}}}\times \left\lceil \frac{{\mathrm{ln}}(1-0.99)}{{\mathrm{ln}}(1-{P}_{g})}\right\rceil\), which reduces, in the limit \({P}_{g}\ll 1\), to:

$${{\rm{TTS}}}_{{\rm{DP}}}/{P}_{g}=O({L}^{2}{2}^{4L}{e}^{{c}_{{\rm{DP}}}{L}^{2}})\ ,\ {c}_{{\rm{DP}}}=8{\eta }^{\alpha },$$
(5)

since the dynamic programming algorithm is only presented the intended Ising instance once every \(1/{P}_{g}\) times on average. Thus, the worst-case classical cost is asymptotically \(O({e}^{8{\eta }^{\alpha }{L}^{2}})\). Note that this scaling is determined not by the intrinsic performance of the DP algorithm, but by the probability Pg that it is presented with the intended Hamiltonian, which is algorithm-independent (and is, of course, not a problem an algorithm running on classical digital computers would need to suffer from). A random guess would find the intended ground state with the same asymptotic scaling: \({{\rm{TTS}}}_{{\rm{rand}}}/{P}_{g}={2}^{N}{e}^{8{\eta }^{\alpha }{L}^{2}}=O({e}^{{c}_{{\rm{rand}}}{L}^{2}})\), but with a larger exponent: \({c}_{{\rm{rand}}}={c}_{{\rm{DP}}}+8\,\mathrm{ln}\,2\).

To compare the D-Wave device’s TTS scaling to this form, we attempted a data collapse of the results shown in Fig. 6, in order to include the \(\eta\) dependence in the scaling function. To achieve the data collapse, we ran a comprehensive search for functions \(f(L,\eta )\) that would collapse both the C and QAC data using as few fitting parameters as possible (see Methods for details of the procedure). A natural choice for such a function is a generalization of Eq. (5) with up to five free parameters, of the form \({\rm{TTS}}=a{L}^{2}1{0}^{bL+c{({\eta }^{2}+{d}^{2})}^{e}{L}^{2}}\). However, we found it to perform poorly (see Supplementary Information Section B.3). Instead, we found that the four-parameter form

$$f(L,\eta )=1{0}^{a{({\eta }^{2}+{b}^{2})}^{c}{L}^{d}},$$
(6)

where the crucial difference is the replacement of \({L}^{2}\) by \({L}^{d}\) in the exponential, which works very well for both the C and QAC data (using three or fewer parameters give poor agreement). The data collapse and fit results are shown in Fig. 8, and the fit parameters along with their \(95 \%\) confidence interval (CI) are given in Table 1. The relatively tight error bounds are evidence of the quality of the data collapse.

Fig. 8
figure 8

Data collapse and fit. Result of the collapse of the data after fitting the TTS results shown in Fig. 6 to \(1{0}^{a{({\eta }^{2}+{b}^{2})}^{c}{L}^{d}}\) [Eq. (6)]. The dashed line is the asymptotic scaling of the classical dynamic programming algorithm, for which \({\mathrm{log}}_{10}({\rm{TTS}}) \sim {L}^{2}\). The red/green region above/below this line is where the C/QAC data lies after the data collapse. These correspond, respectively, to a guaranteed slowdown and a possible (but not guaranteed) speedup. The blue and red solid are the fits derived from the data collapse, with parameters given in Table 1. The shaded regions around the fitted lines represent the \(95 \%\) CI fits as described in Supplementary Information Section B.3.

Table 1 Fit parameters for Eq. (6) after data collapse of the TTS scaling data shown in Fig. 6.

Surprisingly, we find that \(d\ >\ 2\,\) for the C strategy, with high statistical confidence. This means that without error suppression, and even after using a majority vote among four copies of the problem, the performance of the quantum annealer is worse than that of a deterministic worst-case classical algorithm, for which \(d=2\). Hence the “doom” advertised in the title of this work. Of course, a powerful heuristic solver such as SQA40 would outperform DP on these optimization problems, and hence the C strategy would perform even worse in comparison to SQA, which can be considered a classical algorithm that simulates the equilibration of the quantum annealer.

Fortunately, not all is lost: this disturbing finding is mitigated by QAC. As seen in Fig. 8 and Table 1, for QAC we obtain d < 2, again with high statistical confidence. This result restores the hope that a quantum annealer can eventually become competitive with classical optimization algorithms, but only after the incorporation of an error suppression and correction strategy such as QAC. We stress that we make no claim that QAC will outperform a sophisticated classical algorithm such as SQA, only that it gives quantum annealing a fighting chance.

Discussion

It should be remarked that our results on optimization have no direct bearing on other tasks that quantum annealers are potentially capable of speeding up, such as approximate optimization61,62 and sampling.63,64,65,66,67 Nor do our results address quantum annealing slowdowns due to small gaps,68,69,70,71 which may be addressed via other methods, such as non-stoquastic Hamiltonians,72,73 reverse annealing,74,75,76 or inhomogeneous transverse field driving.77,78 However, none of these methods is immune to the effects of \(J\)-chaos.

To conclude, we have shown that QAC can reduce the detrimental effects of \(J\)-chaos on the performance of quantum annealers. In the regime we tested, QAC becomes more effective the higher the noise is and the larger the problem size is. The improvements seen are distinctly greater than without error suppression and correction, even after equalizing resources in terms of total qubit count, in terms of both scaling and absolute effort. Moreover, QAC undoes a catastrophic loss to an exhaustive classical algorithm by improving the scaling of the annealer’s TTS to below the classical upper bound. Thus, we have demonstrated that QAC is not only an effective tool that can be used to improve current quantum annealing hardware, but that error suppression and correction are essential to ensure competitive performance against classical alternatives. Further improvements using more powerful error suppression and correction strategies than the simple one we explored here are certainly expected, and undoubtedly necessary, as ultimately only a fully fault-tolerant approach is expected to be effective in the asymptotic limit of large problem sizes.

Methods

Random Ising instances

Without noise

The set of instances used were generated randomly on the \({[3,1,3]}_{1}\) graph minors of the \(L\times L\) Chimera graph for \(L\in \{2,\ldots ,12\}\) on the D-Wave 2X (DW2X) and \(L\in \{13,\ldots ,16\}\) on the D-Wave 2000Q (DW2000Q). There were \(100\) instances at each graph size such that the local fields, \({h}_{i}\), were \(0\) and the couplings \({J}_{ij}\) were drawn uniformly at random from the set \(\pm\hskip -2pt \frac{1}{6}\times \{1,2,3\}\). We found the ground state energy of these logical instances via the Hamze–Freitas–Selby (HFS) algorithm60,79 and parallel tempering with iso-energetic cluster (Houdayer) moves (PTICM).80,81 By using both, we consistently found ground state energies lower than or just as low as those found by the D-Wave devices. In a few instances, the latter found lower energies than HFS (one instance at \(L=15\) and five at \(L=16\)), and these were confirmed as correct using PTICM. As such, we are confident that the ground state energies found were in fact correct.

With noise

We generated random numbers \(\delta {J}_{ij} \sim {\cal{N}}(0,{\eta }^{2})\). If a modified coupler value \({\tilde{J}}_{ij}={J}_{ij}+\delta {J}_{ij}\) fell outside the experimentally allowed range \([-1,1]\), we truncated it to \(\pm 1\). Since the largest coupling in our set was \(0.5\) and the largest noise had a standard deviation of \(\eta =0.15\), these truncated values were only used a handful of times in our entire data collection.

The D-Wave devices used

In this study, we used the DW2X annealer installed at the USC Information Sciences Institute and the DW2000Q annealer installed at the NASA Quantum Artificial Intelligence Laboratory. The qubits of the annealer occupy the vertices of the Chimera graph of 12 × 12 size for the DW2X and 16 × 16 for the DW2000Q (see Fig. 10 in Supplementary Information Section A). The DW2X has \(1098\) functional qubits, leading to a \({[3,1,3]}_{1}\) graph with \(236\) functional logical qubits. The DW2000Q has \(2031\) functional qubits, leading to a \({[3,1,3]}_{1}\) graph with \(504\) functional logical qubits, as shown in Fig. 9.

Fig. 9
figure 9

The \({[3,1,3]}_{1}\) and the resulting logical graphs of the DW2X and DW2000Q devices. a A logical qubit of the \({[3,1,3]}_{1}\) is formed by connecting three qubits in one-half of the unit cell to a single qubit on the other half of the unit cell. The three qubits are called the “data qubits” and the single qubit plays the role of a “penalty qubit.” The data qubits are tied together with the penalty qubit by ferromagnetic couplings to ensure consistent evolution of each of the three data qubits. Each unit cell now contains two logical qubits. b, c Each logical qubit connects to its neighbor in the same cell via the intra-cell couplings and connects to its horizontal or vertical neighbor via the inter-cell couplings. This construction gives rise to a graph of degree \(3\) which is a minor embedding of the Chimera graph. b is the logical graph for the DW2X, c for the DW2000Q. In both graphs, green (red) circle denote operational (inactive) logical qubits. Orange circles denote logical qubits with intact data qubits, but an inactive penalty qubit. The lines denote the active couplings between the operational logical qubits. For each problem size \(L\), we used a subgraph formed by taking an \(L\times L\) square starting from the top left corner.

The annealing time tf can be chosen in the range [5, 2000] μs on the DW2X and [1, 2000] μs on the DW2000Q. We used tf = 5 μs, since this is the fastest time that could be used across both devices.

There were some differences between the structure and performance of these two devices. A discussion of these differences can be found in Supplementary Information Section C.

Data analysis

We used a Bayesian bootstrap82 over the underlying data (collected as described above) to compute the mean \(\mu\) and the standard deviation \(\sigma\) of the success probabilities and their associated error bars. The ground state probability Pg used in Eq. (4) is the same as \(\mu\).

Consider \(g\) gauges where for each gauge \(i\) we find \({s}_{i}\) successful readouts out of the total \(M\) readouts. We used the β-function \(\beta ({s}_{i},M-{s}_{i})\) as our posterior probability distribution of success, that is, it is our best guess of the distribution of the success probability, given the observation of \({s}_{i}\) successful hits in \(M\) attempts. To draw one sample of our bootstrap distribution, we did the following:

  • First, sample from each of the \(\beta\)-distributions. Let \({B}_{i}\) be a sample from the distribution \(\beta ({s}_{i},M-{s}_{i})\) and let \(\overrightarrow{B}=\{{B}_{1},{B}_{2},\ldots ,{B}_{g}\}\).

  • Then, sample a point from the \(g\)-dimensional uniform Dirichlet distribution. This is a \(g\)-dimensional vector \(\overrightarrow{D}\).

  • The estimate of the success probability for this bootstrap sample is given by

    $$\mu =\overrightarrow{D}\cdot \overrightarrow{B}=\sum _{i=1}^{g}{D}_{i}{B}_{i}$$
    (7)

    and the estimate of the standard deviation of this sample is the square root of the weighted variance of \(\overrightarrow{B}\) where the weights are given by \(\overrightarrow{D}\),

    $${\sigma }^{2}=\sum _{i=1}^{g}{D}_{i}{({B}_{i}-\mu )}^{2}.$$
    (8)

    In Eqs. (7) and (8), we have used the fact that the samples of the uniform Dirichlet distribution sum to \(1\): \({\sum }_{i}{D}_{i}=1\).

  • From these two quantities, we computed \(\sigma /\mu\). Other quantities of interest can be similarly derived from a combination of \(\overrightarrow{B}\) and \(\overrightarrow{D}\).

We repeated these steps a large number of times to obtain a bootstrap distribution over our quantity of interest (in this case, \(\mu\) and \(\sigma /\mu\)). Our best estimate of the quantity and its associated error bars are given by the mean and the spread of the bootstrap distribution, respectively.

Data for C was collected as follows. For each instance and each noise value, we ran the annealer with five random gauges83,84 with 10,000 readouts each. If \(p\) is the success probability of one unencoded copy and each copy is statistically independent, then the C strategy will have at least one successful copy with probability \(1-{(1-p)}^{4}\). Here, we only collected data for a single copy, and then used this combinatorial formula to get an estimate on the success probability of the classical repetition case, as in earlier work.26 In fact, due to crosstalk this provides an upper bound on the actual performance of the C strategy (see Supplementary Information Section C.2), so that our results favor QAC over C even more than our plots indicate.

Data for QAC (see below) was collected as follows. Every data collection run for problem instance \(i\) of size \(L\) can be labeled by two additional parameters; the strength of artificial injected noise \(\eta\) and the strength of the penalty value \(\gamma\). The penalty strength was chosen from the set \(\gamma \in \{0.1,0.2,\ldots ,0.5\}\). For each data collection run \((i,L,\eta ,\gamma )\), we ran the annealer with five random gauges with 10,000 readouts each, for a total of 50,000 readouts. From these data we estimated the mean success probability over the gauges, \({P}_{g}(i,L,\eta ,\gamma )\). The optimal penalty value

$${\gamma }_{{\rm{opt}}}=\mathop{\rm{arg}\,\rm{max}}\limits_{\gamma}\,{P}_{g}(i,L,\eta ,\gamma )$$
(9)

maximizes the success probability within the chosen range of penalty values. The results shown in the Results section were picked to be at this optimal value for each instance. Histograms of the optimal strengths for each problem size are shown in Supplementary Information Section D.

Quantum annealing correction

In QAC we encode each logical Pauli-Z operator as a sum of \(n\) such physical operators, that is,

$$\overline{{\sigma }_{i}^{z}}=\sum _{l=1}^{n}{\sigma }_{{i}_{l}}^{z},\qquad \overline{{\sigma }_{i}^{z}{\sigma }_{j}^{z}}=\sum _{l=1}^{n}{\sigma }_{{i}_{l}}^{z}{\sigma }_{{j}_{l}}^{z}.$$
(10)

Furthermore, we add a term to ferromagnetically couple the physical copies through an auxiliary qubit, that is,

$${H}_{\mathrm{P}}=-\sum _{i=1}^{N}{\sigma }_{{i}_{p}}^{z}\overline{{\sigma }_{i}^{z}},$$
(11)

where \(N\) is the number of logical variables in the original optimization problem. We refer to the physical copies, \({\sigma }_{{i}_{l}}^{z}\), as data qubits and the auxiliary qubit, \({\sigma }_{{i}_{\mathrm{p}}}^{z}\), as a penalty qubit. Thus, we arrive at the following encoding of our logical problem:

$$\overline{H}(s)=A(s){H}_{X}+B(s)(\alpha {\overline{H}}_{{\rm{Ising}}}+\gamma {H}_{\mathrm{P}}),$$
(12)

where \({\overline{H}}_{{\rm{Ising}}}\) is the encoded version of \({H}_{{\rm{Ising}}}\) in Eq. (2a), that is, \({\sigma }_{i}^{z}\,\mapsto\, \overline{{\sigma }_{i}^{z}}\) and \({\sigma }_{i}^{z}{\sigma }_{j}^{z}\,\mapsto\, \overline{{\sigma }_{i}^{z}{\sigma }_{j}^{z}}\), and \(\alpha\) is an overall energy scale for the problem Hamiltonian (not used in this work, but complementary to adding control noise.29) When we add control noise to the QAC Hamiltonian, we replace \({h}_{i}\) by \({\tilde{h}}_{i}={h}_{i}+\delta {h}_{i}\) and \({J}_{ij}\) by \({\tilde{J}}_{ij}={J}_{ij}+\delta {J}_{ij}\). The perturbations \(\delta {h}_{i}\) and \(\delta {J}_{ij}\) are chosen independently for each qubit \(i\) and coupler \((i,j)\) from the distribution specified in Eq. (3).

The current generations of D-Wave devices allow a direct implementation of this code in the Ising Hamiltonian for \(n=3\), as shown in Fig. 9, but are unable to encode the driver Hamiltonian \({H}_{X}\), as this requires many-body terms of the form \({({\sigma }^{x})}^{\otimes n}\). Thus, increasing the penalty strength, \(\gamma\), begins to diminish the effect of the quantum fluctuations that drive quantum annealing. On the other hand, larger \(\gamma\) values are more able to suppress bit-flip errors. Thus, there exists an optimal value of \(\gamma\), which depends on the spectrum of the problem instance.25,26,27,28

After annealing, we obtain a state vector where each data qubit is measured in the computational basis. From this, we can obtain a state vector of logical qubits via a variety of decoding strategies.28 In this work, we exclusively used the strategy in which each logical qubit is decoded by a majority vote of its constituent data qubits.

Data collapse

Here we explain our procedure for identifying the optimal fit and data collapse function, and for extracting CI’s and error bars. We considered trial TTS functions of the form \(f(L,\eta )=1{0}^{{g}_{i}(L,\eta )}\), with:

$${g}_{1}=a{({\eta }^{2}+{b}^{2})}^{c}{L}^{d},{g}_{2}={g}_{1}+{\mathrm{log}}_{10}(e),$$
(13a)
$${g}_{3}=aL+c{({\eta }^{2}+{b}^{2})}^{{d}_{1}}{L}^{{d}_{2}}+{\mathrm{log}}_{10}(e{L}^{2}).$$
(13b)

For \({g}_{3}\) we focused on the three cases \(\{{d}_{1}=1/2,{d}_{2}=2\}\), \(\{{d}_{1}=d,{d}_{2}=2\}\), and \(\{{d}_{1}=1/2,{d}_{2}=d\}\). Thus, our trial functions had either four (\(\{a,b,c,d\}\)) or five (\(\{a,b,c,d,e\}\)) free fitting parameters. For each trial function we computed non-linear least-squares fits to the median TTS data for C on \(L\in \{2,\ldots ,12\}\) and QAC on \(L\in \{2,\ldots ,16\}\). The fitting parameters were initially allowed to take any values. However, we only accepted fits with \(a\ge 0\) in \({g}_{3}\), since \(a\ <\ 0\) (scaling that decreases with \(L\)) would have to reflect overfitting. Thus, we also computed fits where we squared all the fitting parameters [i.e., replaced \(a\) by \({a}^{2}\) in Eqs. (13a) and (13b), etc.] in order to enforce positivity. Furthermore, we tested if the discrepancy between the ideal and actual number of Chimera graph couplers made a difference by fitting with an effective \(L\); see Supplementary Information Section C for details. Thus, for each trial function there were four different methods: unconstrained/squared fitting parameters with \(L\)/effective \(L\). Lastly, all fits were attempted with each of the optimization methods possible in Mathematica: SimulatedAnnealing, RandomSearch, NelderMead, and DifferentialEvolution.

Across the different methods and optimization algorithms used, \({g}_{1}\) was consistently the best of the \(4\)-parameter fits and was always very close to \({g}_{2}\), which is its \(5\)-parameter generalization. The \({g}_{3}\) functions always resulted either in \(a\ <\ 0\) or otherwise a very poor fit. Parameter squaring also improved the fit quality, and of the optimization methods only NelderMead tended to give inferior results.

After determining the three parameters \(\{a,b,c\}\) for the median TTS data for \({g}_{1}\), we found least-squares fits to the upper and lower bounds determined by the \(95 \%\) CIs for the median TTS, by using the same set \(\{a,b,c\}\) and letting only \(d\) be a free parameter. In this manner we found \({d}_{-}\) and \({d}_{+}\), the exponents that provide respective lower and upper bounds on \(d\) for the median TTS data. In turn, \({d}_{-}\) and \({d}_{+}\) have associated \(95 \%\) CIs, denoted \(\Delta {d}_{-}\) and \(\Delta {d}_{+}\). The reported range of \(d\) in Table 1 is then \([{d}_{-}-\Delta {d}_{-},{d}_{+}+\Delta {d}_{+}]\). The resulting fits for each \(\eta\) are shown in Supplementary Information Section B.3.