Introduction

Recent advances in qubit numbers1,2,3,4, as well as operational5,6,7,8,9,10,11,12,13, and measurement14,15,16 fidelities have enabled leading quantum computing platforms, such as superconducting and trapped-ion processors, to target demonstrations of quantum error correction (QEC)17,18,19,20,21,22,23 and quantum advantage2,24,25,26. In particular, two-dimensional stabilizer codes, such as the surface code, are a promising approach23,27 towards achieving quantum fault tolerance and, ultimately, large-scale quantum computation28. One of the central assumptions of textbook QEC is that any error can be decomposed into a set of Pauli errors that act within the computational space of the qubit. In practice, many qubits such as weakly-anharmonic transmons, as well as hyperfine-level trapped ions, are many-level systems which function as qubits by restricting the interactions with the other excited states. Due to imprecise control12,29,30 or the explicit use of non-computational states for operations5,6,9,11,31,32,33,34,35, there exists a finite probability for information to leak from the computational subspace. Thus, leakage constitutes an error that falls outside of the domain of the qubit stabilizer formalism. Furthermore, leakage can last over many QEC cycles, despite having a finite duration set by the relaxation time36. Hence, leakage represents a menacing error source in the context of quantum error correction17,36,37,38,39,40,41,42,43, despite leakage probabilities per operation being smaller than readout, control or decoherence error probabilities6,8,9,44.

The presence of leakage errors has motivated investigations of its effect on the code performance and of strategies to mitigate it. A number of previous studies have focused on a stochastic depolarizing model of leakage38,40,41,42,43, allowing to explore large-distance surface codes and the reduction of the code threshold using simulations. These models, however, do not capture the full details of leakage, even though a specific adaptation has been used in the case of trapped-ion qubits41,42,43. Complementary studies have considered a physically realistic leakage model for transmons36,39, which was only applied to a small parity-check unit due to the computational cost of many-qutrit density-matrix simulations. In either case, leakage was found to have a strong impact on the performance of the code, resulting in the propagation of errors, in the increase of the logical error rate and in a reduction of the effective code distance. In order to mitigate these effects, there have been proposals for the introduction of leakage reduction units (LRUs)37,39,40,45 beyond the natural relaxation channel, for modifications to the decoding algorithms17,38,40, as well as for the use of different codes altogether42. Many of these approaches rely on the detection of leakage or introduce an overhead in the execution of the code. Recently, the indirect detection of leakage in a 3-qubit parity-check experiment20 was realized via a Hidden Markov Model (HMM), allowing for subsequent mitigation via post-selection. Given that current experimental platforms are within reach of quantum-memory demonstrations, detailed simulations employing realistic leakage models are vital for a comprehensive understanding of the effect of leakage on the code performance, as well as for the development of a strategy to detect leakage without additional overhead.

In this work we demonstrate the use of computationally efficient HMMs to detect leakage in a transmon implementation of the distance-3 surface code (Surface-17)46. Using full-density-matrix simulations27 (The quantumsim package can be found at https://quantumsim.gitlab.io/) we first show that repeated stabilizer measurements sharply project data qubits into the leakage subspace, justifying the use of classical HMMs with only two hidden states (computational or leaked) for leakage detection. We observe a considerable increase in the surface-code defect probability of neighboring stabilizers while a data or ancilla qubit is leaked, a clear signal that may be detected by the HMMs. For ancilla qubits, we further consider the information available in the analog measurement outcomes, even when the leaked state \(\left|2\right\rangle\) can be discriminated from the computational states \(\left|0\right\rangle\) and \(\left|1\right\rangle\) with limited fidelity. We demonstrate that a set of two-state HMMs, one HMM for each qubit, can accurately detect both the time and the location of a leakage event in the surface code. By post-selecting on the detected leakage, we restore the logical performance of Surface-17 below the memory break-even point, while discarding less than half of the data for the given error-model parameters. Finally, we outline a minimal set of conditions for our leakage-detection scheme to apply to other quantum-computing platforms. Although post-selection is not scalable due to an exponential overhead in the number of required experiments, these results open the prospect for near-term demonstrations of fault tolerance even in the presence of leakage. Furthermore, HMM-based leakage detection enables the possibility of scalable leakage-aware decoding17,40 and real-time targeted application of LRUs37,39,40.

Results

Leakage error model

We develop an error model for leakage in superconducting transmons, for which two-qubit gates constitute the dominant source of leakage5,6,9,11,12,29,30,31,32,33,34, while single-qubit gates have negligible leakage probabilities8,44. We thus focus on the former, while the latter is assumed to induce no leakage at all. We assume that single-qubit gates act on a leaked state as the identity. Measurement-induced leakage is also assumed to be negligible.

We use full-trajectory simulations to characterize leakage in the Net-Zero implementation9 of the controlled-phase gate (CZ), considered as the native two-qubit gate in a transmon-based Surface-17, with experimentally targeted parameters (see Table 1 and Supplementary Table 1). This gate uses a flux pulse such that the higher frequency qubit (Qflux) is fluxed down from its sweetspot frequency \({\omega }_{\max }\) to the vicinity of the interaction frequency ωint = ωstat − α, where ωstat is the frequency of the other qubit (Qstat), which remains static, and α is the transmon anharmonicity. The inset in Fig. 1a shows a schematic diagram of the frequency excursion taken by Qflux. A (bipolar) 30 ns pulse tunes twice the qubit on resonance with the \(\left|11\right\rangle \leftrightarrow \left|02\right\rangle\) avoided crossing, corresponding to the interaction frequency ωint. This pulse is followed by a pair of 10 ns single-qubit phase-correction pulses. The relevant crossings around ωint are shown in Fig. 1a and are all taken into account in the full-trajectory simulations. The two-qubit interactions give rise to population exchanges towards and within the leakage subspace and to the phases acquired during gates with leaked qubits, which we model as follows.

Table 1 The parameters for the qubit decoherence times and for the gate, measurement and QEC-cycle durations used in the density-matrix simulations.
Fig. 1: CZ error model for two transmon qubits.
figure 1

Schematic of the relevant interactions and the CZ error model for two transmons, a higher frequency one Qflux and a lower frequency one Qstat. The inset of a shows the frequency excursion taken by Qflux from its sweetspot frequency \({\omega }_{\max }\) to the interaction frequency ωint, corresponding to the \(\left|11\right\rangle \leftrightarrow \left|02\right\rangle\) avoided crossing, followed by weaker single-qubit phase-correction pulses. During this excursion, the frequency ωstat of Qstat remains static at \({\omega }_{{\rm{stat}}}=\,{\omega }_{{\rm{int}}}-\left|\alpha \right|\), where α is the anharmonicity. a Sketch of all the considered avoided crossings, with the two-qubit system energy E on the vertical axis versus the frequency ωflux of Qflux on the horizontal axis. b The parametrized CZ error model. An ideal CZ is constructed with the two-qubit phase ϕ11 and the single-qubit phases ϕ01 and ϕ10. It is followed by single-qubit rotations with phases \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\), conditioned on the other transmon being leaked, and by the SWAP-like exchanges with leakage probability L1 and leakage-mobility probability Lm (see section “Leakage error model” for precise definitions). Relaxation and decoherence, indicated by the orange arrows, are taken into account as well.

The model in Fig. 1b considers a general CZ rotation, characterized by the two-qubit phase ϕ11 for state \(\left|11\right\rangle\) and ϕ = 0 for the other three computational states. The single-qubit relative phases ϕ01 and ϕ10 result from imperfections in the phase corrections. The conditional phase is defined as ϕCZ = ϕ11 − ϕ01 − ϕ10 + ϕ00, which for an ideal CZ is ϕCZ = π. In this work, we assume ϕ00 = ϕ01 = ϕ10 = 0 and ϕCZ = ϕ11 = π. We set ϕ02 = − ϕ11 in the rotating frame of the qutrit, as it holds for flux-based gates35.

Interactions between leaked and non-leaked qubits lead to extra phases, which we call leakage conditional phases. We consider first the interaction between a leaked Qflux and a non-leaked Qstat. In this case the gate restricted to the \(\left\{\left|02\right\rangle ,\left|12\right\rangle \right\}\) subspace has the effect \({\mathrm{diag}}\left({e}^{{\rm{i}}{\phi }_{{\rm{02}}}},{e}^{{\rm{i}}{\phi }_{{\rm{12}}}}\right)\), which up to a global phase corresponds to a Z rotation on Qstat with an angle given by the leakage conditional phase \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}:= {\phi }_{{\rm{02}}}-{\phi }_{{\rm{12}}}\). Similarly, if Qstat is leaked, then Qflux acquires a leakage conditional phase \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}:= {\phi }_{{\rm{20}}}-{\phi }_{{\rm{21}}}\). These rotations are generally non-trivial, i.e., \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\,\ne \,\pi\) and \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\,\ne \,0\), due to the interactions in the 3-excitation manifold which cause a shift in the energy of \(\left|12\right\rangle\) and \(\left|21\right\rangle\) (see section “Second-order leakage effects” of Supplementary Methods). If the only interaction leading to non-trivial \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\), \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\) is the interaction between \(\left|12\right\rangle\) and \(\left|21\right\rangle\), then it can be expected that ϕ12 = −ϕ21 in the rotating frame of the qutrit, leading to \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}=\pi -{\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\).

Leakage is modeled as an exchange between \(\left|11\right\rangle\) and \(\left|02\right\rangle\), i.e., \(\left|11\right\rangle\, \mapsto \,\sqrt{1-4{L}_{1}}\left|11\right\rangle +{e}^{{\rm{i}}\phi }\sqrt{4{L}_{1}}\left|02\right\rangle\) and \(\left|02\right\rangle\, \mapsto \,-\!{e}^{-{\rm{i}}\phi }\sqrt{4{L}_{1}}\left|11\right\rangle +\sqrt{1-4{L}_{1}}\left|02\right\rangle\), with L1 the leakage probability47. We observe that the phase ϕ and the off-diagonal elements \(\left|11\right\rangle \left\langle 02\right|\) and \(\left|02\right\rangle \left\langle 11\right|\) do not affect the results presented in this work, so we set them to 0 for computational efficiency (see section “Error model and parameters”). The SWAP-like exchange between \(\left|12\right\rangle\) and \(\left|21\right\rangle\) with probability Lm, which we call leakage mobility, as well as the possibility of further leaking to \(\left|3\right\rangle\), are analyzed in Supplementary Fig. 1 and in section “Second-order leakage effects” of Supplementary Methods.

The described operations are implemented as instantaneous in the quantumsim package (introduced in “The quantumsim package can be found at https://quantumsim.gitlab.io/”), while the amplitude and phase damping experienced by the transmon during the application of the gate are symmetrically introduced around them, indicated by light-orange arrows in Fig. 1b. The dark-orange arrows indicate the increased dephasing rate of Qflux far away from \({\omega }_{\max }\) during the Net-Zero pulse. The error parameters considered in this work are summarized in section “Error model and parameters”. In particular, unless otherwise stated, L1 is set to 0.125% and \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) are randomized for each qubit pair across different batches consisting of 2 × 104 or 4 × 104 runs of 20 or 50 QEC cycles, respectively. This choice is motivated by our expectation that these phases are determined by the frequencies and anharmonicities of the two transmons, as well as by the parameterization of the flux pulse implementing each CZ between the pair, which is fixed when tuning the gate experimentally. Since \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) have not been characterized in experiment, we instead choose to randomized them in order to capture an average behavior.

Some potential errors are found to be small from the full-trajectory simulations of the CZ gate and thus are not included in the parametrized error model. The population exchange between \(\left|01\right\rangle \leftrightarrow \left|10\right\rangle\), with coupling J1, is suppressed (<0.5%) since this avoided crossing is off-resonant by one anharmonicity α with respect to ωint. While \(\left|12\right\rangle \leftrightarrow \left|21\right\rangle\) is also off-resonant by α, the coupling between these two levels is stronger by a factor of 2, hence potentially leading to a larger population exchange (see section “Second-order leakage effects” of Supplementary Methods). The \(\left|11\right\rangle \leftrightarrow \left|20\right\rangle\) crossing is 2α away from ωint and it thus does not give any substantial phase accumulation or population exchange (<0.1%). We have compared the average gate fidelity of CZ gates simulated with the two methods and found differences below ±0.1%, demonstrating the accuracy of the parametrized model.

Effect of leakage on the code performance

We implement density-matrix simulations (The quantumsim package can be found at https://quantumsim.gitlab.io/) to study the effect of leakage in Surface-17 (Fig. 2). We follow the frequency arrangement and operation scheduling proposed in ref. 46, which employs three qubit frequencies for the surface-code lattice, arranged as shown in Fig. 2a. The CZ gates are performed between the high-mid and mid-low qubit pairs, with the higher frequency qubit of the pair taking the role of Qflux (see Fig. 1). Based on the leakage model in section “Leakage error model”, only the high and mid frequency qubits are prone to leakage (assuming no leakage mobility). Thus, in the simulation those qubits are included as three-level systems, while the low-frequency ones are kept as qubits. Ancilla-qubit measurements are modeled as projective in the \(\left\{\left|0\right\rangle ,\left|1\right\rangle ,\left|2\right\rangle \right\}\) basis and ancilla qubits are not reset between QEC cycles. As a consequence, given the ancilla-qubit measurement \(m\left[n\right]\) at QEC cycle n, the syndrome is given by \(m\left[n\right]\oplus m\left[n-1\right]\) and the surface-code defect \(d\left[n\right]\) by \(d\left[n\right]=m\left[n\right]\oplus m\left[n-2\right]\). For the computation of the syndrome and defect bits we assume that a measurement outcome \(m\left[n\right]=2\) is declared as \(m\left[n\right]=1\). The occurrence of an error is signaled by \(d\left[n\right]=1\). To pair defects we use a minimum-weight perfect-matching (MWPM) decoder, whose weights are trained on simulated data without leakage27,48 and we benchmark its logical performance in the presence of leakage errors. The logical qubit is initialized in \({\left|0\right\rangle }_{{\rm{L}}}\) and the logical fidelity is calculated at each QEC cycle, from which the logical error rate εL can be extracted27.

Fig. 2: The effect of leakage on the performance of Surface-17.
figure 2

a Schematic overview of the Surface-17 layout46. Pink (resp. red) circles with D labels represent low-frequency (high-) frequency data qubits, while blue (resp. green) circles with X (Z) labels represent ancilla qubits of intermediate frequency, performing X-type (Z-type) parity checks. b Dependence of the logical error rate εL on the leakage probability L1 for a MWPM decoder (green) and for the decoding upper bound (red). The black solid line shows the physical error rate of a single transmon qubit. The dashed line corresponds to the recently achieved L1 in experiment9. Logical error rate εL for MWPM (c) and upper bound (d) as a function of the leakage conditional phases \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) (for L1 = 0.5%). Here, these phases are not randomized but fixed to the given values across all runs. The logical error rates are extracted from an exponential fit of the logical fidelity over 20 QEC cycles and averaged over 5 batches of 2 × 104 runs for b and one batch of 2 × 104 runs for c, d Error bars correspond to 2 standard deviations estimated by bootstrapping (not included in b due the error bars being smaller than the symbol size).

Figure 2b shows that the logical error rate εL is sharply pushed above the memory break-even point by leakage. We compare the MWPM decoder to the decoding upper bound (UB), which uses the complete density-matrix information to infer a logical error. A strong increase in εL is observed for this decoder as well. Furthermore, the logical error rate has a dependence on the leakage conditional phases for both decoders, as shown in Fig. 2c, d. While not included in these simulations, we do not expect the inclusion of leakage mobility or the possibility of further leaking to \(\left|3\right\rangle\) to have a considerable effect on the logical performance (see section “Effects of leakage mobility and superleakage on leakage detection and code performance” of Supplementary Methods).

Projection and signatures of leakage

We now characterize leakage in Surface-17 and the effect that a leaked qubit has on its neighboring qubits. From the density matrix (DM), we extract the probability \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)={\mathbb{P}}(Q\in {\mathcal{L}})=\left\langle 2| {\rho }_{Q}| 2\right\rangle\) of a qubit Q being in the leakage subspace \({\mathcal{L}}\) at the end of a QEC cycle, after the ancilla-qubit measurements, where ρQ is the reduced density matrix of Q.

In the case of data-qubit leakage, \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) sharply rises to values near unity, where it remains for a finite number of QEC cycles (on average 16 QEC cycles for the considered parameters, given in Table 1). We refer to this sharp increase of \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) as projection of leakage. An example showing this projective behavior (in the case of qubit D4), as observed from the density-matrix simulations, is reported in Fig. 3a. This is the typical behavior of leakage, as shown in Fig. 3b by the bi-modal density distribution of the probabilities \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) for all the high-frequency data qubits Q. As data-qubit leakage is associated with defects on the neighboring ancilla qubits (due to the use of the \(\left|02\right\rangle \leftrightarrow \left|11\right\rangle\) crossing by the CZ gates) and with the further propagation of defects in the following QEC cycles (as shown below), we attribute the observed projection to a back-action effect of the repetitive stabilizer measurements (see Supplementary Fig. 2 and section “Projection of data-qubit leakage by stabilizer-measurement back-action” of Supplementary Methods). Given this projective behavior, we identify individual events by introducing a threshold \({p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)\), above which a qubit is considered as leaked. Here we focus on leakage on D4, sketched in Fig. 3c. Based on a threshold \({p}_{{\rm{th}}}^{{\mathcal{L}}}\left({D}_{{\rm{4}}}\right)=0.5\), we select leakage events and extract the average dynamics shown in Fig. 3d. Leakage grows over roughly 3 QEC cycles following a logistic function, reaching a maximum probability of approximately 0.8. We observe this behavior for all three high-frequency data qubits D3, D4, D5. Each of the high-frequency data qubits equilibrates towards a steady-state population (extracted by averaging \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) over all runs without selecting individual events) after many QEC cycles (see Supplementary Fig. 3 and section “Leakage steady state in the surface code” of Supplementary Methods).

Fig. 3: Projection and signatures of qubit leakage.
figure 3

ab Projection of data-qubit leakage. a Example realization of a data-qubit leakage event, extracted from the density-matrix simulations. b Density histogram of all data-qubit leakage probabilities over 20 bins, extracted over 4 × 104 runs of 50 QEC cycles each. c–e Signatures of data-qubit leakage. c Sketch of how leakage on a data qubit, e.g., D4, alters the interactions with neighboring stabilizers, leading to their anti-commutation (see section “Leakage-induced anti-commutation” of Supplementary Methods). d The average projection of the leakage probability \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\) of D4 over all events, where this probability is first below and then above a threshold of \({p}_{{\rm{th}}}^{{\mathcal{L}}}=0.5\) for at least 5 and 8 QEC cycles, respectively. e The average number of defects on the neighboring stabilizers of D4 over the selected rounds, showing a jump when leakage rises above \({p}_{{\rm{th}}}^{{\mathcal{L}}}\). fh Signatures of ancilla-qubit leakage. f Sketch of how leakage on an ancilla qubit, e.g., Z1, effectively disables the stabilizer check and probabilistically introduces errors on the neighboring data qubits. g We select realizations where Z1 was in the computational subspace for at least 5 QEC cycles, after which it was projected into \(\left|2\right\rangle\) by the readout and remained in that state for at least 5 QEC cycles. h The corresponding defect rate on neighboring stabilizers during the period of leakage. The error bars, which were estimated by bootstrapping, are smaller than the symbol sizes.

We observe an increase in the defect probability of the neighboring ancilla qubits during data-qubit leakage. We extract the probability pd of observing a defect d = 1 on the neighboring stabilizers during the selected data-qubit leakage events, as shown in Fig. 3e. As \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left({D}_{{\rm{4}}}\right)\) reaches its maximum, pd goes to a constant value of approximately 0.5. This can be explained by data-qubit leakage reducing the stabilizer checks it is involved in to effective weight-3 anti-commuting checks, illustrated in Fig. 3c and as observed in ref. 20. This anti-commutation leads to some of the increase in εL for the MWPM and UB decoders in Fig. 2b. Furthermore, we attribute the observed sharp projection of leakage (see Fig. 3d) to a back-action effect of the measurements of the neighboring stabilizers, whose outcomes are nearly randomized when the qubit is leaked (see sections “Leakage-induced anti-commutation” and “Projection of data-qubit leakage by stabilizer-measurement back-action” of Supplementary Methods). The weight-3 checks can also be interpreted as gauge operators, whose pairwise product results in weight-6 stabilizer checks, which can be used for decoding49,50,51,52, effectively reducing the code distance from 3 to 2.

We also find a local increase in the defect probability during ancilla-qubit leakage. For ancilla qubits, \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\) is defined as the leakage probability after the ancilla projection during measurement. Since in the simulations ancilla qubits are fully projected, \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)=0,1\) for an ancilla qubit Q, allowing to directly obtain the individual leakage events, as shown in Fig. 3g. We note that an ancilla qubit remains leaked for 17 QEC cycles on average for the considered parameters (given in Table 1). We extract pd during the selected events, as shown in Fig. 3h. In the QEC cycle after the ancilla qubit leaks, pd abruptly rises to a high constant value. We attribute this to the Z rotations acquired by the neighboring data qubits during interactions with the leaked ancilla qubit, as sketched in Fig. 3f and described in section “Leakage error model”. The angle of rotation is determined by \({\phi }_{{\rm{flus}}}^{{\mathcal{L}}}\) or \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\), depending on whether the leaked ancilla qubit takes the roles of Qstat or Qflux, respectively (see section “Simulation protocol” for the scheduling of operations). In the case of Z-type parity checks, these phase errors are detected by the X-type stabilizers. In the case of X-type checks, the phase errors on data qubits are converted to bit-flip errors by the Hadamard gates applied on the data qubits, making them detectable by the Z-type stabilizers. Furthermore, while the ancilla qubit is leaked, the corresponding stabilizer measurement does not detect any errors on the neighboring data qubits, effectively disabling the stabilizer, as sketched in Fig. 3f. This, combined with the spread of errors, defines the signature of ancilla-qubit leakage and explains part of the observed increase in εL for the MWPM and UB decoders in Fig. 2b.

For both data and ancilla qubits, a leakage event is correlated with a local increase in the defect rate, albeit due to different mechanisms. However, interpreting the spread of defects as signatures of leakage suggests the possible inversion of the problem, allowing for effective leakage detection.

Hidden Markov models

We use a set of HMMs, one HMM for each leakage-prone qubit, to detect leakage. This approach is similar to what was recently demonstrated in a 3-qubit parity-check experiment20, but we use simpler HMMs to make them computationally efficient. In general, an HMM (see Fig. 4 and section “HMM formalism”) models the time evolution of a discrete set of hidden states, the transitions between which are assumed to be Markovian. At each time step a set of observable bits is generated with state-dependent emission probabilities. Depending on the observed outcomes, the HMM performs a Bayesian update of the predicted probability distribution over the hidden states.

Fig. 4: Schematic representation of an HMM for leakage detection.
figure 4

For both ancilla and data qubits only two hidden states are considered, corresponding to the qubit being either in the computational (teal) or leakage subspace (orange). Transitions between these states occur each QEC cycle, depending on the leakage and seepage probabilities. The state-dependent observables are the defects \(d\left(Q\right)\) on the neighboring stabilizers. For ancilla qubits, the in-phase component Im of the analog measurement is also used as an observable.

We apply the concept of HMMs to leakage inference and outline their applicability for an accurate, scalable and run-time executable leakage-detection scheme. This is made possible by two observations. The first is that both data-qubit and ancilla-qubit leakage are sharply projected (see section “Projection and signatures of leakage”) to high \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\). This justifies the use of classical HMMs with only two hidden states, corresponding to the qubit being in the computational or leakage subspace.

The second observation is the sharp local increase in pd associated with leakage (see section “Projection and signatures of leakage”), which we identify as the signature of leakage. This allows us to consider only the defects on the neighboring stabilizers as relevant observables and to neglect correlations between pairs of defects associated with qubit errors. In the case of ancilla-qubit leakage, in addition to the defects, we consider the state information obtained from the analog measurement as input to the HMMs. Each transmon is dispersively coupled to a dedicated readout resonator. The state-dependent shift in the single-shot readout produces an output voltage signal, with in-phase and quadrature components (see section “Transmon measurements in experiment” of Supplementary Methods).

The transition probabilities between the two hidden states are determined by the leakage and seepage probabilities per QEC cycle, which are, to lowest order, a function only of the leakage probability L1 per CZ gate and of the relaxation time T1 (see section “HMM formalism”). We extract the state-dependent emission probabilities from simulation. When a qubit is not leaked, the probability of observing a defect on each of the neighboring stabilizers is determined by regular errors. When a data qubit is leaked, the defect probability is fixed to a nearly constant value by the stabilizer anti-commutation, while when an ancilla qubit is leaked, this probability depends on \({\phi }_{{\rm{flus}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\). Furthermore, the analog measurement outcome can be used to extract a probability of the transmon being in \(\left|0\right\rangle ,\,\left|1\right\rangle\), or \(\left|2\right\rangle\) using a calibrated measurement (see sections “Ancilla-qubit leakage detection” and “Transmon measurements in experiment” of Supplementary Methods).

Data-qubit leakage detection

We assess the ability of the data-qubit HMMs to accurately detect both the time and the location of a leakage event. We recall that these HMMs take the defects on neighboring stabilizers as input. The average temporal response \({p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left(Q\right)\) of the HMMs to an event is shown in Fig. 5 and compared to the leakage probabilities \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) extracted from the density-matrix simulation. Events are selected when crossing a threshold \({p}_{{\rm{th}}}^{{\mathcal{L}}}\), as described in section “Projection and signatures of leakage”, and the response is averaged over these events. For the data-qubit HMMs, the response \({p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left(Q\right)\) closely follows the probability \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) from the density matrix, reaching the same maximum leakage probability but with a smaller logistic growth rate. This slightly slower response is expected to translate to an average delay of about 1 QEC cycles in the detection of leakage.

Fig. 5: Data-qubit leakage detection.
figure 5

a Average response in time of the HMMs (diamonds) to leakage, in comparison to the actual leakage probability extracted from the density-matrix simulations (dashed lines). The average is computed by selecting single realizations where \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\) was below a threshold \({p}_{{\rm{th}}}^{{\mathcal{L}}}=0.5\) for at least 5 QEC cycles and then above it for 5 or more rounds. Error bars, estimated by bootstrapping, are smaller than the symbol sizes. b Precision-recall curves for the data qubits over 4 × 104 runs of 50 QEC cycles each using the HMM predictions (solid) and the leakage probability from the density matrix (dashed). The dotted line corresponds to a random guess classifier for which \({\mathcal{P}}\) is equal to the fraction of leakage events (occurring with probability given by the density matrix) over all QEC cycles and runs.

We now explore the leakage-detection capability of the HMMs. The precision \({\mathcal{P}}\) of an HMM tracking leakage on a qubit Q is defined as

$${\mathcal{P}}_{{\rm{HMM}}}\left({Q}\right)={\mathbb{P}}\left({Q}\in {\mathcal{L}}| {p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left({Q}\right)\,>\,{p}_{{\rm{th}}}^{{\mathcal{L}}}\left({Q}\right)\right)$$
(1)

and can be computed as

$${{\mathcal{P}}}_{{\rm{HMM}}}\left(Q\right)=\frac{{\sum }_{i}{p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q,i\right)\theta \left[{p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left(Q,i\right)-{p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)\right]}{{\sum }_{i}\theta \left[{p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left(Q,i\right)-{p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)\right]},$$
(2)

where i runs over all runs and QEC cycles and θ is the Heaviside step function. The precision is then the fraction of correctly identified leakage events (occurring with probability given by the density matrix), over all of the HMM detections of leakage. The recall \({\mathcal{R}}\) of an HMM for a qubit Q is defined as

$${{\mathcal{R}}}_{{\rm{HMM}}}\left({Q}\right)={\mathbb{P}}\left({p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left({Q}\right)\,>\,{p}_{{\rm{th}}}^{{\mathcal{L}}}\left({Q}\right)| {Q}\in {\mathcal{L}}\right),$$
(3)

and can be computed as

$${{\mathcal{R}}}_{{\rm{HMM}}}\left(Q\right)=\frac{{\sum }_{i}{p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q,i\right)\theta \left[{p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left(Q,i\right)-{p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)\right]}{{\sum }_{i}{p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q,i\right)}.$$
(4)

The recall is the fraction of detected leakage by the HMM over all leakage events (occurring with probability given by the density matrix). The precision-recall (PR) of an HMM (see Fig. 5b) is a parametric curve obtained by sweeping \({p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)\) and plotting the value of \({\mathcal{P}}\) and \({\mathcal{R}}\). Since the PR curve is constructed from \({p}_{{\rm{HMM}}}^{{\mathcal{L}}}\left(Q\right)\) over all QEC cycles and runs, it quantifies the detection ability in both time and space. The detection ability of an HMM manifests itself as a shift of the PR curve towards higher values of \({\mathcal{P}}\) and \({\mathcal{R}}\) simultaneously. We define the optimality \({\mathcal{O}}\left(Q\right)\) of the HMM corresponding to qubit Q as

$${\mathcal{O}}\left(Q\right)={{\rm{AUC}}}_{{\rm{HMM}}}\left(Q\right)/{{\rm{AUC}}}_{{\rm{DM}}}\left(Q\right),$$
(5)

where \({{\rm{AUC}}}_{{\rm{HMM}}}\left(Q\right)\) is the area under the PR curve of the HMM and \({{\rm{AUC}}}_{{\rm{DM}}}\left(Q\right)\) is the area for the optimal model that predicts leakage with probability \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left(Q\right)\), achieving the best possible \({{\mathcal{P}}}_{{\rm{DM}}}\) and \({{\mathcal{R}}}_{{\rm{DM}}}\). An average optimality of \({\mathcal{O}}\left(Q\right)\approx 67.0 \%\) is extracted for the data-qubit HMMs. Given the few QEC-cycle delay in the data-qubit HMM response to leakage, the main limitation to the observed HMM optimality \({\mathcal{O}}\left(Q\right)\) is false detection when a neighboring qubit is leaked (see Supplementary Fig. 4 and section “HMM error budget” of Supplementary Methods).

Ancilla-qubit leakage detection

We now assess the ability of the ancilla-qubit HMMs to accurately detect both the time and the location of a leakage event. These HMMs take as observables the defects on the neighboring stabilizers at each QEC cycle, as well as the analog measurement outcome of the ancilla qubit itself.

We first consider the case when the HMMs rely only on the increase in the defect probability pd and show their PR curves in Fig. 6a, b. Given that projective measurements are used in the simulations, \({{\rm{AUC}}}_{{\rm{DM}}}\left(Q\right)=1\) for ancilla qubits. Bulk ancilla qubits have a moderate \({\mathcal{O}}\left(Q\right)\approx 47 \%\), while boundary ancilla qubits possess nearly no ability to detect leakage. We attribute this to the boundary ancilla qubits having only a single neighboring stabilizer, compared to bulk ancilla qubits having 3 in Surface-17. The HMMs corresponding to pairs of same-type (X or Z) bulk ancilla qubits exhibit visibly different PR curves (see Fig. 6a, b), despite the apparent symmetry of Surface-17. This symmetry is broken by the use of high-frequency and low-frequency transmons as data qubits, leading to differences in the order in which an ancilla qubit interacts with its neighboring data qubits (see ref. 46 and Fig. 8), together with the fact that CZs with L1 ≠ 0 do not commute in general. In addition to a low \({\mathcal{O}}\left(Q\right)\), the errors propagated by the leaked ancilla qubits (and hence the signatures of ancilla-qubit leakage) depend on \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\) (randomized in the simulations). The values of these phases generally lead to different pd than the ones parameterizing the HMM. The latter is extracted based on the average pd observed over the runs (see section “HMM formalism”). In the worst-case (for leakage detection), these phases can lead to no errors being propagated onto the neighboring data qubits, resulting in the undetectability of leakage. The mismatch between the fluctuating pd (over \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}\)) and the average value hinders the ability of the ancilla-qubit HMMs to detect leakage. Even if these phases were individually controllable, tuning them to maximize the detection capability of the HMMs would also lead to an undesirable increase in εL of a (leakage-unaware) decoder (see Fig. 2).

Fig. 6: Ancilla-qubit leakage detection.
figure 6

ad Precision-recall curves for the ancilla-qubit HMMs over 4 × 104 runs of 50 QEC cycles each. In a, b the HMMs rely only on the observed defects on the neighboring stabilizers. In cf the HMMs further get the in-phase component Im of the analog readout as input, from which \({p}_{m}^{{\mathcal{L}}}\) is extracted. The dotted line corresponds to a random guess classifier for which \({\mathcal{P}}\) is equal to the fraction of leakage events over all QEC cycles and runs. As ancilla-qubit leakage is directly measured, \({{\mathcal{P}}}_{{\rm{DM}}}=1\) for all values of \({\mathcal{R}}\) (not shown). Insets in c, d: the HMM optimality \({\mathcal{O}}\) as a function of the discrimination fidelity \({F}^{{\mathcal{L}}}\) between \(\left|1\right\rangle\) and \(\left|2\right\rangle\). The corresponding error bars (extracted over 2 × 104 runs of 20 QEC cycles each) are smaller than the symbol size. The vertical dashed line corresponds to the experimentally measured \({F}^{{\mathcal{L}}}=88.4 \%\). e, f Average response in time of the ancilla-qubit HMMs (diamonds) to leakage, in comparison to the actual leakage probability extracted directly from the readout (dashed), extracted over 4 × 104 runs of 50 QEC cycles each. The average is computed by selecting single realizations where the qubit was in the computational subspace for at least 3 QEC cycles and then in the leakage subspace for 5 or more.

To alleviate these issues, we consider the state-dependent information obtained from the analog measurement outcome. The discrimination fidelity between \(\left|1\right\rangle\) and \(\left|2\right\rangle\) is defined as

$${F}^{{\mathcal{L}}}=1-\frac{{\mathbb{P}}\left({{1}| {2}}\right)+{\mathbb{P}}\left({{2}| {1}}\right)}{2},$$
(6)

where \({\mathbb{P}}\left({{i}| {j}}\right)\) is the conditional probability of declaring the measurement outcome i given that the qubit has been prepared in state \(\left|j\right\rangle\), assuming that no excitation or relaxation occur during the measurement (accounted for in post-processing). Here, we assume that \({\mathbb{P}}\left({{0}| {2}}\right)={\mathbb{P}}\left({{2}| {0}}\right)={0}\), as observed in experiment (see Supplementary Fig. 5). We focus on the discrimination fidelity as in our simulations relaxation is already accounted for in the measurement outcomes (see section “Error model and parameters”). We extract \({F}^{{\mathcal{L}}}\) from recent experimental data20, where the readout pulse was only optimized to discriminate between the computational states. By taking the in-phase component of the analog measurement, for each state \(\left|j\right\rangle\) a Gaussian distribution \({{\mathcal{N}}}_{j}\) is obtained, from which we get \({F}^{{\mathcal{L}}}=88.4 \%\) (see section “Transmon measurements in experiment” of Supplementary Methods).

In order to emulate the analog measurement in simulation, given an ancilla-qubit measurement outcome \(m\in \left\{0,1,2\right\}\), we sample the in-phase response Im from the corresponding distribution \({{\mathcal{N}}}_{m}\). The probability of the ancilla qubit being leaked given Im is computed as

$${p}_{m}^{{\mathcal{L}}}=\frac{{{\mathcal{N}}}_{2}\left({I}_{m}\right)}{{\sum }_{j\in \{0,1,2\}}{{\mathcal{N}}}_{j}\left({I}_{m}\right)}.$$
(7)

The ancilla-qubit HMMs use the sampled responses Im, in combination with the observed defects, to detect leakage.

The PR curves of the HMMs using the analog readout are shown in Fig. 6c, d, from which an average \({\mathcal{O}}\left(Q\right)\approx 97 \%\) can be extracted for the ancilla-qubit HMMs. The temporal responses of the HMMs to leakage are compared to the leakage probabilities extracted from measurement in Fig. 6e, f, showing a relatively sharp response to a leakage event, with an expected delay in the detection of at most 2 QEC cycles. While \({F}^{{\mathcal{L}}}=88.4 \%\) might suggest an even sharper response, this is not the case as the HMM update depends on both the prior \({p}_{{\rm{HMM}}}^{{\mathcal{L}}}\) (which is low when the qubit is not leaked) and on the likelihood of the sampled Im together with the observed defects on the neighboring ancilla qubits (see section “HMM formalism”). While the initial response is not immediately high, given a (not too) low threshold, corresponding to a high \({\mathcal{R}}\), the HMMs still achieve a high \({\mathcal{P}}\), leading to the high \({\mathcal{O}}\) observed (see Fig. 6c, d). A further benefit of the inclusion of the analog-measurement information is that the detection capability of the HMMs is now largely insensitive to the fluctuations in \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{flus}}}^{{\mathcal{L}}}\).

We explore \({\mathcal{O}}\left(Q\right)\) as a function of \({F}^{{\mathcal{L}}}\), as shown in the inset of Fig. 6c, d. To do this, we model \({{\mathcal{N}}}_{j}\) for each state as symmetric and having the same standard deviation, in which case \({F}^{{\mathcal{L}}}\) is a function of their signal-to-noise ratio only (see section “Transmon measurements in experiment” of Supplementary Methods). At low \({F}^{{\mathcal{L}}}\)\(\left(\lesssim 60 \% \right)\) the detection of leakage is possible but limited, especially for the boundary ancilla qubits. On the other hand, even at moderate values of \({F}^{{\mathcal{L}}}\)\(\left(\approx 80 \% \right)\), corresponding to experimentally achievable values, ancilla-qubit leakage can be accurately identified for both bulk and boundary ancilla qubits. Furthermore, relying solely on the analog measurements would allow for the potential minimization of the error spread associated with ancilla-qubit leakage, given controllability over \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{flus}}}^{{\mathcal{L}}}\), without compromising the capability of the HMMs to detect leakage. In section “An alternative scheme for enhancing ancilla-qubit leakage detection” of Supplementary Methods we explore an alternative scheme for increasing the performance of the ancilla-qubit HMMs without using the analog measurements, which comes at the cost of a lower optimality for data-qubit HMMs.

Improving code performance via post-selection

We use the detection of leakage to reduce the logical error rate εL via post-selection on leakage detection, with the selection criterion defined as

$$\mathop{\max }\limits_{Q,n}{p}^{{\mathcal{L}}}\left(Q,n\right)\ge {p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right).$$
(8)

We thus post-select any run for which the leakage probability of any qubit exceeds the defined threshold in any of the QEC cycles. We note that post-selection is not scalable for larger-scale QEC, due to an exponential overhead in the number of required experiments, however, it can be useful for a relatively small code such as Surface-17. Furthermore, note that, while the criterion above is insensitive to overestimation of the leakage probability due to a leaked neighboring qubit (see section “HMM error budget” of Supplementary Methods), it is sensitive to the correct detection of leakage in the first place and to the HMM response in time (especially for short-lived leakage events).

We perform the multi-objective optimization

$$\begin{array}{rcl}\mathop{\min }\limits_{{p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)}&&\left(f,{\varepsilon }_{{\rm{L}}}\right),\\ \,{\text{subject}}\, {\text{to}}\,&&0.02\le {p}_{{\rm{th}}}^{{\mathcal{L}}}\left(Q\right)\le 1,\end{array}$$

where f is the fraction of discarded data. The inequality constraint on the feasible space is helpful for the fitting procedure, required to estimate εL. This optimization uses an evolutionary algorithm (NGSA-II), suitable for conflicting objectives, for which the outcome is the set of lowest possible εL for a given f. This set is known as the Pareto front and is shown in Fig. 7 for both the MWPM and UB decoders. In Fig. 7 we also compare post-selection based on the HMMs against post-selection based on the density-matrix simulation. Here we use the predictions of the HMMs which include the analog measurement outcome with the experimentally extracted \({F}^{{\mathcal{L}}}\) (see section “Ancilla-qubit leakage detection”). The observed agreement between the two post-selection methods proves that the performance gain is due to discarding runs with leakage instead of runs with only regular errors. The performance of the MWPM decoder is restored below the quantum memory break-even point by discarding f ≈ 28%. The logical error rates extracted from simulations without leakage are achieved by post-selection of f ≈ 44% of the data for both the MWPM and UB decoders, when leakage is included.

Fig. 7: Restoring code performance by post-selecting on leakage detection.
figure 7

Improvement in the logical error rate εL via post-selecting on the detection of leakage for a MWPM decoder (green) and the decoder upper bound (red). The post-selection is based on the probabilities predicted by the HMMs (solid) or on those extracted from the density-matrix simulation (dashed), for 2 × 104 runs of 20 QEC cycles each. The physical error rate of a single transmon qubit under decoherence is also given (solid black). Detection of leakage allows for the restoration of the performance of the MWPM decoder, reaching the memory break-even point by discarding about ≈28% of the data. The logical error rates obtained from simulations without leakage (and without post-selection) are indicated by diamonds.

Discussion

We have investigated the effects of leakage and its detectability using density-matrix simulations of a transmon-based implementation of Surface-17. Data and ancilla qubits tend to be sharply projected onto the leakage subspace, either indirectly by a back-action effect of stabilizer measurements for data qubits or by the measurement itself for ancilla qubits. During leakage, a large, but local, increase in the defect rate of neighboring qubits is observed. For data qubits we attribute this to the anti-commutation of the involved stabilizer checks, while for ancilla qubits we find that it is due to an interaction-dependent spread of errors to the neighboring qubits. We have developed a low-cost and scalable approach based on HMMs, which use the observed signatures together with the analog measurements of the ancilla qubits to accurately detect the time and location of leakage events. The HMM predictions are used to post-select out leakage, allowing for the restoration of the performance of the logical qubit below the memory break-even point by discarding less than half of the data (for such a relatively small code and for the given noise parameters), opening the prospect of near-term QEC demonstrations even in the absence of a dedicated leakage-reduction mechanism.

A few noise sources have not been included in the simulations. First, we have not included readout-declaration errors, corresponding to the declared measurement outcome being different from the state in which the ancilla qubit is projected by the measurement itself. These errors are expected to have an effect on the performance of the MWPM decoder, as well as a small effect on the observed optimality of the HMMs. We have also ignored any crosstalk effects, such as residual couplings, cross-driving or dephasing induced by measurements on other qubits. While the presence of these crosstalk mechanisms is expected to increase the error rate of the code, it is not expected to affect the HMM leakage-detection capability. We have assumed measurements to be perfectly projective. However, for small deviations, we do not expect a significant effect on the projection of leakage and on the observation of the characteristic signatures.

We now discuss the applicability of HMMs to other quantum-computing platforms subject to leakage and determine a set of conditions under which leakage can be efficiently detected. First, we assume single-qubit and two-qubit gates to have low leakage probabilities, otherwise QEC would not be possible in general. In this way, single-qubit and two-qubit leakage probabilities can be treated as perturbations to block-diagonal gates, with one block for the computational subspace \({\mathcal{C}}\) and one for the leakage subspace \({\mathcal{L}}\). We focus on the gates used in the surface code, i.e., CZ and Hadamard H (or RY(π/2) rotations or equivalent gates). We consider data-qubit leakage first. We have observed that it is made detectable by the leakage-induced anti-commutation of neighboring stabilizers. The only condition ensuring this anti-commutation is that H acts as the identity in \({\mathcal{L}}\) or that it commutes with the action of CZ within the leakage block (see section “Leakage-induced anti-commutation” of Supplementary Methods), regardless of the specifics of such action. Thus, data-qubit leakage is detectable via HMMs if this condition is satisfied. In particular, it is automatically satisfied if \({\mathcal{L}}\) is 1-dimensional. We now consider ancilla-qubit leakage. Clearly, ancilla-qubit leakage detection is possible if the readout discriminates computational and leakage states perfectly or with high fidelity. If this is not the case, the required condition is that leaked ancilla qubits spread errors according to non-trivial leakage conditional phases, constituting signatures that can be used by an HMM. If even a limited-fidelity readout is available, it can still be used to strengthen this signal, as demonstrated in section “Ancilla-qubit leakage detection”. An issue is the possibility of the readout to project onto a superposition of computational and leakage subspaces. In that case, the significance of ancilla-qubit leakage is even unclear. However, for non-trivial leakage conditional phases, we expect a projection effect to the leakage subspace by a back-action of the stabilizer measurements, due to leakage-induced errors being detected onto other qubits, similarly to what observed for data qubits.

The capability to detect the time and location of a leakage event demonstrated by the HMMs could be used in conjunction with leakage-reductions units (LRUs)37. These are of fundamental importance for fault tolerance in the presence of leakage, since in ref. 40 a threshold for the surface code was not found if dedicated LRUs are not used to reduce the leakage lifetime beyond the one set by the relaxation time. While the latter constitutes a natural LRU by itself, we do not expect it to ensure a threshold since, together with a reduction in the leakage lifetime, it leads to an increase in the regular errors due to relaxation. A few options for LRUs in superconducting qubits are the swap scheme introduced in ref. 36, or the use of the readout resonator to reset a leaked data-qubit into the computational subspace, similarly to refs. 53,54. An alternative is to use the \(\left|02\right\rangle \leftrightarrow \left|11\right\rangle\) crossing to realize a “leakage-reversal” gate that exchanges the leakage population in \(\left|02\right\rangle\) to \(\left|11\right\rangle\). An even simpler gate would be a single-qubit π pulse targeting the \(\left|1\right\rangle \leftrightarrow \left|2\right\rangle\) transition. All these schemes introduce a considerable overhead either in hardware (swap, readout resonator), or time (swap, readout resonator, leakage-reversal gate), or they produce leakage when they are applied in the absence of it (leakage-reversal gate, π pulse). Thus, all these schemes would benefit from the accurate identification of leakage, allowing for their targeted application, reducing the average circuit depth and minimizing the probability of inadvertently inducing leakage. We also note that the swap scheme, in conjunction with a good discrimination fidelity for \(\left|2\right\rangle\), could be used for detecting leakage not only on ancilla qubits but also on data qubits by alternatively measuring them. Still, this scheme would require 5 extra qubits for Surface-17 and would make the QEC-cycle time at least ~ 50% longer, together with more gate and idling errors, thus requiring much better physical error rates to achieve the same logical error rate in near-term experiments.

We discuss how decoders might benefit from the detection of leakage. Modifications to MWPM decoders have been developed for the case when ancilla-qubit leakage is directly measured17,40, and when data-qubit leakage is measured in the LRU circuits40. Further decoder modifications might be developed to achieve a lower logical error rate relative to a leakage-unaware decoder, by taking into account the detected leakage and the probability of leakage-induced errors, as well as the stabilizer information that can still be extracted from the superchecks (see section “Leakage-induced anti-commutation” of Supplementary Methods). In the latter case, a decoder could switch back and forth from standard surface-code decoding to e.g., the partial subsystem-code decoding in refs. 49,50,51. Given control of the leakage conditional phases, the performance of this decoder can be optimized by setting \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}=\pi\) and \({\phi }_{{\rm{flux}}}^{{\mathcal{L}}}=0\), minimizing the spread of phase errors on the neighboring data qubits by a leaked ancilla qubit, as well as the noise on the weight-6 stabilizer extraction in the case of a leaked data qubit (see Supplementary Fig. 6 and section “Leakage-induced anti-commutation” of Supplementary Methods). Given a moderate discrimination fidelity of the leaked state, this is not expected to compromise the detectability of leakage, as discussed in section “Ancilla-qubit leakage detection”. At the same time, for such a decoder we expect the improvement in the logical error rate to be limited in the case of low-distance codes such as Surface-17, as single-qubit errors can result in a logical error. This is because leakage effectively reduces the code distance, either because a leaked data qubit is effectively removed from the code, or because of the fact that a leaked ancilla qubit is effectively disabled and in addition spreads errors onto neighboring data qubits. Large codes, for which leakage could be well tolerated (depending on the distribution of leakage events), cannot be studied with density-matrix simulations, as done in this work for Surface-17. However, the observed sharp projection of leakage and the probabilistic spread of errors justify the stochastic treatment of this error40. Under the assumption that amplitude and phase damping can be modeled stochastically as well, we expect that the performance of decoders and LRUs in large surface codes can be well approximated in the presence of leakage.

Methods

Simulation protocol

For the Surface-17 simulations we use the open-source density-matrix simulation package quantumsim27, available at “The quantumsim package can be found at https://quantumsim.gitlab.io/”. For decoding we use a MWPM decoder27, for which the weights of the possible error pairings are extracted from Surface-17 simulations via adaptive estimation48 without leakage (L1 = 0) and an otherwise identical error model (described in section “Error model and parameters”).

The logical performance of the surface code as a quantum memory is the ability to maintain a logical state over a number of QEC cycles. We focus on the Z-basis logical \({\left|0\right\rangle }_{{\rm{L}}}\), but we have observed nearly identical performance for \({\left|1\right\rangle }_{{\rm{L}}}\). We have not performed simulations for the X-basis logical states \({\left|\pm \right\rangle }_{{\rm{L}}}=\frac{1}{\sqrt{2}}\left({\left|0\right\rangle }_{{\rm{L}}}\pm {\left|1\right\rangle }_{{\rm{L}}}\right)\), as previous studies did not observe a significant difference between the two bases27. The state \({\left|0\right\rangle }_{{\rm{L}}}\) is prepared by initializing all data qubits in \(\left|0\right\rangle\), after which it is maintained for a fixed number of QEC cycles (maximum 20 or 50 in this work), with the quantum circuit given in Fig. 8. The first QEC cycle projects the logical qubit into a simultaneous eigenstate of the X-type and Z-type stabilizers28, with the Z measurement outcomes being +1 in the absence of errors, while the X outcomes are random. The information about the occurred errors is provided by the stabilizer measurement outcomes from each QEC cycle, as well as by a Z-type stabilizer measurements obtained by measuring the data qubits in the computational basis at the end of the run. This information is provided to the MWPM decoder, which estimates the logical state at the end of the experiment by tracking the Pauli frame. For decoding, we assume that the \(\left|2\right\rangle\) state is measured as a \(\left|1\right\rangle\), as in most current experiments. In section “Ancilla-qubit leakage detection” we considered the discrimination of \(\left|2\right\rangle\) in readout, which can be used for leakage detection. While this information can be also useful for decoding, we do not consider a leakage-aware decoder in this work.

Fig. 8: The quantum circuit for a single QEC cycle employed in simulation.
figure 8

The unit-cell scheduling is defined in ref. 46. The qubit labels and frequencies correspond to the lattice arrangement shown in Fig. 2. Gray elements correspond to operations belonging to the previous or the following QEC cycle. The X-type parity checks are performed at the start of the cycle, while the Z-type parity checks are executed immediately after the Z-type stabilizer measurements from the previous cycle are completed. The duration of each operation is given in Table 1. The arrow at the bottom indicates the repetition of QEC cycles.

The logical fidelity \({F}_{{\rm{L}}}\left(n\right)\) at a final QEC cycle n is defined as the probability that the decoder guess for the final logical state matches the initially prepared one. The logical error rate εL is extracted by fitting the decay as

$${F}_{{\rm{L}}}\left(n\right)=\frac{1}{2}\left[1+{\left(1-2{\varepsilon }_{{\rm{L}}}\right)}^{n-{n}_{0}}\right],$$
(10)

where n0 is a fitting parameter (usually close to 0)27.

Error model and parameters

In the simulations we include qubit decoherence via amplitude-damping and phase-damping channels. The time evolution of a single qubit is given by the Lindblad equation

$$\frac{d\rho }{dt}=-i\left[H,\rho \right]+\mathop {\sum}\limits_{i}{L}_{{\rm{i}}}\rho {L}_{{\rm{i}}}^{\dagger }-\frac{1}{2}\left\{{L}_{{\rm{i}}}^{\dagger }{L}_{{\rm{i}}},\rho \right\},$$
(11)

where H is the transmon Hamiltonian

$$H=\omega {a}^{\dagger }a+\frac{\alpha }{2}{\left({a}^{\dagger }\right)}^{2}{a}^{2},$$
(12)

with a the annihilation operator, ω and α the qubit frequency and anharmonicity, respectively, and Li the Lindblad operators. Assuming weak anharmonicity, we model amplitude damping for a qutrit by

$${L}_{{\rm{amp}}}=\sqrt{\frac{1}{{T}_{1}}}a.$$
(13)

The \(\left|2\right\rangle\) lifetime is then characterized by a relaxation time T1/2. Dephasing is described by

$${L}_{{\rm{deph,1}}}=\sqrt{\frac{8}{9{T}_{\phi }}}\left(\begin{array}{ccc}1&0&0\\ 0&0&0\\ 0&0&-1\end{array}\right),$$
(14)
$${L}_{{\rm{deph,2}}}=\sqrt{\frac{2}{9{T}_{\phi }}}\left(\begin{array}{ccc}1&0&0\\ 0&-1&0\\ 0&0&0\end{array}\right),$$
(15)
$${L}_{{\rm{deph,3}}}=\sqrt{\frac{2}{9{T}_{\phi }}}\left(\begin{array}{ccc}0&0&0\\ 0&1&0\\ 0&0&-1\end{array}\right),$$
(16)

leading to a dephasing time Tϕ between \(\left|0\right\rangle\) (resp. \(\left|1\right\rangle\)) and \(\left|1\right\rangle\) (\(\left|2\right\rangle\)), and to a dephasing time Tϕ/2 between \(\left|0\right\rangle\) and \(\left|2\right\rangle\)9. The Lindblad equation is integrated for a time t to obtain an amplitude-damping and phase-damping superoperator R,t, expressed in the Pauli Transfer Matrix representation. For a gate Rgate of duration tgate, decoherence is accounted by applying \({R}_{\downarrow ,{t}_{{\rm{gate}}}/2}{R}_{{\rm{gate}}}{R}_{\downarrow ,{t}_{{\rm{gate}}}/2}\). For idling periods of duration tidle, \({R}_{\downarrow ,{t}_{{\rm{idle}}}}\) is applied.

For single-qubit gates we only include the amplitude and phase damping experienced over the duration tsingle of the gate. These gates are assumed to not induce any leakage, motivated by the low leakage probabilities achieved8,44, and to act trivially in the leakage subspace. For two-qubit gates, namely the CZ, we further consider the increased dephasing rate experienced by qubits when fluxed away from their sweetspot. In superconducting qubits, flux noise shows a typical power spectral density Sf = A/f, where f is the frequency and \(\sqrt{A}\) is a constant. In this work we consider \(\sqrt{A}=4\,\mu {\Phi }_{{\rm{0}}}\), where Φ0 is the flux quantum. Both low-frequency and high-frequency components are contained in Sf, which we define relative to the CZ gate duration tCZ. Away from the sweetspot frequency \({\omega }_{\max }\), a flux-tunable transmon has first-order flux-noise sensitivity \({D}_{\phi }=\frac{1}{2\pi }\left|\frac{\partial \omega }{\partial \Phi }\right|\). The high-frequency components are included as an increase in the dephasing rate Γϕ = 1/Tϕ (compared to the sweetspot), given by \({\Gamma }_{\phi }=2\pi \sqrt{\mathrm{ln}\,2A}{D}_{\phi }\)55, while the low-frequency components are not included due to the built-in echo effect of Net-Zero pulses9. High-frequency and mid-frequency qubits are fluxed away to different frequencies, with dephasing rates computed with the given formula. Furthermore, during a few gates low-frequency qubits are fluxed away to a “parking” frequency in order to avoid unwanted interactions46. The computed dephasing times at the interaction point are given in Table 1. For the CZ gates, we include this increased dephasing during the time tint spent at the interaction point, while for the phase-correction pulses of duration tcor we consider the same dephasing time as at the sweetspot. We do not include deviations in the ideal single-qubit phases of the CZ gate ϕ01 = 0 and ϕ10 = 0 and the two-qubit phase ϕ11 = π, under the assumption that gates are well tuned and that the low-frequency components of the flux noise are echoed out9.

We now consider the coherence of leakage in the CZ gates, which in the rotating frame of the qutrit is modeled as the exchanges

$$\left|11\right\rangle\, \mapsto \,\sqrt{1-4{L}_{1}}\left|11\right\rangle +{e}^{{\rm{i}}\phi }\sqrt{4{L}_{1}}\left|02\right\rangle ,$$
(17)
$$\left|02\right\rangle\, \mapsto \,\sqrt{1-4{L}_{1}}\left|02\right\rangle -{e}^{-{\rm{i}}\phi }\sqrt{4{L}_{1}}\left|11\right\rangle ,$$
(18)

with L1 the leakage probability47. The phase ϕ can lead to an interference effect between consecutive applications of the CZ gate across pairs of data and ancilla qubits. In terms of the full density matrix, the dynamics of Eqs. (17) and (18) leads to a coherent superposition of computational and leaked states

$$\rho =\left(\begin{array}{*{20}{l}}{\rho }^{{\mathcal{C}}}&{\rho }^{{\rm{coh}}}\\ &\\ {\rho }^{{\rm{coh}}}&{\rho }^{{\mathcal{L}}}\end{array}\right),$$
(19)

where \({\rho }^{{\mathcal{C}}}\) (resp. \({\rho }^{{\mathcal{L}}}\)) is the density matrix restricted to the computational (leakage) subspace, while ρcoh are the off-diagonal elements between these subspaces. We observe that varying the phase ϕ does not have an effect on the dynamics of leakage or on the logical error rate. We attribute this to the fact that each ancilla qubit interacts with a given data qubit only once during a QEC cycle and it is measured at the end of it (and as such it is dephased). Thus, the ancilla-qubit measurement between consecutive CZ gates between the same pair prevents any interference effect. Furthermore, setting ρcoh = 0, does not affect the projection and signatures of leakage nor the logical error rate (at least for the logical state prepared in the Z basis), leading to an incoherent leakage model. We attribute this to the projection of leakage itself, which leaves the qubit into a mostly incoherent mixture between the computational and leakage subspaces. In the harmonic rotating frame, \(\left|2\right\rangle\) is expected to acquire an additional phase during periods of idling, proportional to the anharmonicity. However, following the reasoning presented above, we also believe that this phase is irrelevant.

An incoherent leakage model offers significant computational advantage for density-matrix simulations. For the case where ρcoh ≠ 0, the size of the stored density matrix at any time is 46 × 94 (6 low-frequency data qubits, 3 high-frequency data qutrits plus 1 ancilla qutrit currently performing the parity check). Setting ρcoh = 0 reduces the size of the density matrix to 46 × 54, since for each qutrit only the \(\left|2\right\rangle \left\langle 2\right|\) matrix element is stored in addition to the computational subspace. Thus, for the simulations in this work we rely on an incoherent model of leakage.

Measurements of duration tm are modeled by applying \({R}_{\downarrow ,{t}_{{\rm{m}}}/2}{R}_{{\rm{proj}}}{R}_{\downarrow ,{t}_{{\rm{m}}}/2}\), where \({R}_{\downarrow ,{t}_{{\rm{m}}}/2}\) are periods of amplitude and phase damping and Rproj is a projection operator. This projector is chosen according to the Born rule and leaves the ancilla qubit in either \(\left|0\right\rangle\), \(\left|1\right\rangle\), or \(\left|2\right\rangle\). We do not include any declaration errors, which are defined as the measurement outcome being different from the state of the ancilla qubit immediately after the projection. Furthermore, we do not include any measurement-induced leakage, any decrease in the relaxation time via the Purcell effect or any measurement-induced dephasing via broadband sources. We do not consider non-ideal projective measurements (leaving the ancilla in a superposition of the computational states) due to the increased size of the stored density matrix that this would lead to.

HMM formalism

An HMM describes the time evolution of a set \(S=\left\{s\right\}\) of not directly observable states s (i.e., “hidden”), over a sequence of independent observables \(o=\left\{{o}_{i}\right\}\). At each time step n the states undergo a Markovian transition, such that the probability \({p}^{s}\left[n\right]\) of the system being in the state s is determined by the previous distribution \({p}^{s}\left[n-1\right]\) over all sS. These transitions can be expressed via the transition matrix A, whose elements are the conditional probabilities \({A}_{s,s^{\prime} }:= {\mathbb{P}}{({s}[{n}]={s}| {s}[{n}-{1}]={s}^{\prime} )}\). A set of observables is then generated with state-dependent probabilities \({{B}_{{o}_{i}[{n}],{s}}}:= {\mathbb{P}}{({o}_{i}[{n}]={o}_{i}| {s}[{n}]={s})}\). Inverting this problem, the inference of the posterior state probabilities \({p}^{s}\left[n\right]\) from the realized observables is possible via

$${p}^{s}[n]={\mathbb{P}}{({s}[{n}]| {o}[{n}],{o}[{n}-{1}],\ldots ,{o}[{1}])}$$
(20)
$$=\frac{{\mathbb{P}}{{({o}[{n}]| {s}[{n}]){p}_{\rm{prior}}^{s}[{n}}]}}{{\mathbb{P}}({{o}[{n}}])}$$
(21)
$$=\frac{{\prod }_{i}{\mathbb{P}}({{{o}_{i}}[{n}]| {s}[{n}]){p}_{\rm{prior}}^{s}[{n}]}}{{{\prod }_{i}}{\mathbb{P}}({{o}_{i}}[{n}])}$$
(22)
$$=\frac{{\prod }_{i}{B}_{{o}_{i}\left[n\right],s}{p}_{{\rm{prior}}}^{s}\left[n\right]}{{\sum }_{s^{\prime} }{\prod }_{i}{B}_{{o}_{i}\left[n\right],s^{\prime} }{p}_{{\rm{prior}}}^{s^{\prime} }\left[n\right]},$$
(23)

where \({p}_{{\rm{prior}}}^{s}\left[n\right]\) is the prior probability

$${p}_{{\rm{prior}}}^{s}\left[n\right]=\mathop {\sum}\limits_{s^{\prime} }{A}_{s,s^{\prime} }{p}^{s^{\prime} }\left[n-1\right].$$
(24)

We define \({B}_{o\left[n\right],s}={\prod }_{i}{B}_{{o}_{i}\left[n\right],s}\), which for discrete oi constitute the entries of the emission matrix B. In addition to the transition and emission probabilities, the initial state probabilities \({p}^{s}\left[n=0\right]\) are needed for the computation of the evolution.

In the context of leakage detection, we consider only two hidden states, \(S=\left\{{\mathcal{C}},{\mathcal{L}}\right\}\), namely whether the qubit is in the computational (\({\mathcal{C}}\)) or the leakage subspace (\({\mathcal{L}}\)). The transition matrix is parameterized in terms of the leakage and seepage probabilities per QEC cycle. The leakage probability is estimated as \({\Gamma }_{{\mathcal{C}}\to {\mathcal{L}}}\approx {N}_{{\rm{flux}}}{L}_{1}\) (for low L1), where Nflux is in how many CZ gates the qubit is fluxed during a QEC cycle and L1 is the leakage probability per CZ gate. The seepage probability is estimated by \({\Gamma }_{{\mathcal{L}}\to {\mathcal{C}}}\approx {N}_{{\rm{flux}}}{L}_{2}+\left(1-{e}^{\frac{{t}_{{\rm{c}}}}{{T}_{1}/2}}\right)\), where tc is the QEC cycle duration and T1 the relaxation time (see Table 1), while L2 is the seepage contribution from the gate, where L2 = 2L1 due to the dimensionality ratio between \({\mathcal{C}}\) and \({\mathcal{L}}\) for a qubit-qutrit pair47. The transition matrix A is then given by

$$A=\left(\begin{array}{*{20}{c}}1-{\Gamma }_{{\mathcal{C}}\to{\mathcal{L}}}&{\Gamma }_{{\mathcal{L}}\to{\mathcal{C}}}\\{\Gamma }_{{\mathcal{C}}\to{\mathcal{L}}}&1-{\Gamma}_{{\mathcal{L}}\to{\mathcal{C}}}\end{array}\right)$$
(25)

We assume that all qubits are initialized in \({\mathcal{C}}\), which defines the initial state distribution \({p}^{{\mathcal{C}}}\left[n=0\right]=1\) used by the HMMs.

The HMMs consider the defects \(d\left({Q}_{i}\right)\equiv {d}_{i}\) on the neighboring ancilla qubits Qi at each QEC cycle, occurring with probability \({p}^{{d}_{i}}\), as the observables for leakage detection. Explicitly, the emission probabilities are parameterized in terms of the conditional probabilities \({{B}_{{d}_{i}[{n}],{s}}}={\mathbb{P}}{({d}_{i}[{n}]| {s})}\) of observing a defect when the modeled qubit is in \(s={\mathcal{C}}\) or \(s={\mathcal{L}}\). We extract \({B}_{{d}_{i}\left[n\right],{\mathcal{C}}}\) directly from simulation, by averaging over all runs and all QEC cycles, motivated by the possible extraction of this probability in experiment. While this includes runs when the modeled qubit was leaked, we observe no significant differences in the HMM performance when we instead post-select out these periods of leakage, which we attribute to the low L1 per CZ gate. We extract \({B}_{{d}_{i}\left[n\right],{\mathcal{L}}}\) from simulation over the QEC cycles when the leakage probability \({p}_{{\rm{DM}}}^{{\mathcal{L}}}\left({Q}_{i}\right)\) as observed from the density matrix is above a threshold of \({p}_{{\rm{th}}}^{{\mathcal{L}}}=0.5\). In the case of ancilla-qubit leakage, \({B}_{{d}_{i}\left[n\right],{\mathcal{L}}}\) depends on the values of the leakage conditional phases \({\phi }_{{\rm{stat}}}^{{\mathcal{L}}}\) and \({\phi }_{{\rm{flus}}}^{{\mathcal{L}}}\). Thus, in the case of randomized leakage conditional phases, the HMMs are parameterized by the average \({B}_{{d}_{i}\left[n\right],{\mathcal{L}}}\). In the case of data-qubit leakage, the extracted \({B}_{{d}_{i}\left[n\right],{\mathcal{L}}}\) is ≈0.5 regardless of the leakage conditional phases, as expected from the anti-commuting stabilizers (see section "Projection and signatures of leakage").

For ancilla-qubit leakage detection, the analog measurement outcome Im can be additionally considered as an observable, in which case \(o=\left\{{d}_{i},{I}_{m}\right\}\). In this case, the state-dependent probability is further parametrized by \({B}_{{I}_{m}}\left[n\right],{\mathcal{C}}={\mathbb{P}}\left({I}_{m}\left[{n}\right]| {\mathcal{C}}\right)={{\mathcal{N}}_{0}}\left({I}_{m}\left[{n}\right]\right)+{{\mathcal{N}}}_{1}\left({I}_{m}\left[{n}\right]\right)\) and by \({B}_{{I}_{m}}\left[n\right],{\mathcal{L}}={\mathbb{P}}\left({I}_{m}\left[{n}\right]| {\mathcal{L}}\right)={\mathcal{N}}_{2}\left({I}_{m}\left[{n}\right]\right)\), where \({\mathcal{N}}_{i}\) are the Gaussian distributions of the analog responses in the IQ plane, projected along a rotated in-phase axis I, following the same treatment as in section “Transmon measurements in experiment” of Supplementary Methods.