Introduction

Electron spins in semiconductor quantum dots (QDs) are a promising candidate for quantum bits (qubits), owing to their potential scalability1,2, for fault-tolerant quantum computing. Spin qubits in QDs are measured using single-shot readouts via a spin-to-charge conversion technique in which spin states are distinguished by the presence of a charge transition event in the sensor signals3. Such sensors are usually located close to QDs and are capacitively coupled to QDs. For the scalability of universal quantum computation, a high fidelity (≥99%), is required for all processes including preparation, control, and measurement of single qubits to implement the surface code error correction protocol4,5. Qubits must be detected using a limited number of dedicated sensors in a scaled-up system, but having a small capacitance limits the sensor sensitivity. The single spin readout fidelity can be optimized by adjusting experimental parameters such as readout bandwidth, spin relaxation rates, filter rates, data sampling rates, and electron tunnel rates6.

Noise sources harm the signal detection accuracy and limit the detection bandwidth and filtering rates, thereby affecting aforementioned parameters. The signal-to-noise ratio (SNR) is increased through both the hardware and the parameter tuning of classification algorithms, to enable faster detection of the small signals which were limited by the untunable physical parameters7. Low-dimensional quantum devices experience several noises such as amplifier noise, shot noise, and Johnson-Nyquist noise. These appear as white Gaussian noise (Gaussian noise) in the sensor traces8, that has equal intensity at different frequencies. The 1/f noise caused by charge fluctuations in the vicinity of the sensor and qubit appears as voltage drift with a lower frequency than the data sampling rates (drift noise)9. Other potential noise sources are the spike-like noise due to instability in voltage sources, and the interference from the alternative current (AC) power supply. In the case of such noise sources with a small SNR, it is hard to find an appropriate physical or numerical filter that provides high-fidelity single-shot readouts with a large bandwidth. Therefore, a precise post-processing of such noisy single-shot traces is vital for a fast high fidelity single-shot readout in any large-scale dense QD array.

Machine learning (ML) offers a promising approach to find an appropriate algorithm for classifying experimental signals. Recent works have shown that ML can directly identify the experimentally obtained raw data to correct states such as the separability of quantum states10,11, underlying transition rates in superconducting devices12, and phases in quantum simulations13,14. It is a striking advantage of ML to easily capture useful features in raw data of experimental observables without any prior knowledge. Although several post-processing techniques have been proposed to reduce the spin readout error15,16,17, no studies have applied ML, specifically the deep neural network (DNN)18,19, to classify single-shot electron spin readout traces that are obscured by noises in QD devices. We note that in the field of QDs, a few studies have used DNNs for the recognition of two-dimensional charge stability diagrams of multiple QDs20,21,22.

In this study, to realize high fidelity spin measurements in QDs with a device-limited small SNR under harsh noise environments, we present a new classification method based on the DNN to classify traces of single-shot electron spin readout (readout traces) measured by a charge sensor in gate-defined QDs. We focus on the energy selective readout scheme for spin-to-charge conversion3 and aim to measure the initial spin state in the readout trace using the DNN classifier. In this conversion method, only the electron is a spin excited state can tunnel off the dot, accompanied by subsequent electron filling in a spin ground state by an electron tunneling from the reservoir within the readout time. These consecutive tunneling events appear as a blip signal in the response of a nearby charge sensor. The electron tunneling from a spin ground state is forbidden. Therefore, the classification accuracy of the charge transition events is crucial for the high-fidelity spin state readout. We first evaluate the proposed DNN classification with simulated charge transition events in measured traces with various noise environments. Next, we show the training procedure to apply the DNN classification to the real QD devices and evaluate the classification accuracy of charge transition events occurring in GaAs-based lateral QDs. Finally, we demonstrate the precise energy selective electron spin readouts using the trained DNN classifier in a noisy environment.

Results

Neural network architecture

First of all, we describe the architecture of our DNN classifier. The single-shot classification procedure of our DNN is shown in Fig. 1a. The DNN consists of a convolutional neural network (CNN)23,24 for feature extraction and the long short-term memory (LSTM)25 for classification. The dimension of the input is 480. The outputs are two classified indices, either including the charge transitions or not. The filter size of the CNNs is (25,1), each with a stride of 1. The number of parameters in the CNNs and the LSTM are 78 and 32, respectively. The original data is a time trace of voltage outputs VRF of a radio-frequency (RF) reflectometry, which come from the charge sensor. Such a time trace of a fixed acquisition length represents a single spin detection by observing whether a charge transition is present (labeled as ’spin-down’) or not (labeled as ’spin-up’). We choose a 480 μs-long data trace of sensor response, which stochastically includes the features of the charge transition event, as the input for the neural networks. Before the DNN classifying step, the mean value of the traces measured at a condition without charge transition events is subtracted from the raw data to standardize the baseline of the input traces to zero. We input each pre-processed trace into each input neuron per trace. The input traces go through three CNN layers, in which each neuron outputs spatially correlated information for the data points inside a given filter size to extract voltage shift features associated with charge transition events. In the next step, the traces are processed by the LSTM layer, which learns from the correlations among different time series. This increases the robustness to low frequency noise compared with using only CNNs. The input vector size of the LSTM is 1 and the LSTM outputs the probabilities for classifying the single-shot trace with or without a charge transition by the softmax activation function. The decision of detecting a transition signal classifies the trace as either ’spin-up’ or ’spin-down’. Although we changed the number of input neurons according to trace lengths of a single-shot data, the total number of parameters in the DNN classifier is constant and the DNN classifier maintained high classification accuracy. Deep learning can flexibly tune the trainable parameters through a simple procedure, depending on the noises and sensor sensitivity for event detections. This is a remarkable advantage of the DNN classification; it automatically builds an appropriate algorithm for each spin qubit, which has a small SNR. We define SNR as 20log10(Asignal/Anoise) (dB), where Asignal and Anoise are root-mean-square values of the signal and noise, respectively.

Fig. 1: The DNN architecture and the measurement setup.
figure 1

a Scheme of the proposed DNN classification method, where (n) indicates the number of dimensions. The red and blue traces are examples of readout traces with (red) or without a charge transition event (blue), respectively. Input for the DNN classifier are a 480 μs-long data trace of the sensor response. The CNN extracts the features of charge transition events from the measured single-shot readout traces. The LSTM classifies the extracted features in the CNN (X1 ~ X408) whether a charge transition is present or not. b The SEM image of a gate-defined GaAs multiple QD device that is nominally identical to the one used for measurements (see methods in detail). The solid circles indicate QDs. The white squares marked with a cross show ohmic contacts. The RF reflectometry circuit for charge sensing was connected to one of the ohmic contacts. In all measurements, the RF signals are filtered by a 1 MHz low-pass filter.

Noise-robustness evaluation

We evaluated the robustness of the DNN classifier against Gaussian noise and drift noise in simulations. We added the simulated Gaussian noise and drift noise on simulated traces to imitate the experimental noisy environments. We define the noise level as the relative noise amplitude to that of the charge transition signal (Anoise/Asignal). We compared the results of the DNN classifications with two standard classification methods often used in gate-defined QDs for identifying charge transition events3,15: threshold-based classification (thresholding), in which the events can be detected when the signal crosses a threshold, and wavelet transformation classification (the wavelet method), in which the events can be detected when the wavelet transform coefficients of a trace crosses a threshold. Figure 2a shows the classification accuracies of the DNN classification, the wavelet method, and thresholding as a function of the Gaussian noise level. The classification accuracy is defined as the fraction of times when the output label matches the preassigned label of the simulated traces. We trained the DNN classifier for each noise level to maximize the classification accuracy. We also optimized the threshold value in the thresholding and the size of the scale vector and threshold value in wavelet method for each noise level. As the noise level increased above 0.5, the accuracies for the wavelet method and thresholding gradually decreased. The accuracy for the DNN classification, however, did not drop significantly even when the noise level reached 2.0. This indicates that the DNN classification is more robust than conventional methods under the presence of certain Gaussian noise. Figure 2(b) shows the classification accuracies for the three methods as a function of the drift noise level. With increasing drift noise level, accuracy decreased for thresholding and the wavelet, whereas the DNN classification correctly classified all 1200 traces in the test dataset. These results indicate that the DNN classifier builds a numerical filter appropriate for different types of noise with different amplitude by tuning the trainable parameters.

Fig. 2: Comparisons of classification accuracies.
figure 2

Comparisons of classification accuracies among the DNN classification, the wavelet method, and thresholding as a function of (a) Gaussian noise level and (b) drift noise level added to the artificial charge transition signals that are generated by the Markov chain model. (c) Simulated real-time sensor response including charge transition signals with Gaussian noise level of 0.1 (top), Gaussian noise level of 2.0 (middle), and drift noise level of 2.0 (also added Gaussian noise level of 0.1 to imitate the sensor response in experiments) (bottom).

Examples of the simulated trace are shown in Fig. 2c. The top panel shows the almost ideal trace including charge transition signals that is accurately identified by all three classification methods. The middle shows the trace with the Gaussian noise (noise level of 2.0). The bottom shows the trace with drift noise (noise level of 2.0). Although the charge transition signals were buried in Gaussian noise with noise level of 2.0 (the middle panel of Fig. 2c), the DNN classifier identified the transition signals with relatively high accuracy in this harsh noisy situation.

Training procedure and accuracy evaluation on QD devices

Next, we demonstrated the advantage of the DNN classification using the experimental dataset, which was obtained from the sensor QD adjacent to a GaAs quadruple QD. The procedure to learn actual charge transition events is quite simple. As depicted in Fig. 3a, it consists of three steps: (1) dataset collection, (2) pre-processing of the dataset (standardization), and (3) training the DNN classifier.

Fig. 3: The training procedure in experiments.
figure 3

a Schematic of the procedure for collecting and pre-processing the dataset for training. Each single-shot data is pre-processed by subtracting the mean value of the traces measured at a condition without charge transition events to remove the offset amplitude of charge transition signals. Then, 2800 pre-processed traces, which have the standardized voltage shift features associated with charge transition events and noises, are used to train the DNN classifier. Next, 1200 traces are used to evaluate the classification accuracy. The stability diagrams are used to collect the traces in Dot1 (b), Dot3 (c), and Dot4 (d). The red squares show the position for collecting the traces with charge transition events.

In step (1), we collected 2000 traces on the charge transition line for electron number N changing between 0 and 1 as down-spin data and 2000 traces off the charge transition line as up-spin data. We assumed that a trace on the transition line contains at least one charge transition event if the readout time is longer than approximately fivefold tunnel time to the lead. This assumption was validated by the Markov chain model simulation. We were able to recognize the charge transition lines of each QD in the stability diagrams as shown in Fig. 3b–d. The dataset for training and analysis was collected at a transition line. In step (2), we subtracted the mean value of VRF in a trace measured at a condition without charge transition events from each single-shot data. Then, the offset amplitude of the charge transition signal was removed. Finally, in step (3), we trained the networks with the dataset that was prepared in steps (1) and (2). The data were sorted randomly into two datasets, 2800 data points to train the networks and 1200 data points to evaluate classification accuracy.

Using the trained DNN classifier, we evaluate the classification accuracy of charge tunneling events for QDs with different SNRs under various noise environments in a QD array. Table 1 shows the accuracies in each QD for the DNN classification and thresholding. Similar to the previous analysis, the classification accuracy was defined as the fraction of times when the output state corresponds to the preassigned label in step (1). We achieved over 95% accuracy in all the QDs using the DNN classification, regardless of the distance between the target QD and the sensor and, thus, the difference in the local noise environments. The accuracy of the thresholding in Dot1 located far from the charge sensor decreased to 81.5%, which was lower than other QDs and worse than the accuracy expected from the SNR. The lower accuracy in Dot1 was mainly due to the presence of the spike-like noise caused by the instability of measurement equipments. It does not apppear in the SNR because it irregularly appears as a short time pulse. Since the accuracy in QD1 was drastically restored by using the DNN classification, this indicates that deep learning readily finds the appropriate numerical filter also for such spike-like noise without prior knowledge of the noise type. We note that the accuracy of the DNN classification was mainly limited by wrong labeling of the training dataset in the experiment. A few traces obtained in the Coulomb blockade region accidentally contained charge transition events caused by thermal excitation and a few traces obtained on the charge transition line under certain conditions of tunnel rates, detection band width, and readout rate did not contain any charge transition events. These errors can be easily solved with lower electron temperature and an appropriate device tuning.

Table 1 Classification accuracy for DNN classification and thresholding for the experimental charge transition events. The threshold value was optimized to minimize the error. The SNR in Dot1, Dot3, and Dot4 were 7.41 dB, 13.5 dB, and 11.5 dB, respectively.

Spin state estimation

Finally, to validate the DNN classification for extracting physical quantities, that do not change depending on the classification methods, we performed energy selective readout and measured spin relaxation time (T1) using the DNN classification and thresholding in Dot4. Since the spin excited state is a spin-down state for electrons in GaAs QDs, we determine the down probability of initial spin state (Down-spin probability) in a trace by counting the number of traces with a charge transition in the repeated single-shot measurements. Figure 4a shows the down-spin probability estimated using the DNN classification and thresholding as a function of the wait time tw, which is the duration for injecting electron spin and waiting spin relaxation in the QD. We used the trained network with the dataset obtained from Dot4. Although the training dataset includes a few bilp signals, the DNN classifier was successfully generalized to classify into 0 and 1 or more blips owing to the proposed DNN architecture. The red and black circles show the results of the DNN classification and thresholding, respectively. The relaxation times T1 were obtained from the fitting of the exponential curve A × exp(-tw/T1) + B, where A and B are the fitting parameters. The relaxation times estimated by the DNN classification were within an error range of those obtained by thresholding. Thus, the advantage of the DNN classification was limited when the SNR was high enough (11.5 dB) for thresholding. The DNN classification is most powerful under noisy environments.

Fig. 4: Spin relaxation measurements using the DNN classifier.
figure 4

Spin relaxation measurement results using the DNN classification and thresholding in Dot4. In the DNN classification, we trained the network following the procedure as shown in Fig. 3a in Dot4. a Down-spin probability versus wait time tw, out of a total 2000 traces taken for each waiting time for both the DNN classification and thresholding. The solid lines are the fit to an exponential decay curve (see details in the main text). The threshold value is optimized to maximize the visibility of decrease in the count of down-spin. b The down-spin probability versus wait time tw in the presence of artificially added Gaussian noise.

Therefore, to further investigate robustness to noise, we artificially added Gaussian noise to the single-shot data to create a noisy environment, which reduced the SNR to −2.51 dB (total noise level of 1.34). Figure 4b shows the estimated down-spin probability with artificially added Gaussian noise. A large decrease in the down-spin probability amplitude was observed for thresholding in the noisy environment, whereas the amplitude was almost unchanged for the DNN classification. The decrease in the probability amplitude leads to a large error in the estimated spin relaxation time, as shown in Table 2. The error of T1 significantly increased only for thresholding and not the DNN classification. Thus, the DNN classification provides more precise estimations of the spin relaxation times than thresholding, indicating the robustness of the DNN classification in a noisy environment for spin state identification. We note that the offset B, which implies the probability of the wrong outcome, up-spin state for down-spin state readout, increased for thresholding in the noisy environment, which also validates the robustness of the DNN classification for the spin state.

Table 2 T1 times and fitting parameters extracted from spin relaxation measurements using DNN classification or thresholding.

Discussion

We proposed and demonstrated a new method based on a DNN to classify noisy single-shot electron charge and spin readout traces. This DNN classification method flexibly builds numerical filters regardless of the noise type and outperformed two conventional numerical classifications. The performance of the charge tunneling event identification in QDs was experimentally improved using DNNs trained on measured datasets. To verify its applicability to energy selective spin readouts, we showed that the DNN classification estimated the spin relaxation time more precisely than thresholding, especially in a noisy environment. This suggests that the DNN offers robust classifications of the readout traces measured under noisy environments, allowing accurate extraction of various physical quantities in QDs. Although the actual gain of the readout speed depends on the experimental conditions such as existing noise types and the fluctuation of the SNR induced by the sensor’s instability, we expect that the DNN classifier would improve the readout bandwidth at least 4 times higher than the thresholding method with the assigned fidelity of higher than 95%, assuming a white Gaussian noise6,26. The DNN classifier should be more advantageous to filter low frequency noises in experimental environments. The speed-up of spin readouts using the DNN classifier has to be further elucidated in next steps. Moreover, our method is applicable to other spin-to-charge conversion schemes, where the spin state can be distinguished by the presence of a charge tunneling event in a trace, such as singlet-triplet tunnel rate selective readout27, singlet-triplet readout by Pauli-spin blockade (PSB)28, or more complicated multi-state readout29 with the appropriate training dataset. The accuracy of the DNN classifier could be further increased with more sophisticated neural network architecture and training procedure. This work validates the use of ML in building an appropriate classification algorithm to measure quantum devices and opens a route for future applications such as fast spin readout30, high temperature qubit operations31,32, and dispersive gate readout33,34,35, where the SNR is generally small.

Methods

Device and measurements

The quadruple quantum dot array is formed electrostatically in a two dimensional electron gas which is 90 nm below the surface of a GaAs/AlGaAs heterostructure (see Fig. 1b). To operate the QDs, the center gate (C) separates the QD array and the sensor QD, the barrier gates (B1, B2, B3, B4, B5) tune the tunnel rates to the reservoir or the inter-dot, and the plunger gates (P1, P2, P3, P4) tune the chemical potential of individual QDs. The sensor dot gates (SB1, SB2, SP1) form a conventional sensitive QD sensor. The charge sensor conductivity, which is measured using RF reflectometry operating at a bandwidth up to 15 MHz, reflects the electron number in a QD array. All measurements were performed under an in-plane magnetic field of 3.2 T that enables clear distinction of spin-up and spin-down states. The device was cooled to a base temperature of 10 mK in a dilution refrigerator.

Simulated trace creation

The simulated signals used to evaluate the robustness of the DNN classification are created by the Markov chain model. We create 2000 traces containing charge transition events with a tunneling time of 33 μs (charge transitions dataset) and 2000 traces without the transition events (no charge transition dataset). All the traces have a total length of 480 μs. The data are then sorted randomly into two datasets with 2800 traces used to train the DNN and 1200 traces used for classifier evaluation. We choose a 1 kHz sinusoidal wave, which is slower than the measurement time (480 μs), as the simulated drift noise.

Experimental training data acquisition

To use the DNN classification in real measurements of gate-defined QDs, we propose and demonstrate a practical procedure to collect the enormous amount of labeled data required to train the DNN classifier for each sensor configuration by setting the state on a charge transition line where sequential charge tunneling independent of the spin occurs. These real-time traces with charge transition events mimic the discriminator traces used to identify spin states. We perform acquisition of the datasets for training and single-shot spin readout measurements in a gate-defined QD array formed at a GaAs/AlGaAs heterointerface.