Unsupervised identification of topological phase transitions using predictive models

Eliska Greplova; Agnes Valenti; Gregor Boschung; Frank Schäfer; Niels Lörch; Sebastian D Huber

doi:10.1088/1367-2630/ab7771

1. Introduction

Identifying phase transitions is one of the key questions in theoretical and experimental condensed matter physics alike. For the experimental characterization of thermodynamic phase transitions, there exists an excessive amount of possible tools, ranging from system specific, like the study of the conductivity in an electronic system, to very generic, like the specific heat. The latter is particularly appealing as it does not assume any prior knowledge: for example, structural transitions, the onset of magnetism, or the transition to superconductivity, all show up in this generic probe. The study of the specific heat is also a standard tool for the theoretician, especially given its generic power.

For quantum phase transitions [1], an equally generic tool as the specific heat for thermal transitions is the fidelity susceptibility. One investigates the derivative of the overlap ${\partial }_{\beta }\langle \psi (\beta +\epsilon )| \psi (\beta )\rangle$ [2] of two infinitesimally separated ground states $| \psi (\beta )\rangle$ as a function of some tuning parameter β. While this probe is in principle very powerful [3–6], it is typically hard to evaluate as one has rarely access to the full wave-function. At least not for most of the approximate numerical techniques and especially not in experimental studies. This raises the question if one can replace the fidelity susceptibility with a tool that is equally unbiased, generic, and accessible to typical numerical and experimental techniques.

In a recent publication some of the present authors introduced such an algorithmic method for classical systems with an order-parameter signaling an (arbitrary) symmetry breaking [7]. Here we demonstrate that one can successfully generalize this method to problems without a local order parameter, i.e. systems with a topological character. Moreover, we show that one can straightforwardly extend [7] to the quantum realm.

The method is based on the analysis of the accuracy of a predictive model. The central idea is to distill a predictive model that relates input data from numerical or experimental studies to the output in the form of a known tuning parameter such as the temperature or a parameter in the Hamiltonian β. Typically, one infers this predictive model via machine-learning techniques in the form of neural nets. The basic idea, however, is independent of the specific inference technique. In a next step, the accuracy of the predictive model is analyzed via the comparison of the predicted to the known value of the tuning parameter β. In particular, we show the derivative of the prediction accuracy with respect to the tuning parameter to be an equally sensitive indicator of a phase transition as the fidelity susceptibility.

To illustrate our generalization of the methods of [7], we investigate two generic models hosting interesting thermodynamic phases without a local order parameter. First, we investigate the finite-temperature cross-over in Wegner's Ising gauge theory (IGT) [8–10] to show that we can analyze an interesting classical problem without a local order parameter. Second, we broaden the scope by taking the step from the IGT to a generalized toric code problem [11, 12] showcasing the applicability of the method to quantum problems.

2. The Ising gauge theory

Wegner's Ising gauge theory (IGT) is a spin model defined on a N × N square lattice with spins placed on the lattice bonds [8–10, 13]. It is described by the Hamiltonian

$\begin{eqnarray}&&{H}_{\mathrm{IGT}}=-J\sum _{p}\prod _{i\in p}{\sigma }_{i}^{z},\end{eqnarray} \tag{ 1 }$

where J is a coupling constant, p refers to plaquettes on the lattice (see figure 1), and ${\sigma }_{i}^{z}$ is the Pauli matrix describing a single spin-1/2. Periodic boundary conditions are imposed. The ground state of this Hamiltonian is a highly degenerate manifold, an arbitrary superposition of all states that meet the condition that the product of spins along each plaquette is equal to 1. At a finite temperature T > 0 the local constraints ${\prod }_{i\in p}{\sigma }_{i}^{z}=1$ are violated (see figure 1). The IGT does not have a finite temperature phase transition. However, for finite system sizes one can find a crossover temperature, T* = 1/(kβ*) defined by the appearance of one plaquette with ${\prod }_{i\in p}{\sigma }_{i}^{z}=-1$ , resulting in the scaling ${T}^{* }\sim 1/\mathrm{ln}(2{N}^{2})$ [12, 14]. Matters are further complicated by the fact that the ground-state manifold cannot be characterized by a local order parameter [15, 10] owing to a local gauge degree of freedom. We come back to this point below.

**Figure 1.** Local constraint of Ising gauge theory: upper left panel shows an example of a T = 0 state where the gauge condition ${\prod }_{i}{\sigma }_{i}^{z}=1$ is met for all plaquettes, lower left panel shows the corresponding dual map, where the spins are mapped on Wilson loops that are uninterrupted in the case of zero-temperature states. Upper right panel shows an example of a $T=\infty$ state, where the gauge condition is violated, the lower panel shows the corresponding Wilson loops with breakages at the places the plaquette condition is not met.
Download figure:
Standard image High-resolution image

To check whether a given spin state is in the IGT ground-state manifold, one has to verify that the condition ${\prod }_{i\in p}{\sigma }_{i}^{z}=1$ is met for all plaquettes in the lattice. Equivalently, one can use the duality map to analyze the phase transition: we connect the edges of the lattice that contain spins with the same orientation and form loops. The IGT constrained phase then has the property that all these loops are closed. Whenever the constraint is violated it results in an open loop [10, 16, 13], see figure 1.

Distinguishing high and low temperature states of the model (1) is a well studied test case for machine learning recognition of phases of matter [14]. As one can see from figure 1, the IGT constitutes an interesting example where the phases are hard to distinguish visually without being a priori familiar with a local restrictions or the dual map. While an supervised approach is immediately successful at distinguishing the high and low temperature phases [14], unsupervised approaches did not succeed without an explicit recipe what type of restriction to look at. There has been significant progress in this direction, but a fully general approach is yet to be found [17–24]. While methods like principal component analysis, clustering and variational auto-encoders have proven to be successful to determine the phase transitions in spin models possessing an order parameter [25], systems without order parameters still represent a challenge.

Here we show how the method introduced by Schäfer et al [7] can be generalized to systems without a local order parameter. One first pre-trains a neural network to relate a spin configuration ${\left\{S\right\}}_{{\beta }_{\mathrm{label}}}$ to the (inverse) temperature β_label, at which the configuration was sampled. After this initial training, the performance of the estimator is assessed with respect to the true value. The derivative

$\begin{eqnarray}&&{ \mathcal D }({\beta }_{\mathrm{label}})=\displaystyle \frac{\partial }{\partial {\beta }_{\mathrm{label}}}{\beta }_{\mathrm{pred}}({\left\{S\right\}}_{{\beta }_{\mathrm{label}}})\end{eqnarray} \tag{ 2 }$

is maximal where the estimator performs worst. In other words, a local maximum in ${ \mathcal D }({\beta }_{\mathrm{label}})$ indicates a phase transition or cross-over temperature ${\beta }_{\mathrm{label}}^{* }$ . While this method does not in principle rely on a local order parameter, it has been shown that the network picks up on the magnetization pattern [7]. It was therefore unclear if one can generalize this strategy to the current problem. Here we show that this approach is valid even for phases of matter that do not contain an order parameter, or a finite temperature phase transition.

Our approach differs from prototypical unsupervised machine learning techniques, such as, e.g. principal component analysis, t-distributed stochastic neighbor embedding (t-SNE), or k-means clustering, since a fully supervised subroutine, namely a regression on the labeled system parameters, is employed. However, we intentionally refer to the approach as an unsupervised learning scheme, as the method aims ultimately to infer the phase diagram of the physical system and not its parameters and the algorithm thereby requires no prior knowledge of the phase labels, the number of different phases or character of the phase transition. In fact, the derivative (2) has generically a stronger signal when the parameters in the supervised part of the protocol are not learned up to high precision.

We create sample configurations of the IGT model and label them with β = 1/(kT). We train a convolutional neural network to predict β given an IGT configuration as an input. Our neural network consists of 2 convolutional and 2 dense layers and was trained on 2 × 10⁵ configurations for 100 different values of β (for details see appendix A).

In figure 2 we show how the difference between the true and predicted inverse temperatures β_pred − β_label behaves as a function of the true β_label for seven different system sizes N = 4, 8, 12, 16, 20, 24, 28 (the total number of spins is 2N²). We see that the behavior of the prediction is not uniform for all inputs and, in fact, we observe that for all systems sizes there exists a different finite $\bar{\beta }$ above which the network has difficulties to identify the correct β_label. In figure 3 we show ${ \mathcal D }({\beta }_{\mathrm{label}})$ which we evaluated as

$\begin{eqnarray*}&&{ \mathcal D }({\beta }_{\mathrm{label}})\approx \displaystyle \frac{{\beta }_{\mathrm{pred}}\left({\left\{S\right\}}_{{\beta }_{\mathrm{label}}^{i+1}}\right)-{\beta }_{\mathrm{pred}}\left({\left\{S\right\}}_{{\beta }_{\mathrm{label}}^{i-1}}\right)}{{\beta }_{\mathrm{label}}^{i+1}-{\beta }_{\mathrm{label}}^{i-1}},\end{eqnarray*}$

where sampled at discrete ${\beta }_{\mathrm{label}}^{i}$ . For all system sizes we observe a presence of a peak that indicates the position of the largest change in the difference between true and predicted β. The peak becomes less recognizable with increasing system size, which is consistent with the fact that the critical β* keeps increasing with growing system size and in the infinite system limit the crossover behavior completely disappears.

**Figure 2.** We show the difference of the network prediction of β_pred and assigned label β_label, β_pred − β_label as a function of β_label for system sizes N = 4 to N = 28. The dashed lines denote the position of the crossover inverse temperature β* as determined by the density of states method.
Download figure:
Standard image High-resolution image

**Figure 3.** Derivative of the output of the predictive model, ${ \mathcal D }({\beta }_{\mathrm{label}})$ , as a function of assigned labels β_label for system sizes N = 4 to N = 28. The dashed lines denote the position of the crossover inverse temperature β* as determined by the density of states method.
Download figure:
Standard image High-resolution image

**Figure 3.** Derivative of the output of the predictive model, ${ \mathcal D }({\beta }_{\mathrm{label}})$ , as a function of assigned labels β_label for system sizes N = 4 to N = 28. The dashed lines denote the position of the crossover inverse temperature β* as determined by the density of states method.
Download figure:
Standard image High-resolution image

The neural network predicts a continuous parameter (inverse temperature) for our model and we observe a change of behavior at some critical value. We show in figure 4 the determined crossover temperature β* as a function of system size. For the system sizes we were able to test numerically we recover logarithmic scaling as expected for the crossover temperature [14, 26].

**Figure 4.** Positions of critical β* as a function of a system size N. We show the scaling obtained from the unsupervised learning method and the scaling obtained from density of states in blue and orange respectively. The shaded areas represent the error bars. Error bars correspond to standard deviation from the mean ${\tilde{\beta }}^{* }$ evaluated by averaging over β* predicted by five separately trained neural nets.
Download figure:
Standard image High-resolution image

**Figure 4.** Positions of critical β* as a function of a system size N. We show the scaling obtained from the unsupervised learning method and the scaling obtained from density of states in blue and orange respectively. The shaded areas represent the error bars. Error bars correspond to standard deviation from the mean ${\tilde{\beta }}^{* }$ evaluated by averaging over β* predicted by five separately trained neural nets.
Download figure:
Standard image High-resolution image

To independently confirm the neural network predictions, we can analyze whether we can identify the physics of what the network is learning and reproduce its predictions by another physical model. From the training set, we can construct a density of states distribution, . In particular, the density of states can be written as a function of energy, E, and inverse temperature, β

$\begin{eqnarray}&&\epsilon (\beta ,E)=\displaystyle \frac{{\sum }_{n=1}^{M}{\delta }_{E,{E}_{n}}{\delta }_{\beta ,{\beta }_{n}}}{{\sum }_{n=1}^{M}{\delta }_{\beta ,{\beta }_{n}}}.\end{eqnarray} \tag{ 3 }$

Here, δ_a,b is the Kronecker-delta symbol (δ_{a, b} = 1 for a = b and δ_a,b = 0 for $a\ne b$ ), E_n (β_n) is energy (label) of the nth configuration in the training set and M is the number of configurations in the training set. We show the distribution obtained for the system size N = 8 (128 spins) in figure 5.

**Figure 5.** Density of states distribution (β, E) of the training set as a function of inverse temperature β and energy E. The plot above has been generated for system size N = 8.
Download figure:
Standard image High-resolution image

epsilon — **Figure 5.** Density of states distribution (β, E) of the training set as a function of inverse temperature β and energy E. The plot above has been generated for system size N = 8.
Download figure:
Standard image High-resolution image

We use the distribution (3) to calculate the most likely $\beta =: {\beta }_{\mathrm{pred}}$ for each configuration at a given energy, which immediately allows us to evaluate the relation between the assigned β and β_pred. Using the density of states we are able to reproduce the behavior in figure 3 (see appendix A). We show the detailed calculation and the dependencies of the predicted β_pred and its derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ as a function of the true β_label in appendix A. This gives us a numerical evidence that the network is learning the density of states distribution shown in figure 5. We identify the logarithmic scaling (with system size) of the critical β* predicted from the density of states (shown in blue in figure 4) analogously to the predictions obtained from the neural net model.

Another unsupervised approach that has proven to be successful for both classical and quantum systems is the confusion scheme introduced in [18]. We compare both approaches in appendix C and show that the confusion scheme is not suitable for the example of IGT studied here.

3. The toric code and its generalizations

So far we have analyzed the performance of our method on the cross-over of a classical spin-1/2 model. When going to quantum models, two complications arise, related to the input and output of our predictive model. For classical systems, simple spin configurations are the natural input. For quantum systems, generically entanglement in the form of non-classically correlated configurations plays a key role. Consequently, the choice of training data needs to either reflect some prior knowledge of the system, or one has to sample over various classical projections of the entangled wave function. On the output side, one can either target a finite-temperature transition, or investigate a quantum phase transition at zero temperature. In the former, the output of the predictive model stays the same: β_pred, the inverse temperature. For zero temperature transitions, one can still investigate a single-parameter family of Hamiltonians H(β). The obvious prediction task is then to reproduce the tuning parameter β, rather then the temperature.

We now turn to a concrete model of a quantum phase transition in a system without a local order parameter. The obvious generalization of (1) is the application of a transverse field [9, 13, 27]

$\begin{eqnarray}&&{H}_{\mathrm{TR}}=-\sum _{p}\prod _{i\in p}{\sigma }_{i}^{z}-g\sum _{l}{\sigma }_{l}^{x}.\end{eqnarray} \tag{ 4 }$

The model above is very well studied, has a confinement-deconfinement transition at a critical g*, and is a working horse for the study of ${{\mathbb{Z}}}_{2}$ spin-liquids. Instead of directly working with this simple model we go beyond (4) in two ways: (i) We restrict ourselves to a subset of gauge-invariant ground states by moving to the toric code [11]. (ii) We generalize the transverse field to allow for an exact solution. We detail both steps in the following.

The IGT of equation (4) has a local ${{\mathbb{Z}}}_{2}$ gauge degree of freedom. The generators of this gauge transformation are the vertex operators

$\begin{eqnarray}&&{A}_{s}=\prod _{i\in s}{\sigma }_{i}^{x},\end{eqnarray} \tag{ 5 }$

that consist of a product of σ_x operators along a vertex, s, of the lattice. The geometry of the vertex operator is illustrated in figure 6. The operators A_s commute with the Hamiltonian, i.e. [H_TR, A_s] = 0 for all vertices s. In other words, one can obtain an eigenstate by changing the sign of the classical σ_z-variables of another eigenstate, as long as one does so for all spins connected to one vertex. The toric code Hamiltonian

$\begin{eqnarray}&&{H}_{\mathrm{TC}}=-\sum _{s}{A}_{s}+{H}_{\mathrm{IGT}}=-\sum _{s}{A}_{s}-\sum _{p}{B}_{p},\end{eqnarray} \tag{ 6 }$

elevates the generators of the gauge transformation to a term in the Hamiltonian. As a consequence, the ground states of the toric code correspond to the gauge-invariant ground states of H_TR [27]. For our numerical purposes below, we largely benefit from the exact solution of the above Hamiltonian: we can write one of the four (un-normalized) ground states as [28]

$\begin{eqnarray}&&| \mathrm{TC}\rangle =\displaystyle \frac{1}{2}\prod _{s}(1+{B}_{p})| {0}_{x}\rangle ,\end{eqnarray} \tag{ 7 }$

where $| {0}_{x}\rangle$ is a reference state with all spins up in the σ^x basis. Then, applying products of Pauli z-matrices along the two non-contractible loops yields the other three orthogonal ground states. We can easily see that the ground states are indeed gauge invariant by applying gauge transformations, obtaining ${A}_{s}| \mathrm{TC}\rangle =| \mathrm{TC}\rangle$ .

**Figure 6.** Illustration of plaquette, B_p, and vertex, A_s, operators in the lattice.
Download figure:
Standard image High-resolution image

Applying a transverse field a spin-model typically excludes an exact solution. The present case is no difference. However, in a recent publication, Chamon and Castelnovo introduced the following generalization of the toric code [12, 26, 29, 30]

$\begin{eqnarray}&&H={H}_{\mathrm{TC}}+\sum _{s}{{\rm{e}}}^{-\beta \sum _{i\in p}{\lambda }_{i}{\sigma }_{i}^{x}}=-\sum _{s}{A}_{s}+\sum _{p}\left(-{B}_{p}+\sum _{s}{{\rm{e}}}^{-\beta \sum _{i\in p}{\lambda }_{i}{\sigma }_{i}^{x}}\right),\end{eqnarray} \tag{ 8 }$

where ${\lambda }_{i}\in [-1,1]$ describes the particular configuration of added background fields and β > 0 characterizes their amplitude. A transition to a topologically trivial phase occurs at a critical value of the field strength β_c. The field configuration λ_i influences the critical value β_c. A detailed analysis of this phase transition has been provided in [30].

To finish our discussion of these exactly solvable models we write the ground state of (8)

$\begin{eqnarray}&&| {\rm{\Psi }}\rangle =\displaystyle \frac{1}{\sqrt{Z}}{{\rm{e}}}^{\tfrac{\beta }{2}{\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}}| \mathrm{TC}\rangle =\displaystyle \frac{1}{\sqrt{{\rm{Z}}}}\sum _{{h}\in {H}}{{\rm{e}}}^{\tfrac{\beta }{2}{\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}({h})}{h}| {0}_{x}\rangle .\end{eqnarray} \tag{ 9 }$

This ground state is four-fold degenerate when periodic boundary conditions are considered [28]. We denote with H the abelian group whose elements h are all possible operations defined by the action of products of plaquette operators on an initial (reference) spin-configuration $| {0}_{x}\rangle$ . By ${\sigma }_{i}^{x}(h)$ we denote the eigenvalue of the operator ${\sigma }_{i}^{x}$ on the eigenstate $h| {0}_{x}\rangle$ . As a consequence, the term ${\sigma }_{i}^{x}(h)$ can take the values ±1. The normalization factor, Z corresponds to the partition function for this ground state and is given by

$\begin{eqnarray*}&&Z:= \sum _{{h}\in {H}}{{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}({h})}.\end{eqnarray*}$

With these considerations we are now in the position to show that the analysis of the predictive model can point out the topological phase transition of this quantum model as well. Unlike in the IGT, discussed in the previous section, the highly entangled ground states of the modified toric code model (8) are not fully characterized by a spin configuration alone. On the other hand, equation (9) provides a closed analytical form for the ground states of the family of the Hamiltonians (8). In addition to that, these ground states are only four-fold degenerate in the topological phase. We take advantage of the knowledge of the modified toric code ground states and show this to be sufficient for identification of the phase transition from the predictive model.

3.1. Projection onto spin configurations

We consider a projection of the ground states of the Hamiltonian (8) onto the σ_x and σ_z bases. These two types of projections correspond to experimentally accessible measurements and we show that both allow us to detect the topological phase transition of the full quantum model. As for the IGT cross-over analyzed previously, we are yet again in the situation where we are able to input a configuration into the predictive model and ask it to predict a continuous parameter.

There are two crucial differences here: first, we are considering a zero temperature topological phase transition that is driven by the applied field strength β. The second difference lies in the behavior of the projected spin configurations in the two phases. In particular, we are able to draw parallels to phase transitions of classical spin models. As we elaborate below, choosing a basis to project on corresponds to mapping the phases of the quantum model to phases of a specific classical spin model.

3.1.1. The σ_x–projection

Let us first consider the projection onto the σ_x basis. We notice that the ground state (9) represents a superposition of x-spin-configurations $| {S}_{h}\rangle := h| 0{\rangle }_{x}$ for all elements of the group H. All states $| {S}_{h}\rangle$ fulfill the so-called closed loop condition

$\begin{eqnarray}&&{A}_{s}| {S}_{h}\rangle =| {S}_{h}\rangle \end{eqnarray} \tag{ 10 }$

for all values of β. In connection to the IGT, this corresponds to the condition of gauge invariance. More concretely, local constraints are imposed, that the product of σ_x eigenvalues around a vertex is equal to one. The value of the field strength, β, influences the weight of a given spin configuration (see equation (9)). Therefore, the probability to obtain a particular configuration $| {S}_{h}\rangle$ after projection onto σ_x-basis is given by

$\begin{eqnarray}&&p({S}_{h})=| \langle {S}_{h}| {\rm{\Psi }}(\beta )\rangle {| }^{2}=\displaystyle \frac{{{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}(h)}}{{\sum }_{\tilde{h}\in H}{{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}(\tilde{h})}}.\end{eqnarray} \tag{ 11 }$

We can understand the physics of the σ_x-projected ground state by first considering limiting cases of the field strength β. When β → 0, the ground state (9) corresponds to the ground state of the pure toric code Hamiltonian (6). Therefore, when projected onto the σ_x basis, all possible $| {S}_{h}\rangle$ are equally likely (since all $| {S}_{h}\rangle$ are weighted equally in the full eigenstate). When $\beta \to \infty$ , on the other hand, all configurations but $| S\rangle =| 0\rangle$ are exponentially suppressed and hence, the projected spin configurations are always ordered.

Thus, what used to be a topological phase transition of the full quantum state is now a transition from disordered spin-configurations (β small) to an ordered spin-configuration (β large, all spins up). We observe that, provided there is a finite β at which the transition between ordered and disordered configurations manifests itself, we obtained a phase transition that shows resemblance to the phase transition of the 2D Ising model. We show that indeed the 2D Ising model and its phase transition can be recovered by a simple change of variables, see appendix C.

Let us now explore the topological phase transition in the toric code model using the unsupervised learning method we introduced above. We train a neural network on the projected σ_x configurations labeled with the field strength, β. We used a network consisting of two convolutional (100 filters, kernel size 3 and 2), one dense layer with 100 neurons and one dropout layer with dropout rate 0.15. We train the neural network on 59950 configurations containing 100 different values of β between 0 and 1. All the simulations were performed for the system size N = 20 (800 spins). Once the model is trained we apply it on 2000 new configurations for 30 different values of β and evaluate the derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ (2) of the outcome, see figure 7. We show ${ \mathcal D }({\beta }_{\mathrm{label}})$ for six example field configurations. The field configurations correspond to different distributions {λ_i} and are detailed in appendix C. As shown in [30] the position and the existence of the phase transition is strongly dependent on the distribution {λ_i} of added fields.

**Figure 7.** Phase transition of the toric code model extracted from unsupervised learning on σ_x-projected configurations of the ground states. The derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ of the prediction of the neural network model designed to predict the field strength β is shown for six different field configurations. The configurations are exemplary illustrated in the bottom panel of the figure. Black circles denote fields in positive directions (shades of grey denote strength), red circles denote fields in the negative direction and empty circles denote no fields present. The dashed vertical lines denote the position of the phase transition for a given field configuration evaluated using the fidelity susceptibility.
Download figure:
Standard image High-resolution image

It was shown in [26] that the topological phase transition of the generalized toric code model can be determined from the behavior of the fidelity between two ground states with slightly varied field strengths (δβ → 0)

$\begin{eqnarray}&&{F}_{\beta }=\langle {\rm{\Psi }}(\beta )| {\rm{\Psi }}(\beta +\delta \beta )\rangle .\end{eqnarray} \tag{ 12 }$

In other words, we calculate the overlap of two ground state wave functions with applied fields whose magnitudes are very close to each other. We can indeed observe a change in the behavior of the overlap in the neighborhood of the phase transition. The rate of this change is better analyzed by studying the derivative of the quantity in equation (12), the so-called fidelity susceptibility

$\begin{eqnarray}&&{\chi }_{F}=-{\left.\displaystyle \frac{{\partial }^{2}\mathrm{ln}F}{{\partial }^{2}(\delta \beta )}\right|}_{\delta \beta =0}.\end{eqnarray} \tag{ 13 }$

We observe in figure 7 that the dashed lines determined from fidelity susceptibility calculation are in good agreement with the maximum of the peaks of the derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ of the predictive model. We show details of the fidelity susceptibility calculation in appendix C.

3.1.2. The σ_z–projection

We can ask whether a particular projection is necessary to determine the topological phase transition from the spin configurations alone. Let us consider measuring the ground state in the σ_z basis instead of σ_x. In order to simplify mathematical expressions let us without loss of generality choose a different state from the ground state manifold

$\begin{eqnarray}&&| {{\rm{\Psi }}}_{z}\rangle =\displaystyle \frac{1}{\sqrt{{Z}_{z}}}{{\rm{e}}}^{\tfrac{\beta }{2}{\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}}\sum _{g\in G}g| {0}_{z}\rangle .\end{eqnarray} \tag{ 14 }$

Here, analogously to equation (9), G is the abelian group of possible products of vertex operators and $| {0}_{z}\rangle$ is the reference state. Note that we chose a different reference state. As opposed to equation (9), all spins of the reference state are aligned in an eigenstate of σ_z instead of σ_x. The normalization is denoted with Z_z and not elaborated on further here.

Let us again examine the limiting behavior of β if σ_z was measured on every spin of the state (14). If β → 0 we obtain the exact toric code ground state. Projective measurement of σ_z on this ground state then results in the configuration ${S}_{g}=g| {0}_{z}\rangle$ , hence the closed loop (plaquette) conditions ${B}_{p}| {S}_{g}\rangle =| {S}_{g}\rangle$ are fulfilled. Every configuration $g| {0}_{z}\rangle$ fulfilling these constraints is obtained with equal probability. We note here, that the local plaquette constraints are in exact correspondence to the IGT local constraints fulfilled in the zero temperature phase.

Applying the same logic as in the case of the σ_x projection, we can conclude that in the case $\beta \to \infty$ we arrive at a completely polarized state, where all spins are aligned in the x-direction. If we now project onto a σ_z eigenstate, the plaquette constraints will not stay preserved. In fact, any configuration in σ_z basis will be obtained with equal probability. Hence, we find that in the σ_z projection the phase transition arises from a quite different process than we observed before: for small β the system would be in the state where loop conditions are preserved, while for large β they are violated. While in the case of σ_x the phase transition simply changes the weight for some states from the set preserving loop condition, in the case of σ_z projection we transition from the state where all the states preserving loop condition are weighted equally to the phase where the loop constraints are completely violated. We can therefore draw parallels to the previously examined IGT transition at finite temperature. In particular, in both cases we observe phases that can be distinguished by checking for a violation of the local closed loop constraints. However, there is a crucial difference between these two transitions. IGT exhibits a finite temperature cross-over and the violation of local constraints is a result of thermal excitations. Here, we consider a quantum phase transition at zero temperature, where the local constraints are violated due to the interplay with added perturbations. In particular, for IGT in the thermodynamic limit there is only a transition at infinite inverse temperature, $\beta$ , whereas the quantum phase transition we consider here occurs at a finite field strength β in the thermodynamic limit as well.

We employ the unsupervised learning technique on the σ_z projection of the modified toric code ground state (14) with the strength of the background field β as a label for the supervised part of the protocol. This time our neural net model consists of two convolutional layers (with 128 filters and kernel size 2) and three dense layers (with 100, 100 and 50 neurons, respectively).

We show in figure 8 the results for N = 4 (32 spins) and two different field configurations of the 6 field configurations defined in appendix C and previously studied on a larger lattice for the x-projections. The reduction of the system size and number of field configurations presented here are a consequence of constructing projections onto the σ_z-basis from the ground state containing σ_x fields: mixed σ_zσ_x terms make Monte Carlo update computationally significantly more expensive (for details see appendix C).

We note here, that for both σ_x and σ_z projections we limit ourselves to a single topological sector with the choice of the ground state in equation (9). Since the other topological sectors exhibit qualitatively the same phase transition at the same transition point the discussion above can be extended to any ground state within the topological sector.

3.2. Phase transition determination from the stabilizer expectation values

Finally, we discuss on how to obtain the topological phase transition in the toric code model by extracting necessary information by measurements that can be readily performed on the quantum state at hand and do not require projections onto the spin configurations. It was shown in [30] that the behavior of the expectation values of the stabilizer operators are intimately related to the position of the topological phase transition in the toric code model. We use our predictive model to evaluate the position of the phase transition from the expectation value of the stabilizer operators to offer an alternative method to determine the position of the phase transition.

As in the previous sections, we train a neural network to predict the value of the field strength amplitude, β. This time we use as an input the expectation value of the plaquette operator, $\langle {B}_{p}\rangle$ (with β as a label). Then we use the network to predict the field strength β for the expectation values of B_p evaluated with respect to the new set of quantum states. We use a neural network with two dense layers (with 20 neurons each). The derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ of the predictive model is shown as a function of field strength for six distinctive field configurations in figure 9. We again compare to the position of the phase transition obtained by fidelity susceptibility method (dashed lines) and observe an excellent agreement. As in the case of configurational data, we used the topological sector defined by the ground state in equation (9). Our result is again independent of the state in the ground state manifold, as all ground states are locally indistinguishable in the topological phase. As a consequence, the local expectation value $\langle {B}_{p}\rangle$ does not depend on the topological sector examined.

**Figure 9.** Phase transition of the toric code model extracted from unsupervised learning on expectation values of a stabilizer operator, $\langle {B}_{p}\rangle$ , in the ground state. The derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ of the prediction of the neural network model designed to predict the field strength β is shown for six example field configurations. Black circles denote fields in positive direction (shades of grey denote a weaker field strength), red circles denote fields in negative direction and empty circles denote no field present.
Download figure:
Standard image High-resolution image

$\langle {B}_{p}\rangle $ — **Figure 9.** Phase transition of the toric code model extracted from unsupervised learning on expectation values of a stabilizer operator, $\langle {B}_{p}\rangle$ , in the ground state. The derivative ${ \mathcal D }({\beta }_{\mathrm{label}})$ of the prediction of the neural network model designed to predict the field strength β is shown for six example field configurations. Black circles denote fields in positive direction (shades of grey denote a weaker field strength), red circles denote fields in negative direction and empty circles denote no field present.
Download figure:
Standard image High-resolution image

While the connection between expectation values of stabilizer operators and the position of the phase transition have not been shown analytically, another numerical evidence was provided in [31]. The authors examine direct detection of anyons, a process that can be mapped onto the expectation values which we investigated here. The presence of anyons is then immediately tied to the existence of topological order. We elaborate on the connection to the present work in appendix C.

4. Discussion

Unsupervised machine learning techniques for phase classification in condensed matter physics are potentially powerful tools for the discovery of new quantum phases. Due to the lack of local order parameters, phases exhibiting topological order present a challenging task for unsupervised methods. In this work, we have shown that a novel unsupervised method, namely the analysis of predictive neural network models [7], can reliably detect the violations of topological order, or a topological phase transition should it exist.

Topologically ordered states have been particularly challenging for unsupervised learning techniques, because the quantity characterizing topological order is inherently non-local and hard to identify from raw data. In the method presented here, we trained the network on an arbitrary continuous parameter associated to the state and then analyzed the errors in the network predictions. We presented numerical evidence that these prediction errors are signatures of a phase transition. We showed that this conclusion was independent of the particular type of phase transition present in the system and the type of the input data. To determine which type of phase transition is present, applying our method in conjunction with principal component analysis [17], variational auto-encoders [25], or confusion schemes [18, 21] that all succeed in determination of phase transition governed by local order parameter can be used as a guideline.

Providing the resolution to the problem of finding the cross-over temperature in the IGT and its generalizations in an unsupervised manner is the first step towards developing reliable techniques that can be applied to study the models whose phase diagrams are not yet fully understood.

Acknowledgments

We thank Juan Carrasquilla for fruitful discussions. We acknowledge Mark H Fischer for contributions to early versions of the code. We are grateful for financial support from the Swiss National Science Foundation and the NCCR QSIT. This work has received funding from the European Research Council under grant agreement no. 771503.

Appendix A.: IGT: predictive model

We created the samples used for training (like those shown in figure 1) of our model using Monte Carlo simulations. We created data for system sizes N × N × 2 with N ∈ {4, 8, 12, 16, 20, 24, 28}. For each system size we created 100 different values of β ∈ [0, 5]. We generated 20 000 configurations for each pair [β, N]. The neural net we used consists of 2 convolutional (128 filters, kernel size 3) and 2 dense layers (300 and 100 neurons, respectively). We observed that our method is flexible with respect to the hyper parameters of the neural network. However, too shallow networks that predict the same average β for all states inside and outside of the topological sector should be avoided.

We trained the network by minimizing the mean-squared-error loss function

$\begin{eqnarray}&&{L}^{\mathrm{mse}}({\beta }_{\mathrm{pred}}-{\beta }_{\mathrm{label}})=\displaystyle \frac{1}{n}\sum _{n}{\left({\beta }_{\mathrm{pred}}-{\beta }_{\mathrm{label}}\right)}^{2},\end{eqnarray} \tag{ A1 }$

where β_pred is the β determined by the network and β_label is the label of the given input sample, n is the batch size. The predictions of β by the network and their divergences are shown in figures 2 and 3.

In order to evaluate the error bars of the neural net predictions, we repeated the training procedure outlined above for 5 separate models (identical construction, separately generated training sets). Then we evaluated standard deviation of the critical β*.

We can replicate the predictions achieved by a neural network using a density of states based model as explained in the main text. Let us consider lattice configurations (training samples) X_n with their assigned inverse temperature labels β_n = β_label(X_n). We can evaluate an energy, E_n of each of these configurations using the formula

$\begin{eqnarray}&&E=-J\sum _{p}\prod _{i\in p}{\sigma }_{i}^{z},\end{eqnarray} \tag{ A2 }$

where the first summation is over all plaquettes, p, whereas the second summation is over spins within each plaquette. For convenience we choose J = 1. Then we can construct the density of states distribution of the training set

$\begin{eqnarray}&&\epsilon (\beta ,E)=\displaystyle \frac{{\sum }_{n=1}^{N}{\delta }_{E,{E}_{n}}{\delta }_{\beta ,{\beta }_{n}}}{{\sum }_{n=1}^{N}{\delta }_{\beta ,{\beta }_{n}}}.\end{eqnarray} \tag{ A3 }$

Here, δ_a,b is the Kronecker delta symbol ( ${\delta }_{a,b}=1$ for a = b and δ_a,b = 0 for $a\ne b$ ), E_n is energy for the configuration X_n evaluated using formula A2 and N is number of configurations X_n in the training set. We can write the energy distribution in the form above because the energy of the lattice configuration, E is discrete by construction and β is discretized in steps as explained above. An example of this distribution is shown in figure 5 for the system size N = 8.

Having access to the energy E_n of a given configuration X_n, we can then evaluate the average β of all states with energy E, which we denote by β^av for a configuration X_n in the training set

$\begin{eqnarray}&&{\beta }^{{\rm{av}}}(E)=\displaystyle \frac{{\sum }_{n=1}^{N}{\delta }_{E,{E}_{n}}{\beta }_{n}}{{\sum }_{n=1}^{N}{\delta }_{E,{E}_{n}}}.\end{eqnarray} \tag{ A4 }$

The function above predicts the value of β which is most likely for a given energy, E, given the energy distribution of the training set. We can use the function (A4) to determine the relation between assigned labels, β_n and values of β predicted by our model

$\begin{eqnarray}&&{\beta }^{{\rm{est}}}(\beta )=\displaystyle \frac{{\sum }_{m=1}^{M}{\beta }^{{\rm{av}}}({E}_{m}){\delta }_{\beta ,{\beta }_{m}}}{{\sum }_{m=1}^{M}{\delta }_{\beta ,{\beta }_{m}}},\end{eqnarray} \tag{ A5 }$

where M is the number of configurations X_m in an arbitrarily chosen test set. Using equation (A5) we can predict the estimated β for a range of true labels. In figure A1 we show the difference between true and predicted β as a function of true β. In figure A2 we show the derivative of the estimated β as a function of true β. Comparing with figure 3 we see that our model based on the density of states in the training set is reproducing well the actions of the neural net model we introduced in the main text. We have used maxima determined by the density of states as a dashed-line reference for the position of the transition in the main text.

**Figure A1.** Density of states based prediction of β. We plot the difference between true and predicted β, β_pred − β_label, as a function of β_label.
Download figure:
Standard image High-resolution image

**Figure A2.** We show the derivative of the density of states based prediction, ${ \mathcal D }({\beta }_{\mathrm{label}})$ , as a function of β_label.
Download figure:
Standard image High-resolution image

Appendix B.: Toric code: predictive model

B.1. Mapping to Ising model

The projection of the modified toric code ground state on the σ_x basis can be understood by mapping to a classical Ising model. Let us examine the ground state of the toric code with fields (9)

$\begin{eqnarray}&&| {\rm{\Psi }}\rangle =\displaystyle \frac{1}{\sqrt{Z}}\sum _{h\in H}{{\rm{e}}}^{\tfrac{\beta }{2}{\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}(h)}h| {0}_{x}\rangle .\end{eqnarray} \tag{ B1 }$

We perform a projection of $| {\rm{\Psi }}\rangle$ onto the σ_x basis. As stated in the main text, the outcome of the projection are the configurations $| {S}_{h}\rangle$ fulfilling the closed-loop condition ${A}_{s}| {S}_{h}\rangle =| {S}_{h}\rangle$ . In addition, the probability to obtain $| {S}_{h}\rangle$ after projection is given by

$\begin{eqnarray}&&p({S}_{h})=| \langle {S}_{h}| {\rm{\Psi }}\rangle {| }^{2}=\displaystyle \frac{{{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}(h)}}{{\sum }_{\tilde{h}\in H}{{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}(\tilde{h})}}.\end{eqnarray} \tag{ B2 }$

The system can be mapped to the classical Ising model as follows [26]. First we notice, that every group element h uniquely determines the configuration $| {S}_{h}\rangle$ by applying h to a reference spin configuration, which we choose to be $| {0}_{x}\rangle$ (all spins up in x-basis). Then, $| {S}_{h}\rangle =h| {0}_{x}\rangle$ . In addition, h corresponds to the product of a set I_h of plaquette operators $h={\prod }_{p\in {I}_{h}}{B}_{p}$ . Every such set of plaquette operators corresponding to a spin configuration $| {S}_{h}\rangle$ can be mapped to the following pseudo-spin configuration: artificial degrees of freedom (pseudo-spins) θ_p ∈ {−1, 1} are introduced on every plaquette. The value of the pseudo-spin θ_p is determined by I_h: if B_p ∈ I_h (plaquette flipped) it is equal to −1, else it is equal to one. As a consequence, the original spin-configuration $\{{\sigma }_{i}^{x}(h)\}$ (corresponding to $| {S}_{h}\rangle$ ) can be deduced from the pseudo-spin configuration θ_p by applying the rule ${\sigma }_{i}^{x}(h)={\theta }_{p}{\theta }_{p^{\prime} }$ . Here, p and p' are the two adjacent plaquettes to spin i. The geometry of the mapping is illustrated in figure B1.

**Figure B1.** Illustration of the mapping to a classical Ising model. The introduced pseudo-spins on the vertices are drawn in blue, the original spins in grey. The relation ${\sigma }_{i}^{x}(h)={\theta }_{p}{\theta }_{p^{\prime} }$ holds.
Download figure:
Standard image High-resolution image

Let us translate the mapping into the calculation of the probability p(S_h). Inserting the rule ${\sigma }_{i}^{x}(h)={\theta }_{p}{\theta }_{p^{\prime} }$ to equation (B2) yields

$\begin{eqnarray}&&p({S}_{h})=p(\{{\theta }^{h}\})=\displaystyle \frac{{{\rm{e}}}^{\beta {\sum }_{\langle p,p^{\prime} \rangle }{J}_{p,p^{\prime} }{\theta }_{p}^{h}{\theta }_{p^{\prime} }^{h}}}{{\sum }_{\{\theta \}}{{\rm{e}}}^{\beta {\sum }_{\langle p,p^{\prime} \rangle }{J}_{p,p^{\prime} }{\theta }_{p}{\theta }_{p^{\prime} }}},\end{eqnarray} \tag{ B3 }$

with ${J}_{p,p^{\prime} }={\lambda }_{i}$ for the plaquettes $p,p^{\prime}$ adjacent to edge i and summing over nearest-neighbors $\langle p,p^{\prime} \rangle$ . The pseudo-spin configuration obtained from the group element h by applying the explained mapping is denoted by the parameters {θ^h}. In contrast, the sum over {θ} represents a sum over all possible pseudo-spin configurations. We recognize the expression (B3) as Boltzmann weight for an Ising model with bond strengths ${J}_{p,p^{\prime} }$ at temperature $T=1/({k}_{B}\beta )$ . The topological phase transition undergone by the studied perturbed toric code model hence shows the behavior of an Ising phase transition from disordered pseudo-spin configurations to ordered spin configurations after projecting onto the σ_x basis.

B.2. Calculation of the fidelity susceptibility

We compare the position of the phase transition found by the neural network to the transition indicated by the fidelity susceptibility [2]. The fidelity susceptibility is defined as

$\begin{eqnarray}&&{\left.{\chi }_{F}=-\displaystyle \frac{{\partial }^{2}\mathrm{ln}\langle {\rm{\Psi }}(\beta )| {\rm{\Psi }}(\beta +{\rm{\Delta }}\beta )\rangle }{\partial {\left({\rm{\Delta }}\beta \right)}^{2}}\right|}_{{\rm{\Delta }}\beta =0},\end{eqnarray} \tag{ B4 }$

where the state $| {\rm{\Psi }}(\beta )\rangle$ is a ground state of a given Hamiltonian with respect to the parameter β. For our particular model, $| {\rm{\Psi }}(\beta )\rangle$ is given in equation (9). It has been shown, that a divergence or maximum of the fidelity susceptibility χ_F indicates a second-order symmetry-breaking quantum phase transition [3–5]. Numerical evidence suggests that topological phase transitions are indicated in the same way [6]. We can calculate the fidelity susceptibility for the introduced disordered toric model as

$\begin{eqnarray}&&{\chi }_{F}=\displaystyle \frac{1}{4}\displaystyle \frac{{\sum }_{h\in H}{\left({\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}\left(h\right)\right)}^{2}{{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}\left(h\right)}\cdot Z}{{Z}^{2}},\end{eqnarray} \tag{ B5 }$

$\begin{eqnarray}&&-\displaystyle \frac{1}{4}\displaystyle \frac{{\left({\sum }_{h\in H}\left({\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}\left(h\right)\right){{\rm{e}}}^{\beta {\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}\left(h\right)}\right)}^{2}}{{Z}^{2}}.\end{eqnarray} \tag{ B6 }$

We numerically evaluate the expression via Monte Carlo sampling for the different field configurations examined throughout this work and compare the position of the maximum with the position of the phase transition found by the neural network. In particular, we calculate the fidelity susceptibility for the following 6 different field configurations:

$\begin{eqnarray}\begin{array}{rcl} \mbox{\unicode{x00060}} \mathrm{Config}\ 1\mbox{'}:{\lambda }_{i} & = & \left\{\begin{array}{ll}0.5 & \mathrm{if}\ \ i\ \mathrm{mod}\ 2=0,\\ 1 & \mathrm{else},\end{array}\right.\\ \mbox{\unicode{x00060}} \mathrm{Config}\ 2\mbox{'}:{\lambda }_{i} & = & \left\{\begin{array}{ll}0.5 & \mathrm{if}\ i\ \ \mathrm{mod}\ 3=1,\\ 0.25 & \mathrm{if}\ i\ \ \mathrm{mod}\ 3=2,\\ 1 & \mathrm{else},\end{array}\right.\\ \mbox{\unicode{x00060}} \mathrm{Config}\ 3\mbox{'}:{\lambda }_{i} & = & \left\{\begin{array}{ll}-0.5 & \mathrm{if}\ i\ \ \mathrm{mod}\ 10=9,\\ 1 & \mathrm{else},\end{array}\right.\\ \mbox{\unicode{x00060}} \mathrm{Config}\ 4\mbox{'}:{\lambda }_{i} & = & \left\{\begin{array}{ll}-0.5 & \mathrm{if}\ i\ \ \mathrm{mod}\ 20=19,\\ 0.25 & \mathrm{if}\ i\ \ \mathrm{mod}\ 5=4,\\ 1 & \mathrm{else},\end{array}\right.\\ \mbox{\unicode{x00060}} \mathrm{Config}\ 5\mbox{'}:{\lambda }_{i} & = & \left\{\begin{array}{ll}0 & \mathrm{if}\ i\ \ \mathrm{mod}\ 5=0,\\ 1 & \mathrm{else},\end{array}\right.\\ \mbox{\unicode{x00060}} \mathrm{Config}\ 6\mbox{'}:{\lambda }_{i} & = & 1\ \forall \ i.\end{array}\end{eqnarray} \tag{ B7 }$

The chosen configurations constitute representative examples and incorporate different distributions of positive, negative and zero fields. The numerical simulations of the fidelity susceptibility for the 6 different field configurationson on lattices of lengths examined throughout this work (N = 20 and N = 4) are shown in figures B2 and B3.

**Figure B2.** Fidelity susceptibility numerically calculated for a lattice of length N = 20. The different field configurations are illustrated in the exemplary configuration plots and detailed in equation (B7).
Download figure:
Standard image High-resolution image

**Figure B3.** Fidelity susceptibility numerically calculated for a lattice of length N = 4. The different field configurations are illustrated in the exemplary configuration plots.
Download figure:
Standard image High-resolution image

The fidelity susceptibility can be connected to the heat capacity of the classical Ising model explained in the previous subsection, as elaborated in [6] and [30].

B.3. Numerical simulation of projections

The projection of the ground state of the perturbed toric model onto the σ_x or σ_z basis is in both cases simulated via Monte Carlo sampling. To project on the σ_x basis, we aim to obtain a configuration S_h sampled from the probability distribution p(S_h) (B3). Such a configuration is reached via a Markov chain. More concretely, we start with a lattice with all spins up in x basis and construct the Markov chain as follows: in each step (with given spin configuration ${S}_{{h}_{i}}$ ), a random plaquette p is picked. The decision, whether the plaquette should be flipped and ${S}_{{h}_{i}}\to {S}_{{h}_{i+1}}$ is made via a Metropolis-Hastings test. The four spins around the chosen plaquette are flipped with probability

$\begin{eqnarray}&&\displaystyle \frac{p({S}_{{h}_{i+1}})}{p({S}_{{h}_{i}})}={{\rm{e}}}^{\beta {\sum }_{i\in p}{\lambda }_{i}({\sigma }_{i}^{x}({h}_{i+1})-{\sigma }_{i}^{x}({h}_{i}))}.\end{eqnarray} \tag{ B8 }$

After thermalization time, the spin configuration S_h is obtained with probability p(S_h) and a projection is simulated.

Projecting on the σ_z basis follows the same principle with the caveat that the spin-flip probability is computationally expensive to calculate. We start from a state in the ground state manifold

$\begin{eqnarray}&&| {{\rm{\Psi }}}_{z}\rangle =\displaystyle \frac{1}{\sqrt{{Z}_{z}}}{{\rm{e}}}^{\tfrac{\beta }{2}{\sum }_{i}{\lambda }_{i}{\sigma }_{i}^{x}}\sum _{g\in G}g| {0}_{z}\rangle \end{eqnarray} \tag{ B9 }$

and project on the σ_z basis. In particular, let us examine the probability to obtain the spin configuration

$\begin{eqnarray}&&| {z}_{M}\rangle =\prod _{i\in M}{\sigma }_{i}^{x}| {0}_{z}\rangle ,\end{eqnarray} \tag{ B10 }$

where M is a set of spins that are flipped in the configuration $| {z}_{M}\rangle$ with respect to the initial state $| {0}_{z}\rangle$ . Then, the probability to project on $| {z}_{M}\rangle$ is given by

$\begin{eqnarray}&&p({z}_{M})=| \langle {z}_{M}| {{\rm{\Psi }}}_{z}\rangle {| }^{2}={\left(\displaystyle \frac{{\sum }_{C}\prod _{j\notin {C}_{M}}\cosh (\tfrac{\beta }{2}{\lambda }_{j}){\prod }_{i\in {C}_{M}}\sinh (\tfrac{\beta }{2}{\lambda }_{j})}{{\sum }_{C}{\prod }_{j\notin C}\cosh (\beta {\lambda }_{j}){\prod }_{i\in C}\sinh (\beta {\lambda }_{j})}\right)}^{2},\end{eqnarray} \tag{ B11 }$

where the closed loops C correspond to the set of spins that are flipped when applying a product of vertex operators to the intial state $\prod {A}_{s}| {0}_{z}\rangle ={\prod }_{i\in C}{\sigma }_{i}^{x}| {0}_{z}\rangle$ . The sum is over all possible closed loops, hence over all possible products of vertex operators. Similarly, the set C_M can be constructed from the closed loop C by flipping the spins in M

$\begin{eqnarray}&&\prod _{i\in M}{\sigma }_{i}^{x}\prod _{i\in C}{\sigma }_{i}^{x}| {0}_{z}\rangle =\prod _{i\in {C}_{M}}{\sigma }_{i}^{x}| {0}_{z}\rangle .\end{eqnarray} \tag{ B12 }$

In order to obtain a spin configuration sampled from the distribution p defined in equation (B11), we construct a Markov chain by starting with a spin configuration $| {0}_{z}\rangle$ in the σ_z basis. In each step, a random spin is chosen and flipped (updating the spin configuration $| {z}_{i}\rangle$ to $| {z}_{i+1}\rangle$ ) with probability p(z_i + 1)/p(z_i). As the computation of the spin flip probability is expensive, we simulate z-projections only for 2 × 4 × 4 = 32 spins.

The neural network for the predictive model on the lattice with length N = 4 (outcomes shown in figure 8) consists of two convolutional layers with 128 neurons each and three dense layers with 100, 100 and 50 neurons. Training was conducted on a set of 157960 examples in total, 144 values of β between 0 and 1. For evaluation of the trained model to predict the field parameter, the values for β were chosen to be 72 discrete steps between 0 and 1. A total of 100800 evaluation examples was generated, data augmentation (rotations, translations, mirror) led to an additional factor of 100.

B.4. Detection of quasiparticles

We elaborated in the main text, that measuring a stabilizer expectation value contains sufficient information to indicate the position of the topological phase transition. This behavior can be related to a detection of the topological phase transition by measuring quasiparticles. More concretely, numerical evidence has been presented in [31], that a topological phase transition can be indicated by a detection of quasiparticles. For the toric model, the toric Hamiltonian can be modified such that the ground state contains a pair of quasiparticles

$\begin{eqnarray}&&{H}_{m}=-\sum _{p\ne p1,p2}{B}_{p}-\sum _{s}{A}_{s}+\sum _{p=p1,p2}{B}_{p}.\end{eqnarray} \tag{ B13 }$

At the plaquettes p₁ and p₂, the expectation value $\langle {B}_{p}{\rangle }_{m}=-1$ measured on the ground state shows the existence of a quasiparticle. Here, the subscript m denotes that the expectation value is taken with respect to the ground state of the modified toric code. When adding a phase-transition driving perturbation paramatrized by a field β, the position of the phase transition is indicated by a divergence in the derivative ${\partial }_{\beta }\langle {B}_{p}{\rangle }_{m}$ with p ∈ {p₁, p₂}. If the added perturbation is of the form

$\begin{eqnarray}&&{H}_{m}\to {H}_{m}+\sum _{p}{{\rm{e}}}^{-\beta \sum _{i\in p}{\lambda }_{i}{\sigma }_{i}^{x}},\end{eqnarray} \tag{ B14 }$

the following relation holds

$\begin{eqnarray}&&\langle {B}_{p}\rangle =-\langle {B}_{p1}{\rangle }_{m}.\end{eqnarray} \tag{ B15 }$

Here, the expectation value $\langle {B}_{p}\rangle$ is evaluated on the ground state of the model without quasiparticles (9) examined throughout this work. We conclude that the divergence in the derivative of $\langle {B}_{p}\rangle$ indicates the position of the phase transition. We therefore understand, that the predictive model is able to reconstruct this behavior as the accuracy of the predicted field strength depends on the slope of the expectation value $\langle {B}_{p}\rangle$ .

Appendix C.: Comparison with the confusion scheme

We compare the introduced unsupervised approach with the similar confusion scheme developed in [18]. We give here a quick summary of the scheme, for further details we refer to the work of the original authors. As starting point, one is given (uniformly sampled) data in the range (β_a, β_b) and an unknown critical value β_c separating two phases, with β_a < β_c < β_b. The critical point can be estimated by systematically 'guessing'. In particular, one chooses a value ${\beta }_{c}^{{\prime} }$ as guess for the critical point and separates the dataset in two parts: all values smaller than ${\beta }_{c}^{{\prime} }$ are labeled with 0, all values larger than ${\beta }_{c}^{{\prime} }$ are labeled with 1. A neural network is trained to reproduce the labels. This procedure is repeated for all values ${\beta }_{c}^{{\prime} }$ in the interval (β_a, β_b), and the performance $P({\beta }_{c}^{{\prime} })$ of the trained networks is plotted and analyzed. The main idea is, that the network performs best when ${\beta }_{c}^{{\prime} }$ is chosen to be the critical point. More concretely, the performance has a W-shape, if a phase transition is recognized (see figure C1). As a consequence, the main difference between our method and the confusion scheme is that we use a single network and analyze its prediction, while the method of van Nieuwenburg et al requires separate networks for each point of the phase diagram they wish to check (which might make it less suitable for high dimensional parameter spaces).

**Figure C1.** Accuracy $P({\beta }_{c}^{{\prime} })$ of the confusion scheme on the IGT problem with N = 12 (blue dots). In grey, the ideal W-shape indicating a phase transition is shown as comparison. The crossover from ground state to non ground state is at around ${\beta }_{c^{\prime} }={\beta }_{c}=2.5$ . The error bars are obtained by averaging over ten different and independent Monte Carlo runs for obtaining the data.
Download figure:
Standard image High-resolution image

**Figure C1.** Accuracy $P({\beta }_{c}^{{\prime} })$ of the confusion scheme on the IGT problem with N = 12 (blue dots). In grey, the ideal W-shape indicating a phase transition is shown as comparison. The crossover from ground state to non ground state is at around ${\beta }_{c^{\prime} }={\beta }_{c}=2.5$ . The error bars are obtained by averaging over ten different and independent Monte Carlo runs for obtaining the data.
Download figure:
Standard image High-resolution image

The confusion scheme has proven to perform well on a variety of phase transitions in classical systems and quantum systems. We employ the confusion scheme on the IGT crossover analyzed in section 2. The network performance $P({\beta }_{c}^{{\prime} })$ is shown in figure C1 for the system size N = 12. As the typical W-shape is not reproduced, the position of the phase transition is not recognized. Instead, we obtain a shifted V-shape. We can understand the result in the following way. The confusion method can easily distinguish between all states with β < β_c on one side of the transition, but it cannot distinguish at the states in the ordered sector. More concretely, all samples in the ordered sector (β > β_c) are in the ground state (no local constraints are violated) and thus indistinguishable to the networks. In the disordered section, the configurations at different ${\beta }_{c}^{{\prime} }$ are distinguishable by different numbers of local plaquette constraint violations. Smaller values of β lead to a larger number of frustrated plaquettes. As a consequence, the network is able to distinguish all the states in the disordered phase.

For this example, a relatively small network architecture consisting of a convolutional layer (12 × 12 = 144 neurons, 5 filters and kernel size 3) and a dense layer with one output neuron was chosen. Some testing with larger networks showed a similar accuracy curve. Tests on smaller networks showed a different V-shape. Specifically, the obtained V-shape was not shifted to the disordered phase. Henceforth, the networks were not able to distinguish the configurations in none of the phases and had to guess randomly. We conclude, that the application of the confusion scheme to IGT is not straightforward.

Unsupervised identification of topological phase transitions using predictive models

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. The Ising gauge theory

3. The toric code and its generalizations

3.1. Projection onto spin configurations

3.1.1. The σ_x–projection

3.1.2. The σ_z–projection

3.2. Phase transition determination from the stabilizer expectation values

4. Discussion

Acknowledgments

Appendix A.: IGT: predictive model

Appendix B.: Toric code: predictive model

B.1. Mapping to Ising model

B.2. Calculation of the fidelity susceptibility

B.3. Numerical simulation of projections

B.4. Detection of quasiparticles

Appendix C.: Comparison with the confusion scheme

Unsupervised identification of topological phase transitions using predictive models

Article metrics

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. The Ising gauge theory

3. The toric code and its generalizations

3.1. Projection onto spin configurations

3.1.1. The σx–projection

3.1.2. The σz–projection

3.2. Phase transition determination from the stabilizer expectation values

4. Discussion

Acknowledgments

Appendix A.: IGT: predictive model

Appendix B.: Toric code: predictive model

B.1. Mapping to Ising model

B.2. Calculation of the fidelity susceptibility

B.3. Numerical simulation of projections

B.4. Detection of quasiparticles

Appendix C.: Comparison with the confusion scheme

3.1.1. The σ_x–projection

3.1.2. The σ_z–projection