Abstract
In this work, we explore the statistical system's configuration generation with generative models from Deep Learning to go beyond conventional Monte Carlo methods. Specifically, we devise a conditional generative adversarial network (cGAN) for the Ising system spin configuration generation, and we demonstrate it's working outside the training range of temperature for the ensemble of configurations. Being different from the original GAN design, we add a further recognizer network for the constraint on the conditional parameters (in our case temperature) and also provide a diversity benefit for the generative model. We showed that the newly proposed cGAN here can learn the distribution of the Ising model for different temperatures, which can generate spin configuration efficiently with correct (within a probability distribution) temperature estimation for the microscopic configurations. Moreover, without information about criticality provided inside the training data set, it is also shown that the developed cGAN can generate Ising spin configurations around the phase transition point with the order parameter (mean magnetization) reasonably well-match to the conventional MCMC simulation but with parallel sampling advantage. We also compared typical spin configurations from cGAN with specified conditional temperature to be the critical temperature with samples simulated by MCMC, which visibly is not distinguishable. This thus can possibly help to avoid critical slowing-down as shown in the traditional Monte Carlo method.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Recently the applications of machine learning techniques in scientific research have been boomed since both the currently increased computational power and the corresponding strong pattern recognition ability. Deep Learning (DL) is a branch of machine learning aiming at extracting and understanding high-level representations of data with a deeper structure of artificial neural networks [1]. After the successful development and applications from image/video processing and speech recognition, DL also showed great success in physics [2, 3], biology and engineering. It is shown that DL has the ability to effectively handling complex nonlinear systems with strong correlations which are beyond the traditional analysis ability. Tremendous progress has been made in applying deep learning techniques in condensed matter systems like classical or quantum spin models, especially in the study of phase transition identification [4, 5], and compressed quantum state representation [6], or the acceleration of Monte-Carlo simulations [7, 8]. As we know, the collective behavior of interacting degrees of freedom in the system is the core of many physical system kinds of research. Usually, the enormous number of possible free parameters that defined a near-infinite configuration space makes it super difficult to effectively model many-body systems. One thus introduces novel machine learning methods to build better approximations for the system and help to extract physics insight.
Besides these pattern recognition applications (i.e. classification and regression), there's recently increasing interest in utilizing the generative methods for the physical system [9–11]. In general, generative models target at learning the joint probability distribution of the data for further density estimation or new sample generation. Some promising generative methods are, for example, restricted Boltzmann machine (RBM), variational autoencoder (VAE), Generative Adversarial Network (GAN), or Flow-based generative models.
It's actually not yet well studied about the representational ability and limitations of the above mentioned generative models in physical systems. In this paper we explore Ising system spin configuration generation with conditional GAN outside of training range of temperatures, to better understand to which extent we can apply the GAN methods in many-body statistical physics for assisting or even replacing the traditional methods like MCMC or variational mean-field approaches. Besides exploring the generative model's ability for capturing the underlying physics of the spin system, our work here also would help in compressing the physics of the system into the highly trained network to be an efficient representation, thus can largely reduce storage cost from practical consideration.
The left of the paper is structured as follows. After outlining the Ising spin model we introduce the generative adversarial network (GAN) methods. Then we develop a conditional generative adversarial network (cGAN) to be applied on Ising spin configuration generation for specified temperature, provide our numerical experiment on training and testing, discuss different aspects of the physics capturing degree for cGAN, In the end, we summarize the main finds and conclude.
2. Ising model
We consider the 2-D Ising model on the square lattice (N × N grid) with each site on the lattice contains one spin that points either up or down. The Ising model is a well-known model of magnetism like a ferromagnetic system of particles, for which the interactions of spins either pointing up or down si ∈ − 1, 1(1 ≤ i ≤ N) is described according to the following Hamiltonian
where the summation is taken over all nearest-neighbor spin pairs (〈i, j〉) and h is an external magnetic field which is set to zero in our study here. The coupling J (also called exchange energy) sets the scale of interaction strength for the system. We consider J = 1 which corresponds to the ferromagnetic case and the negative sign in the Hamiltonian indicates that the spins are preferred to be parallel with each neighbor for lowering the internal energy of the system.
We consider the equilibrium Ising system here. As dictated by the ensemble theory, the probability density of a microscopic spin configuration s at equilibrium at a given temperature T is given by the Boltzmann distribution
where the Boltzmann constant k is set to unity in our calculation and the normalization factor Z is the partition function defined as with the summation running over all the configurations. Clearly one sees that such Boltzmann distribution prefers ordered states when the Ising system is at low temperature, while at high temperature this will be less required and the system would be balanced or dominated by the disorder states. From statistical physics, thermal expectations of any physical quantity (e.g. energy or magnetization per degrees of freedom) that depends on the configuration s can be estimated as
Numerically one usually adopts Markovian Chain Monte Carlo (MCMC) sampling to construct a set of N configurations si that are distributed according to the distribution of P(s∣T) like via Metropolis-Hastings algorithm or Woff algorithm, then under which the estimation turns out to be just .
With latices of size 60 × 60, we collected Ising spin configurations sampled by Monte Carlo simulations via the Metropolis-Hastings algorithm. We prepared configurations at 40 different temperatures values from T = 1 to T = 3.475 with equal interval space in between (only 32 different temperature ensembles are taken as training set as will be explained later in training our generative model), at each temperature value 2500 configurations were generated after a sufficient equilibrium (burning time) preparation. To reduce the correlations in generating the spin configurations, the generation is with 1000 MCMC steps between two successive configurations recording on one Markov Chain.
3. Generative adversarial networks
Aimed at learning the data distribution, Generative Adversarial Network (GAN) contains two differentiable functions modeled by deep neural networks: one is the generator G(z) which maps a prior noise vector z from latent space (z ∼ p(z)) to the target data space (), the other one is the discriminator D(x) which tries to distinguish real data x from generated data . The two networks will compete with each other during training, through which the generator distribution pG (x) can be pulled to approach the underlying data distribution ptrue(x).
The vanilla generative adversarial networks, GAN, involves two loss functions and for the discriminator and the generator, respectively. By mimicking a zero-sum game with , GAN consists of optimization for the respective network's parameters θG and θD to be converged in the game to
In the original GAN, the binary cross-entropy loss function was used,
with
represents the expected value over the normalized probability distribution p(x) and one used
Note that for the generator, the first term of (5) has no influence on generator G during training as it only depends on θD . The expectation values in the above loss functions are evaluated in training from the mean of all the training samples. θD and θG are updated via backpropagation with the gradients of the loss functions,
with xi the sample from the training set and zi the latent noise is drawn from the prior distribution pprior. With the above binary cross-entropy loss function, the optimal discriminator for given fixed generator G can be derived to be
Taking the information theory point of view, the above training objective for discriminator (thus the training criterion of the generator) are just the Jensen-Shannon (JS) divergence, which is a measure of similarity between two probability distributions,
where the JS divergence is formulated by symmetrizing the Kullback-Leibler (KL) divergence,
with the KL divergence defined as
So the optima for the generator is reached if and only if pG (x) = ptrue(x), resulting in D* = 1/2 for the global optimum of the zero-sum (minimax) game which is also called Nash-equilibrium.
Practically it's very hard to train with this default GAN setup especially for high-dimensional cases. With only the above JS divergence measure, the discriminator D(x) may not provide sufficient information to measure the distance between the generated distribution pG and the real data distribution ptrue when the two distributions do not have any overlap. Mathematically, when the support of pG and ptrue both rest in low dimensional manifolds of the data space, the two distributions thus have a zero measure overlap which results in a vanishing gradient for the generator. This leads to a weak training signal for G updating and also general instability. Mode-collapse can easily occur for GAN where the generator only learns to produce a single element (mode) in the state space that can maximally confuse the discriminator. In order to avoid this kind of failure training, a multitude of different techniques has been developed recently, like ACGAN [12], WGAN [13], improved WGAN, which help to stabilize and improve the GAN training. We used the improved WGAN with gradient penalty [14] in this work. The most important difference of WGAN compared to the original GAN lies in the loss function, where the Wasserstein-distance (also called Earth Mover distance) provides an efficient measure for the distance between the two distributions (ptrue and pG ) even if they are not overlapping anywhere. The loss functions are now
and
where the gradient penalty term with strength λ is computed in a linearly interpolated sample space,
with uniformly sampled ∼ (0, 1].
4. Ising configuration generation with conditional GAN
With the Gradient Penalty improved Wasserstein GAN (WGAN-gp) as the basic adversarial training framework, here we build a conditional GAN (cGAN) for generating Ising spin configurations at conditionally specified temperature values.
4.1. cGAN architecture
Figure 1 depicted the cGAN structure developed in the present study. It consists of three models: a generator G(z, T) for spin configuration generation conditioned on temperature T with prior source z ∼ p(z), a discriminator D(x, T) for distinguishing the spin configurations by giving a Wasserstein distance measure and a recognizer R(x) for estimating the temperature associated with the spin configuration.
As explored in [15], for the input of the discriminator the condition of temperature can be added as a second channel of the spin configurations. Inside the generator, we propose to add the condition of temperature as a scale factor for the latent code z ∼ p(z) which works better than other embedding methods from our experiments. As a further push for diversity requirement of the generated distribution, we additionally consider maximizing the entropy for the generator's distribution which turns out to be a reconstruction error for an additional recognizer network for the condition of temperature,
by assuming the distribution modeled by the recognizer R to be gaussian one immediately obtain the normal Euclidean distance between the model prediction and ground-truth.
4.2. Training and testing results
As our main focus, we explore here the generalizability of the trained cGAN in generating spin configuration outside of the training set range for the temperature. Specifically, we excluded in the training set spin configuration ensembles with the temperature around the vicinity of the phase transition. This critical region is the most time-consuming part for conventional MCMC simulation since the auto-correlation time will diverge while approaching the critical point (which is also called critical slowing-down). The training set of configuration ensembles cover temperature interval [1.0, 2.0] ∪ [2.5, 3.5] from MCMC simulations.
After training, we purposely specify conditional temperature values in the critical range [2.0, 2.5] for the cGAN generation, and further test the recognizer prediction (i.e. temperature evaluation) on the cGAN generated configurations and MCMC generated training configurations. The results are shown in figure 2.
Download figure:
Standard image High-resolution imageWe see that the recognizer R(x) successfully learned the temperature estimation for each individual spin configuration. Note that from statistical theory, under a fixed temperature every spin configuration has a probability to appear according to equation (2), so the temperature is associated with a randomly chosen spin configuration should also follow a distribution but not a deterministic value. That's why in figure 2 the network predicted temperature spread with the center in the ground truth value, even though in the training for each MC generated spin configurations we label them with the specific temperature value during MCMC generation. For the cGAN generated configurations at unseen critical temperature region, the recognizer also consistently predict the correct temperature estimation for them (see the temperature range [2.0, 2.5] in figure 2). We stress that for the training only temperature range [1.0, 2.0] ∪ [2.5, 3.5] was provided, but the agreement between the desired conditional temperature values and the supervised trained prediction for temperature on the generated configurations is spectacular over a much broader range of temperature values.
We take a closer look at the microscopic configurations generated from the trained cGAN with conditional temperature specified to be the critical temperature T = Tc , to check that if the trained generative model here can capture the criticality physics based on training configurations without any criticality information. Physically speaking, it's also quite interesting to know how much criticality information can be contained inside configuration ensembles away from the critical region. In principle only when the ensemble approach infinity (to be large enough to demonstrate the Ergodic hypothesis), all the physics of the system can be represented which actually is the physics inside the partition function of the system. Here, taking the approximation point of view that the MC generated configuration ensembles can provide numerical construction for the partition function for further observable estimation, we can imagine that the cGAN can first learn the interpolation of the partition function in temperature and then reverse it back to the microscopic configuration space. One typical configuration generated by cGAN at the critical temperature (which is not included in the training data set) and with magnetization around the mean value is shown in figure 3. Since the continuous numerical output nature of the designed cGAN network, there are some shining 'star' points appeared to which one can use around the operation to get rid of easily and would not change the thermal dynamics observable evaluation visibly.
Download figure:
Standard image High-resolution imageWe then evaluated the mean magnetization per lattice site for the Ising configurations ensemble at different temperatures with the trained cGAN, and compare it with the MCMC results in figure 4, where 100 configurations are generated to evaluate the mean magnetization for each temperature value by the trained cGAN. To take into account the model uncertainty approximately, we save three different trained cGAN version by stopping the training at different epochs. From figure 4 we see that the general temperature dependency of magnetization for configurations is represented well by the generator in cGAN. We attempt to use the network to generalize the distribution that it was trained on, so most interestingly are the critical region that is not included in the training for the cGAN. Clearly, we see the network got the ability to make this high-dimensional interpolation for configuration generation at conditional temperature values outside of the training range, which reasonably captured the average order parameter observable, mean magnetization.
Download figure:
Standard image High-resolution image5. Conclusions
In this paper, we developed a conditional GAN for Ising spin configuration generation under specified temperature values and successfully trained it for configurations generation with conditional temperature values outside of the training set. The Ising spin configurations generated from our generator well replicate the underlying physical distributions of the training set, also the generator in the cGAN well captured the high dimensional correlations between the microscopic spin configurations and temperature on an ensemble average manner. Most interestingly, the criticality information which is not provided in the training set a be partly revealed by the well-trained cGAN, for which we checked the order parameter estimation in the critical region that agrees reasonably well with the Monte Carlo simulation. This can be helpful in handling critical slowing-down near criticality as in MCMC simulations like to be taken as an uncorrelated proposal in the Markov Chain. It can also be useful in compressing information of the studies physical system which in our case Ising systems. In the future, we will test more physics like spin-spin correlations comparisons. We will also take into account the boundary condition and symmetries of the system explicitly into the Generator.