Choice of damping coefficient in Langevin dynamics

Skeel, Robert D.; Hartmann, Carsten

doi:10.1140/epjb/s10051-021-00182-z

Choice of damping coefficient in Langevin dynamics

Regular Article - Computational Methods
Open access
Published: 07 September 2021

Volume 94, article number 178, (2021)
Cite this article

Download PDF

You have full access to this open access article

The European Physical Journal B Aims and scope Submit manuscript

Choice of damping coefficient in Langevin dynamics

Download PDF

Robert D. Skeel¹ &
Carsten Hartmann²

4876 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

This article considers the application of Langevin dynamics to sampling and investigates how to choose the damping parameter in Langevin dynamics for the purpose of maximizing thoroughness of sampling. Also, it considers the computation of measures of sampling thoroughness.

Graphic abstract

Reconstruction of Stochastic Dynamical Equations: Exemplary Diffusion, Jump-Diffusion Processes and Lévy Noise-Driven Langevin Dynamics

Non-parametric estimation of a Langevin model driven by correlated noise

Article 21 July 2021

Clemens Willers & Oliver Kamps

The Langevin Equation and Wiener Process

1 Introduction

Langevin dynamics is a popular tool for molecular simulation. It requires the choice of a damping coefficient, which is the reciprocal of a diffusion coefficient. (More generally this might be a diffusion tensor.) The special case of a constant scalar diffusion coefficient is the topic of this article. The motivation for this study is a suspicion that proposed novel MCMC propagators based on Langevin dynamics (in particular, stochastic gradient methods for machine learning [4, 9]) might be obtaining their advantage at the expense of reduced sampling efficiency, as, say, measured by effective sample size.

For simulations intended to model the dynamics, the appropriate choice of $\gamma $ is based on physics. Generally, the dissipation and fluctuation terms are there to account for omitted degrees of freedom. In their common usage as thermostats, they model the effect of forces due to atoms just outside the set of explicitly represented atoms. These are essentially boundary effects, which disappear in the thermodynamic limit $N_\mathrm {atoms}\rightarrow \infty $, where $N_\mathrm {atoms}$ is the number of explicitly represented atoms. Since the ratio of the number of boundary atoms to interior atoms is of order $N_\mathrm {atoms}^{-1/3}$, it might be expected that $\gamma $ is chosen to be proportional to $N_\mathrm {atoms}^{-1/3}$.

There is second possible role for the addition of fluctuation-dissipation terms in a dynamics simulation: with a small damping coefficient, these terms can also play a role in stabilizing a numerical integrator [21], which might be justified if the added terms are small enough to have an effect no greater than that of the discretization error.

The bulk of molecular simulations, however, are “simply” for the purpose of drawing random samples from a prescribed distribution and this is the application under consideration here. The appropriate choice of $\gamma $ optimizes the efficiency of sampling. A measure of this is the effective sample size $N/\tau $ where N is the number of samples and $\tau $ is the integrated autocorrelation time. The latter is, however, defined in terms of an observable. An observable is an expectation of a specified function of the configuration, which for lack of a better term, is referred to here as a preobservable. As an added complication, the accuracy of an estimate of an integrated autocorrelation time (IAcT) depends on sampling thoroughness [13, Sec. 3], so a conservative approach is indicated. Ref. [13, Sec. 3.1] advocates the use of the maximum possible IAcT and shows how it might be a surrogate for sampling thoroughness. The maximum possible IAcT is about the same (except for a factor of 2) as the decorrelation time of Ref. [30], defined to be “the minimum time that must elapse between configurations for them to become fully decorrelated (i.e., with respect to any quantity)”.

Therefore, for sampling, it is suggested that $\gamma $ be chosen to achieve a high level of sampling thoroughness, as measured by the maximum possible IAcT. An initial study of this question is reported in Ref. [38, Sec. 5], and the purpose of the present article is to clarify and extend these results.

To begin with, we analyse an underdamped Langevin equation with a quadratic potential energy function. (See Eq. (12) below.) The main purpose of analyzing this model problem is, of course, to obtain insight and heuristics that can be applied to general potential energy functions. Needed for choosing the optimal gamma is a substitute for the lowest frequency. For the model problem, this can be obtained from the covariance matrix for the position coordinates, which is not difficult to compute for a general potentials. And for estimating $\tau _{\mathrm {q},\max }$, the analysis suggests using the set of all quadratic polynomials, which can be achieved using the algorithm of reference [13, Sec. 3.5].

For molecular simulation, the suggestion is that one might choose linear combinations of functions of the form $|{\varvec{r}}_j -{\varvec{r}}_i|^2$ and $({\varvec{r}}_j -{\varvec{r}}_i)\cdot ({\varvec{r}}_k -{\varvec{r}}_i)$ where each ${\varvec{r}}_i$ is an atomic position or center of mass of a group of atoms. Such functions share with the potential energy function the property of being invariant under a rigid body movement.

1.1 Results and discussion

Section 5 analyzes integrated autocorrelation times for the standard model problem of a quadratic potential energy function. An expression is derived for the IAcT for any preobservable; this is applied in Sect. 5.2 to check the accuracy of a method for estimating the IAcT. In Sect. 5, we also determine the maximum IAcT, denoted by $\tau _{\mathrm {q},\max }$, over all preobservables defined on configurations, as well as the damping coefficient $\gamma ^*$ that minimizes $\tau _{\mathrm {q},\max }$. It is shown that it is polynomials of degree $\le 2$ that produce the largest value of $\tau _{\mathrm {q},\max }$. And that choosing $\gamma $ equal to the lowest frequency, which is half of the optimal value of $\gamma $ for that frequency, minimizes $\tau _{\mathrm {q},\max }$. These results extend those of Ref. [38, Sec. 5], which obtains a (less relevant) result for preobservables defined on phase space rather than configuration space.

Sections 6 and 7 test the heuristics derived from the quadratic potential energy on some simple potential energy functions giving rise to multimodal distributions.

Results suggest that the heuristics for choosing the maximizing preobservable and optimal gamma are effective.

One of the test problems is one constructed by Ref. [23] to demonstrate the superiority of BAOAB over other Langevin integrators. Experiments for this problem in Sect. 6 are consistent with this claim of superiority.

In defining “quasi-reliability” and the notion of thorough sampling, Ref. [13] makes an unmotivated leap from maximizing over preobservables that are indicator functions to maximizing over arbitrary preobservables. The test problem of Sect. 7 provides a cursory look at this question, though the matter may warrant further study.

Obtaining reliable estimates of the IAcT without generating huge sets of samples very much hinders this investigation. To this end, Sect. 4.1 explores an intriguing way of calculating an estimate for the phase space $\tau _{\max }$, which avoids the difficult calculation of IAcTs. For the model problem, it give more accurate results for $\tau _{\max }$ than estimating IAcTs, due to the difficulty of finding a set of functions that play the same role as quadratic polynomials when maximizing IAcTs. The literature offers interesting suggestions that might help in the development of better schemes for estimating IAcTs, and it may be fruitful to recast some of these ideas using the formalisms employed in this article. In particular, Ref. [30] offers a novel approach based on determining whether using every $\tau $ th sample creates a set of independent samples. Additionally, there are several conditions on covariances [16, Theorem 3.1] that can be checked or enforced.

1.2 Related work

While the major part of the literature on Markov chain Monte Carlo (MCMC) methods with stochastic differential equations focuses on the overdamped Langevin equation (e.g. [3, 35] and the references given there), there have been significant advances, both from an algorithmic and a theoretical point of view, in understanding the underdamped Langevin dynamics [34]. For example, in Refs. [7, 39] Langevin dynamics has been studied from the perspective of thermostatting and enhancment of specific vibrational modes or correlations, in Refs. [8, 17, 25] Langevin dynamics has been used to tackle problems in machine learning and stochastic optimisation. From a theoretical point of view, the Langevin equation is more difficult to analyse than its overdamped counterpart, since the noise term is degenerate and the associated propagator is non-symmetric; recent work on optimising the friction coefficient for sampling is due to [4, 11, 36], theoretical analyses using both probabilistic and functional analytical methods have been conducted in [5, 10, 12]; see also [27, Secs. 2.3–2.4] and the references therein.

Relevant in this regard are Refs. [20, 26, 33], in which non-reversible perturbations of the overdamped Langevin equation are proposed, with the aim of increasing the spectral gap of the propagator or reducing the asymptotic variance of the sampler. Related results on decorrelation times for the overdamped Langevin using properties of the dominant spectrum of the infinitesimal generator of the associated Markov process have been proved in [22, Sec. 4].

A key point of this article is that quantities like spectral gaps or asymptotic variances are not easily accessible numerically, therefore computing goal-oriented autocorrelation times (i.e. for specific observables that are of interest) that can be computed from simulation data is a sensible approach. With that being said, it would be a serious omission not to mention the work of Ref. [30], which proposes the use of indicator functions for subsets of configuration space to estimate asymptotic variance and effective sample size from autocorrelation times using trajectory data.

Finally, we should also mention that many stochastic optimisation methods that are nowadays popular in the machine learning comminity, like ADAM or RMSProp, adaptively control the damping coefficient, though in an ad-hoc way, so as to improve the convergence to a local minimum. They share many features with adaptive versions of Langevin thermostats that are used in moecular dynamics [24], and, therefore, it comes as no surprise that the Langevin model is the basis for the stochastic modified equation approach that can be used to analyse state of the art momentum-based stochastic optimisation algorithms like ADAM [1, 28].

2 Preliminaries

The computational task is to sample from a probability density $\rho _\mathrm {q}(\mathbf {q})$ proportional to $\exp (-\beta V(\mathbf {q}))$, where V(q) is a potential energy function and $\beta $ is inverse temperature. In principle, these samples are used to compute an observable $\mathbb {E}[u(\mathbf {Q})]$, where $\mathbf {Q}$ is a random variable from the prescribed distribution and $u(\mathbf {q})$ is a preobservable (possible an indicator function). The standard estimate is

$$\begin{aligned} \mathbb {E}[u(\mathbf {Q})]\approx \widehat{U}_N =\frac{1}{N}\sum _{n=0}^{N-1} u(\mathbf {Q}_n), \end{aligned}$$

where the samples $\mathbf {Q}_n$ are from a Markov chain, for which $\rho _\mathrm {q}(\mathbf {q})$ (or a close approximation thereof) is the stationary density. Assume the chain has been equilibrated, meaning that $\mathbf {Q}_0$ is drawn from a distribution with density $\rho _\mathrm {q}(\mathbf {q})$. An efficient and popular way to generate such a Markov chain is based on Langevin dynamics, whose equations are

$$\begin{aligned} \begin{aligned} \mathrm {d}\mathbf {Q}_t&= M^{-1}\mathbf {P}_t\,\mathrm {d}t,\\ \mathrm {d}\mathbf {P}_t&= F(\mathbf {Q}_t)\,\mathrm {d}t -\gamma \mathbf {P}_t\,\mathrm {d}t +\textstyle \sqrt{\frac{2\gamma }{\beta }}M_\mathrm {h}\,\mathrm {d}\mathbf {W}_t, \end{aligned} \end{aligned}$$

(1)

where $F(\mathbf {q}) = -\nabla V(\mathbf {q})$, M is a matrix chosen to compress the range of vibrational frequencies, $M_\mathrm {h}M_\mathrm {h}^\mathsf {T}= M$, and $\mathbf {W}_t$ is a vector of independent standard Wiener processes. The invariant phase space probability density $\rho (\mathbf {q},\mathbf {p})$ is given by

$$\begin{aligned} \rho (\mathbf {q},\mathbf {p}) = \frac{1}{Z}\exp (-\beta (V(\mathbf {q}) +\frac{1}{2}\mathbf {p}^\mathsf {T}M^{-1}\mathbf {p})), \end{aligned}$$

where $Z>0$ is a normalisation constant that guarantees that $\rho $ integrates to 1. We call $\rho _\mathrm {q}(\mathbf {q})$ its marginal density for $\mathbf {q}$. We suppose $\rho >0$.

It is common practice in molecular dynamics to use a numerical integrator, which introduces a modest bias, that depends on the step size $\varDelta t$. As an illustration, consider the BAOAB integrator [23]. Each step of the integrator consists of the following substeps:

B:
$\mathbf {P}_{n+1/4} =\mathbf {P}_n +\frac{1}{2}\varDelta t F(\mathbf {Q}_n)$,
A:
$\mathbf {Q}_{n+1/2} =\mathbf {Q}_n +\frac{1}{2}\varDelta t M^{-1}\mathbf {P}_{n+1/4}$,
O:
$\mathbf {P}_{n+3/4} =\exp (-\gamma \varDelta t)\mathbf {P}_{n+1/4} +\mathbf {R}_{n+1/2}$,
A:
$\mathbf {Q}_{n+1} =\mathbf {Q}_{n+1/2} +\frac{1}{2}\varDelta t M^{-1}\mathbf {P}_{n+3/4}$,
B:
$\mathbf {P}_{n+1} =\mathbf {P}_{n+3/4} +\frac{1}{2}\varDelta t F(\mathbf {Q}_{n+1/2})$,

where $\mathbf {R}_{n+1/2}$ is a vector of independent Gaussian random variables with mean $\mathbf {0}$ and covariance matrix $(1 -\exp (-2\gamma \varDelta t))$ $\beta ^{-1}M$.

In the following, we use the shorthand $\mathbf {Z}=(\mathbf {Q},\mathbf {P})$ to denote a phase space vector. It is known [16, Sec. 2] that the variance of the estimate $\widehat{U}_N$ for $\mathbb {E}[u(\mathbf {Z})]$ is

$$\begin{aligned} \mathrm {Var}[\widehat{U}_N]\approx \frac{\tau }{N}\mathrm {Var}[u(\mathbf {Z})], \end{aligned}$$

(2)

which is exact relative to 1/N in the limit $N\rightarrow \infty $. Here $\tau $ is the integrated autocorrelation time (IAcT)

$$\begin{aligned} \tau = 1 + 2\sum _{k=1}^{+\infty }\frac{C(k)}{C(0)} \end{aligned}$$

(3)

and C(k) is the autocovariance at lag k defined by

$$\begin{aligned} C(k) =\mathbb {E}[(u(\mathbf {Z}_0) -\mu )(u(\mathbf {Z}_k) -\mu )] \end{aligned}$$

(4)

with $\mu =\mathbb {E}[u(\mathbf {Z}_0)] =\mathbb {E}[u(\mathbf {Z}_k)$. Here and in what follows the expectation $\mathbb {E}[\cdot ]$ is understood over all realisations of the (discretized) Langevin dynamics, with initial conditions $\mathbf {Z}_0$ drawn from the equilibrium probability density function $\rho $.

2.1 Estimating integrated autocorrelation time

Estimates of the IAcT based on estimating covariances C(k) suffer from inaccuracy in estimates of C(k) due to a decreasing number of samples as k increases. To get reliable estimates, it is necessary to underweight or omit estimates of C(k) for larger values of k. Many ways to do this have been proposed. Most attractive are those [16, Sec. 3.3] that take advantage of the fact that the time series is a Markov chain.

One that is used in this study is a short computer program called acor [18] that implements a method described in Ref. [31]. It recursively reduces the series to one half its length by summing successive pairs of terms until the estimate of $\tau $ based on the reduced series is deemed reliable. The definition of “reliable” depends on heuristically chosen parameters. A greater number of reductions, called $ reducs $ in this paper, employs greater numbers of covariances, but at the risk of introducing more noise.

2.2 Helpful formalisms for analyzing MCMC convergence

It is helpful to introduce the linear operator $\mathcal {T}$ defined by

$$\begin{aligned} \mathcal {T}u(\mathbf {z}) =\int \rho (\mathbf {z}'|\mathbf {z})u(\mathbf {z}')\mathrm {d}\mathbf {z}' \end{aligned}$$

where $\rho (\mathbf {z}'|\mathbf {z})$ is the transition probability density for the Markov chain. Then one can express an expectation of the form $\mathbb {E}[v(\mathbf {Z}_0)u(\mathbf {Z}_1)]$, arising from a covariance, as

$$\begin{aligned} \mathbb {E}[v(\mathbf {Z}_0)u(\mathbf {Z}_1)] =\langle v,\mathcal {T}u\rangle \end{aligned}$$

where the inner product $\langle \cdot ,\cdot \rangle $ is defined by

$$\begin{aligned} \langle v, u\rangle = \int v(\mathbf {z})u(\mathbf {z})\rho (\mathbf {z})\,\mathrm {d}\mathbf {z}. \end{aligned}$$

(5)

The adjoint operator

$$\begin{aligned} \mathcal {T}^\dagger v(\mathbf {z}) =\frac{1}{\rho (\mathbf {z})} \int \rho (\mathbf {z}|\mathbf {z}')v(\mathbf {z}')\rho (\mathbf {z}')\mathrm {d}\mathbf {z}' \end{aligned}$$

is what Ref. [37] calls the forward transfer operator, because it propagates relative probability densities forward in time. On the other hand, Ref. [29] calls $\mathcal {T}^\dagger $ the backward operator and calls $\mathcal {T}$ itself the forward operator. To avoid confusion, use the term transfer operator for $\mathcal {T}$. The earlier work [13, 38] is in terms of the operator $\mathcal {T}^\dagger $. To get an expression for $\mathbb {E}[v(\mathbf {Z}_0)u(\mathbf {Z}_k)]$, write

$$\begin{aligned} \mathbb {E}[v(\mathbf {Z}_0)u(\mathbf {Z}_k)] =\int \!\!\int v(z)u(z')\rho _k(z'|z)\rho (z)\,\mathrm {d}z\mathrm {d}z' \end{aligned}$$

where $\rho _k(z'|z)$ is the iterated transition probability density function defined recursively by $\rho _1(z'|z) = \rho (z|z')$ and

$$\begin{aligned} \rho _k(z'|z) =\int \rho (z'|z'')\rho _{k-1}(z''|z)\mathrm {d}z'', \quad k = 2, 3,\ldots . \end{aligned}$$

By induction on k

$$\begin{aligned} \mathcal {T}^k u(z) = \mathcal {T}\mathcal {T}^{k-1}u(z) =\int \rho _k(z'|z) u(z')\mathrm {d}z', \end{aligned}$$

whence,

$$\begin{aligned} \mathbb {E}[v(\mathbf {Z}_0)u(\mathbf {Z}_k)] =\langle v, \mathcal {T}^k u\rangle . \end{aligned}$$

2.2.1 Properties of the transfer operator and IAcT

It is useful to establish some properties of $\mathcal {T}$ and the IAcT that will be used throughout the article. In particular, we shall provide a formula for $\tau (u)$ in terms of the transfer operator that will be the starting point for systematic improvements and that will later on allow us to estimate $\tau $ by solving a generalised eigenvalue problem.

Clearly, $\mathcal {T}\,1 = 1$, and 1 is an eigenvalue of $\mathcal {T}$. Here, where the context requires a function, the symbol 1 denotes the constant function that is identically 1. Where the context requires an operator, it denotes the identity operator. To remove the eigenspace corresponding to the eigenvalue $\lambda =1$ from $\mathcal {T}$, define the orthogonal projection operator

$$\begin{aligned} \mathcal {E}u =\langle 1, u\rangle \,1 \end{aligned}$$

and consider instead the operator

$$\begin{aligned} \mathcal {T}_0 =\mathcal {T}-\mathcal {E}. \end{aligned}$$

It is assumed that the eigenvalues $\lambda $ of $\mathcal {T}_0$ satisfy $|\lambda | < 1$, in other words, we assume that the underlying Markov chain is ergodic. Stationarity of the target density $\rho (\mathbf {z})$ w.r.t. $\rho (\mathbf {z}|\mathbf {z}')$ implies that $\mathcal {T}^\dagger \,1 = 1$ and that $\mathcal {T}^\dagger \mathcal {T}\,1 = 1$. Therefore, $\mathcal {T}^\dagger \mathcal {T}$ is a stochastic kernel. This implies that the spectral radius of $\mathcal {T}^\dagger \mathcal {T}$ is 1, and, since it is a symmetric operator, one has that

$$\begin{aligned} \langle \mathcal {T}u,\mathcal {T}u\rangle =\langle u,\mathcal {T}^\dagger \mathcal {T}u\rangle \le \langle u, u\rangle . \end{aligned}$$

(6)

The IAcT, given by Eq. (3), requires autocovariances, which one can express in terms of $\mathcal {T}_0$ as follows:

$$\begin{aligned} \begin{aligned} C(k) =&\, \langle (1 -\mathcal {E}) u, (1 -\mathcal {E})\mathcal {T}^k u\rangle \\ =&\, \langle (1 -\mathcal {E}) u, (1 -\mathcal {E})\mathcal {T}_0^k u\rangle \\ =&\, \langle (1 -\mathcal {E}) u,\mathcal {T}_0^k u\rangle , \end{aligned} \end{aligned}$$

(7)

which follows because $\mathcal {E}$ and $1 -\mathcal {E}$ are symmetric. Substituting Eqs. (7) into (3) gives

$$\begin{aligned} \tau (u) = \frac{\langle (1 -\mathcal {E})u,\mathcal {D}u\rangle }{\langle (1 -\mathcal {E})u, u\rangle }, \quad \text{ where } \mathcal {D}= 2\left( 1 - \mathcal {T}_0\right) ^{-1} - 1. \end{aligned}$$

(8)

It can be readily seen that $\tau $ is indeed nonnegative. With $v = (1 -\mathcal {T}_0)^{-1}u$, the numerator in Eq. (8) satisfies

$$\begin{aligned} \langle (1 -\mathcal {E})u,\mathcal {D}u\rangle= & {} \langle (1 -\mathcal {E})(1 -\mathcal {T}_0)v, (1 +\mathcal {T}_0)v\rangle \\= & {} \langle v, v\rangle -\langle \mathcal {T}v,\mathcal {T}v\rangle \\\ge & {} 0. \end{aligned}$$

Therefore, $\tau (u)\ge 0$ if $(1-\mathcal {E})u\ne 0$, where the latter is equivalent to $u\ne \mathbb {E}[u]$ being not a constant.

3 Sampling thoroughness and efficiency

Less than “thorough” sampling can degrade estimates of an IAcT. Reference [13, Sec. 1] proposes a notion of “quasi-reliability” to mean the absence of evidence in existing samples that would suggest a lack of sampling thoroughness. A notion of sampling thoroughness begins by considering subsets A of configuration space. The probability that $\mathbf {Q}\in A$ can be expressed as the expectation $\mathbb {E}[1_A]$ where $1_A$ is the indicator function for A. A criterion for thoroughness might be that

$$\begin{aligned} |\widehat{1_A} -\Pr (\mathbf {Q}\in A)|\le \text {tol}\quad \text{ where } \widehat{1_A} =\frac{1}{N}\sum _{n=1}^N 1_A(\mathbf {Q}_n). \end{aligned}$$

(9)

This is not overly stringent, since it does not require that there are any samples in sets A of probability $\le \text {tol}$.

The next step in the development of this notion is to replace the requirement $|\widehat{1_A} -\Pr (\mathbf {Q}\in A)|\le \text {tol}$ by something more forgiving of the random error in $\widehat{1_A}$. For example, we could require instead that

$$\begin{aligned} \left( \mathrm {Var}\left[ \widehat{1_A}\right] \right) ^{1/2}\le 0.5 \,\textit{tol}, \end{aligned}$$

which would satisfy Eq. (9) with 95% confidence, supposing an approximate normal distribution for the estimate. (If we are not willing to accept the Gaussian assumption, Chebychev’s inequality tells us that we reach 95% confidence level if we replace the right hand side by $0.05\,\text {tol}$.)

Now let $\tau _A$ be the integrated autocorrelation time for $1_A$. Because

$$\begin{aligned} \mathrm {Var}\left[ \widehat{1_A}\right]\approx & {} \tau _A\frac{1}{N}\mathrm {Var}\left[ 1_A(\mathbf {Z})\right] \\= & {} \tau _A\frac{1}{N}\Pr (\mathbf {Z}\in A)(1 -\Pr (\mathbf {Z}\in A)) \le \frac{1}{4N}\tau _A, \end{aligned}$$

it is enough to have $(1/4N)\tau _A\le (1/4)\text {tol}^2$ for all sets of configurations A to ensure thorough sampling (assuming again Gaussianity). The definition of good coverage might then be expressed in terms of the maximum $\tau (1_A)$ over all A. Note that the sample variance may not be a good criterion if all the candidate sets A have small probability $\Pr (\mathbf {Z}\in A)$, in which case it is rather advisable to consider the relative error [6].

Reference [13, Sec 3.1] then makes a leap, for the sake of simplicity, from considering just indicator functions to arbitrary functions. This leads to defining $\tau _{\mathrm {q},\max } =\sup _{\mathrm {Var}[u(\mathbf {Q})] > 0}\tau (u)$. The condition $\mathrm {Var}[u(\mathbf {Q})] > 0$ is equivalent to $(1 -\mathcal {E})u\ne 0$.

A few remarks on the efficient choice of preobservables are in order.

Remark 1

Generally, if there are symmetries present in both the distribution and the preobservables of interest, this may reduce the amount of sampling needed. Such symmetries can be expressed as bijections $\psi _\mathrm {q}$ for which $u(\psi _\mathrm {q}(\mathbf {q})) = u(\mathbf {q})$ and $\rho _\mathrm {q}(\psi _\mathrm {q}(\mathbf {q})) =\rho _\mathrm {q}(\mathbf {q})$. Examples include translational and rotational invariance, as well as interchangeability of atoms and groups of atoms. Let $\varPsi _\mathrm {q}$ denote the set of all such symmetries. The definition of good coverage then need only include sets A, which are invariant under all symmetries $\psi _\mathrm {q}\in \varPsi _\mathrm {q}$. The extension from indicator sets $1_A$ to general functions leads to considering $W_\mathrm {q}=\{u(\mathbf {q})~|~ u(\psi _\mathrm {q}(\mathbf {q})) = u(\mathbf {q})$ for

all $\psi _\mathrm {q}\in \varPsi _\mathrm {q}\}$ and defining

$$\begin{aligned} \tau _{\mathrm {q},\max } =\sup _{u\in W^0_{\mathrm {q}}}\tau (u) \end{aligned}$$

where $W_{\mathrm {q}}^0 = \{u\in W_\mathrm {q}~|~\mathrm {Var}[u(\mathbf {Q})] > 0\}$.

Remark 2

Another consideration that might dramatically reduce the set of relevant preobservables is the attractiveness of using collective variables $\zeta =\xi (q)$ to characterize structure and dynamics of molecular systems. This suggests considering only functions defined on collective variable space, hence, functions of the form $\bar{u}(\xi (q))$.

4 Computing the maximum IAcT

The difficulty of getting reliable estimates for $\tau (u)$ to compute the maximum IAcT makes it interesting to consider alternative formulation.

4.1 A transfer operator-based formulation

Although, there is little interest in sampling functions of auxiliary variables like momenta, it may be useful to consider phase space sampling efficiency. Specifically, a maximum over phase space is an upper bound and it might be easier to estimate. Putting aside exploitation of symmetries, the suggestion is to using $\tau _{\max } =\sup _{\mathrm {Var}[u(\mathbf {Z})] > 0}\tau (u)$. One has, with a change of variables, that

$$\begin{aligned} \tau \left( \left( 1 - \mathcal {T}_0\right) v\right) =\tau _2(v) \end{aligned}$$

where

$$\begin{aligned} \tau _2(v) =\frac{\langle (1 -\mathcal {T})v, (1 +\mathcal {T})v\rangle }{\langle (1 -\mathcal {T})v, (1 -\mathcal {T})v\rangle }. \end{aligned}$$

This follows from $\langle (1 -\mathcal {E})(1 -\mathcal {T}_0)v, (1\pm \mathcal {T}_0)v\rangle = \langle (1 -\mathcal {T})v, (1\pm \mathcal {T})v\mp \mathcal {E}v\rangle =\langle (1 -\mathcal {T})v, (1\pm \mathcal {T})v\rangle $. Therefore,

$$\begin{aligned} \tau _{\max }= & {} \sup _{\mathrm {Var}\left[ \left( 1 -\mathcal {T}_0\right) v(\mathbf {Z})\right]> 0}\tau \left( \left( 1 -\mathcal {T}_0\right) v\right) \\= & {} \sup _{\mathrm {Var}\left[ \left( 1 -\mathcal {T}_0\right) v(\mathbf {Z})\right]> 0}\tau _2(v) =\sup _{\mathrm {Var}[v(\mathbf {Z})] > 0}\tau _2(v). \end{aligned}$$

The last step follows because $(1 -\mathcal {T}_0)$ is nonsingular.

Needed for an estimate of $\tau _2(v)$ is $\langle \mathcal {T}v,\mathcal {T}v\rangle $. To evaluate $\langle \mathcal {T}v,\mathcal {T}v\rangle $, proceed as follows: Let $\mathbf {Z}'_{n+1}$ be an independent realization of $\mathbf {Z}_{n+1}$ from $\mathbf {Z}_n$. In particular, repeat the step, but with an independent stochastic process having the same distribution. Then

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ v\left( \mathbf {Z}_1\right) v\left( \mathbf {Z}'_1\right) \right]&= \int \!\int v(z)v(z') \\&~\quad \times \int \rho (z|z'')\rho (z'|z'')\rho (z'')\mathrm {d}z''\,\mathrm {d}z\mathrm {d}z' \\&= \langle \mathcal {T}v,\mathcal {T}v\rangle . \end{aligned} \end{aligned}$$

(10)

For certain simple preobservables and propagators having the simple form of BAOAB, the samples $v(\mathbf {Z}_n) v(\mathbf {Z}'_n)$ might be obtained at almost no extra cost, and their accuracy improved and their cost reduced by computing conditional expectations analytically.

This approach has been tested on the model problem of Sect. 5, a Gaussian process, and found to be significantly better than the use of acor. Unfortunately, this observation is not generalisable: For example, for a double well potential, it is difficult to find preobservables $v(\mathbf {z})$, giving a computable estimate of $\tau _{\max }$ which comes close to an estimate from using acor with $u(\mathbf {z}) = z_1$.

Another drawback is that the estimates, though computationally inexpensive, require accessing intermediate values in the calculation of a time step, which are not normally an output option of an MD program. Therefore, we will discuss alternatives in the next two paragraphs.

4.2 A generalised eigenvalue problem

Let $\mathbf {u}(\mathbf {z})$ be a row vector of arbitary basis functions $u_i(\mathbf {z})$, $i = 1,2,\ldots ,\text {imax}$ that span a closed subspace of the Hilbert space associated with the inner product $\langle \cdot ,\cdot \rangle $ defined by (5) and consider the linear combination $u(\mathbf {z}) =\mathbf {u}(\mathbf {z})^\mathsf {T}\mathbf {x}$. One has

$$\begin{aligned} \tau (u) =\frac{\langle (1 -\mathcal {E})u,\mathcal {D}u\rangle }{\langle (1 -\mathcal {E})u, u\rangle } =\frac{\mathbf {x}^\mathsf {T}\mathbf {D}\mathbf {x}}{\mathbf {x}^\mathsf {T}\mathbf {C}_0\mathbf {x}} \end{aligned}$$

where

$$\begin{aligned} \mathbf {D}= \langle (1 -\mathcal {E})\mathbf {u},\mathcal {D}\mathbf {u}^\mathsf {T}\rangle \quad \text{ and }\quad \mathbf {C}_0 =\langle (1 -\mathcal {E})\mathbf {u},\mathbf {u}^\mathsf {T}\rangle . \end{aligned}$$

If the span of the basis is sufficiently extensive to include preobservables having the greatest IAcTs (e.g. polynomials, radial basis functions, spherical harmonics, etc.), the calculation of $\tau _{\max }$ reduces to that of maximizing $\mathbf {x}^\mathsf {T}\mathbf {D}\mathbf {x}/(\mathbf {x}^\mathsf {T}\mathbf {C}_0\mathbf {x})$ over all $\mathbf {x}$, which is equivalent to solving the symmetric generalized eigenvalue problem

$$\begin{aligned} \frac{1}{2}(\mathbf {D}+\mathbf {D}^\mathsf {T})\mathbf {x}= \lambda \mathbf {C}_0\mathbf {x}. \end{aligned}$$

(11)

It should be noted that the maximum over all linear combinations of the elements of $\mathbf {u}(\mathbf {z})$ can be arbitrarily greater than use of any of the basis functions individually. Moreover, in practice, the coefficients in (11) will be random in that they have to be estimated from simulation data, which warrants special numerical techniques. These techniques, including classical variance reduction methods, Markov State Models or specialised basis functions, are not the main focus of this article and we therefore refer to the articles [19, 32], and the references given there.

Remark 3

Appendix B records different notions of reversibility of the transfer operator that entail specific restrictions on the admissible basis functions that guarantee that the covariance matrices, and thus $\mathbf {C}_0$, remain symmetric.

4.3 The use of acor

It is not obvious how to use an IAcT estimator to construct matrix off-diagonal elements $D_{ij} =\langle (1 -\mathcal {E})u_i,\mathcal {D}u_j^\mathsf {T}\rangle $, $j\ne i$, from the time series $\{\mathbf {u}(\mathbf {Z}_m)\}$. Nevertheless, it makes sense to use arcor as a preprocessing or predictor step to generate an initial guess for an IAcT. The acor estimate for a scalar preobservable $u(\mathbf {z})$ has the form

$$\begin{aligned} \widehat{\tau } = \widehat{D}/\widehat{C}_0, \end{aligned}$$

where

$$\begin{aligned} \widehat{C}_0 =\widehat{C}_0\left( \{u\left( \mathbf {Z}_n\right) -\hat{U}\},\{u\left( \mathbf {Z}_n\right) -\hat{U}\}\right) , \end{aligned}$$

and

$$\begin{aligned} \widehat{D} =\widehat{D}\left( \{u\left( \mathbf {Z}_n\right) -\hat{U}\},\left\{ u\left( \mathbf {Z}_n\right) -\hat{U}\right\} \right) , \end{aligned}$$

are bilinear functions of their arguments that depend on the number of reductions $ reducs $ where $\hat{U}$ denotes the empirical mean of $\{\mathbf {u}(\mathbf {Z}_m)\}$.

The tests reported in Sects. 5–7 then use the following algorithm. (In what follows we assume that $\{\mathbf {u}(\mathbf {Z}_m)\}$ has been centred by subtracting the empirical mean.)

Ref. [13, Sec. 3.5] uses a slightly different algorithm that proceeds as follows:

In the experiments reported here, the original algorithm sometimes does one reduction fewer than the new algorithm.

Remark 4

Theoretically, the matrix $\mathbf {D}+\mathbf {D}^\mathsf {T}$ is positive definite. If it is not, that suggests that the value of $ reducs $ is not sufficiently conservative, in which case $ reducs $ needs to be reduced. A negative eigenvalue might also arise if the Markov chain does not converge due to a stepsize $\varDelta t$ that is too large. This can be confirmed by seeing whether the negative eigenvalue persists for a larger number of samples.

5 Analytical result for the model problem

The question of optimal choice for the damping coefficient is addressed in Ref. [38, Sec. 5.] for the standard model problem $F(\mathbf {q}) = - K\mathbf {q}$, where K is symmetric positive definite, for which the Langevin equation is

$$\begin{aligned} \begin{aligned} \mathrm {d}\mathbf {Q}_t&= M^{-1}\mathbf {P}_t\,\mathrm {d}t,\\ \mathrm {d}\mathbf {P}_t&= -K\mathbf {Q}_t\,\mathrm {d}t -\gamma \mathbf {P}_t\,\mathrm {d}t +\textstyle \sqrt{\frac{2\gamma }{\beta }}M_\mathrm {h}\,\mathrm {d}\mathbf {W}_t. \end{aligned} \end{aligned}$$

(12)

Changing variables $\mathbf {Q}' =M_\mathrm {h}^\mathsf {T}\mathbf {Q}$ and $\mathbf {P}' =M_\mathrm {h}^{-1}\mathbf {P}$ and dropping the primes gives $\mathrm {d}\mathbf {Q}_t = \mathbf {P}_t\,\mathrm {d}t$,

$$\begin{aligned} \mathrm {d}\mathbf {P}_t = -M_\mathrm {h}^{-1}KM_\mathrm {h}^{-\mathsf {T}}\mathbf {Q}_t\,\mathrm {d}t -\gamma \mathbf {P}_t\,\mathrm {d}t +\sqrt{2\gamma /\beta }\,\mathrm {d}\mathbf {W}_t. \end{aligned}$$

With an orthogonal change of variables, this decouples into scalar equations, each of which has the form

$$\begin{aligned} \mathrm {d}Q_t = P_t\,\mathrm {d}t,\quad \mathrm {d}P_t = -\omega ^2 Q_t\,\mathrm {d}t -\gamma P_t\,\mathrm {d}t +\sqrt{2\gamma /\beta }\,\mathrm {d}W_t \end{aligned}$$

where $\omega ^2$ is an eigenvalue of $M_\mathrm {h}^{-1}KM_\mathrm {h}^{-\mathsf {T}}$, or, equivalently, an eigenvalue of $M^{-1}K$. Changing to dimensionless variables $t' =\omega t$, $\gamma ' =\gamma /\omega $, $Q' = (\beta m)^{1/2}\omega Q$, $P' = (\beta /m)^{1/2}P$, and dropping the primes gives

$$\begin{aligned} \mathrm {d}Q_t = P_t\,\mathrm {d}t,\quad \mathrm {d}P_t = - Q_t\,\mathrm {d}t -\gamma P_t\,\mathrm {d}t +\sqrt{2\gamma }\,\mathrm {d}W_t. \end{aligned}$$

(13)

For an MCMC propagator, assume exact integration with step size $\varDelta t$.

From Ref. [38, Sec. 5.1], one has $\mathcal {T}= (\mathrm {e}^{\varDelta t\mathcal {L}})^\dagger =\exp (\varDelta t$

$\mathcal {L}^\dagger )$ where

$$\begin{aligned} \mathcal {L}^\dagger f = p\frac{\partial }{\partial q}f - q\frac{\partial }{\partial p}f -\gamma p\frac{\partial }{\partial p}f +\gamma \frac{\partial ^2}{\partial p^2}f. \end{aligned}$$

The Hilbert space defined by the inner product from Eq. (5) has, in this case, a decomposition into linear subspaces $\mathbb {P}_k =\mathrm {span}\{ He _m(q) He _n(p)~|~m + n = k\}$ (denoted by $\mathbb {P}'_k$ in Ref. [38, Sec. 5.3]). Let

$$\begin{aligned} \mathbf {u}_k^\mathsf {T}= \left[ He _k(q) He _0(p),\, He _{k-1}(q) He _1(p),\,\ldots ,\, He _0(q) He _k(p)\right] , \end{aligned}$$

and, in particular,

$$\begin{aligned} \mathbf {u}_1^\mathsf {T}= & {} [ q,\,p],\\ \mathbf {u}_2^\mathsf {T}= & {} \left[ q^2 - 1,\,qp,\,p^2 - 1\right] ,\\ \mathbf {u}_3^\mathsf {T}= & {} \left[ q^3 - 3q,\,\left( q^2 - 1\right) p,\,q\left( p^2 - 1\right) ,\,p^3 - 3p\right] ,\\ \mathbf {u}_4^\mathsf {T}= & {} \left[ q^4 - 6q^2 + 3, \,\left( q^3 - 3q\right) p,\,\left( q^2 - 1\right) \left( p^2 - 1\right) ,\right. \\&\left. q\left( p^3 - 3p\right) ,\,p^4 - 6p + 3\right] . \end{aligned}$$

With a change of notation from Ref. [38, Sec. 5.3], $\mathcal {L}\mathbf {u}_k^\mathsf {T}= \mathbf {u}_k^\mathsf {T}\mathbf {A}_k$, with $\mathbf {A}_k$ given by

$$\begin{aligned} \mathbf {A}_k = \left[ \begin{array}{cccc} 0 &{} 1 &{} &{} \\ -k &{} -\gamma &{} \ddots &{} \\ &{} \ddots &{} \ddots &{} k \\ &{} &{} -1 &{} -k\gamma \\ \end{array}\right] . \end{aligned}$$

(14)

One can show, using arguments similar to those in [38, Sec. 5.3], that $\mathbb {P}_k$ closed under application of $\mathcal {L}^\dagger $. Therefore, $\mathcal {L}^\dagger \mathbf {u}_k^\mathsf {T}=\mathbf {u}_k^\mathsf {T}\mathbf {B}_k$ for some $k+1$ by $k+1$ matrix $\mathbf {B}_k$. Forming the inner product of $\mathbf {u}_k$ with each side of this equation gives $\mathbf {B}_k =\mathbf {C}_{k,0}^{-1}\langle \mathbf {u}_k,\mathcal {L}^\dagger \mathbf {u}_k^\mathsf {T}\rangle $ where $\mathbf {C}_{k,0}=\langle \mathbf {u}_k,\mathbf {u}_k^\mathsf {T}\rangle $. It follows that

$$\begin{aligned} \mathbf {B}_k =\mathbf {C}_{k,0}^{-1}\langle \mathbf {u}_k,\mathcal {L}^\dagger \mathbf {u}_k^\mathsf {T}\rangle =\mathbf {C}_{k,0}^{-1}\langle \mathcal {L}\mathbf {u}_k,\mathbf {u}_k^\mathsf {T}\rangle \end{aligned}$$

and

$$\begin{aligned} \mathcal {L}^\dagger \mathbf {u}_k^\mathsf {T}=\mathbf {u}_k^\mathsf {T}\mathbf {C}_{k,0}^{-1}\mathbf {A}_k^\mathsf {T}\mathbf {C}_{k,0}. \end{aligned}$$

The Hermite polynomials $\mathbf {u}_k$ are orthogonal and

$$\begin{aligned} \mathbf {C}_{k,0} =\mathrm {diag}\left( k!0!,\,(k-1)!1!,\,\ldots ,\,0!k!\right) . \end{aligned}$$

Also, $\mathcal {E}\mathbf {u}_k^\mathsf {T}=\mathbf {0}^\mathsf {T}$. Accordingly,

$$\begin{aligned} \mathcal {T}_0\mathbf {u}_k^\mathsf {T}=\mathcal {T}\mathbf {u}_k^\mathsf {T}=\mathbf {u}_k^\mathsf {T}\mathbf {C}_{k,0}^{-1}\exp \left( \varDelta t\mathbf {A}_k^\mathsf {T}\right) \mathbf {C}_{k,0}, \end{aligned}$$

and

$$\begin{aligned} \mathcal {D}\mathbf {u}_k^\mathsf {T}=\mathbf {u}_k^\mathsf {T}\mathbf {C}_{k,0}^{-1}\mathbf {D}_k, \end{aligned}$$

(15)

where

$$\begin{aligned} \mathbf {D}_k= & {} \mathbf {C}_{k,0} \left( 2\left( I -\mathbf {C}_{k,0}^{-1}\exp \left( \varDelta t\mathbf {A}_k^\mathsf {T}\right) \mathbf {C}_{k,0}\right) ^{-1} - I\right) \\= & {} -\coth \left( \frac{\varDelta t}{2}\mathbf {A}_k^\mathsf {T}\right) \mathbf {C}_{k,0}. \end{aligned}$$

A formula for $\tau (u)$ is possible if u(q) can be expanded in Hermite polynomials as $u =\sum _{k=1}^\infty c_k He _k$. Then, from Eq. (15), $\mathcal {D} He _k\in \mathbb {P}_k$, not to mention $ He _k\in \mathbb {P}_k$. Using these facts and the mutual orthogonality of the subspaces $\mathbb {P}_k$, it can be shown that

$$\begin{aligned} \tau (u) =\frac{\sum _{k=1}^\infty k!c_k^2\tau ( He _k)}{\sum _{k=1}^\infty k!c_k^2}. \end{aligned}$$

(16)

From this, it follows that $\max _u\tau (u) =\max _k\tau ( He _k)$.

Since $ He _k =\mathbf {u}_k^\mathsf {T}\mathbf {x}$ with $\mathbf {x}=[1, 0,\ldots ,0]^\mathsf {T}$, one has

$$\begin{aligned} \tau \left( He _k\right) =\left( \mathbf {D}_k\right) _{11}/\left( \mathbf {C}_{k,0}\right) _{11} = \left( \coth \left( -\frac{\varDelta t}{2}\mathbf {A}_k\right) \right) _{11}. \end{aligned}$$

(17)

Asymptotically $\tau ( He _k) = -(2/\varDelta t)(\mathbf {A}_k^{-1})_{11}$, in the limit as $\varDelta t\rightarrow 0$. In particular,

$$\begin{aligned} \mathbf {A}_1^{-1}= \left[ \begin{array}{rrr} -\gamma &{}\quad -1 \\ 1 &{}\quad 0 \end{array}\right] \end{aligned}$$

(18)

and

$$\begin{aligned} \mathbf {A}_2^{-1}= -\frac{1}{2\gamma }\left[ \begin{array}{c@{\quad }c@{\quad }c} \gamma ^2 + 1 &{} -2\gamma &{} 1 \\ \gamma &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 \end{array}\right] . \end{aligned}$$

(19)

Writing $\tau ( He _k)$ as an expansion in powers of $\varDelta t$,

$$\begin{aligned} \tau ( He _k) = T_k(\gamma )/\varDelta t +\mathcal {O}(\varDelta t), \end{aligned}$$

one has $T_1(\gamma ) = 2\gamma $ and $T_2(\gamma ) =\gamma +1/\gamma $. Fig. 1 plots $T_k(\gamma )$, $k = 1, 2, 3, 4$, $1/2\le \gamma \le 4$. Empirically, $\max _k T_k = T_{\max }{\mathop {=}\limits ^{\mathrm {def}}}\max \{T_1, T_2\}$.

Restoring the original variables, one has

$$\begin{aligned} \tau _{\mathrm {q},\max } = T_{\max }(\gamma /\omega )/(\omega \varDelta t) +\mathcal {O}(\omega \varDelta t). \end{aligned}$$

The leading term increases as $\omega $ decreases, so $\tau _{\mathrm {q},\max }$ depends on the lowest frequency $\omega _1$. And $\tau _{\mathrm {q},\max }$ is minimized at $\gamma =\omega _1$, which is half of the critical value $\gamma = 2\omega _1$. Contrast this with the result [38, Sec. 5.] for the phase space maximum IAcT, which is minimized for $\gamma =(\sqrt{6}/2)\omega _1$.

Remark 5

The result is consistent with related results from [4, 12] that consider optimal damping coefficients that maximise the speed of convergence measured in relative entropy. Specifically, calling $\eta _t=\mathcal {N}(\mu _t,\varSigma _t)$ the law of the solution to (13), with initial conditions $(Q_t,P_t)=(q,p)$; see Appendix A for details. Then, using [2, Thm. 4.9], we have

$$\begin{aligned} KL\left( \eta _t,\rho \right) \le M\exp (-2\alpha t), \end{aligned}$$

where $M\in (1,\infty )$ and $\alpha $ denotes the spectral abcissa of the matrix A in Appendix A, i.e. the negative real part of the eigenvalue that is closest to the imaginary axis. Here

$$\begin{aligned} KL(f,g) = \int \log \frac{f(z)}{g(z)}f(z)\,{\text {d}}z, \end{aligned}$$

denotes the relative entropy (or: Kullback–Leibler divergence) between two phase space probability densities f and g, assuming that

$$\begin{aligned} \int _{\{g(z)=0\}} f(z)dz = 0. \end{aligned}$$

(Otherwise we set $KL(f,g)=\infty $.)

It is a straightforward calculation to show that the maximum value for $\alpha $ (that gives the fastest decay of $KL(\eta _t,\rho )$) is attained at $\gamma =2$, which is in agreement with the IAcT analysis. For analogous statements on the multidimensional case, we refer to [4].

We should mention that that there may be cases, in which the optimal damping coefficient may lead to a stiff Langevin equation, depending on the eigenvalue spectrum of the Hessian of the potential energy function. As a consequence, optimizing the damping coefficient may reduce the maximum stable step size $\varDelta t$ that can be used in numerical simulations.

5.1 Application to more general distributions

Note that for the model problem, the matrix K can be extracted from the covariance matrix

$$\begin{aligned} \mathrm {Cov}[\mathbf {Q}] = (1/\beta )K^{-1}. \end{aligned}$$

Therefore, as a surrogate for the lowest frequency $\omega _1$, and as a recommended value for $\gamma $, consider using

$$\begin{aligned} \gamma ^*=\left( \lambda _{\min }\left( M^{-1}K\right) \right) ^{1/2} = \left( \beta \lambda _{\max }\left( \mathrm {Cov}[\mathbf {Q}]M\right) \right) ^{-1/2}. \end{aligned}$$

5.2 Sanity check

As a test of the accuracy of acor and the analytical expression (16), the IAcT is calculated by acor for a time series generated by the exact analytical propagator (given in Appendix A) for the reduced model problem given by Eq. (12). For the preobservable, we choose

$$\begin{aligned} u(q) = He _3(q)/\sqrt{3!} - He _2(q)/\sqrt{2!}, \end{aligned}$$

where $ He _2(q) = q^2 - 1$ and $ He _3(q) = q^3 - 3q$ are Hermite polynomials of degree 2 and 3; as damping coefficient, we choose $\gamma = 2$, which is the critical value; the time increment is $\varDelta t = 0.5$, which is about 1/12 th of a period.

In this and the other results reported here, equilibrated initial values are obtained by running for 50000 burn-in steps.

As the dependence of the estimate on N is of interest here, we run $M=10^3$ independent realisations for each value of N, from which we can estimate the relative error

$$\begin{aligned} \delta _N(\tau (u)) = \frac{\sqrt{\mathrm {Var}[\tau (u)]}}{\mathbb {E}[\tau (u)]}, \end{aligned}$$

which we expect to decay as $N^{-1/2}$. Figure 2 shows the relative error in the estimated IAcT $\tau (u)$ for $N = 2^{13}$, $2^{14}$, ..., $2^{22}$. The least-squares fit of the log relative error as a function of $\log N$ has slope $m=0.4908$. Thus we observe a nearly perfect $N^{-1/2}$ decay of the relative error, in accordance with the theoretical prediction.

6 A simple example

The procedure to determine the optimal damping coefficient in the previous section is based on linear Langevin systems. Even though the considerations of Sect. 5 do not readily generalize to nonlinear systems, it is plausible to use the harmonic approximation as a proxy for more general systems, since large IAcT values are often due to noise-induced metastability, in which case local harmonic approximations inside metastable regions are suitable.

For estimating the maximum IAcT, the model problem therefore suggests the use of linear, quadratic and cubic functions of the coordinates, where the latter is suitable to capture the possible non-harmonicity of the potential energy wells in the metastable regime.

The first test problem, which is from Ref. [23], possesses an asymmetric multimodal distribution. It uses $U(q) = \frac{1}{4} q^4 +\sin (1+5q)$ and $\beta = 1$, and it generates samples using BAOAB with a step size $\varDelta t = 0.2$, which is representative of step sizes used in Ref. [23]. Figure 3 plots with dotted lines the unnormalized probability density function.

6.1 Choice of basis

A first step is to find a preobservable that produces a large IAcT. It would be typical of actual practice to try to select a good value for $\gamma $. To this end, choose $\gamma =\gamma ^*= 1.276$,

To obtain this value, do a run of sample size $N = 2\cdot 10^6$ using $\gamma = 1$, as in one of the tests in Ref. [23].

With a sample size $N = 10^7$, the maximum IAcT is calculated for polynomials of increasing degree using the approach described in Sects. 4.2–4.3. Odd degrees produces somewhat greater maxima than even degrees. For cubic, quintic, and septic polynomials, $\tau _{\max }$ has values 59.9, 63.9, 65.8, respectively. As a check that the sample size is adequate, the calculations are redone with half the sample size.

Figure 3 shows how the maximizing polynomial evolves as its degree increases from 3 to 5 to 7.

6.2 Optimal choice of damping coefficient

The preceding results indicate that septic polynomials are a reasonable set of functions for estimating $\tau _{\mathrm {q},\max }$. For 25 values of $\gamma $, ranging from 0.2 to 5, the value of $\tau _{\mathrm {q},\max }$ was thus estimated, each run consisting of $N = 10^7$ samples.

The optimal value is $\gamma = 1.8 = 1.4\gamma ^*$, which is close the heuristic choice $\gamma ^*$ for a damping coefficient. Figure 4 plots $\tau _{\mathrm {q},\max }$ vs. the ratio $\gamma /\gamma ^*$.

With respect to this example, Ref. [23, Sec. 5] states, “We were concerned that the improved accuracy seen in the high $\gamma $ regime might come at the price of a slower convergence to equilibrium”. The foregoing results indicate that the value $\gamma = 1$ used in one of the tests is near the apparent optimal value $\gamma = 1.8$. Hence, the superior accuracy of BAOAB over other methods observed in the low $\gamma $ regime does not come at the price of slower convergence.

7 Sum of three Gaussians

The next, perhaps more challenging, test problem uses the sum of three (equidistant) Gaussians for the distribution, namely.

$$\begin{aligned}&\exp (-V(x, y))\\&\quad = \exp \left( -((x-d)^2 + y^2)/2\right) \\&\qquad +\exp \left( -\left( (x+d/2)^2 + \left( y-\sqrt{3}d/2\right) ^2\right) /2\right) \\&\qquad \left. +\exp \left( -\left( (x+d/2)^2 + \left( y+\sqrt{3}d/2\right) ^2\right) /2\right) \right) , \end{aligned}$$

where d is a parameter that measures the distance of the three local minima from the origin. Integrating the Langevin system using BAOAB with a step size $\varDelta t = 0.5$ as for the model problem, which is what V(x, y) becomes if $d = 0$.

Shown in Fig. 5 are the first $8\cdot 10^4$ points of a trajectory where $d = 4.8$.

7.1 Choice of basis

To compare $\tau _{\max }$ for different sets of preobservables, choose $\gamma =\gamma ^*= 0.261$, and with $\gamma $ so chosen, run the simulation with $d = 4.8$ for $N = 10^7$ steps. To compute $\gamma ^*$, run the simulation for $N = 2\cdot 10^6$ steps with $\gamma =1$ (which is optimal for $d = 0$).

Here are the different sets of preobservables and the resulting values of $\tau _{\max }$:

1.
linear polynomials of x and y, for which $\tau _{\max } = 18774$,
2.
quadratic polynomials of x and y, for which $\tau _{\max } = 19408$,
3.
linear combinations of indicator functions $\{1_A, 1_B, 1_C\}$ for the three conformations
$$\begin{aligned} A= & {} \{(x, y)~:~ |y|\le \sqrt{3}x\},\\ B= & {} \{(x, y)~:~ y\ge 0 \text{ and } y\ge \sqrt{3}x\},\\ C= & {} \{(x, y)~:~ y\le 0 \text{ and } y\le -\sqrt{3}x\}, \end{aligned}$$
for which $\tau _{\max } = 18492$,
4.
$1_A$ alone, for which $\tau = 12087$,
5.
$1_B$ alone, for which $\tau = 5056$,
6.
$1_C$ alone, for which $\tau = 4521$.

As consequence of these results, the following section uses quadratic polynomials to estimate $\tau _{\mathrm {q},\max }$.

7.2 Optimal choice of damping coefficient

Shown in Fig. 6 is a plot of $\tau _{\mathrm {q},\max }$ vs. the ratio $\gamma /\gamma ^*$. To limit the computing time, we set the parameter to $d = 4.4$ rather than 4.8 as in Sect. 7.1; for $d = 4.4$, we have $\gamma ^\star = 0.285$, obtained using the same protocol as does Sect. 7.1.

We consider $0.05\le \gamma \le 2.2$ in increments of 0.01 from 0.05 to 0.2, and in increments of 0.1 from 0.2 to 2.2. Each data point is based on a run of $N = 2\cdot 10^7$ time steps. Even though the variance of the estimator is not negligible for our choice of simulation parameters, it is clearly visible that the minimum of $\tau _{\mathrm {q},\max }$ is attained at $\gamma \approx \gamma ^*$.

8 Conclusions

We have discussed the question of how to choose the damping coefficient in (underdamped) Langevin dynamics that leads to efficient sampling of the stationary probability distribution or expectations of certain observables with respect to this distribution. Here, efficient sampling is understood as minimizing the maximum possible (worst case) integrated autocorrelation time (IAcT). We propose a numerical method that is based on the concept of phase space preobservables that span a function space over which the worst-case IAcT is computed using trajectory data; the optimal damping coefficient can then chosen on the basis of this information.

Based on heuristics derived from a linear Langevin equation, we derive rules of thumb for choosing good preobservables for more complicated dynamics. The results for the linear model problem are in agreement with recent theoretical results on Ornstein–Uhlenbeck processes with degenerate noise, and they are shown to be a good starting point for a systematic analysis of nonlinear Langevin samplers.

Data Availability Statement

This manuscript has associated data in a data repository. [Authors’ comment: The repository is available at https://github.com/carstenhartmann/gamma.]

References

J. An, J. Lu, L. Ying, Stochastic modified equations for the asynchronous stochastic gradient descent. Inf. Inference 11, iaz030 (2019)
Google Scholar
A. Arnold, J. Erb. Sharp entropy decay for hypocoercive and non-symmetric Fokker-Planck equations with linear drift. arXiv:1409.5425, (2014)
M. Betancourt, The convergence of Markov chain Monte Carlo methods: from the Metropolis method to Hamiltonian Monte Carlo. Ann. Phys. 531(3), 1700214 (2019)
Article MathSciNet Google Scholar
T. Breiten, C. Hartmann, U. Sharma. Stochastic gradient descent and fast relaxation to thermodynamic equilibrium: a stochastic control approach. arXiv:2103.05096, (2021)
Y. Cao, J. Lu, L. Wang. On explicit $L^2$-convergence rate estimate for underdamped Langevin dynamics. arXiv e-prints, page arXiv:1908.04746, (2019)
V. Caron, A. Guyader, M.M. Zuniga, B. Tuffin, Some recent results in rare event estimation. ESAIM Proc. 44, 239–259 (2014)
Article MathSciNet Google Scholar
M. Ceriotti, G. Bussi, M. Parrinello, Colored-noise thermostats á la carte. J. Chem. Theory Comput. 6(4), 1170–1180 (2010)
Article Google Scholar
X. Cheng, N. Chatterji, P. Bartlett, M. Jordan. Underdamped Langevin MCMC: A non-asymptotic analysis, ed. by S. Bubeck, V. Perchet, P. Rigollet, Proceedings of Machine Learning Research, vol. 75. PMLR, (2018), pp 300–323
N. Ding, Y. Fang, R. Babbush, C. Chen, R. Skeel, H. Neven, Bayesian sampling using stochastic gradient thermostats, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, (2014), pp. 32303–3211
J. Dolbeault, C. Mouhot, C. Schmeiser, Hypocoercivity for kinetic equations with linear relaxation terms. Comptes Rendus Mathematique 347(9), 511–516 (2009)
Article MathSciNet Google Scholar
A. Duncan, N. Nüsken, G. Pavliotis, Using perturbed underdamped Langevin dynamics to efficiently sample from probability distributions. J. Stat. Phys. 169, 1098–1131 (2017)
Article ADS MathSciNet Google Scholar
A. Eberle, A. Guillin, R. Zimmer, Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab. 47(4), 1982–2010 (2019)
Article MathSciNet Google Scholar
Y. Fang, Y. Cao, R. D. Skeel. Quasi-reliable estimates of effective sample size. IMA J. Numer. Anal. (2020). https://doi.org/10.1093/imanum/draa077
Y. Fang, J.M. Sanz-Serna, R.D. Skeel, Compressible generalized hybrid Monte Carlo. J. Chem. Phys. 140(17), 174108 (2014). 10 pages
Article ADS Google Scholar
Y. Fang, J.M. Sanz-Serna, R.D. Skeel, Erratum: “Compressible generalized hybrid Monte Carlo” [J. Chem. Phys. 140, 174108 (2014)]. J. Chem. Phys. 144(140), 174108 (2016). (1 page)
C.J. Geyer, Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–511 (1992)
Google Scholar
I. Gitman, H. Lang, P. Zhang, L. Xiao, Understanding the role of momentum in stochastic gradient methods, ed. by H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, R. Garnett, NIPS 2019, vol 32, (2019), pp. 9630–9640
J. Goodman, Acor, statistical analysis of a time series, Spring 2009. http://www.math.nyu.edu/faculty/goodman/software/acor/. Accessed 5 Jan 2021
A. Grossfield, D.M. Zuckerman, Quantifying Uncertainty and Sampling Quality in Biomolecular Simulations. Volume 5 of Annual Reports in Computational Chemistry (Elsevier, Amsterdam, 2009), pp. 23–48
Google Scholar
C.-R. Hwang, S.-Y. Hwang-Ma, S.-J. Sheu, Accelerating diffusions. Ann. Appl. Probab. 15(2), 1433–1444 (2005)
Article MathSciNet Google Scholar
J.A. Izaguirre, D.P. Catarello, J.M. Wozniak, R.D. Skeel, Langevin stabilization of molecular dynamics. J. Chem. Phys. 114(5), 2090–2098 (2001)
Article ADS Google Scholar
C. Le Bris, T. Lelièvre, M. Luskin, D. Perez, A mathematical formalization of the parallel replica dynamics. Monte Carlo Methods Appl. 18(2), 119–146 (2012)
MathSciNet MATH Google Scholar
B. Leimkuhler, C. Matthews, Rational construction of stochastic numerical methods for molecular sampling. Appl. Math. Res. Express 2013, 34–56 (2013)
B. Leimkuhler, C. Matthews, Molecular Dynamics: With Deterministic and Stochastic Numerical Methods (Springer International Publishing, New York, 2015)
MATH Google Scholar
B. Leimkuhler, C. Matthews, T. Vlaar, Partitioned integrators for thermodynamic parameterization of neural networks. Found. Data Sci. 1(4), 457–489 (2019)
Article Google Scholar
T. Lelièvre, F. Nier, G. Pavliotis, Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J. Stat. Phys. 152(2), 237–274 (2013)
Article ADS MathSciNet Google Scholar
T. Lelièvre, G. Stoltz, Partial differential equations and stochastic methods in molecular dynamics. Acta Numerica 25, 681–880 (2016)
Article MathSciNet Google Scholar
Q. Li, C. Tai, W. E, Stochastic modified equations and dynamics of stochastic gradient algorithms i: Mathematical foundations. J. Mach. Learn. Res. 20(40), 1–47 (2019)
MathSciNet MATH Google Scholar
J. Liu, W. Wong, A. Kong, Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81, 27–40 (1994)
Article MathSciNet Google Scholar
E. Lyman, D.M. Zuckerman, On the structural convergence of biomolecular simulations by determination of the effective sample size. J. Phys. Chem. B 111, 12876–12882 (2007)
Article Google Scholar
N. Madras, A.D. Sokal, The pivot algorithm: a highly efficient Monte Carlo method for the self-avoiding walk. J. Stat. Phys. 50(1–2), 109–186 (1988)
Article ADS MathSciNet Google Scholar
F. Noé, F. Nüske, A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Model. Simul. 11(2), 635–655 (2013)
Article MathSciNet Google Scholar
L. Rey-Bellet, K. Spiliopoulos, Irreversible Langevin samplers and variance reduction: a large deviations approach. Nonlinearity 28(7), 2081 (2015)
Article ADS MathSciNet Google Scholar
C. Robert, V. Elvira, N. Tawn, C. Wu, Accelerating MCMC algorithms. WIREs. Comput. Stat. 10(5), e1435 (2018)
Article MathSciNet Google Scholar
G. Roberts, O. Stramer, Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 4(4), 337–358 (2002)
J. Roussel, G. Stoltz, Spectral methods for Langevin dynamics and associated error estimates. ESAIM M2AN 52(3), 1051–1083 (2018)
Article MathSciNet Google Scholar
C. Schütte, W. Huisinga, Biomolecular conformations as metastable sets of Markov chains, Proceedings of the 38th Annual Allerton Conference on Communication, Control, and Computing (2000), pp. 1106–1115
R.D. Skeel, Y. Fang, Comparing Markov chain samplers for molecular simulation. Entropy 19(10), 561 (2017). 16 pages
Article ADS Google Scholar
B.P. Uberuaga, M. Anghel, A.F. Voter, Synchronization of trajectories in canonical molecular-dynamics simulations: observation, explanation, and exploitation. J. Chem. Phys. 120(14), 6363–6374 (2004)
Article ADS Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

School of Mathematics and Statistical Sciences, Arizona State University, 900 S Palm Walk, Tempe, AZ, 85281, USA
Robert D. Skeel
Institute of Mathematics, Brandenburgische Technische Universität Cottbus-Senftenberg, 03046, Cottbus, Germany
Carsten Hartmann

Authors

Robert D. Skeel
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Hartmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Robert D. Skeel or Carsten Hartmann.

Appendices

Appendix A: Analytical propagator for reduced model problem

This section derives the analytical propagator for Eq. (13). In vector form, the equation is

$$\begin{aligned} \mathrm {d}\mathbf {Z}_t = A\mathbf {Z}\,\mathrm {d}t +\mathbf {b}\,\mathrm {d}W_t\quad \text{ where } A = \left[ \begin{array}{cc} 0 &{} 1 \\ -1 &{} -\gamma \end{array}\right] , \end{aligned}$$

and $\mathbf {b}= [0,\,\sqrt{2\gamma }]^\mathsf {T}$. The variation of parameters solution is

$$\begin{aligned} \mathbf {Z}_t =\mathrm {e}^{t A}\mathbf {Z}_0 +\mathbf {R}_t\quad \text{ where } \mathbf {R}_t =\int _0^t\mathrm {e}^{(t - s)A}\mathbf {b}\,\mathrm {d}t. \end{aligned}$$

The stochastic process $\mathbf {R}_t$ is Gaussian with mean zero and covariance matrix

$$\begin{aligned} \varSigma =\mathbb {E}[\mathbf {R}_t\mathbf {R}_t^\mathsf {T}] =\int _0^t\mathrm {e}^{(t - s)A}\mathbf {b}\mathbf {b}^\mathsf {T}\mathrm {e}^{(t - s)A^\mathsf {T}}\,\mathrm {d}W_t. \end{aligned}$$

To evaluate this expressions, use $A = X\varLambda X^{-1}$ where

$$\begin{aligned} X = \left[ \begin{array}{cc} 1 &{} 1 \\ -\gamma _- &{} -\gamma _+ \end{array}\right] , \quad X^{-1} =\frac{1}{\delta }\left[ \begin{array}{ccc} \gamma _+ &{} 1 \\ -\gamma _- &{} -1 \end{array}\right] , \end{aligned}$$

$\varLambda =\mathrm {diag}(-\gamma _-, -\gamma _+)$,

$$\begin{aligned} \gamma _\pm =\frac{1}{2}(\gamma \pm \delta ),\quad \text{ and }\quad \delta =\sqrt{\gamma ^2 - 4\omega ^2}. \end{aligned}$$

Noting that $\exp (-\gamma _\pm t) =\exp (-\gamma t/2)(\cosh (\delta t/2)\mp \sinh $

$(\delta t/2))$, one has

$$\begin{aligned} \mathrm {e}^{t A} =\mathrm {e}^{-\gamma t/2}\cosh \frac{\delta t}{2} \left[ \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1\end{array}\right] +\mathrm {e}^{-\gamma t/2}\frac{t}{2}\mathrm {sinhc}\frac{\delta t}{2} \left[ \begin{array}{cc} \gamma &{} 2 \\ -2 &{} -\gamma \end{array}\right] , \end{aligned}$$

where $\mathrm {sinhc}\,s =(\sinh s)/s$.

Then

$$\begin{aligned} \varSigma= & {} X \int _0^t\mathrm {e}^{(t - s)\varLambda }X^{-1}\mathbf {b}\mathbf {b}^\mathsf {T}X^{-\mathsf {T}}\mathrm {e}^{(t - s)\varLambda }\,\mathrm {d}t X^\mathsf {T}\\= & {} \frac{2\gamma }{\delta ^2}X\int _0^t\mathrm {e}^{(t - s)\varLambda } \left[ \begin{array}{cc} 1 &{} -1 \\ -1 &{} 1\end{array}\right] \mathrm {e}^{(t - s)\varLambda }\,\mathrm {d}t X^\mathsf {T}\\= & {} \frac{2\gamma }{\delta ^2}X\left[ \begin{array}{cc} \displaystyle \frac{1 -\mathrm {e}^{-2\gamma _- t}}{2\gamma _-} &{} \displaystyle -\frac{1 -\mathrm {e}^{-\gamma t}}{\gamma }\\ \displaystyle -\frac{1 -\mathrm {e}^{\gamma t}}{\gamma } &{} \displaystyle \frac{1 -\mathrm {e}^{-2\gamma _+ t}}{2\gamma _+} \end{array}\right] X^\mathsf {T}. \end{aligned}$$

Noting that $\exp (-2\gamma _\pm t)=\exp (-\gamma t) (1 + 2\sinh ^2(\delta t/2))\mp 2\sinh (\delta t/2)\cosh (\delta t/2))$, one has

$$\begin{aligned} \varSigma= & {} (1 -\mathrm {e}^{-\gamma t}) \left[ \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1\end{array}\right] -\frac{\gamma t^2}{2}\mathrm {e}^{-\gamma t}(\mathrm {sinhc}\frac{\delta t}{2})^2 \left[ \begin{array}{cc}\gamma &{} -2 \\ -2 &{}\gamma \end{array}\right] \\&+\gamma t\mathrm {e}^{-\gamma t}\mathrm {sinhc}\frac{\delta t}{2}\cosh \frac{\delta t}{2} \left[ \begin{array}{cc} -1 &{} 0 \\ 0 &{} 1\end{array}\right] . \end{aligned}$$

Appendix B: Different notions of reversibility

We briefly mention earlier work and discuss different reversiblity concepts for transfer operators.

1.1 Appendix B.1: Quasi-reversibility

Reference [13, Sec. 3.4] introduces a notion of quasi-reversibility. A transfer operator $\mathcal {T}$ is quasi-reversible if

$$\begin{aligned} \mathcal {T}^\dagger = \mathcal {R}^\dagger \mathcal {T}\mathcal {R}\end{aligned}$$

where $\mathcal {R}$ is an operator such that $\mathcal {R}^2 = 1$. This somewhat generalizes the (suitably modified) definitions in Refs. [13, 38]. The principal example of such an operator is $\mathcal {R}u = u\circ R$ where R is a bijection such that $R\circ R =\mathrm {id}$ and $u\circ R = u$ for $u\in W$, e.g, momenta flipping.

The value of the notion of quasi-reversibility is that it enables the construction of basis functions that lead to a matrix of covariances that possesses a type of symmetric structure [38, Sec. 3.1]. This property is possessed by “adjusted” schemes that employ an acceptance test, and by the limiting case $\varDelta t\rightarrow 0$ of unadjusted methods like BAOAB.

1.2 Appendix B.2: Modified detailed balance

A quite different generalization of reversibility, termed “modified detailed balance”, is proposed in Ref. [14] as a tool for making it a bit easier to prove stationarity.

Modified detailed balance is introduced in Ref. [14] as a concept to make it easier to prove stationarity. In terms of the transfer operator, showing stationarity means showing that $\mathcal {F}\,1 = 1$, where 1 is the constant function 1.

Reference [14, Eq. (15)] defines modified detailed balance in terms of transition probabilities. The definition is equivalent to $\mathcal {F}= \mathcal {R}^{-1}\mathcal {F}^\dagger \mathcal {R}^{-1}$ under the assumption that $\mathcal {R}$ preserves the stationary distribution. This readily generalizes to

$$\begin{aligned} \mathcal {F}=\mathcal {R}_2\mathcal {F}^\dagger \mathcal {R}_1 \end{aligned}$$

(20)

where $\mathcal {R}_1$ and $\mathcal {R}_2$ are arbitrary except for the assumption that each of them preserve the stationary distribution. Stationarity follows from Eq. (20) because $\mathcal {F}^\dagger \,1 = 1$ for any adjoint transfer operator and $\mathcal {R}_1\,1 =\mathcal {R}_2\,1 = 1$ by assumption.

Reference [14] has errors, which are corrected in Ref. [15].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Skeel, R.D., Hartmann, C. Choice of damping coefficient in Langevin dynamics. Eur. Phys. J. B 94, 178 (2021). https://doi.org/10.1140/epjb/s10051-021-00182-z

Download citation

Received: 18 April 2021
Accepted: 09 August 2021
Published: 07 September 2021
DOI: https://doi.org/10.1140/epjb/s10051-021-00182-z

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Choice of damping coefficient in Langevin dynamics

Abstract

Graphic abstract

Similar content being viewed by others

Reconstruction of Stochastic Dynamical Equations: Exemplary Diffusion, Jump-Diffusion Processes and Lévy Noise-Driven Langevin Dynamics

Non-parametric estimation of a Langevin model driven by correlated noise

The Langevin Equation and Wiener Process

1 Introduction

1.1 Results and discussion

1.2 Related work

2 Preliminaries

2.1 Estimating integrated autocorrelation time

2.2 Helpful formalisms for analyzing MCMC convergence

2.2.1 Properties of the transfer operator and IAcT

3 Sampling thoroughness and efficiency

Remark 1

Remark 2

4 Computing the maximum IAcT

4.1 A transfer operator-based formulation

4.2 A generalised eigenvalue problem

Remark 3

4.3 The use of acor

Remark 4

5 Analytical result for the model problem

Remark 5

5.1 Application to more general distributions

5.2 Sanity check

6 A simple example

6.1 Choice of basis

6.2 Optimal choice of damping coefficient

7 Sum of three Gaussians

7.1 Choice of basis

7.2 Optimal choice of damping coefficient

8 Conclusions

Data Availability Statement

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Appendices

Appendix A: Analytical propagator for reduced model problem

Appendix B: Different notions of reversibility

1.1 Appendix B.1: Quasi-reversibility

1.2 Appendix B.2: Modified detailed balance

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation