Abstract
We present a complete framework for determining the asymptotic (or logarithmic) efficiency of estimators of large deviation probabilities and rate functions based on importance sampling. The framework relies on the idea that importance sampling in that context is fully characterized by the joint large deviations of two random variables: the observable defining the large deviation probability of interest and the likelihood factor (or Radon–Nikodym derivative) connecting the original process and the modified process used in importance sampling. We recover with this framework known results about the asymptotic efficiency of the exponential tilting and obtain new necessary and sufficient conditions for a general change of process to be asymptotically efficient. This allows us to construct new examples of efficient estimators for sample means of random variables that do not have the exponential tilting form. Other examples involving Markov chains and diffusions are presented to illustrate our results.
Similar content being viewed by others
Notes
We could identify the new copies with a different symbol, say \({\tilde{{\mathbf {X}}}}_n^{(i)}\), since they are generated from a different distribution and so represent a different random variable. Here, we keep \({\mathbf {X}}_n^{(i)}\) but always specify the distribution, \(P_n\) or \(Q_n\), used. The same applies to the observable.
We use the same letter \(\lambda \) for the Legendre–Fenchel transform and for the SCGF in (23), since, as already mentioned, the Gärtner–Ellis theorem ensures that, under appropriate conditions, both functions coincide.
A corner in \(I_P(m)\) or \(I_Q(m)\) signals physically a dynamical phase transition in the fluctuations of \(M_n\). Here, we assume, for simplicity, that no such phase transition occurs. Note that a corner in the function \(I_Q^B(w)\) is not related to a dynamical phase transition, since this function is obtained by conditioning. It can have a corner, as the example of the exponential tilting shows, regardless of whether \(I_P(m)\) or \(I_Q(m)\) is smooth.
References
Shwartz, A., Weiss, A.: Large Deviations for Performance Analysis. Stochastic Modeling Series. Chapman and Hall, London (1995)
Wales, D.: Energy Landscapes: Applications to Clusters, Biomolecules and Glasses. Cambridge University Press, Cambridge (2004)
E, W., Ren, W., Vanden-Eijnden, E.: Minimum action method for the study of rare events. Commun. Pure Appl. Math. 57, 637–656 (2004)
Lelièvre, T., Rousset, M., Stoltz, G. (eds.): Free Energy Computations: A Mathematical Perspective. Imperial College Press, London (2010)
Ellis, R.S.: Entropy, Large Deviations, and Statistical Mechanics. Springer, New York (1985)
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Springer, New York (1998)
den Hollander, F.: Large Deviations, Fields Institute Monograph. AMS, Providence (2000)
Touchette, H.: The large deviation approach to statistical mechanics. Phys. Rep. 478, 1–69 (2009)
Garrahan, J.P., Jack, R.L., Lecomte, V., Pitard, E., van Duijvendijk, K., van Wijland, F.: Dynamical first-order phase transition in kinetically constrained models of glasses. Phys. Rev. Lett. 98, 195702 (2007)
Garrahan, J.P., Lesanovsky, I.: Thermodynamics of quantum jump trajectories. Phys. Rev. Lett. 104, 160601 (2010)
Espigares, C.P., Garrido, P.L., Hurtado, P.I.: Dynamical phase transition for current statistics in a simple driven diffusive system. Phys. Rev. E 87, 032115 (2013)
Bunin, G., Kafri, Y., Podolsky, D.: Cusp singularities in boundary-driven diffusive systems. J. Stat. Phys. 152, 112–135 (2013)
Tsobgni Nyawo, P., Touchette, H.: A minimal model of dynamical phase transition. Europhys. Lett. 116, 50009 (2016)
Lazarescu, A.: Generic dynamical phase transition in one-dimensional bulk-driven lattice gases with exclusion. J. Phys. A 50, 254004 (2017)
Gallavotti, G., Cohen, E.G.D.: Dynamical ensembles in nonequilibrium statistical mechanics. Phys. Rev. Lett. 74, 2694–2697 (1995)
Kurchan, J.: Fluctuation theorem for stochastic dynamics. J. Phys. A 31, 3719–3729 (1998)
Lebowitz, J.L., Spohn, H.: A Gallavotti-Cohen-type symmetry in the large deviation functional for stochastic dynamics. J. Stat. Phys. 95, 333–365 (1999)
Harris, R.J., Schütz, G.M.: Fluctuation theorems for stochastic dynamics. J. Stat. Mech. 2007, P07020 (2007)
Baiesi, M., Maes, C., Wynants, B.: Fluctuations and response of nonequilibrium states. Phys. Rev. Lett. 103, 010602 (2009)
Derrida, B.: Non-equilibrium steady states: Fluctuations and large deviations of the density and of the current. J. Stat. Mech. 2007, P07023 (2007)
Bertini, L., De Sole, A., Gabrielli, D., Jona-Lasinio, G., Landim, C.: Stochastic interacting particle systems out of equilibrium. J. Stat. Mech. 2007, P07014 (2007)
Harris, R.J., Touchette, H.: Large deviation approach to nonequilibrium systems. In: Klages, R., Just, W., Jarzynski, C. (eds.) Nonequilibrium Statistical Physics of Small Systems: Fluctuation Relations and Beyond, Reviews of Nonlinear Dynamics and Complexity, vol. 6, pp. 335–360. Wiley-VCH, Weinheim (2013)
Garrahan, J.P.: Aspects of non-equilibrium in classical and quantum systems: slow relaxation and glasses, dynamical large deviations, quantum non-ergodicity, and open quantum dynamics. Physica A 504, 130–154 (2018)
Sekimoto, K.: Stochastic Energetics, Lect. Notes Phys., vol. 799. Springer, New York (2010)
Seifert, U.: Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 75, 126001 (2012)
Seifert, U.: Stochastic thermodynamics: from principles to the cost of precision. Physica A 504, 176–191 (2018)
Ciliberto, S.: Experiments in stochastic thermodynamics: short history and perspectives. Phys. Rev. X 7, 021051 (2017)
Cérou, F., Guyader, A.: Adaptive multilevel splitting for rare event analysis. Stoch. Anal. Appl. 25, 417–443 (2007)
Dean, T., Dupuis, P.: Splitting for rare event simulation: a large deviation approach to design and analysis. Stoch. Proc. Appl. 119, 562–587 (2009)
Cérou, F., Guyader, A., Lelièvre, T., Pommier, D.: A multiple replica approach to simulate reactive trajectories. J. Chem. Phys. 134, 054108 (2011)
Cérou, F., Delyon, B., Guyader, A., Rousset, M.: On the asymptotic normality of adaptive multilevel splitting. SIAM J. Uncertain. Quant. 7, 1–30 (2019)
Cérou, F., Guyader, A., Rousset, M.: Adaptive multilevel splitting: historical perspective and recent results. Chaos 29, 043108 (2019)
Bréhier, C.-E., Lelièvre, T.: On a new class of score functions to estimate tail probabilities of some stochastic processes with adaptive multilevel splitting. Chaos 29, 033126 (2019)
Grassberger, P.: Go with the winners: a general Monte Carlo strategy. Comput. Phys. Commun. 147, 64–70 (2002)
Giardina, C., Kurchan, J., Peliti, L.: Direct evaluation of large-deviation functions. Phys. Rev. Lett. 96, 120603 (2006)
Lecomte, V., Tailleur, J.: A numerical approach to large deviations in continuous time. J. Stat. Mech. 2007, P03004 (2007)
Angeli, L., Grosskinsky, S., Johansen, A.M., Pizzoferrato, A.: Rare event simulation for stochastic dynamics in continuous time. J. Stat. Phys. 176, 1185–1210 (2019)
Torrie, G.M., Valleau, J.P.: Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling. J. Comput. Phys. 23, 187–199 (1977)
Juneja, S., Shahabuddin, P.: Rare-event simulation techniques: an introduction and recent advances, Chap. 11, pp. 291–350 Elsevier, Amsterdam (2006)
Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability. Springer, New York (2007)
Bucklew, J.A.: Introduction to Rare Event Simulation. Springer, New York (2004)
Sadowsky, J.S., Bucklew, J.A.: Large deviations theory techniques in Monte Carlo simulation. In: MacNair, E.A., Musselman, K.J., Heidelberger, P. (eds.) Proceedings of the 1989 Winter Simulation Conference, pp. 505–513. ACM, New York (1989)
Sadowsky, J.S., Bucklew, J.A.: On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Trans. Inf. Theory 36, 579–588 (1990)
Bucklew, J.A., Ney, P., Sadowsky, J.S.: Monte Carlo simulation and large deviations theory for uniformly recurrent Markov chains. J. Appl. Prob. 27, 44–59 (1990)
Schlebusch, H.-J.: On the asymptotic efficiency of importance sampling techniques. IEEE Trans. Inf. Thoery 39, 710–715 (1993)
Dieker, A.B., Mandjes, M.: On asymptotically efficient simulation of large deviation probabilities. Adv. Appl. Prob. 37, 539–552 (2005)
Efron, B., Traux, D.: Large deviations theory in exponential families. Ann. Math. Stat. 39, 1402–1424 (1968)
Touchette, H.: Asymptotic equivalence of probability measures and stochastic processes. J. Stat. Phys. 170, 962–978 (2018a)
Cottrell, M., Fort, J.-C., Malgouyres, G.: Large deviations and rare events in the study of stochastic algorithms. IEEE Trans. Autom. Control 28, 907–920 (1983)
Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems, Grundlehren der Mathematischen Wissenschaften, vol. 260. Springer, New York (1984)
Graham, R.: Macroscopic potentials, bifurcations and noise in dissipative systems. In: Moss, F., McClintock, P.V.E. (eds.) Noise in Nonlinear Dynamical Systems, vol. 1, pp. 225–278. Cambridge University Press, Cambridge (1989)
Luchinsky, D.G., McClintock, P.V.E., Dykman, M.I.: Analogue studies of nonlinear systems. Rep. Prog. Phys. 61, 889–997 (1998)
Touchette, H.: Introduction to dynamical large deviations of Markov processes. Physica A 504, 5–19 (2018b)
Bertini, L., De Sole, A., Gabrielli, D., Jona-Lasinio, G., Landim, C.: Macroscopic fluctuation theory. Rev. Mod. Phys. 87, 593–636 (2015)
Touchette, H.: Equivalence and nonequivalence of ensembles: thermodynamic, macrostate, and measure levels. J. Stat. Phys. 159, 987–1016 (2015)
Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method. Springer, New York (2004)
Engel, A., Monasson, R., Hartmann, A.K.: On large deviation properties of Erdös-Rényi random graphs. J. Stat. Phys. 117, 387–426 (2004)
Hartmann, A.K.: Large-deviation properties of largest component for random graphs. Eur. J. Phys. B 84, 627–634 (2011)
Dewenter, T., Hartmann, A.K.: Large-deviation properties of resilience of power grids. New J. Phys. 17, 015005 (2015)
Guasoni, P., Robertson, S.: Optimal importance sampling with explicit formulas in continuous time. Financ. Stoch. 12, 1–19 (2008)
Vanden-Eijnden, E., Weare, J.: Rare event simulation of small noise diffusions. Commun. Pure Appl. Math. 65, 1770–1803 (2012)
Kundu, A., Sabhapandit, S., Dhar, A.: Application of importance sampling to the computation of large deviations in nonequilibrium processes. Phys. Rev. E 83, 031119 (2011)
Klymko, K., Geissler, P.L., Garrahan, J.P., Whitelam, S.: Rare behavior of growth processes via umbrella sampling of trajectories. Phys. Rev. E 97, 032123 (2018)
Whitelam, S.: Sampling rare fluctuations of discrete-time Markov chains. Phys. Rev. E 97, 032122 (2018)
Jacobson, D., Whitelam, S.: Direct evaluation of dynamical large-deviation rate functions using a variational ansatz. Phys. Rev. E 100, 052139 (2019)
Glasserman, P., Wang, Y.: Counterexamples in importance sampling for large deviations probabilities. Ann. Appl. Prob. 7, 731–746 (1997)
Puhalskii, A., Spokoiny, V.: On large-deviation efficiency in statistical inference. Bernoulli 4, 203–272 (1998)
Ellis, R.S., Haven, K., Turkington, B.: Large deviation principles and complete equivalence and nonequivalence results for pure and mixed ensembles. J. Stat. Phys. 101, 999–1064 (2000)
Varadhan, S.R.S.: Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19, 261–286 (1966)
Touchette, H.: A basic introduction to large deviations: theory, applications, simulations. In: Leidl, R., Hartmann, A.K. (eds.) Modern Computational Science 11: Lecture Notes from the 3rd International Oldenburg Summer School. BIS-Verlag der Carl von Ossietzky Universität Oldenburg, Oldenburg (2011)
Chetrite, R., Touchette, H.: Nonequilibrium Markov processes conditioned on large deviations. Ann. Henri Poincaré 16, 2005–2057 (2015a)
Harris, R.J., Touchette, H.: Current fluctuations in stochastic systems with long-range memory. J. Phys. A 42, 342001 (2009)
Küchler, U., Sōrensen, M.: On exponential families of Markov processes. J. Stat. Plan. Inference 66, 3–19 (1998)
Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes. Springer, New York (1979)
Chetrite, R., Touchette, H.: Variational and optimal control representations of conditioned and driven processes. J. Stat. Mech. 2015, P12001 (2015b)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317, p. 1988. Springer, New York (1988)
Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization, 2nd edn. Springer, New York (2006)
Acknowledgements
A.G. thanks Maxime Sangnier for fruitful discussions during the writing of this paper. We also thank Grégoire Ferré and Gabriel Stoltz for carefully reading the paper. H.T. is supported by Stellenbosch University (Establishment Funds) and the National Research Foundation of South Africa (Grant No. 96199).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Abhishek Dhar.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Convex Analysis
We collect in this section basic results of convex analysis used in the paper in relation to the rate function \(I_Q^B(w)\), defined in (50), and its Legendre–Fenchel transform \(\lambda _Q^B(k)\), defined in (57). Both are functions of a single real variable, so we state the necessary results only for this simple case. We assume further that all convex functions are proper closed convex functions. For more general results and proofs, we refer to [76,77,78].
1.1 Subdifferentials
Let \(f:{\mathbb {R}}\rightarrow {\bar{{\mathbb {R}}}}\) be a real function taking values in the set of extended reals \({\bar{{\mathbb {R}}}}\). The subdifferential \(\partial f(x)\) of f at the point x is the set of all values \(k\in {\mathbb {R}}\) such that
for all \(y\in {\mathbb {R}}\) [76, Sect. 23]. Put differently, and as illustrated in Fig. 7a, \(\partial f(x)\) is the set of slopes of all possible supporting lines of f at x. If f has not supporting line at x, then \(\partial f(x)=\emptyset \). We will see next that this may happen when f is nonconvex.
For convex functions, subdifferentials exist everywhere in the domain of f(x), except possibly at boundary points [76, Theorem 23.4]. For this class of functions, we have in fact \(\partial f(x) = [f'(x^-),f'(x^+)]\), where \(f'(x^-)\) is the left-derivative and \(f'(x^+)\) the right-derivative [76, Theorem 24.3]. If these are equal, f is differentiable at x so that \(\partial f(x) = \{f'(x)\}\) [76, Theorem 25.1]. In all cases, \(\partial f(x)\) is a closed convex interval [76, p. 215].
1.2 Legendre–Fenchel Transforms
The Legendre–Fenchel transform of f is the real function defined by
This function is also called the dual or conjugate of f and has the property of being convex [76, Theorem 12.2]. The double dual or biconjugate of f is the Legendre–Fenchel of \(f^*\):
This is also a convex function, corresponding to the convex envelope or convex hull of f [77, Theorem 11.1], as illustrated in Fig. 7b.
With this geometric interpretation of \(f^{**}\), it is natural to say that x is a convex point of f if \(f(x)=f^{**}(x)\) and a nonconvex point of f if \(f(x)\ne f^{**}(x)\). An important result proved in [68, Lem. 4.1] is that the set of convex points coincides with the set of points admitting supporting lines, except possibly at boundary points. With this proviso, we then have \(f(x)=f^{**}(x)\) if and only if \(\partial f(x)\ne \emptyset \). This is illustrated in Fig. 7a. The same result also implies that, if \(f(x)=f^{**}(x)\), then \(\partial f(x)=\partial f^{**}(x)\).
In this paper, we deal with rate functions, which always have at least one global minimum. Denoting one such minimizer by \(x^*\), we then have \(0\in \partial f(x^*)\). Hence, \(x^*\) is a convex point such that \(f(x^*)=f^{**}(x^*)\) and \(\partial f(x^*)=\partial f^{**}(x^*)\).
1.3 Duality
The proof of our main result, Theorem 4, is based on another important result about convex functions stating (see [76, Cor. 23.5.1] or [77, Prop. 11.3]) that
This property expresses a form of duality or conjugacy between the slopes of f and the slopes of \(f^*\), illustrated in Fig. 8a. From this result, it is easy to see that convex, affine parts of f correspond to cusps of \(f^*\), and vice versa, as shown in Fig. 8b.
The duality in (A.4) also holds for \(f^{**}\), since this function is convex and is the Legendre–Fenchel transform of \(f^*\). Therefore,
This result implies that \(f^*\) has a cusp also when f is nonconvex, as shown in Fig. 8, since \(f^{**}\) is affine where f is nonconvex. Thus, \(f^*\) has a cusp either if f is affine or f is nonconvex.
Since subdifferentials of f and \(f^{**}\) match at convex points, it is also clear from (A.5) that the first duality (A.4) holds locally at these points even if f is not globally convex. We use this result in this paper when dealing with the subdifferential of \(I_Q^B\) at its global minimum \(w^*\), which is a convex point, as mentioned. In this case, the first duality result can be applied at that point even though \(I_Q^B\) might be nonconvex at other points, as in Figs. 2c or 6.
Appendix B: Contraction Principle
The contraction principle is an important result in large deviation theory relating the rate functions of random variables that can be mapped to one another. Let \((A_n)_{n>0}\) be a sequence of random variables satisfying the LDP with good rate function \(I_A\) and let \((B_n)_{n>0}\) be another sequence such that \(B_n=f(A_n)\) with f continuous. Then \((B_n)_{n>0}\) also satisfies the LDP with good rate function
See [6, Theorem 4.2.1] for details.
Instead of considering a single continuous function f as the contraction, one can also consider a sequence \((f_n)_{n>0}\) of continuous functions. In this case, the contraction principle also applies provided that \(f_n\) is “close enough” to f with respect to \(P_n\). To be more precise, let \({\mathcal {A}}\) denote the space of \(A_n\) and define
as the set of points for which \(f_n\) differs from f by at least \(\delta >0\) with respect to any metric \(\Vert \cdot \Vert \) on \({\mathcal {B}}\), the space of \(B_n\). Then, according to [6, Cor. 4.2.21], \(B_n=f_n(A_n)\) satisfies the LDP with good rate function \(I_B\) given by (A.6) with f as the contraction if, for all \(\delta >0\),
This condition only means that the probability that \(f_n\) differs from f decreases faster than exponentially with n in the large deviation limit. This is met in most cases when \(f_n\) is smooth and \(I_A\) is a good rate function.
Two particular applications of this result are considered in the paper.
Example 4
Consider two real random variables \(A_n\) and \(B_n\) related by the simple rescaling \(B_n=c_n A_n\) with \(c_n\rightarrow 1\) as \(n\rightarrow \infty \). Here, the limit function is the identity \(f(a)=a\), so one expects \(A_n\) and \(B_n\) to have the same rate function. This is verified by noting that, for every \(M>0\), there exists \(n_0=n_0(M,\delta )\) such that for all \(n\ge n_0\), one has \(\Gamma _{n,\delta }\subseteq (-\infty ,-M]\cup [M,\infty )\). Therefore, from the definition of the LDP, we obtain
But, since the rate function \(I_A\) of \(A_n\) is good, it is coercive, so that
Therefore, the limit on the left-hand side of (A.9) must give \(-\infty \), implying \(I_B(b) = I_A(b)\) from the condition (A.8).
Example 5
Let \(B_n =f(A_n)+c_n\) with \(c_n\rightarrow c\). Then the rate function of \(B_n\) is obtained from (A.6) with the contraction \(B_n=f(A_n)+c\). This follows trivially because the distance between \(f(a)+c_n\) and \(f(a)+c\) is constant in a. Since \(c_n\rightarrow c\), there must be an n beyond which \(|c_n-c|<\delta \), leading to \(P_n(\Gamma _{n,\delta })=0\), so the condition (A.8) is also satisfied.
These results also hold if \(\Gamma _{n,\delta }\) is defined on a subset of \({\mathcal {A}}\), since any restriction or constraint on \(A_n\) can be included in the definition of \(f_n\). This arises, for example, when considering the contraction of \(J_Q(m,w)\) to \(I_Q^B(w)\), which involves the restriction \(m\in B\).
Rights and permissions
About this article
Cite this article
Guyader, A., Touchette, H. Efficient Large Deviation Estimation Based on Importance Sampling. J Stat Phys 181, 551–586 (2020). https://doi.org/10.1007/s10955-020-02589-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-020-02589-x