Sampling from Non-smooth Distributions Through Langevin Diffusion

Luu, Tung Duy; Fadili, Jalal; Chesneau, Christophe

doi:10.1007/s11009-020-09809-7

Sampling from Non-smooth Distributions Through Langevin Diffusion

Published: 17 July 2020

Volume 23, pages 1173–1201, (2021)
Cite this article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Tung Duy Luu¹,
Jalal Fadili¹ &
Christophe Chesneau²

238 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we propose proximal splitting-type algorithms for sampling from distributions whose densities are not necessarily smooth nor log-concave. Our approach brings together tools from, on the one hand, variational analysis and non-smooth optimization, and on the other hand, stochastic diffusion equations, and in particular the Langevin diffusion. We establish in particular consistency guarantees of our algorithms seen as discretization schemes in this context. These algorithms are then applied to compute the exponentially weighted aggregates for regression problems involving non-smooth penalties that are commonly used to promote some notion of simplicity/complexity. Some popular penalties are detailed and implemented on some numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Unadjusted Langevin Algorithm for Non-convex Weakly Smooth Potentials

Article 09 December 2023

Dao Nguyen, Xin Dang & Yixin Chen

The forward–backward envelope for sampling with the overdamped Langevin algorithm

Article Open access 31 May 2023

Armin Eftekhari, Luis Vargas & Konstantinos C. Zygalakis

Complexity of zigzag sampling algorithm for strongly log-concave distributions

Article 03 June 2022

Jianfeng Lu & Lihan Wang

References

Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588. https://doi.org/10.1162/neco.1997.9.7.1545
Article Google Scholar
Antoniadis A, Fan J (2001) Regularization of wavelet approximations. Journal of the American Statistical Association 96:939–967. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.6694
Article MathSciNet Google Scholar
Bach F (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225
MathSciNet MATH Google Scholar
Bakin S (1999) Adaptive regression and model selection in data mining problems. Thesis (Ph.D.)–Australian National University, pp 1999
Bauschke HH, Combettes PL (2011) Convex analysis and monotone operator theory in Hilbert spaces. Springer, Berlin
Book Google Scholar
Bauschke HH, Borwein JM, Combettes PL (2003) Bregman monotone optimization algorithms. SIAM J Control Optim 42(2):596–636
Article MathSciNet Google Scholar
Bernard F, Thibault L (2005) Prox-regular functions in hilbert spaces. Journal of Mathematical Analysis and Applications 303(1):1–14. https://doi.org/10.1016/j.jmaa.2004.06.003. http://www.sciencedirect.com/science/article/pii/S0022247X04004718
Article MathSciNet Google Scholar
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13(1):1063–1095. http://dl.acm.org/citation.cfm?id=2503308.2343682
MathSciNet MATH Google Scholar
Biau G, Devroye L (2010) On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J Multivar Anal 101(10):2499–2518. https://doi.org/10.1016/j.jmva.2010.06.019
Article MathSciNet MATH Google Scholar
Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9:2015–2033. http://dl.acm.org/citation.cfm?id=1390681.1442799
MathSciNet MATH Google Scholar
Bickel PJ, Ritov Y, Tsybakov A (2009) Simultaneous analysis of lasso and Dantzig selector. Ann Stat 37(4):1705–1732
Article MathSciNet Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Article MathSciNet MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods theory and applications. Springer Series in Statistics. Springer, Berlin
Book Google Scholar
Candès E, Plan Y (2009) Near-ideal model selection by ℓ₁ minimization. Ann Stat 37(5A):2145–2177
Article MathSciNet Google Scholar
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Foundations of Computational mathematics 9(6):717–772
Article MathSciNet Google Scholar
Candès EJ, Strohmer T, Voroninski V (2013) Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics 66(8):1241–1274
Article MathSciNet Google Scholar
Chaari L, Tourneret JY, Chaux C, Batatia H (2014) A hamiltonian monte carlo method for non-smooth energy sampling. Tech. Rep. arXiv:1401.3988
Chen S, Donoho D, Saunders M (1999) Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20(1):33–61
Article MathSciNet Google Scholar
Chen X, Lin Q, Kim S, Carbonell JG, Xing EP (2010) An efficient proximal-gradient method for general structured sparse learning. Preprint arXiv:10054717
Chesneau C, Hebiri M (2008) Some theoretical results on the grouped variables lasso. Mathematical Methods of Statistics 17(4):317–326
Article MathSciNet Google Scholar
Dalalyan AS (2014) Theoretical guarantees for approximate sampling from a smooth and log-concave density. to appear in JRSS B 1412.7392. arXiv:1412.7392v3.pdf
Dalalyan AS, Tsybakov AB (2007) Aggregation by exponential weighting and sharp oracle inequalities. In: Proceedings of the 20th annual conference on learning theory. COLT’07, pp 97–111. Springer, Berlin, http://dl.acm.org/citation.cfm?id=1768841.1768854
Dalalyan A, Tsybakov AB (2008) Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach Learn 72(1-2):39–61. https://doi.org/10.1007/s10994-008-5051-0
Article MATH Google Scholar
Dalalyan A, Tsybakov A (2009) Pac-bayesian bounds for the expected error of aggregation by exponential weights. Tech. rep. Université Paris 6, CREST and CERTIS, Ecole des Ponts ParisTech, personal communication
Dalalyan AS, Tsybakov AB (2012) Sparse regression learning by aggregation and Langevin Monte-Carlo. J Comput Syst Sci 78(5):1423–1443. https://doi.org/10.1016/j.jcss.2011.12.023
Article MathSciNet MATH Google Scholar
Donoho D (2006) For most large underdetermined systems of linear equations the minimal ℓ¹-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics 59(6):797–829
Article MathSciNet Google Scholar
Durmus A, Moulines E (2015) Non-asymptotic convergence analysis for the unadjusted Langevin Algorithm. https://hal.archives-ouvertes.fr/hal-01176132, preprint hal-01176132
Durmus A, Moulines E, Pereyra M (2016) Sampling from convex non continuously differentiable functions, when Moreau meets Langevin. arXiv:1612.07471
Duy Luu T, Fadili JM, Chesneau C (2016) PAC-Bayesian risk bounds for group-analysis sparse regression by exponential weighting. Tech. rep., hal-01367742, https://hal.archives-ouvertes.fr/hal-01367742
Fadili J, Peyré G (2011) Total variation projection with first order schemes. IEEE Trans Image Process 20(3):657–669
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties
Fazel M, Hindi H, Boyd SP (2001) A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, vol 6. IEEE, pp 4734–4739
Freund Y (1995) Boosting a weak learning algorithm by majority. Information and Computation 121(2):256–285. https://doi.org/10.1006/inco.1995.1136. http://www.sciencedirect.com/science/article/pii/S0890540185711364
Article MathSciNet Google Scholar
Gao HY, Bruce A (1997) Waveshrink with firm shrinkage. Statist Sinica 7:855–874
MathSciNet MATH Google Scholar
Genuer R (2010) Random Forests: elements of theory, variable selection and applications. Theses, Université Paris Sud - Paris XI. https://tel.archives-ouvertes.fr/tel-00550989
Guedj B, Alquier P (2013) Pac-bayesian estimation and prediction in sparse additive models. Electron J Statist 7:264–291. https://doi.org/10.1214/13-EJS771
Article MathSciNet MATH Google Scholar
Higham D, Mao X, Stuart A (2003) Strong convergence of euler-type methods for nonlinear stochastic differential equations. SIAM J Numer Anal 40 (3):1041–1063
Article MathSciNet Google Scholar
Jégou H, Furon T, Fuchs JJ (2012) Anti-sparse coding for approximate nearest neighbor search. In: IEEE ICASSP, pp 2029–2032
Kloeden PE, Platen E (1995) Numerical solution of stochastic differential equations. Stochastic Modelling and Applied Probability. Springer, Berlin
MATH Google Scholar
Kusolitsch N (2010) Why the theorem of scheffé should be rather called a theorem of riesz. Period Math Hung 61(1):225–229
Article MathSciNet Google Scholar
Lecué G (2007) Simultaneous adaptation to the margin and to complexity in classification. Ann Statist 35(4):1698–1721. https://doi.org/10.1214/009053607000000055
Article MathSciNet MATH Google Scholar
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261. https://doi.org/10.1006/inco.1994.1009
Article MathSciNet MATH Google Scholar
Lyubarskii Y, Vershynin R (2010) Uncertainty principles and vector quantization. IEEE Trans Inf Theory 56(7):3491–3501
Article MathSciNet Google Scholar
Negahban S, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat Sci 27(4):538–557
Article MathSciNet Google Scholar
Nemirovski A (2000) Topics in non-parametric statistics
Osborne M, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20 (3):389–403
Article MathSciNet Google Scholar
Pereyra M (2016) Proximal markov chain monte carlo algorithms. Stat Comput 26(4):745–760
Article MathSciNet Google Scholar
Pereyra M, Schniter P, Chouzenoux E, Pesquet J, Tourneret J, Hero AO, McLaughlin S (2016) Tutorial on stochastic simulation and optimization methods in signal processing. IEEE Sel Topics in Signal Processing 10(2):224–241
Article Google Scholar
Peyré G, Fadili J, Chesneau C (2011) Group sparsity with overlapping partition functions. In: EUSIPCO. Barcelona, Spain
Poliquin RA, Rockafellar RT (1996) Prox-regular functions in variational analysis. Trans Am Math Soc 348(5):1805–1838
Article MathSciNet Google Scholar
Poliquin RA, Rockafellar RT, Thibault L (2000) Local differentiability of distance functions. Transactions of the American mathematical Society 352:5231–5249
Article MathSciNet Google Scholar
Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52 (3):471–501
Article MathSciNet Google Scholar
Rigollet P, Tsybakov A (2007) Linear and convex aggregation of density estimators. Mathematical Methods of Statistics 16(3):260–280. https://doi.org/10.3103/S1066530707030052
Article MathSciNet MATH Google Scholar
Roberts GO, Tweedie RL (1996) Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4):341–363. http://www.jstor.org/stable/3318418
Article MathSciNet Google Scholar
Rockafellar RT, Wets R (1998) Variational analysis, vol 317. Springer, Berlin
Book Google Scholar
Rudin L, Osher S, Fatemi E (1992a) Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1-4):259–268
Article MathSciNet Google Scholar
Rudin LI, Osher S, Fatemi E (1992b) Nonlinear total variation based noise removal algorithms. Phys D 60(1-4):259–268. https://doi.org/10.1016/0167-2789(92)90242-F
Article MathSciNet MATH Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227. https://doi.org/10.1023/A:1022648800760
Article Google Scholar
Scheffe H (1947) A useful convergence theorem for probability distributions. Ann Math Statist 18(3):434–438
Article MathSciNet Google Scholar
Studer C, Yin W, Baraniuk RG (2012) Signal representations with minimum \(\ell _{\infty }\)-norm. In: 50th annual allerton conference on communication, control, and computing
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B Methodological 58 (1):267–288
MathSciNet MATH Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1):91–108
Article MathSciNet Google Scholar
Vaiter S, Golbabaee M, Fadili MJ, Peyré G (2015a) Model selection with low complexity priors. Information and Inference: A Journal of the IMA (IMAIAI
Vaiter S, Peyré G, Fadili MJ (2015b) Low complexity regularization of linear inverse problems. In: Pfander G (ed) Sampling theory, a renaissance, Applied and Numerical Harmonic Analysis (ANHA). Birkhäuser/Springer
van de Geer S (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics 41(1):72–86. https://doi.org/10.1111/sjos.12032
Article MathSciNet MATH Google Scholar
Vovk VG (1990) Aggregating strategies. In: Proceedings of the third annual workshop on computational learning theory. COLT ’90, pp 371–386. http://dl.acm.org/citation.cfm?id=92571.92672. Morgan Kaufmann Publishers Inc., San Francisco
Wei F, Huang J (2010) Consistent group selection in high-dimensional linear regression. Bernoulli 16(4):1369–1384
Article MathSciNet Google Scholar
Woodworth J, Chartrand R (2015) Compressed sensing recovery via nonconvex shrinkage penalties. CoRR arXiv:1504.02923
Xuerong M (2007) Stochastic differential equations and applications. Woodhead Publishing
Yang Y (2004) Aggregating regression procedures to improve performance. Bernoulli 10(1):25–47
Article MathSciNet Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1):49–67
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by Conseil Régional de Basse-Normandie and partly by Institut Universitaire de France.

Author information

Authors and Affiliations

Normandie University, ENSICAEN, UNICAEN, CNRS, GREYC, Caen, France
Tung Duy Luu & Jalal Fadili
Normandie University, UNICAEN, CNRS, LMNO, Caen, France
Christophe Chesneau

Authors

Tung Duy Luu
View author publications
You can also search for this author in PubMed Google Scholar
Jalal Fadili
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Chesneau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Chesneau.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luu, T.D., Fadili, J. & Chesneau, C. Sampling from Non-smooth Distributions Through Langevin Diffusion. Methodol Comput Appl Probab 23, 1173–1201 (2021). https://doi.org/10.1007/s11009-020-09809-7

Download citation

Received: 02 May 2018
Revised: 02 July 2020
Accepted: 07 July 2020
Published: 17 July 2020
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11009-020-09809-7

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Sampling from Non-smooth Distributions Through Langevin Diffusion

Abstract

Access this article

Similar content being viewed by others

Unadjusted Langevin Algorithm for Non-convex Weakly Smooth Potentials

The forward–backward envelope for sampling with the overdamped Langevin algorithm

Complexity of zigzag sampling algorithm for strongly log-concave distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Sampling from Non-smooth Distributions Through Langevin Diffusion

Abstract

Access this article

Similar content being viewed by others

Unadjusted Langevin Algorithm for Non-convex Weakly Smooth Potentials

The forward–backward envelope for sampling with the overdamped Langevin algorithm

Complexity of zigzag sampling algorithm for strongly log-concave distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation