Abstract
In this paper, we propose proximal splitting-type algorithms for sampling from distributions whose densities are not necessarily smooth nor log-concave. Our approach brings together tools from, on the one hand, variational analysis and non-smooth optimization, and on the other hand, stochastic diffusion equations, and in particular the Langevin diffusion. We establish in particular consistency guarantees of our algorithms seen as discretization schemes in this context. These algorithms are then applied to compute the exponentially weighted aggregates for regression problems involving non-smooth penalties that are commonly used to promote some notion of simplicity/complexity. Some popular penalties are detailed and implemented on some numerical experiments.
Similar content being viewed by others
References
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588. https://doi.org/10.1162/neco.1997.9.7.1545
Antoniadis A, Fan J (2001) Regularization of wavelet approximations. Journal of the American Statistical Association 96:939–967. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.6694
Bach F (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225
Bakin S (1999) Adaptive regression and model selection in data mining problems. Thesis (Ph.D.)–Australian National University, pp 1999
Bauschke HH, Combettes PL (2011) Convex analysis and monotone operator theory in Hilbert spaces. Springer, Berlin
Bauschke HH, Borwein JM, Combettes PL (2003) Bregman monotone optimization algorithms. SIAM J Control Optim 42(2):596–636
Bernard F, Thibault L (2005) Prox-regular functions in hilbert spaces. Journal of Mathematical Analysis and Applications 303(1):1–14. https://doi.org/10.1016/j.jmaa.2004.06.003. http://www.sciencedirect.com/science/article/pii/S0022247X04004718
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13(1):1063–1095. http://dl.acm.org/citation.cfm?id=2503308.2343682
Biau G, Devroye L (2010) On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J Multivar Anal 101(10):2499–2518. https://doi.org/10.1016/j.jmva.2010.06.019
Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9:2015–2033. http://dl.acm.org/citation.cfm?id=1390681.1442799
Bickel PJ, Ritov Y, Tsybakov A (2009) Simultaneous analysis of lasso and Dantzig selector. Ann Stat 37(4):1705–1732
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods theory and applications. Springer Series in Statistics. Springer, Berlin
Candès E, Plan Y (2009) Near-ideal model selection by ℓ1 minimization. Ann Stat 37(5A):2145–2177
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Foundations of Computational mathematics 9(6):717–772
Candès EJ, Strohmer T, Voroninski V (2013) Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics 66(8):1241–1274
Chaari L, Tourneret JY, Chaux C, Batatia H (2014) A hamiltonian monte carlo method for non-smooth energy sampling. Tech. Rep. arXiv:1401.3988
Chen S, Donoho D, Saunders M (1999) Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20(1):33–61
Chen X, Lin Q, Kim S, Carbonell JG, Xing EP (2010) An efficient proximal-gradient method for general structured sparse learning. Preprint arXiv:10054717
Chesneau C, Hebiri M (2008) Some theoretical results on the grouped variables lasso. Mathematical Methods of Statistics 17(4):317–326
Dalalyan AS (2014) Theoretical guarantees for approximate sampling from a smooth and log-concave density. to appear in JRSS B 1412.7392. arXiv:1412.7392v3.pdf
Dalalyan AS, Tsybakov AB (2007) Aggregation by exponential weighting and sharp oracle inequalities. In: Proceedings of the 20th annual conference on learning theory. COLT’07, pp 97–111. Springer, Berlin, http://dl.acm.org/citation.cfm?id=1768841.1768854
Dalalyan A, Tsybakov AB (2008) Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach Learn 72(1-2):39–61. https://doi.org/10.1007/s10994-008-5051-0
Dalalyan A, Tsybakov A (2009) Pac-bayesian bounds for the expected error of aggregation by exponential weights. Tech. rep. Université Paris 6, CREST and CERTIS, Ecole des Ponts ParisTech, personal communication
Dalalyan AS, Tsybakov AB (2012) Sparse regression learning by aggregation and Langevin Monte-Carlo. J Comput Syst Sci 78(5):1423–1443. https://doi.org/10.1016/j.jcss.2011.12.023
Donoho D (2006) For most large underdetermined systems of linear equations the minimal ℓ1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics 59(6):797–829
Durmus A, Moulines E (2015) Non-asymptotic convergence analysis for the unadjusted Langevin Algorithm. https://hal.archives-ouvertes.fr/hal-01176132, preprint hal-01176132
Durmus A, Moulines E, Pereyra M (2016) Sampling from convex non continuously differentiable functions, when Moreau meets Langevin. arXiv:1612.07471
Duy Luu T, Fadili JM, Chesneau C (2016) PAC-Bayesian risk bounds for group-analysis sparse regression by exponential weighting. Tech. rep., hal-01367742, https://hal.archives-ouvertes.fr/hal-01367742
Fadili J, Peyré G (2011) Total variation projection with first order schemes. IEEE Trans Image Process 20(3):657–669
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties
Fazel M, Hindi H, Boyd SP (2001) A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, vol 6. IEEE, pp 4734–4739
Freund Y (1995) Boosting a weak learning algorithm by majority. Information and Computation 121(2):256–285. https://doi.org/10.1006/inco.1995.1136. http://www.sciencedirect.com/science/article/pii/S0890540185711364
Gao HY, Bruce A (1997) Waveshrink with firm shrinkage. Statist Sinica 7:855–874
Genuer R (2010) Random Forests: elements of theory, variable selection and applications. Theses, Université Paris Sud - Paris XI. https://tel.archives-ouvertes.fr/tel-00550989
Guedj B, Alquier P (2013) Pac-bayesian estimation and prediction in sparse additive models. Electron J Statist 7:264–291. https://doi.org/10.1214/13-EJS771
Higham D, Mao X, Stuart A (2003) Strong convergence of euler-type methods for nonlinear stochastic differential equations. SIAM J Numer Anal 40 (3):1041–1063
Jégou H, Furon T, Fuchs JJ (2012) Anti-sparse coding for approximate nearest neighbor search. In: IEEE ICASSP, pp 2029–2032
Kloeden PE, Platen E (1995) Numerical solution of stochastic differential equations. Stochastic Modelling and Applied Probability. Springer, Berlin
Kusolitsch N (2010) Why the theorem of scheffé should be rather called a theorem of riesz. Period Math Hung 61(1):225–229
Lecué G (2007) Simultaneous adaptation to the margin and to complexity in classification. Ann Statist 35(4):1698–1721. https://doi.org/10.1214/009053607000000055
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261. https://doi.org/10.1006/inco.1994.1009
Lyubarskii Y, Vershynin R (2010) Uncertainty principles and vector quantization. IEEE Trans Inf Theory 56(7):3491–3501
Negahban S, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat Sci 27(4):538–557
Nemirovski A (2000) Topics in non-parametric statistics
Osborne M, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20 (3):389–403
Pereyra M (2016) Proximal markov chain monte carlo algorithms. Stat Comput 26(4):745–760
Pereyra M, Schniter P, Chouzenoux E, Pesquet J, Tourneret J, Hero AO, McLaughlin S (2016) Tutorial on stochastic simulation and optimization methods in signal processing. IEEE Sel Topics in Signal Processing 10(2):224–241
Peyré G, Fadili J, Chesneau C (2011) Group sparsity with overlapping partition functions. In: EUSIPCO. Barcelona, Spain
Poliquin RA, Rockafellar RT (1996) Prox-regular functions in variational analysis. Trans Am Math Soc 348(5):1805–1838
Poliquin RA, Rockafellar RT, Thibault L (2000) Local differentiability of distance functions. Transactions of the American mathematical Society 352:5231–5249
Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52 (3):471–501
Rigollet P, Tsybakov A (2007) Linear and convex aggregation of density estimators. Mathematical Methods of Statistics 16(3):260–280. https://doi.org/10.3103/S1066530707030052
Roberts GO, Tweedie RL (1996) Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4):341–363. http://www.jstor.org/stable/3318418
Rockafellar RT, Wets R (1998) Variational analysis, vol 317. Springer, Berlin
Rudin L, Osher S, Fatemi E (1992a) Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1-4):259–268
Rudin LI, Osher S, Fatemi E (1992b) Nonlinear total variation based noise removal algorithms. Phys D 60(1-4):259–268. https://doi.org/10.1016/0167-2789(92)90242-F
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227. https://doi.org/10.1023/A:1022648800760
Scheffe H (1947) A useful convergence theorem for probability distributions. Ann Math Statist 18(3):434–438
Studer C, Yin W, Baraniuk RG (2012) Signal representations with minimum \(\ell _{\infty }\)-norm. In: 50th annual allerton conference on communication, control, and computing
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B Methodological 58 (1):267–288
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1):91–108
Vaiter S, Golbabaee M, Fadili MJ, Peyré G (2015a) Model selection with low complexity priors. Information and Inference: A Journal of the IMA (IMAIAI
Vaiter S, Peyré G, Fadili MJ (2015b) Low complexity regularization of linear inverse problems. In: Pfander G (ed) Sampling theory, a renaissance, Applied and Numerical Harmonic Analysis (ANHA). Birkhäuser/Springer
van de Geer S (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics 41(1):72–86. https://doi.org/10.1111/sjos.12032
Vovk VG (1990) Aggregating strategies. In: Proceedings of the third annual workshop on computational learning theory. COLT ’90, pp 371–386. http://dl.acm.org/citation.cfm?id=92571.92672. Morgan Kaufmann Publishers Inc., San Francisco
Wei F, Huang J (2010) Consistent group selection in high-dimensional linear regression. Bernoulli 16(4):1369–1384
Woodworth J, Chartrand R (2015) Compressed sensing recovery via nonconvex shrinkage penalties. CoRR arXiv:1504.02923
Xuerong M (2007) Stochastic differential equations and applications. Woodhead Publishing
Yang Y (2004) Aggregating regression procedures to improve performance. Bernoulli 10(1):25–47
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1):49–67
Acknowledgements
This work was supported by Conseil Régional de Basse-Normandie and partly by Institut Universitaire de France.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luu, T.D., Fadili, J. & Chesneau, C. Sampling from Non-smooth Distributions Through Langevin Diffusion. Methodol Comput Appl Probab 23, 1173–1201 (2021). https://doi.org/10.1007/s11009-020-09809-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-020-09809-7
Keywords
- Langevin diffusion
- Monte-Carlo
- Non-smooth distributions
- Proximal splitting
- Exponentially weighted aggregation