Approximation and sampling of multivariate probability distributions in the tensor train decomposition,Statistics and Computing

当前位置： X-MOL 学术 › Stat. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Approximation and sampling of multivariate probability distributions in the tensor train decomposition
Statistics and Computing ( IF 1.6 ) Pub Date : 2019-11-02 , DOI: 10.1007/s11222-019-09910-z
Sergey Dolgov , Karim Anaya-Izquierdo , Colin Fox , Robert Scheichl

General multivariate distributions are notoriously expensive to sample from, particularly the high-dimensional posterior distributions in PDE-constrained inverse problems. This paper develops a sampler for arbitrary continuous multivariate distributions that is based on low-rank surrogates in the tensor train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry. We build upon recent developments of the cross approximation algorithms in linear algebra to construct a tensor train approximation to the target probability density function using a small number of function evaluations. For sufficiently smooth distributions, the storage required for accurate tensor train approximations is moderate, scaling linearly with dimension. In turn, the structure of the tensor train surrogate allows sampling by an efficient conditional distribution method since marginal distributions are computable with linear complexity in dimension. Expected values of non-smooth quantities of interest, with respect to the surrogate distribution, can be estimated using transformed independent uniformly-random seeds that provide Monte Carlo quadrature or transformed points from a quasi-Monte Carlo lattice to give more efficient quasi-Monte Carlo quadrature. Unbiased estimates may be calculated by correcting the transformed random seeds using a Metropolis–Hastings accept/reject step, while the quasi-Monte Carlo quadrature may be corrected either by a control-variate strategy or by importance weighting. We show that the error in the tensor train approximation propagates linearly into the Metropolis–Hastings rejection rate and the integrated autocorrelation time of the resulting Markov chain; thus, the integrated autocorrelation time may be made arbitrarily close to 1, implying that, asymptotic in sample size, the cost per effectively independent sample is one target density evaluation plus the cheap tensor train surrogate proposal that has linear cost with dimension. These methods are demonstrated in three computed examples: fitting failure time of shock absorbers; a PDE-constrained inverse diffusion problem; and sampling from the Rosenbrock distribution. The delayed rejection adaptive Metropolis (DRAM) algorithm is used as a benchmark. In all computed examples, the importance weight-corrected quasi-Monte Carlo quadrature performs best and is more efficient than DRAM by orders of magnitude across a wide range of approximation accuracies and sample sizes. Indeed, all the methods developed here significantly outperform DRAM in all computed examples.

中文翻译：

张量分解中多元概率分布的逼近与采样

众所周知，一般的多元分布采样非常昂贵，尤其是在PDE约束反问题中的高维后验分布。本文开发了一种基于张量列格式的低秩替代项的任意连续多元分布的采样器，该方法已被用于量子物理学和化学领域中可扩展的高维密度函数逼近的方法。我们基于线性代数中交叉逼近算法的最新发展，以使用少量函数评估来构建目标概率密度函数的张量火车逼近。为了获得足够平滑的分布，精确张量列车近似所需的存储量应适中，并随尺寸线性缩放。反过来，张量列替代的结构允许通过有效的条件分布方法进行采样，因为边际分布是可计算的，并且维度线性复杂。可以使用提供蒙特卡罗正交的变换独立均匀随机种子或来自准蒙特卡洛晶格的变换点以提供更有效的准蒙特卡洛，来估计相对于代理分布而言非平滑感兴趣量的预期值正交。可以通过使用Metropolis-Hastings接受/拒绝步骤校正转换后的随机种子来计算无偏估计，而准蒙特卡洛正交可以通过控制变量策略或重要性加权来校正。我们表明，张量列车近似中的误差线性传播到大都会－哈斯廷斯排斥率和由此产生的马尔可夫链的积分自相关时间中。因此，可以使积分自相关时间任意接近1，这意味着，在样本量渐近的情况下，每个有效独立样本的成本是一种目标密度评估，加上具有随成本线性变化的廉价张量火车替代提议。在三个计算示例中演示了这些方法：减震器的安装失效时间；PDE约束的逆扩散问题；并从Rosenbrock分布中取样。延迟拒绝自适应大都会（DRAM）算法用作基准。在所有计算示例中，权重校正的准蒙特卡罗正交算法在各种近似精度和样本量范围内表现最佳，并且比DRAM效率高出几个数量级。实际上，在所有计算示例中，此处开发的所有方法均明显优于DRAM。

更新日期：2019-11-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11