当前位置: X-MOL 学术J. Comput. Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An adaptive Hessian approximated stochastic gradient MCMC method
Journal of Computational Physics ( IF 4.1 ) Pub Date : 2021-01-21 , DOI: 10.1016/j.jcp.2021.110150
Yating Wang , Wei Deng , Guang Lin

Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. In this work, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.



中文翻译:

自适应Hessian近似随机梯度MCMC方法

贝叶斯方法已成功集成到训练深度神经网络中。随机梯度马尔可夫链蒙特卡洛方法(SG-MCMC)是一种流行的方法,由于其处理大型数据集的能力以及避免过度拟合的潜力,因此引起了越来越多的关注。尽管标准SG-MCMC方法在各种问题上均表现出出色的性能,但是当目标后验密度中的随机变量具有尺度差异或高度相关时,它们可能效率不高。在这项工作中,我们提出了一种自适应的Hessian近似随机梯度MCMC方法,以在从后验采样时合并局部几何信息。想法是应用随机逼近(SA)在每次迭代时顺序更新预处理矩阵。预处理器具有二阶信息,可以有效地引导采样器的随机游动。代替计算和保存对数后验的完整Hessian,我们使用样本的有限存储量及其随机梯度来近似更新公式中的Hessian向量逆乘法。此外,通过利用SA平滑优化预处理矩阵,我们提出的算法可以在温和条件下以可控偏差渐近收敛到目标分布。为了减少训练和测试的计算负担,我们采用基于幅度的权重修剪方法来增强网络的稀疏性。与标准SG-MCMC更新规则相比,我们的方法易于使用,并且显示出更好的学习效果。逆Hessian的近似值减轻了大型模型的存储和计算复杂性。对几个问题进行了数值实验,包括从二维相关分布中采样,综合回归问题以及学习异构椭圆PDE的数值解。数值结果表明,收敛速度和准确度都有了很大的提高。

更新日期:2021-02-04
down
wechat
bug