当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diagonal Preconditioning: Theory and Algorithms
arXiv - CS - Machine Learning Pub Date : 2020-03-17 , DOI: arxiv-2003.07545
Zhaonan Qu, Yinyu Ye, Zhengyuan Zhou

Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.

中文翻译:

对角预处理:理论与算法

对角预处理一直是优化和机器学习中的主要技术。它通常会减少所应用的设计或 Hessian 矩阵的条件数,从而加快收敛速度​​。然而,关于各种对角预处理程序如何改善预处理矩阵的条件数以及如何转化为优化改进的严格分析很少见。在本文中,我们首先使用随机矩阵理论分析了一种流行的基于列标准差的对角预处理技术及其对条件数的影响。然后我们确定一类设计矩阵,其条件数可以通过此过程显着减少。然后,我们研究了优化对角预处理问题以改进任何满秩矩阵的条件数,并提供了一个二分算法和一个具有 $O(\log(\frac{1}{\epsilon}))$ 迭代的潜在约简算法复杂性,其中每次迭代分别由 SDP 可行性问题和使用 Nesterov-Todd 方向的牛顿更新组成。最后,我们将最优对角预处理算法扩展到自适应设置,并将其在减少回归和分类问题的条件数和加速收敛方面的经验性能与另一种自适应预处理技术(即批量归一化)的经验性能进行比较,这在训练机器中是必不可少的学习模型。
更新日期:2020-03-26
down
wechat
bug