Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution,Foundations of Computational Mathematics

当前位置： X-MOL 学术 › Found. Comput. Math. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Foundations of Computational Mathematics ( IF 2.5 ) Pub Date : 2019-08-05 , DOI: 10.1007/s10208-019-09429-9
Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g., trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This “implicit regularization” feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e., phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a by-product, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control—measured entrywise and by the spectral norm—which might be of independent interest.

中文翻译：

非凸统计估计中的隐式正则化：梯度下降线性收敛，用于相位检索，矩阵完成和盲解卷积

近年来，在设计可证明有效的非凸过程以解决统计估计问题方面，出现了一系列活动。由于经验损失的高度非凸性，因此，最先进的程序通常需要进行适当的正则化（例如，修剪，正则化的成本，预测），以确保快速收敛。但是，对于诸如梯度下降之类的普通程序，现有理论要么建议采用高度保守的学习率以避免过冲，要么完全缺乏性能保证。本文揭示了非凸优化中的一个惊人现象：即使在没有显式正则化的情况下，梯度下降也可以在各种统计模型下隐式强制执行适当的正则化。实际上，梯度下降遵循一条轨迹，该轨迹停留在具有良好几何形状的盆地内，由与采样机制不一致的点组成。这种“隐式正则化”功能使梯度下降能够以更加激进的方式进行，而不会发生过冲，从而节省了大量计算量。着眼于三个基本的统计估计问题，即相位检索，低秩矩阵完成和盲反卷积，我们确定梯度下降无需明确的正则化即可实现接近最优的统计和计算保证。特别是，通过将统计模型与通用优化理论结合起来，我们开发了一种通用的方法，可以通过一劳永逸的摄动参数来分析迭代算法的轨迹。作为副产品，为了完成嘈杂的矩阵，

更新日期：2019-08-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11