当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Provable accelerated gradient method for nonconvex low rank optimization
Machine Learning ( IF 7.5 ) Pub Date : 2019-06-26 , DOI: 10.1007/s10994-019-05819-w
Huan Li , Zhouchen Lin

Optimization over low rank matrices has broad applications in machine learning. For large-scale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex problem $$\min _{\mathbf {U}\in \mathbb {R}^{n\times r}} g(\mathbf {U})=f(\mathbf {U}\mathbf {U}^T)$$ min U ∈ R n × r g ( U ) = f ( U U T ) under the assumptions that $$f(\mathbf {X})$$ f ( X ) is restricted $$\mu $$ μ -strongly convex and L -smooth on the set $$\{\mathbf {X}:\mathbf {X}\succeq 0,\text{ rank }(\mathbf {X})\le r\}$$ { X : X ⪰ 0 , rank ( X ) ≤ r } . We propose an accelerated gradient method with alternating constraint that operates directly on the $$\mathbf {U}$$ U factors and show that the method has local linear convergence rate with the optimal dependence on the condition number of $$\sqrt{L/\mu }$$ L / μ . Globally, our method converges to the critical point with zero gradient from any initializer. Our method also applies to the problem with the asymmetric factorization of $$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$$ X = U ~ V ~ T and the same convergence result can be obtained. Extensive experimental results verify the advantage of our method.

中文翻译:

非凸低秩优化的可证明加速梯度法

低秩矩阵的优化在机器学习中具有广泛的应用。对于大规模问题,一个有吸引力的启发式方法是将低秩矩阵分解为两个小得多的矩阵的乘积。在本文中,我们研究了非凸问题 $$\min _{\mathbf {U}\in \mathbb {R}^{n\times r}} g(\mathbf {U})=f(\mathbf {U }\mathbf {U}^T)$$ min U ∈ R n × rg ( U ) = f ( UUT ) 在 $$f(\mathbf {X})$$ f ( X ) 受限的假设下\mu $$ μ - 强凸且 L -光滑的集合 $$\{\mathbf {X}:\mathbf {X}\succeq 0,\text{ rank }(\mathbf {X})\le r\ }$$ { X : X ⪰ 0 , 等级 ( X ) ≤ r } . 我们提出了一种具有交替约束的加速梯度方法,该方法直接对 $$\mathbf {U}$$ U 因子进行操作,并表明该方法具有局部线性收敛速度,并且对 $$\sqrt{L 的条件数具有最佳依赖性/\mu }$$ L / μ 。在全局范围内,我们的方法从任何初始化器都收敛到具有零梯度的临界点。我们的方法也适用于 $$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$$ X = U ~ V ~ T 可以得到相同的收敛结果。大量的实验结果验证了我们方法的优势。我们的方法也适用于 $$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$$ X = U ~ V ~ T 可以得到相同的收敛结果。大量的实验结果验证了我们方法的优势。我们的方法也适用于 $$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$$ X = U ~ V ~ T 可以得到相同的收敛结果。大量的实验结果验证了我们方法的优势。
更新日期:2019-06-26
down
wechat
bug