当前位置: X-MOL 学术arXiv.cs.CC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Poly-time universality and limitations of deep learning
arXiv - CS - Computational Complexity Pub Date : 2020-01-07 , DOI: arxiv-2001.02992
Emmanuel Abbe and Colin Sandon

The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) {\it Deep learning with SGD is efficiently universal.} Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) {\it Result (1) does not hold for GD:} Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial {\it cross-predictability,} where the cross-predictability gives a measure of ``average'' function correlation -- relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree $k$ if and only if $k$ is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities).

中文翻译:

深度学习的多时间普遍性和局限性

本文的目标是表征深度学习可以或不能在多时间学习的函数分布。证明了基于SGD的深度学习的普遍性结果,证明了基于GD的深度学习的非普遍性结果;这也给出了基于 SGD 的深度学习和统计查询算法之间的分离:(1){\it 使用 SGD 的深度学习是有效通用的。} 可以从多时间样本中学习的任何函数分布也可以通过用 SGD 训练的 poly-size 神经网络在多步、多速率和可能的多噪声的多时间初始化上。因此,深度学习提供了一个通用的学习范式:众所周知,近似和估计误差可以用多尺寸神经网络控制,使用 NP-hard 的 ERM;这个新结果表明,优化误差也可以用 SGD 在 poly-time 中控制。具有足够大批次的 GD 的图片会发生变化:(2) {\it 结果 (1) 不适用于 GD:} 使用 GD(全梯度或足够大的批次)训练的多尺寸神经网络在使用 poly-步骤、poly-range 和至少 poly-noise 不能学习任何具有超多项式 {\it cross-predictability,} 的函数分布,其中交叉预测给出了“平均”函数相关性的度量——关系和区别对统计维度进行了讨论。特别是,当且仅当 $k$ 是常数时,具有这些约束的 GD 可以有效地学习 $k$ 度的单项式。因此(1)和(2)指向一个有趣的对比:
更新日期:2020-01-10
down
wechat
bug