Tight Hardness Results for Training Depth-2 ReLU Networks,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tight Hardness Results for Training Depth-2 ReLU Networks
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-11-27 , DOI: arxiv-2011.13550
Surbhi Goel, Adam Klivans, Pasin Manurangsi, Daniel Reichman

We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of $k$ ReLUs minimizing the squared error (for $k>1$) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error $\epsilon$. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest $\kappa$-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in $1/\epsilon^2$. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on $\epsilon$.

中文翻译：

深度2训练ReLU网络的严格硬度结果

我们证明了使用ReLU激活功能训练深度2神经网络的几种硬度结果; 这些网络只是ReLU的加权总和（可能包括负系数）。我们的目标是输出一个深度2神经网络，该神经网络针对给定的训练集将平方损失最小化。我们证明，对于具有单个ReLU的网络，此问题已经是NP难题。我们还证明了NP硬度，即使在可实现的设置中（即，当标签与未知的depth-2 ReLU网络一致时），也可以输出$ k $ ReLUs的加权和以最小化平方误差（对于$ k> 1 $）。。我们还可以根据期望的附加误差$ \ epsilon $获得运行时间的下限。为了获得下界，我们使用间隙指数时间假说（Gap-ETH）以及关于在次指数时间内逼近众所周知的Densest $ kappa-Subgraph问题的难度的新假设（这些假设分别用于证明不同的下界）。例如，我们证明，在合理的硬度假设下，找到最佳拟合ReLU的任何适当的学习算法都必须以$ 1 / \ epsilon ^ 2 $的时间指数运行。连同先前关于不正确学习ReLU的工作（Goel等人，COLT'17）一起，这意味着学习ReLU的正确算法和不正确算法之间的首次分离。我们还研究了正确学习具有约束权重的ReLU的深度2网络的问题，从而为在可实现和不可知论的环境中学习此类网络所需的运行时间提供了新的（最坏情况）上限。就对$ \ epsilon $的依赖性而言，运行时间的上限与我们的下限基本匹配。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>