当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Early-stopped neural networks are consistent
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05932 Ziwei Ji, Justin D. Li, Matus Telgarsky
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05932 Ziwei Ji, Justin D. Li, Matus Telgarsky
This work studies the behavior of neural networks trained with the logistic
loss via gradient descent on binary classification data where the underlying
data distribution is general, and the (optimal) Bayes risk is not necessarily
zero. In this setting, it is shown that gradient descent with early stopping
achieves population risk arbitrarily close to optimal in terms of not just
logistic and misclassification losses, but also in terms of calibration,
meaning the sigmoid mapping of its outputs approximates the true underlying
conditional distribution arbitrarily finely. Moreover, the necessary iteration,
sample, and architectural complexities of this analysis all scale naturally
with a certain complexity measure of the true conditional model. Lastly, while
it is not shown that early stopping is necessary, it is shown that any
univariate classifier satisfying a local interpolation property is necessarily
inconsistent.
中文翻译:
早期停止的神经网络是一致的
这项工作研究了通过梯度下降对二元分类数据进行逻辑损失训练的神经网络的行为,其中基础数据分布是一般的,并且(最佳)贝叶斯风险不一定为零。在这种情况下,结果表明,提前停止的梯度下降不仅在逻辑和错误分类损失方面,而且在校准方面都实现了接近最优的总体风险,这意味着其输出的 sigmoid 映射接近真实的潜在条件分布随意精细。此外,此分析的必要迭代、样本和架构复杂性都自然地随着真实条件模型的某种复杂性度量而扩展。最后,虽然没有表明提前停止是必要的,
更新日期:2021-06-11
中文翻译:
早期停止的神经网络是一致的
这项工作研究了通过梯度下降对二元分类数据进行逻辑损失训练的神经网络的行为,其中基础数据分布是一般的,并且(最佳)贝叶斯风险不一定为零。在这种情况下,结果表明,提前停止的梯度下降不仅在逻辑和错误分类损失方面,而且在校准方面都实现了接近最优的总体风险,这意味着其输出的 sigmoid 映射接近真实的潜在条件分布随意精细。此外,此分析的必要迭代、样本和架构复杂性都自然地随着真实条件模型的某种复杂性度量而扩展。最后,虽然没有表明提前停止是必要的,