Distributional Generalization: A New Kind of Generalization,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distributional Generalization: A New Kind of Generalization
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-17 , DOI: arxiv-2009.08092
Preetum Nakkiran, Yamini Bansal

We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error. For example, if we mislabel 30% of dogs as cats in the train set of CIFAR-10, then a ResNet trained to interpolation will in fact mislabel roughly 30% of dogs as cats on the *test set* as well, while leaving other classes unaffected. This behavior is not captured by classical generalization, which would only consider the average error and not the distribution of errors over the input domain. Our formal conjectures, which are much more general than this example, characterize the form of distributional generalization that can be expected in terms of problem parameters: model architecture, training procedure, number of samples, and data distribution. We give empirical evidence for these conjectures across a variety of domains in machine learning, including neural networks, kernel machines, and decision trees. Our results thus advance our empirical understanding of interpolating classifiers.

中文翻译：

分布式泛化：一种新的泛化

我们引入了一个新的泛化概念——分布式泛化——它粗略地指出，分类器在训练和测试时的输出是接近的*作为分布*，而不是仅仅接近于它们的平均误差。例如，如果我们在 CIFAR-10 的训练集中将 30% 的狗错误标记为猫，那么经过训练进行插值的 ResNet 实际上也会在 *测试集* 上将大约 30% 的狗错误标记为猫，同时留下其他上课不受影响。这种行为不会被经典泛化捕获，它只会考虑平均误差而不是输入域上的误差分布。我们的正式猜想比这个例子更一般，描述了可以根据问题参数预期的分布泛化形式：模型架构、训练过程、样本数量和数据分布。我们在机器学习的各个领域（包括神经网络、核机器和决策树）中为这些猜想提供了经验证据。因此，我们的结果推进了我们对插值分类器的经验理解。

更新日期：2020-10-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文