当前位置:
X-MOL 学术
›
arXiv.cs.IT
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Geometric Analysis of Neural Collapse with Unconstrained Features
arXiv - CS - Information Theory Pub Date : 2021-05-06 , DOI: arxiv-2105.02375 Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu
arXiv - CS - Information Theory Pub Date : 2021-05-06 , DOI: arxiv-2105.02375 Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu
We provide the first global optimization landscape analysis of
$Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the
last-layer classifiers and features of neural networks during the terminal
phase of training. As recently reported by Papyan et al., this phenomenon
implies that ($i$) the class means and the last-layer classifiers all collapse
to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and
($ii$) cross-example within-class variability of last-layer activations
collapses to zero. We study the problem based on a simplified
$unconstrained\;feature\;model$, which isolates the topmost layers from the
classifier of the neural network. In this context, we show that the classical
cross-entropy loss with weight decay has a benign global landscape, in the
sense that the only global minimizers are the Simplex ETFs while all other
critical points are strict saddles whose Hessian exhibit negative curvature
directions. In contrast to existing landscape analysis for deep neural networks
which is often disconnected from practice, our analysis of the simplified model
not only does it explain what kind of features are learned in the last layer,
but it also shows why they can be efficiently optimized in the simplified
settings, matching the empirical observations in practical deep network
architectures. These findings could have profound implications for
optimization, generalization, and robustness of broad interests. For example,
our experiments demonstrate that one may set the feature dimension equal to the
number of classes and fix the last-layer classifier to be a Simplex ETF for
network training, which reduces memory cost by over $20\%$ on ResNet18 without
sacrificing the generalization performance.
中文翻译:
具有不受约束特征的神经塌陷的几何分析
我们提供了$ Neural \; Collapse $的第一个全局优化景观分析-在训练的最后阶段,最后一层分类器和神经网络的特征中出现了一个有趣的经验现象。正如Papyan等人最近报道的那样,这种现象意味着($ i $)类均值和最后一层的分类器全部折叠到单面等角紧框架(ETF)的顶点,直到缩放为止,而($ ii $ )跨最后层激活的类内可变性崩溃为零。我们基于简化的$ unconstrained \; feature \; model $模型研究问题,该模型将神经网络的分类器与最顶层隔离开来。在这种情况下,我们证明了随着权重衰减而发生的经典互熵损失具有良好的全局前景,从某种意义上讲,唯一的全局最小化器是Simplex ETF,而所有其他关键点都是严格的鞍形,其Hessian呈现负曲率方向。与通常与实践脱节的用于深度神经网络的现有景观分析相比,我们对简化模型的分析不仅解释了在最后一层学习了哪些特征,而且还说明了为什么可以在优化过程中对其进行有效优化。简化的设置,与实际的深度网络体系结构中的经验观察相匹配。这些发现可能对广泛利益的优化,泛化和鲁棒性具有深远的影响。例如,
更新日期:2021-05-07
中文翻译:
具有不受约束特征的神经塌陷的几何分析
我们提供了$ Neural \; Collapse $的第一个全局优化景观分析-在训练的最后阶段,最后一层分类器和神经网络的特征中出现了一个有趣的经验现象。正如Papyan等人最近报道的那样,这种现象意味着($ i $)类均值和最后一层的分类器全部折叠到单面等角紧框架(ETF)的顶点,直到缩放为止,而($ ii $ )跨最后层激活的类内可变性崩溃为零。我们基于简化的$ unconstrained \; feature \; model $模型研究问题,该模型将神经网络的分类器与最顶层隔离开来。在这种情况下,我们证明了随着权重衰减而发生的经典互熵损失具有良好的全局前景,从某种意义上讲,唯一的全局最小化器是Simplex ETF,而所有其他关键点都是严格的鞍形,其Hessian呈现负曲率方向。与通常与实践脱节的用于深度神经网络的现有景观分析相比,我们对简化模型的分析不仅解释了在最后一层学习了哪些特征,而且还说明了为什么可以在优化过程中对其进行有效优化。简化的设置,与实际的深度网络体系结构中的经验观察相匹配。这些发现可能对广泛利益的优化,泛化和鲁棒性具有深远的影响。例如,