当前位置: X-MOL 学术arXiv.cs.IT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Geometric Analysis of Neural Collapse with Unconstrained Features
arXiv - CS - Information Theory Pub Date : 2021-05-06 , DOI: arxiv-2105.02375
Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

We provide the first global optimization landscape analysis of $Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that ($i$) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and ($ii$) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified $unconstrained\;feature\;model$, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures. These findings could have profound implications for optimization, generalization, and robustness of broad interests. For example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over $20\%$ on ResNet18 without sacrificing the generalization performance.

中文翻译:

具有不受约束特征的神经塌陷的几何分析

我们提供了$ Neural \; Collapse $的第一个全局优化景观分析-在训练的最后阶段,最后一层分类器和神经网络的特征中出现了一个有趣的经验现象。正如Papyan等人最近报道的那样,这种现象意味着($ i $)类均值和最后一层的分类器全部折叠到单面等角紧框架(ETF)的顶点,直到缩放为止,而($ ii $ )跨最后层激活的类内可变性崩溃为零。我们基于简化的$ unconstrained \; feature \; model $模型研究问题,该模型将神经网络的分类器与最顶层隔离开来。在这种情况下,我们证明了随着权重衰减而发生的经典互熵损失具有良好的全局前景,从某种意义上讲,唯一的全局最小化器是Simplex ETF,而所有其他关键点都是严格的鞍形,其Hessian呈现负曲率方向。与通常与实践脱节的用于深度神经网络的现有景观分析相比,我们对简化模型的分析不仅解释了在最后一层学习了哪些特征,而且还说明了为什么可以在优化过程中对其进行有效优化。简化的设置,与实际的深度网络体系结构中的经验观察相匹配。这些发现可能对广泛利益的优化,泛化和鲁棒性具有深远的影响。例如,
更新日期:2021-05-07
down
wechat
bug