当前位置: X-MOL 学术Proc. Natl. Acad. Sci. U.S.A. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prevalence of neural collapse during the terminal phase of deep learning training.
Proceedings of the National Academy of Sciences of the United States of America ( IF 9.4 ) Pub Date : 2020-10-06 , DOI: 10.1073/pnas.2015509117
Vardan Papyan 1 , X Y Han 2 , David L Donoho 3
Affiliation  

Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier’s decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.



中文翻译:

在深度学习训练的最终阶段,神经崩溃的发生率很高。

训练分类深网的现代实践涉及训练的最终阶段(TPT),该阶段始于训练错误首先消失的时代。在TPT期间,训练误差实际上保持为零,而训练损失则趋于零。TPT的直接测量(针对三种原型深网架构和七个规范分类数据集)揭示了普遍的归纳偏差,我们称其为神经崩溃(NC),涉及四个相互关联的现象。(NC1)最后一层训练激活的类内变异的交叉示例崩溃为零,因为单个激活本身崩溃为它们的类均值。(NC2)该类表示折叠到单面等角紧密框架(ETF)的顶点。(NC3)直到重新定标,最后一层的分类器才崩溃为类均值或换句话说为单纯形ETF(即为自对偶配置)。(NC4)对于给定的激活,分类器的决策会崩溃,只能选择具有最接近火车等级平均值的任何一个等级(即,最近的等级中心[NCC]决策规则)。TPT产生的对称且非常简单的几何图形具有重要的优点,包括更好的泛化性能,更好的鲁棒性和更好的可解释性。

更新日期:2020-10-07
down
wechat
bug