Physica D: Nonlinear Phenomena ( IF 4 ) Pub Date : 2021-07-18 , DOI: 10.1016/j.physd.2021.132952 Gene Ryan Yoo 1 , Houman Owhadi 2
We introduce a new regularization method for Artificial Neural Networks (ANNs) based on the Kernel Flow (KF) algorithm. The algorithm was introduced in Owhadi and Yoo (2019) as a method for kernel selection in regression/kriging based on the minimization of the loss of accuracy incurred by halving the number of interpolation points in random batches of the dataset. Writing for the functional representation of compositional structure of the ANN (where are the weights and biases of the layer ), the inner layers outputs define a hierarchy of feature maps and a hierarchy of kernels . When combined with a batch of the dataset, these kernels produce KF losses (defined as the regression error incurred by using a random half of the batch to predict the other half) depending on the parameters of the inner layers (and ). The proposed method simply consists of aggregating (as a weighted sum) a subset of these KF losses with a classical output loss (e.g., cross-entropy). We test the proposed method on Convolutional Neural Networks (CNNs) and Wide Residual Networks (WRNs) without alteration of their structure nor their output classifier and report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without a significant increase in computational complexity relative to standard CNN and WRN training (with Drop Out and Batch Normalization). We suspect that these results might be explained by the fact that while conventional training only employs a linear functional (a generalized moment) of the empirical distribution defined by the dataset and can be prone to trapping in the Neural Tangent Kernel regime (under over-parameterizations), the proposed loss function (defined as a nonlinear functional of the empirical distribution) effectively trains the underlying kernel defined by the CNN beyond regressing the data with that kernel.
中文翻译:
使用内核流对神经网络的内层进行深度正则化和直接训练
我们为基于核流 (KF) 算法的人工神经网络 (ANN) 引入了一种新的正则化方法。该算法在 Owhadi 和 Yoo (2019) 中被引入,作为回归/克里金法中的内核选择方法,该方法基于将数据集随机批次中的插值点数量减半而导致的精度损失最小化。写作 用于 ANN 的组成结构的功能表示(其中 是层的权重和偏差 ),内层输出 定义特征图的层次结构和内核的层次结构 . 当与一批数据集结合时,这些内核会产生 KF 损失 (定义为 使用批次的随机一半来预测另一半所产生的回归误差)取决于内层的参数 (和 )。所提出的方法仅包括将这些 KF 损失的子集与经典输出损失(例如,交叉熵)进行聚合(作为加权和)。我们在不改变其结构和输出分类器的情况下在卷积神经网络 (CNN) 和宽残差网络 (WRN) 上测试所提出的方法,并报告减少了测试错误,减少了泛化差距,并增加了对分布偏移的鲁棒性,而没有显着增加计算量相对于标准 CNN 和 WRN 训练的复杂性(使用 Drop Out 和 Batch Normalization)。