当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust classification via MOM minimization
Machine Learning ( IF 4.3 ) Pub Date : 2020-04-27 , DOI: 10.1007/s10994-019-05863-6
Guillaume Lecué , Matthieu Lerasle , Timlothée Mathieu

We present an extension of Chervonenkis and Vapnik’s classical empirical risk minimization (ERM) where the empirical risk is replaced by a median-of-means (MOM) estimator of the risk. The resulting new estimators are called MOM minimizers. While ERM is sensitive to corruption of the dataset for many classical loss functions used in classification, we show that MOM minimizers behave well in theory, in the sense that it achieves Vapnik’s (slow) rates of convergence under weak assumptions: the functions in the hypothesis class are only required to have a finite second moment and some outliers may also have corrupted the dataset. We propose algorithms, inspired by MOM minimizers, which may be interpreted as MOM version of block stochastic gradient descent (BSGD). The key point of these algorithms is that the block of data onto which a descent step is performed is chosen according to its “ centrality” among the other blocks. This choice of “ descent block” makes these algorithms robust to outliers; also, this is the only extra step added to classical BSGD algorithms. As a consequence, classical BSGD algorithms can be easily turn into robust MOM versions. Moreover, MOM algorithms perform a smart subsampling which may help to reduce substantially time computations and memory resources when applied to non linear algorithms. These empirical performances are illustrated on both simulated and real datasets.

中文翻译:

通过 MOM 最小化进行稳健分类

我们提出了 Chervonenkis 和 Vapnik 的经典经验风险最小化 (ERM) 的扩展,其中经验风险被风险的均值 (MOM) 估计量取代。由此产生的新估计量称为 MOM 最小化量。虽然 ERM 对分类中使用的许多经典损失函数的数据集损坏很敏感,但我们表明 MOM 最小化器在理论上表现良好,因为它在弱假设下实现了 Vapnik(慢)收敛速度:假设中的函数类只需要有一个有限的二阶矩,一些异常值也可能破坏了数据集。我们提出了受 MOM 最小化器启发的算法,可以将其解释为块随机梯度下降 (BSGD) 的 MOM 版本。这些算法的关键点是根据其在其他块中的“中心性”来选择执行下降步骤的数据块。这种“下降块”的选择使这些算法对异常值具有鲁棒性;此外,这是添加到经典 BSGD 算法的唯一额外步骤。因此,经典的 BSGD 算法可以很容易地变成健壮的 MOM 版本。此外,MOM 算法执行智能二次采样,当应用于非线性算法时,这可能有助于显着减少时间计算和内存资源。这些经验表现在模拟和真实数据集上都有说明。这是添加到经典 BSGD 算法的唯一额外步骤。因此,经典的 BSGD 算法可以很容易地变成健壮的 MOM 版本。此外,MOM 算法执行智能二次采样,当应用于非线性算法时,这可能有助于显着减少时间计算和内存资源。这些经验表现在模拟和真实数据集上都有说明。这是添加到经典 BSGD 算法的唯一额外步骤。因此,经典的 BSGD 算法可以很容易地变成健壮的 MOM 版本。此外,MOM 算法执行智能二次采样,当应用于非线性算法时,这可能有助于显着减少时间计算和内存资源。这些经验表现在模拟和真实数据集上都有说明。
更新日期:2020-04-27
down
wechat
bug