A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05522
Ou Wu, Weiyao Zhu, Yingjun Deng, Haixiang Zhang, Qinghu Hou

A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances. Studies focusing on unequal sample contributions (e.g., easy, hard, noisy) in learning usually refer to these contributions as robust machine learning (RML). Weighing and regularization are two common techniques in RML. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ or even contradict with different learning algorithms. For example, some strategies take the hard samples first, whereas some strategies take easy first. Conducting a clear comparison for existing RML algorithms in dealing with different samples is difficult due to lack of a unified theoretical framework for RML. This study attempts to construct a mathematical foundation for RML based on the bias-variance trade-off theory. A series of definitions and properties are presented and proved. Several classical learning algorithms are also explained and compared. Improvements of existing methods are obtained based on the comparison. A unified method that combines two classical learning strategies is proposed.

中文翻译：

基于偏差-方差权衡的稳健机器学习的数学基础

机器学习中的一个常见假设是样本独立同分布（iid）。然而，不同样本在训练中的贡献并不相同。有些样本很难学习，有些样本很嘈杂。样本的不平等贡献对训练性能有相当大的影响。侧重于学习中不等样本贡献（例如，简单、困难、嘈杂）的研究通常将这些贡献称为稳健机器学习 (RML)。权重和正则化是 RML 中的两种常用技术。已经提出了许多学习算法，但是处理简单/困难/噪声样本的策略与不同的学习算法不同甚至矛盾。例如，有些策略先取困难样本，而有些策略则先取容易。由于缺乏统一的 RML 理论框架，很难对现有 RML 算法在处理不同样本时进行清晰的比较。本研究试图基于偏差-方差权衡理论构建 RML 的数学基础。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文