当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05522 Ou Wu, Weiyao Zhu, Yingjun Deng, Haixiang Zhang, Qinghu Hou
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05522 Ou Wu, Weiyao Zhu, Yingjun Deng, Haixiang Zhang, Qinghu Hou
A common assumption in machine learning is that samples are independently and
identically distributed (i.i.d). However, the contributions of different
samples are not identical in training. Some samples are difficult to learn and
some samples are noisy. The unequal contributions of samples has a considerable
effect on training performances. Studies focusing on unequal sample
contributions (e.g., easy, hard, noisy) in learning usually refer to these
contributions as robust machine learning (RML). Weighing and regularization are
two common techniques in RML. Numerous learning algorithms have been proposed
but the strategies for dealing with easy/hard/noisy samples differ or even
contradict with different learning algorithms. For example, some strategies
take the hard samples first, whereas some strategies take easy first.
Conducting a clear comparison for existing RML algorithms in dealing with
different samples is difficult due to lack of a unified theoretical framework
for RML. This study attempts to construct a mathematical foundation for RML
based on the bias-variance trade-off theory. A series of definitions and
properties are presented and proved. Several classical learning algorithms are
also explained and compared. Improvements of existing methods are obtained
based on the comparison. A unified method that combines two classical learning
strategies is proposed.
中文翻译:
基于偏差-方差权衡的稳健机器学习的数学基础
机器学习中的一个常见假设是样本独立同分布(iid)。然而,不同样本在训练中的贡献并不相同。有些样本很难学习,有些样本很嘈杂。样本的不平等贡献对训练性能有相当大的影响。侧重于学习中不等样本贡献(例如,简单、困难、嘈杂)的研究通常将这些贡献称为稳健机器学习 (RML)。权重和正则化是 RML 中的两种常用技术。已经提出了许多学习算法,但是处理简单/困难/噪声样本的策略与不同的学习算法不同甚至矛盾。例如,有些策略先取困难样本,而有些策略则先取容易。由于缺乏统一的 RML 理论框架,很难对现有 RML 算法在处理不同样本时进行清晰的比较。本研究试图基于偏差-方差权衡理论构建 RML 的数学基础。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。
更新日期:2021-06-11
中文翻译:
基于偏差-方差权衡的稳健机器学习的数学基础
机器学习中的一个常见假设是样本独立同分布(iid)。然而,不同样本在训练中的贡献并不相同。有些样本很难学习,有些样本很嘈杂。样本的不平等贡献对训练性能有相当大的影响。侧重于学习中不等样本贡献(例如,简单、困难、嘈杂)的研究通常将这些贡献称为稳健机器学习 (RML)。权重和正则化是 RML 中的两种常用技术。已经提出了许多学习算法,但是处理简单/困难/噪声样本的策略与不同的学习算法不同甚至矛盾。例如,有些策略先取困难样本,而有些策略则先取容易。由于缺乏统一的 RML 理论框架,很难对现有 RML 算法在处理不同样本时进行清晰的比较。本研究试图基于偏差-方差权衡理论构建 RML 的数学基础。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。给出并证明了一系列定义和性质。还解释和比较了几种经典的学习算法。在比较的基础上获得对现有方法的改进。提出了一种结合两种经典学习策略的统一方法。