Risk-Monotonicity via Distributional Robustness,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Risk-Monotonicity via Distributional Robustness
arXiv - CS - Machine Learning Pub Date : 2020-11-28 , DOI: arxiv-2011.14126
Zakaria Mhammedi, Hisham Husain

Acquisition of data is a difficult task in most applications of Machine Learning (ML), and it is only natural that one hopes and expects lower populating risk (better performance) with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms such as the Empirical Risk Minimizer (ERM). Non-monotonic behaviour of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems not only highlight our lack of understanding of learning algorithms and generalization but rather render our efforts at data acquisition in vain. It is, therefore, crucial to pursue this concern and provide a characterization of such behaviour. In this paper, we derive the first consistent and risk-monotonic algorithms for a general statistical learning setting under weak assumptions, consequently resolving an open problem (Viering et al. 2019) on how to avoid non-monotonic behaviour of risk curves. Our algorithms make use of Distributionally Robust Optimization (DRO) -- a technique that has shown promise in other complications of deep learning such as adversarial training. Our work makes a significant contribution to the topic of risk-monotonicity, which may be key in resolving empirical phenomena such as double descent.

中文翻译：

分布稳健性的风险单调性

在大多数机器学习（ML）应用中，数据采集是一项艰巨的任务，人们自然希望并期望随着数据点的增加而降低填充风险（更好的性能）。出乎意料的是，事实证明，即使对于最标准的算法（如经验风险最小化器（ERM））也并非如此。在二次下降的描述下，流行的深度学习范例中已经体现并出现了训练中的风险和不稳定性的非单调行为。这些问题不仅突显了我们对学习算法和泛化的理解不足，而且使我们在数据采集方面的努力徒劳无功。因此，至关重要的是要解决这一问题并提供这种行为的特征。在本文中，我们推导了在弱假设下针对一般统计学习设置的第一个一致性和风险单调算法，从而解决了如何避免风险曲线的非单调行为的开放性问题（Viering等人2019）。我们的算法利用了分布稳健优化（DRO）-一种在深度学习的其他复杂功能（例如对抗训练）中显示出希望的技术。我们的工作对风险单调性这一主题做出了重大贡献，这可能是解决诸如二次下降之类的经验现象的关键。我们的算法利用了分布稳健优化（DRO）-一种在深度学习的其他复杂功能（例如对抗训练）中显示出希望的技术。我们的工作对风险单调性这一主题做出了重大贡献，这可能是解决诸如二次下降之类的经验现象的关键。我们的算法利用了分布稳健优化（DRO）-一种在深度学习的其他复杂功能（例如对抗训练）中显示出希望的技术。我们的工作对风险单调性这一主题做出了重大贡献，这可能是解决诸如二次下降之类的经验现象的关键。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文