当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Benchmark Lottery
arXiv - CS - Information Retrieval Pub Date : 2021-07-14 , DOI: arxiv-2107.07002
Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning.

中文翻译:

基准彩票

经验机器学习 (ML) 的世界强烈依赖于基准来确定不同算法和方法的相对有效性。本文提出了“基准彩票”的概念,它描述了 ML 基准测试过程的整体脆弱性。基准彩票假设,除了基本算法优势之外,许多因素可能会导致一种方法被认为是优越的。在 ML 社区中普遍存在的多个基准设置上,我们表明,只需选择不同的基准任务,算法的相对性能就可以显着改变,突出了当前范式的脆弱性以及从基准 ML 方法得出的潜在错误解释。鉴于每个基准都对其认为重要的内容做出声明,我们认为这可能会导致社区有偏见的进步。我们讨论了观察到的现象的影响,并提供了使用多个机器学习领域和社区作为用例来缓解它们的建议,包括自然语言处理、计算机视觉、信息检索、推荐系统和强化学习。
更新日期:2021-07-19
down
wechat
bug