当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accounting for Variance in Machine Learning Benchmarks
arXiv - CS - Machine Learning Pub Date : 2021-03-01 , DOI: arxiv-2103.03098
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.

中文翻译:

机器学习基准差异的会计处理

有力的经验证据表明,一种机器学习算法A胜过另一种机器学习算法B理想地要求进行多次试验,以优化变异数据源(例如数据采样,数据扩充,参数初始化和超参数选择)上的学习管道。这是非常昂贵的,并且要下定决心要得出结论。我们对整个基准测试过程进行了建模,揭示了由于数据采样,参数初始化和超参数选择引起的差异显着影响了结果。根据这种差异,我们分析了当今使用的主要比较方法。我们显示了一个与直觉相反的结果,即向不完美的估算器添加更多的变化源会比计算量减少51倍的理想估算器更好。基于这些结果,我们在五个不同的深度学习任务/架构上研究了检测改进的错误率。这项研究使我们提出了性能比较的建议。
更新日期:2021-03-05
down
wechat
bug