I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data,Neuroscience & Biobehavioral Reviews

当前位置： X-MOL 学术 › Neurosci Biobehav Rev › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data
Neuroscience & Biobehavioral Reviews ( IF 7.5 ) Pub Date : 2020-10-06 , DOI: 10.1016/j.neubiorev.2020.09.036
Mahan Hosseini ₁ , Michael Powell ₂ , John Collins ₃ , Chloe Callahan-Flintoft ₄ , William Jones ₁ , Howard Bowman ₅ , Brad Wyble ₆

Affiliation

Machine learning has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, and MEG data. With these powerful techniques comes the danger of overfitting of hyperparameters which can render results invalid. We refer to this problem as ‘overhyping’ and show that it is pernicious despite commonly used precautions. Overhyping occurs when analysis decisions are made after observing analysis outcomes and can produce results that are partially or even completely spurious. It is commonly assumed that cross-validation is an effective protection against overfitting or overhyping, but this is not actually true. In this article, we show that spurious results can be obtained on random data by modifying hyperparameters in seemingly innocuous ways, despite the use of cross-validation. We recommend a number of techniques for limiting overhyping, such as lock boxes, blind analyses, pre-registrations, and nested cross-validation. These techniques, are common in other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques in the neurosciences.

中文翻译：

我尝试了很多事情：脑数据分类中意外过拟合的危险

机器学习增强了神经科学家解释通过EEG，fMRI和MEG数据收集的信息的能力。这些强大的技术带来了过度拟合超参数的危险，这可能会使结果无效。我们将此问题称为“过度炒作”并表明尽管采取了常用的预防措施，但这仍然是有害的。在观察分析结果后做出分析决定时会发生过度炒作，并可能产生部分甚至完全虚假的结果。通常认为交叉验证是防止过度拟合或过度炒作的有效保护，但实际上并非如此。在本文中，我们表明，尽管使用了交叉验证，但可以通过看似无害的方式修改超参数来获得随机数据的虚假结果。我们建议使用多种技术来限制过度炒作，例如密码箱，盲法分析，预注册和嵌套交叉验证。这些技术在使用机器学习的其他领域（包括计算机科学和物理学）中很常见。

更新日期：2020-11-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文