当前位置: X-MOL 学术Sensors › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
he Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
Sensors ( IF 3.4 ) Pub Date : 2021-05-06 , DOI: 10.3390/s21093225
Alexander Kamrud 1 , Brett Borghetti 1 , Christine Schubert Kabban 1
Affiliation  

EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning data into training, validation, and testing sets, in order for cross-participant models to avoid overestimation of model accuracy. Despite this necessity, the majority of EEG-based cross-participant models have not adopted such guidelines. Furthermore, some data repositories may unwittingly contribute to the problem by providing partitioned test and non-test datasets for reasons such as competition support. In this study, we demonstrate how improper dataset partitioning and the resulting improper training, validation, and testing of a cross-participant model leads to overestimated model accuracy. We demonstrate this mathematically, and empirically, using five publicly available datasets. To build the cross-participant models for these datasets, we replicate published results and demonstrate how the model accuracies are significantly reduced when proper EEG cross-participant model guidelines are followed. Our empirical results show that by not following these guidelines, error rates of cross-participant models can be underestimated between 35% and 3900%. This misrepresentation of model performance for the general population potentially slows scientific progress toward truly high-performing classification models.

中文翻译:


个体差异、非平稳性的影响以及数据分区决策对脑电图跨参与者模型训练和测试的重要性



基于脑电图的深度学习模型已趋向于对任何个体进行分类的模型(跨参与者模型)。然而,由于脑电图由于非平稳性和个体差异而在参与者之间存在差异,因此必须遵循某些准则将数据划分为训练集、验证集和测试集,以便跨参与者模型避免高估模型准确性。尽管有这种必要性,但大多数基于脑电图的跨参与者模型尚未采用此类指南。此外,一些数据存储库可能会出于竞争支持等原因提供分区的测试和非测试数据集,从而无意中加剧了该问题。在本研究中,我们演示了不正确的数据集划分以及由此产生的跨参与者模型的不正确训练、验证和测试如何导致高估的模型准确性。我们使用五个公开可用的数据集从数学和经验上证明了这一点。为了为这些数据集构建跨参与者模型,我们复制了已发布的结果,并演示了当遵循适当的脑电图跨参与者模型指南时,模型的准确性如何显着降低。我们的实证结果表明,如果不遵循这些准则,跨参与者模型的错误率可能会被低估在 35% 到 3900% 之间。这种对普通大众模型性能的错误表述可能会减缓真正高性能分类模型的科学进展。
更新日期:2021-05-06
down
wechat
bug