当前位置: X-MOL 学术Sensors › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
he Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
Sensors ( IF 3.9 ) Pub Date : 2021-05-06 , DOI: 10.3390/s21093225
Alexander Kamrud , Brett Borghetti , Christine Schubert Kabban

EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning data into training, validation, and testing sets, in order for cross-participant models to avoid overestimation of model accuracy. Despite this necessity, the majority of EEG-based cross-participant models have not adopted such guidelines. Furthermore, some data repositories may unwittingly contribute to the problem by providing partitioned test and non-test datasets for reasons such as competition support. In this study, we demonstrate how improper dataset partitioning and the resulting improper training, validation, and testing of a cross-participant model leads to overestimated model accuracy. We demonstrate this mathematically, and empirically, using five publicly available datasets. To build the cross-participant models for these datasets, we replicate published results and demonstrate how the model accuracies are significantly reduced when proper EEG cross-participant model guidelines are followed. Our empirical results show that by not following these guidelines, error rates of cross-participant models can be underestimated between 35% and 3900%. This misrepresentation of model performance for the general population potentially slows scientific progress toward truly high-performing classification models.

中文翻译:

个体差异,非平稳性以及数据分区决策对脑电图跨参与者模型的训练和测试的重要性

基于EEG的深度学习模型已趋向于旨在对任何个人进行分类的模型(跨参与者模型)。但是,由于非平稳性和个体差异导致参与者的脑电图各不相同,因此必须遵循某些准则将数据划分为训练集,验证集和测试集,以使跨参与者模型避免过高估计模型准确性。尽管有此必要,但大多数基于EEG的跨参与者模型仍未采用此类指南。此外,出于竞争支持之类的原因,某些数据存储库可能会通过提供分区的测试数据集和非测试数据集而无意间导致了该问题。在这项研究中,我们展示了不正确的数据集分区以及由此产生的不正确的训练,验证,跨参与者模型的测试会导致模型准确性被高估。我们使用五个公开可用的数据集在数学上和经验上证明了这一点。为了建立这些数据集的跨参与者模型,我们复制了已发表的结果,并演示了遵循正确的EEG跨参与者模型指南时如何显着降低模型的准确性。我们的经验结果表明,不遵循这些准则,跨参与者模型的错误率可能会被低估35%至3900%。对一般人群的模型性能的这种错误表述可能减慢了向真正高效的分类模型的科学进展。为了建立这些数据集的跨参与者模型,我们复制了已发表的结果,并演示了遵循正确的EEG跨参与者模型指南时如何显着降低模型的准确性。我们的经验结果表明,不遵循这些准则,跨参与者模型的错误率可能会被低估35%至3900%。对一般人群的模型性能的这种错误表述可能减慢了向真正高效的分类模型的科学进展。为了建立这些数据集的跨参与者模型,我们复制了已发表的结果,并演示了遵循正确的EEG跨参与者模型指南时如何显着降低模型的准确性。我们的经验结果表明,不遵循这些准则,跨参与者模型的错误率可能会被低估35%至3900%。对一般人群的模型性能的这种错误表述可能减慢了向真正高效的分类模型的科学进展。跨参与者模型的错误率可能在35%至3900%之间被低估。对一般人群的模型性能的这种错误表述可能减慢了向真正高效的分类模型的科学进展。跨参与者模型的错误率可能在35%至3900%之间被低估。对一般人群的模型性能的这种错误表述可能减慢了向真正高效的分类模型的科学进展。
更新日期:2021-05-06
down
wechat
bug