Elsevier

Neural Networks

Volume 129, September 2020, Pages 55-74
Neural Networks

Interpretable and lightweight convolutional neural network for EEG decoding: Application to movement execution and imagination

https://doi.org/10.1016/j.neunet.2020.05.032Get rights and content

Highlights

  • A parsimonious and interpretable convolutional NN is proposed for EEG decoding.

  • Sinc- and depthwise convolutions are used for temporal and spatial filtering.

  • A gradient-based technique is designed to interpret the learned features.

  • The network outperforms a traditional machine learning algorithm and other CNNs.

  • The learned spectral–spatial features match well-known EEG motor-related activity.

Abstract

Convolutional neural networks (CNNs) are emerging as powerful tools for EEG decoding: these techniques, by automatically learning relevant features for class discrimination, improve EEG decoding performances without relying on handcrafted features. Nevertheless, the learned features are difficult to interpret and most of the existing CNNs introduce many trainable parameters. Here, we propose a lightweight and interpretable shallow CNN (Sinc-ShallowNet), by stacking a temporal sinc-convolutional layer (designed to learn band-pass filters, each having only the two cut-off frequencies as trainable parameters), a spatial depthwise convolutional layer (reducing channel connectivity and learning spatial filters tied to each band-pass filter), and a fully-connected layer finalizing the classification. This convolutional module limits the number of trainable parameters and allows direct interpretation of the learned spectral–spatial​ features via simple kernel visualizations. Furthermore, we designed a post-hoc gradient-based technique to enhance interpretation by identifying the more relevant and more class-specific features. Sinc-ShallowNet was evaluated on benchmark motor-execution and motor-imagery datasets and against different design choices and training strategies. Results show that (i) Sinc-ShallowNet outperformed a traditional machine learning algorithm and other CNNs for EEG decoding; (ii) The learned spectral–spatial features matched well-known EEG motor-related activity; (iii) The proposed architecture performed better with a larger number of temporal kernels still maintaining a good compromise between accuracy and parsimony, and with a trialwise rather than a cropped training strategy. In perspective, the proposed approach, with its interpretative capacity, can be exploited to investigate cognitive/motor aspects whose EEG correlates are yet scarcely known, potentially characterizing their relevant features.

Introduction

Approaches based on machine learning algorithms provide powerful tools to analyse and decode brain activity from electroencephalographic (EEG) data, both in research and application areas. In particular, machine learning techniques have been exploited in many EEG-based Brain–Computer​ Interfaces (BCIs). In these systems, a feature extraction stage (McFarland, Anderson, Muller, Schlogl, & Krusienski, 2006) extracts the meaningful characteristics of the pre-processed (Bashashati, Fatourechi, Ward, & Birch, 2007) EEG signals and a downstream classification stage (Lotte et al., 2018) makes a decision based on the extracted characteristics, to provide the appropriate feedback to the user (Mak & Wolpaw, 2009). One popular and performing feature extraction algorithm is the filter bank common spatial pattern (FBCSP) (Ang, Chin, Zhang, & Guan, 2008) that applies a bank of bandpass filters (selected a priori) and extracts features for each frequency band based on the spatial filtering method. FBCSP has been widely used as EEG feature extraction method and won several competitions, such as BCI competition IV datasets 2a and 2b (Ang, Chin, Wang, Guan, & Zhang, 2012) related to EEG decoding of imagined movements.

However, the traditional machine learning pipeline described above performs feature extraction and classification in separate steps. Furthermore, it strongly relies on a priori knowledge in the design of the feature extraction stage (e.g. the filters’ cut-off frequencies in the FBCSP) and prevents that other potentially relevant (but unknown) features are extracted and used for decoding. For this reason, this approach may also have negative impact on decoding accuracy. Recently, machine learning innovations, proposed in the computer vision field and represented by convolutional neural networks (CNNs), have been transposed to EEG decoding tasks (Roy et al., 2019), mitigating the need for manual feature extraction. CNNs automatically learn features in a hierarchical structure from the input data in an end-to-end fashion, i.e. without separating the feature extraction, selection and classification steps. Thus, in the field of EEG decoding, CNNs can be trained by feeding EEG signals as input to the neural network, obtaining as output the corresponding predicted label. Accordingly, CNNs do not need any a priori knowledge about the meaningful characteristics of the signals for the specific decoding task and have the potentiality to discover the relevant features (even so-far unknown) by using all input information.

An efficient way to provide EEG signals as input to CNNs is to design a 2D input representation with the electrodes along one dimension and time steps along the other (Borra et al., 2020a, Borra et al., 2020b, Cecotti and Graser, 2011, Farahat et al., 2019, Lawhern et al., 2018, Leeuwen et al., 2019, Manor and Geva, 2015, Schirrmeister et al., 2017, Shamwell et al., 2016, Tang et al., 2017, Zeng et al., 2019, Zhao et al., 2019), preserving the original EEG representation i.e. non-transformed representation. Other input representations, e.g. transformed representations such as time-frequency decomposition (Bashivan et al., 2015, Sakhavi et al., 2015, Tabar and Halici, 2016), generally increase data dimensionality requiring more training data and/or regularization to learn meaningful features. CNNs with a non-transformed representation are typically designed by stacking individual temporal and spatial convolutional layers or a single spatio-temporal convolutional layer, and eventually deeper convolutional layers that learn patterns on the filtered activations. CNNs based on these architectures have been successfully applied to several EEG decoding tasks, such as P300 detection tasks (Borra et al., 2020a, Cecotti and Graser, 2011, Farahat et al., 2019, Lawhern et al., 2018, Manor and Geva, 2015, Shamwell et al., 2016), motor imagery and execution decoding tasks (Borra et al., 2020b, Lawhern et al., 2018, Schirrmeister et al., 2017, Tang et al., 2017, Zhao et al., 2019), anomaly detection tasks (Leeuwen et al., 2019), emotion classification (Zeng et al., 2019), and they have been generally proved to outperform traditional machine learning approaches. Despite these effective applications of CNNs in EEG decoding, there are still a number of critical issues that require further investigation. Indeed, CNNs introduce a large number of trainable parameters requiring large training datasets to obtain a good fit, have a longer training time compared to simpler models, introduce many hyper-parameters (e.g. number of kernels, kernel sizes, number of layers, type of activation functions, etc.), and the automatically learned features are difficult to be interpreted. In particular, techniques that increase the interpretability of the learned features are receiving growing interest as key ingredients to achieve more robust validation when using CNNs (Montavon, Samek, & Müller, 2018). In the field of CNN-based EEG decoding, increasing the interpretability may be particularly relevant for neuroscientists as to the following aspects: (i) check the correct learning by verifying that the models do not rely on artefactual sources but on neurophysiological features; (ii) enable the understanding of which EEG features better discriminate the investigated classes; (iii) potentially characterize new features exploited by the network for the classification, and thus increase the insight into the neural correlates underlying the classified behaviours.

Several efforts have been made to increase CNN interpretability via post-hoc interpretation techniques (i.e. techniques that analyse the trained model). These techniques include temporal and spatial kernel visualizations (Cecotti and Graser, 2011, Lawhern et al., 2018), kernel ablation tests (i.e. selective removal of single kernels) (Lawhern et al., 2018), saliency maps (i.e. maps showing the gradient of CNN prediction with respect to its input example) (Farahat et al., 2019), gradient-weighted class activation mapping (Jonas et al., 2019), correlation maps between input features and outputs of given layers (Schirrmeister et al., 2017). Some of these works face the interpretability issue together with other key issues previously cited, such as model complexity (in terms of number of layers and numbers of trainable parameters) and the size of the training dataset. Schirrmeister et al. (2017) tested both a deeper CNN (DeepConvNet, with 5 convolutional layers and one fully-connected layer,) and a shallower CNN (ShallowConvNet, with 2 convolutional layers and one fully-connected layer) for decoding movement execution and motor imagery, analysed the effect of increasing the amount of training examples (via cropped training), and used correlation maps to interpret the CNN learned features. Lawhern et al. (2018) designed a shallow and lightweight CNN (EEGNet, with 3 convolutional layers and one fully-connected layer) by introducing depthwise and separable convolutions that reduced the number of parameters to fit, tested a range of EEG decoding tasks with various training sizes, and interpreted the learned features via kernel visualization and ablation.

Besides post-hoc techniques, network interpretability may be increased by introducing directly interpretable layers within the network architecture; importantly, these layers may intrinsically reduce the number of trainable parameters too, promoting more interpretable and, at the same time, lightweight CNNs. Very recently, few studies have explored this approach in CNNs for EEG decoding. Zhao et al. (2019) introduced a time–frequency convolutional layer in an architecture inspired by ShallowConvNet (Schirrmeister et al., 2017) to learn time–frequency filters designed by real-valued Morlet wavelets. In a previous preliminary work (Borra et al., 2020b), for the first time we used a temporal sinc-convolutional layer (Ravanelli & Bengio, 2018) for EEG decoding, included in an architecture based on DeepConvNet (Schirrmeister et al., 2017), to learn temporal filters defined by parametrized sinc-functions that implement band-pass filters. Instead of learning all the kernel values as in a traditional convolutional layer, both in the wavelet- and sinc-convolutional layer only 2 parameters for each kernel need to be learned and they are directly interpretable: the bandwidth of the Gaussian and the wavelet central frequency in one case (Zhao et al., 2019), and the two cutoff frequencies of the band-pass filters in the other case (Borra et al., 2020b). While this approach appears promising, its use in EEG decoding is still limited and the so-far proposed CNNs (Borra et al., 2020b, Zhao et al., 2019) have some limitations. Indeed, except for a single directly interpretable convolutional layer, the rest of these CNNs uses traditional less interpretable convolutional layers. This aspect, not only may hinder the overall interpretability of the learned features, but also requires a large number of trainable parameters leading to models more prone to overfitting and this is especially true in case of the deep CNN we previously proposed (Borra et al., 2020b). Furthermore, each of these CNNs has been tested only on a single decoding task (movement imagination Zhao et al., 2019, and movement execution Borra et al., 2020b), and the ability of each network to generalize across motor paradigms has not been verified.

The purpose of this work is to contribute to the recent developments of CNN-based EEG decoding by designing and analysing a novel CNN that includes interpretable and optimized layers, able to increase the overall interpretability of the network, reduce the number of trainable parameters and, at the same time, ensure good performances compared to existing state-of-the art (SOA) algorithms. The CNN proposed here is a lightweight shallow CNN, named Sinc-ShallowNet, obtained by stacking two convolutional layers that extract spectral and spatial EEG features respectively, followed by a fully-connected layer finalizing the classification. The two convolutional layers are specifically devised to increase interpretability and decrease the number of trainable parameters and consist of a temporal sinc-convolutional layer and a spatial depthwise convolutional layer. The spatial depthwise convolutional layer ties spatial filters to each particular band-pass filter learned by the temporal sinc-convolutional layer, enabling the learning of spatial features related to specific frequency ranges. The proposed architecture was applied to decode sensorimotor rhythms both during motor execution (ME) and motor imagery (MI) using public benchmark datasets. Moreover, an extensive analysis of Sinc-ShallowNet was performed including the following aspects:

  • i.

    Comparison of the decoding performance of Sinc-ShallowNet with SOA decoding algorithms, including one traditional machine learning pipeline based on FBCSP coupled with regularized Linear Discriminant Analysis (rLDA) and other three CNNs (ShallowConvNet and DeepConvNet Schirrmeister et al., 2017, EEGNet Lawhern et al., 2018).

  • ii.

    Assessment of some design choices on Sinc-ShallowNet performance in a post-hoc hyper-parameter evaluation procedure inspired by Schirrmeister et al. (2017). The evaluated design choices concern: the number of the temporal band-pass filters, the number of spatial filters for each temporal filter, the introduction of an optional recombination of the spatial activations, and the size of activation aggregation (average pooling) before the fully-connected layer.

  • iii.

    Evaluation of the effect of increasing the training data size via cropped training compared to trialwise training. Indeed, the effect of cropped training on different CNN architectures is still unclear. Schirrmeister et al. (2017) found that cropped training significantly increased the performance of deep architectures (DeepConvNet), while no significant effect was obtained with shallow architectures (ShallowConvNet). Despite this, other shallow architectures (Zhao et al., 2019) were trained with a cropped strategy. Therefore, we evaluated the effect of the training strategy on the performance of Sinc-ShallowNet and of the re-implemented SOA CNNs.

  • iv.

    Feature interpretation. Since the trainable parameters of the temporal sinc-convolutional layer are the cutoff frequencies of the learned band-pass filters, the learned spectral features can be directly visualized and interpreted once the training ends. Furthermore, inspired from the saliency maps (Simonyan, Vedaldi, & Zisserman, 2013), we designed a post-hoc interpretation technique named “temporal sensitivity analysis” (as it acts on the kernels of the temporal sinc-convolutional layer). This technique enables the identification of the more relevant and more class-specific band-pass filters and the spatial features (as learned in the depthwise convolutional layer) related to these band-pass filters can be visualized.

Section snippets

Methods

This section is devoted to the description of the proposed CNN for EEG motor decoding. At first, we define the problem of EEG decoding into the framework of supervised classification learning via CNNs and provide notations useful for the following description. Subsequently, we illustrate the benchmark datasets used to train and test the CNNs (the proposed one and the SOA CNNs), the architecture of the proposed CNN, the training procedure, and finally the post-hoc interpretation technique. The

Classification performance and comparison with state-of-the-art approaches

In this section, the performances of the basal Sinc-ShallowNet (trained via trialwise strategy) are compared with the traditional machine learning algorithm and with the three re-implemented CNNs (trained via trialwise strategy).

Fig. 3 reports the confusion matrices obtained with the proposed architecture and with the machine learning algorithm FBCSP + rLDA, with ME- and MI-EEG signals. Each of these matrices represents the confusion matrix across the subject-specific classifiers. Denoting with

Discussion

In this study Sinc-ShallowNet, a novel lightweight and interpretable CNN for EEG decoding, was designed and applied to motor execution and imagery tasks. The use of a band-pass filtering specialized convolutional layer (sinc-convolutional layer) and a spatial filtering with a reduced CNN channel connectivity (depthwise convolutional layer) enables the learning of band-pass filters and directly associated spatial filters. Thus, the proposed CNN is fully-interpretable and optimized in its

Conclusions

In conclusion, we proposed a novel CNN named Sinc-ShallowNet, characterized by an interpretable and efficient (in terms of number of trainable parameters) convolutional module. This module includes a temporal sinc-convolutional layer, forcing the learning of band-pass filters with only two trainable parameters per filter, and a spatial depthwise convolution that learns spatial features tied to each band-pass filter. The proposed design provides direct interpretability of the learned

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN V used for this research. This work is part of the “Department of Excellence” Project of the Department of Electrical, Electronic and Information Engineering, University of Bologna, funded by the Italian Ministry of Education, Universities and Research (MIUR) .

References (50)

  • TangZ. et al.

    Single-trial EEG classification of motor imagery using deep convolutional neural networks

    Optik

    (2017)
  • ZhaoD. et al.

    Learning joint space–time–frequency features for EEG decoding on small labeled data

    Neural Networks

    (2019)
  • AngK.K. et al.

    Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b

    Frontiers in Neuroscience

    (2012)
  • AngK.K. et al.

    Filter bank common spatial pattern (FBCSP) in brain-computer interface

  • BashashatiA. et al.

    A survey of signal processing algorithms in brain–computer interfaces based on electrical brain signals

    Journal of Neural Engineering

    (2007)
  • BashivanP. et al.

    Learning representations from EEG with deep recurrent-convolutional neural networks

    (2015)
  • BenjaminiY. et al.

    Controlling the false discovery rate: A practical and powerful approach to multiple testing

    Journal of the Royal Statistical Society. Series B. Statistical Methodology

    (1995)
  • BlankertzB. et al.

    Optimizing spatial filters for robust EEG single-trial analysis

    IEEE Signal Processing Magazine

    (2008)
  • BorraD. et al.

    Convolutional neural network for a P300 Brain-Computer Interface to improve social attention in autistic spectrum disorder

  • BorraD. et al.

    EEG motor execution decoding via interpretable sinc-convolutional neural networks

  • CecottiH. et al.

    Convolutional neural networks for P300 detection with application to Brain-Computer Interfaces

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • ChinZ.Y. et al.

    Multi-class filter bank common spatial pattern for four-class motor imagery BCI

  • CholletF.

    Xception: Deep learning with depthwise separable convolutions

  • ClevertD.-A. et al.

    Fast and accurate deep network learning by exponential linear units (elus)

    (2015)
  • CroneN.E. et al.

    Functional mapping of human sensorimotor cortex with electrocorticographic spectral analysis. II. Event-related synchronization in the gamma band

    Brain : A Journal of Neurology

    (1998)
  • Cited by (73)

    View all citing articles on Scopus
    View full text