Introduction

A brain-computer interface (BCI), also called “a mind-machine interface”, is a bridge between the human brain and an external device. This bridge is a new communication pathway which is expanding. This interface consists of a set of sensors and signal processing components that directly convert one’s brain activity into a series of communication or control signals. In this system, brain waves should firstly be captured using a brain wave recording apparatus. In most cases, EEG is utilized to record brain waves due to its high temporal accuracy, inexpensiveness and availability. Brain-computer interfaces are pathways through which a computer either receives orders from the brain or sends signals to it [1]. In recent years, visual-evoked potential based BCIs have gained a unique place among BCI-based systems and are used for different purposes, due to various features such as high reliability diagnosis, low training requirements, high information transfer rates, and the improved voluntary function of the user. Many approaches have been proposed to perform processing and frequency identification in Steady state visually evoked potential (SSVEP)-based BCI systems [2]. Chevallier et al. [3] applied Riemannian geometry and covariance matrices to extract features from SSVEP signals. On the contrary, Cao et al. [4] used fuzzy features and fuzzy entropy and finally performed an assessment. Sözer et al. [5] presented a combined approach for extracting features from SSVEP signals. In this approach, a reference signal is obtained by the combination of electrode signals, and multiple regression analysis is used to determine the optimal coefficients for this combination. On the contrary, Aznan et al. [6] utilized a convolutional neural network method to classify EEG signals. In this paper, seven features were extracted using wavelet transform and five features was selected using the genetic algorithm. GA chooses the best features for SSVEP signal classification to use them in BCI systems. Genetic algorithms are simple to implement. It was shown that GAs could randomly seek the optimal solution for a problem. Many feature selection methods use metaheuristic algorithms to avoid increasing computational complexity. GAs will be able to optimize the problem of feature selection with appropriate accuracy within an acceptable time [7].

Materials and methods

This paper proposes a model for feature extraction, feature selection and, finally, classification. In the block diagram of Fig. 1, the conceptual model of the proposed approach is presented. In the first step, the data is collected. This data is processed, which includes noise elimination and data preparation for the next steps. In the second step, the features are extracted from the signal, and in the third step, the best features are selected for SSVEP signal detection by using evolutionary approaches. Finally, using several classifiers, these signals are classified to determine to which signal stimulus the subject attends. In the following, each step is investigated separately and in a more detailed manner.

Fig. 1
figure 1

The block diagram of the proposed algorithm

Data acquisition

A device made by Biosemi Inc., Netherlands, was utilized for data recording. The data is recorded at the RIKEN Science Institute, Laboratory for Advanced Brain Signal Processing. A total of 128 active electrodes have recorded electroencephalogram signals from 4 subjects. The SSVEP stimulation was achieved using a small chessboard for 3 stimulation frequencies (8, 14 and 28 Hz). The sampling rate of signals was 256 Hz. The subjects participating in the experiment, sat at a 90-cm distance from the monitor. SSVEP begins 5 s after the start of the data and ends 20 s after it. 15 s of SSVEP are recorded from 4 subjects. For each person, five SSVEP visual stimulators are displayed to each tested subject, which are the images of personal emotions such as grief, happiness and anger [8, 11].

Pre-processing

In the data pre-processing stage, the external noise caused by mains electricity or blinking must be eliminated [25]. A signal histogram is used to display the distribution of potentials at each frequency. To remove the noise, a high-pass filter is applied to remove the excess values of each frequency. Finally, signals are denoised to regenerate the original signal according to the filters. In order to remove the effect of electrodes on the signal, a spatial filter called CAR, made up of Large and Small Laplace, is used and ultimately applied on the signal.

Feature extraction

EEG signals are non-stationary and transient, the analysis of which requires the use of transform approaches such as wavelet transform [12], for the signals under study to be applied. It is presented in Eq. 1:

$$W(a,b) = \int\limits_{ - \infty }^{ + \infty } {X(t)\frac{1}{\sqrt a }\psi } \times \left( {\frac{t - b}{a}} \right)dt.$$
(1)

In the above equation, ψ is the wavelet function expressing the general condition. b is the translation parameter, indicating the value of scaling. a changes the time scale of the probe function ψ. If a > 1 the wavelet function is dilated, and if a < 1, the function would be compacted [16, 17, 19,20,21, 23, 24].

Since the EEG signal does not contain useful components at frequencies above 30 Hz, five levels are used for decomposition [22]. For each tested subject 128 sensors are utilized, and each signal is divided into 5 components. For each subject and the estimated characteristics from each signal, 640 components are investigated. A variety of approaches exist for extracting features from signals [28, 29]. The band Power is measured using the following equations.

$$p\left( n \right) = x^{2} \left( n \right),$$
(2)
$$\overline{p}\left( n \right) = \frac{1}{\omega }\mathop \sum \limits_{k = 1}^{\omega } x\left( {n - k} \right).$$
(3)

The Eqs. 2 and 3 indicate that the frequency band feature per n equals to the mean power, ω, of the previous samples, and the final value of the frequency band feature is obtained from the Eq. 4. The ln is used to improve the classification.

$${\text{BandPower}} = \ln \overline{p\left( n \right)} .$$
(4)

Each signal consists of entropy and redundancy. Useful and valuable information can be acquired from variable data such as signals that have high levels of variation. This useful information is entropy, which is chaotic and completely random [9]. In a variable data, the random variable X expresses the states of the system. The values of the variable X are as shown in Eq. 5.

$$X = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\},\quad n > 0.$$
(5)

And the probability of each is,

$$P = \left\{ {p_{1} ,p_{2} , \ldots ,p_{n} } \right\}\quad 0 \le p_{i} \le 1,\quad i = 1, 2, \ldots ,n,$$
(6)

where,

$$\mathop \sum \limits_{i = 1}^{n} p_{i} = 1.$$
(7)

Therefore, the entropy information of this system is obtained by the Eq. 8.

$$H = - \mathop \sum \limits_{i = 1}^{n} p_{i} \ln p_{i} .$$
(8)

Spectral entropy is one of these important entropies, which makes an estimation of the complexity of time series. There are several transforms to obtain the spectrum of signal. Fourier transform is one of the approaches measuring the signal spectral density. This transform expresses the spectral density distribution in the form of a function of signal frequency. Walsh transform was applied to acquire a new type of feature called the sequence entropy. The Fourier transform of the signal is obtained which yields the amount of spectrum, Wi, at the point i. The power spectral density is acquired.

$$\hat{P}\left( {w_{i} } \right) = \frac{1}{N}\left| {X\left( {w_{i} } \right)} \right|^{2} .$$
(9)

The power spectral density is normalized. The spectral density is obtained using Eq. 10.

$$P_{i} = \frac{{\hat{P} \left( {w_{i} } \right)}}{{\mathop \sum \nolimits_{i}^{n} \hat{P} \left( {w_{i} } \right)}}.$$
(10)

Finally, 7 features were extracted including the features extracted by wavelet transform. According to the EEG frequency band, A5, D2, D3, D4 and D5 were chosen as shown in Table 1. The following feature extraction parameters were implemented to represent the experimental signals as shown in Table 2.

Table1 Decomposition coefficient
Table 2 Extracted features

Feature selection

After extracting features, feature selection is done using the GA. In general, the process of feature selection is based on selecting the best features of all. Figure 2 presents the general process of feature selection using the GA.

Fig. 2
figure 2

The flowchart for feature selection using the GA

In the first, different subsets of features with different dimensions are formed. These subsets are evaluated according to the fitness function written in the GA and the best subset of features is selected as the best features [18]. The GA select features by using coded decision variables, rather than features themselves. It begins with a set of answers, not one specific answer. This algorithm uses a random transfer to achieve the best answer and has few computational rules. The GA uses the information of fitness function, rather than derivatives and other information The initial population size consists of 100 generations, the crossover rate, 0.75 and the mutation rate, 0.02. The elitism approach is applied for replacing the next generation. In order to find the best features through GA, the fitness function is designed for the selection of a set of features which would result in the best simulation (the lowest mean square error) of these features for the reconstruction of the signal.

$$Fitness = w_{a} \times sig - Accuracy + w_{f} \times \left( {\sum\limits_{i = 1}^{{n_{f} }} {g_{f}^{i} } } \right)^{ - 1} ,$$
(11)

where W is the weight of each feature under study. Sig-Accuracy is the accuracy of signal obtained by the reconstruction using the selected features. In general, 7 features extracted from signals have been used for feature selection. In this feature selection approach, in order to find the solution, each of the features is considered as a gene in the chromosome and the algorithm searches for the best features to simulate the output signal. The algorithm seeks the best subset of features. One of the issues here is the number of features to select from. In order to solve this problem, 3, 4 and 5 features are selected, respectively. In other words, at first, 3 features are selected from a total of 7 features and the mean square error (MSE) is measured. In the second step, 4 features are selected and the MSE is determined. In the last step, 5 features are selected. Finally, according to the MSE, the best output is selected. It can be concluded that the MSE does not vary much among 3, 4 and 5 features.

Classification

SVM and Bayes Classifiers have been used for classification [13]. The Bayes classifier estimates the distribution of class probability and attributes, and the learning is reduced to the level of probability estimation. As a result, there are differences in comparison with other classification approaches. This algorithm is widely used for implementation due to its high speed and simplicity. The structure of Bayesian network expresses the internal relationship between the features and can also be applied for incomplete sets of data. Experts can easily understand its structure and change it in case better predictions are necessary. Learning in this network consists of the two main tasks of graph structure learning, and then the parameter learning for structures. This algorithm originates from the main equation of Bayes theorem, which is displayed as follows:

$$p\left( {c_{i} |x} \right) = \frac{{p\left( {x|c_{i} } \right) \cdot p\left( {c_{i} } \right)}}{p(x)}.$$
(12)

In Eq. 12, ci of the class i and x are independent variables. The condition is stated as a set of conditions. Therefore,

$$P(x) = \prod\limits_{j = 1}^{n} {p\left( {x_{j} } \right)} ,$$
(13)
$$P\left( {x|c_{i} } \right) = \prod\limits_{j = 1}^{n} {p\left( {x_{j} |C_{i} } \right)} ,$$
(14)
$$p\left( {C_{i} |x} \right) = \frac{{\prod\nolimits_{j = 1}^{n} {p\left( {c_{i} } \right) \times P\left( {x_{j} |C_{i} } \right)} }}{P(x)}.$$
(15)

Finally, the class with the highest probability is selected. The naïve Bayes network explains the conditional dependence among the features and can determine the dependence between the variables. The Bayes network is made up of nodes that are the same as variables. Nodes in Bayes network are variables selected from the set of features. The edges of each node indicate the dependence of variables on each other. SVM performs a nonlinear mapping on the data to transform them into a higher-dimension space, and searches for a separating linear hyperplane. It can be used both for classification and prediction [10, 14, 15, 18]. The goal is to find the best line, the best plane, or the best hyperplane to separate unseen data from all possible modes. SVM does this with Maximal Marginal Hyperplane (MMH). The margins should be of the same distance from either sides of the separating hyperplane and parallel to it.

The hyperplane equation is as,

$$W \cdot X + b = 0.$$
(16)

Where w is a weight vector and we have, w = {w1, w2… wn}. In addition, n is the number of indicators and b, a scalar. For example, in a 2D feature space, the linear equation of the separator is as,

$$W_{0} + W_{1} \cdot X_{1} + W_{2} \cdot X_{2} = 0.$$
(17)

Any point falling above this line is shown as,

$$W_{0} + W_{1} \cdot X_{1} + W_{2} \cdot X_{2} > 0.$$
(18)

Any point below the line is represented as,

$$W_{0} + W_{1} \cdot X_{1} + W_{2} \cdot X_{2} < 0.$$
(19)

Result

For performance analysis, the EEG dataset is divided into training set and testing set according to 3-times cross validation scheme. Feature optimization including GA based methods are performed on the training set [26, 27]. The classification accuracy which is obtained on the testing data is employed for performance comparison. Three, four and five group-extracted feature are given to classifier respectively and MSE is measured. According to the MSE, the best output is selected as shown in Table 3.

Table3 The number of features and MSE computation in each case

According to results of SVM and Bayes Classifiers, we compare our proposed method with other studies as seen as Fig. 3. However, they used approaches such as decision tree and Bhattacharyya for feature selection with Bayes as a classifier. Also continuous wavelet transform (CWT) and Spectral F-Test (SFT) for feature selection with SVM [11, 13].

Fig. 3
figure 3

The proposed method is comparable to other studies. a Comparing feature selection approaches with the proposed method using Bayes classification based on the accuracy criteria. b Comparing feature selection approaches based on CWT and SFT, with the proposed method using SVM classification based on the accuracy criteria

Figure 4 shows a comparison between classification approaches with and without feature extraction. This comparison is done based on the classification accuracy criterion, program execution time.

Fig. 4
figure 4

A comparison of the program execution time using the Bayes classifier, with feature selection using GA and without feature selection

Discussion

This study proposes a simple and straightforward algorithmic method to improve the classification of BCI data. The proposed method reduces the input features space dimensions and reconstructs the format of the features to optimize the classifier. So the training time has been reduced and the accuracy is improved. Heidari et al. [11] used a same database to this study, and applied wavelet transform for feature extraction, Bayes classifiers and SVM as shown in Fig. 3. The proposed model is more accurate than the two feature selection models of decision tree and Bhattacharyya. However, the number of features selected in the proposed method using GA is lower and the selected features are more optimal. In the following, according to Anupama et al. [12], the proposed model has been compared with the SVM classification approach. The SVM classifier is very sensitive to the feature selection strategy [13]. In their study, the approaches used for feature extraction and feature selection included CWT and SFT as shown in Fig. 3. In this paper, seven features were extracted using wavelet transform and five features was selected using the genetic algorithm. Chromosomes with binary genes are considered with each chromosome length (7) equal to the number of features, while a zero state eliminates a feature and one means feature existence in classification. GA chooses the best features for SSVEP signal classification to use them in BCI systems. Genetic algorithms are simple to implement. It was shown that GAs could randomly seek the optimal solution for a problem. Many feature selection methods use metaheuristic algorithms to avoid increasing computational complexity. GAs will be able to optimize the problem of feature selection with appropriate accuracy within an acceptable time. Based on the output results, it can be concluded that the proposed model is proper for feature selection, and can be used to reduce the processing size, and also BCI costs [28, 29]. Finally, a comparison of classification is based on the classification accuracy, the program execution time and the initial database size. After feature extraction using wavelet transform, the total size of data is 950 MB at the first stage. Each tested signal from one sensor is decomposed into 5 signals, and finally, an entropy feature is extracted from the signal. After feature selection the total size of the data in the database is reduced to 240 MB. Due to the fact that the features selected using the fitness function are the best features, it is possible to reconstruct the original signal with high accuracy from these selected features. The program execution time of the classification algorithm is approximately 8 min with selected feature, and almost 25 min without feature selection. Finally, although applying the Bayes algorithm has slightly lowered the classification accuracy, the reduced execution time of the algorithm, as well as the decreased size of database, makes the implementation of feature selection based on the GA more effective for classification. However, feature extraction and selection are considered as a challenging issue for researchers.

Conclusion

This paper utilized wavelet transform, GA, and classification using Bayes and SVM to improve the performance of BCI. Based on the output results, it can be concluded that the proposed model is proper for feature selection, and can be used to reduce the processing size, and also BCI costs. Furthermore, the classification accuracy using the SVM classifier reached 90.4%, indicating better results compared with the Bayes approach. The SVM classifier is very sensitive to the feature selection strategy. The results of this study indicated better feature selection and the effective dimension reduction of these features. For future studies, it is recommended to use other databases, the empirical mode decomposition approach and other methods of signal processing, and to apply optimal feature selection approaches such as the cuckoo algorithm, colonial competitive algorithm, fuzzy logic, neuro-fuzzy network, as well as using several classifiers for classification.