Abstract

Optical burst switching (OBS) networks are frequently compromised by attackers who can flood the networks with burst header packets (BHPs), causing a denial of service (DoS) attack, also known as a BHP flooding attack. Nowadays, a set of machine learning (ML) methods have been embedded into OBS core switches to detect these BHP flooding attacks. However, due to the redundant features of BHP data and the limited capability of OBS core switches, the existing technology still requires major improvements to work effectively and efficiently. In this paper, an efficient and effective ML-based security approach is proposed for detecting BHP flooding attacks. The proposed approach consists of a feature selection phase and a classification phase. The feature selection phase uses the information gain (IG) method to select the most important features, enhancing the efficiency of detection. For the classification phase, a decision tree (DT) classifier is used to build the model based on the selected features of BHPs, reducing the overfitting problem and improving the accuracy of detection. A set of experiments are conducted on a public dataset of OBS networks using 10-fold cross-validation and holdout techniques. Experimental results show that the proposed approach achieved the highest possible classification accuracy of 100% by using only three features.

1. Introduction

Optical burst switching (OBS) in networks has become an important dynamic sub-wavelength switching technique and a solution for developing the new type of Internet backbone infrastructure [1]. The OBS network mainly consists of three types of nodes, namely, core nodes, ingress, and egress. The core nodes represent the intermediate nodes, which are designed to reduce the processing and buffering of the optical data burst using a control data packet with specific information, namely, burst header packets (BHPs) [2].

In a network with burst traffic, OBS plays an essential role for packet switching with a higher level of necessary details than other existing networks’ switching techniques. However, this type of switching is still suffering from several challenges such as security and quality of service (QoS) due to BHP flooding attacks. The function of BHP in OBS is to reserve the unused channel for the arrival of a data burst (DB). This function can be exploited by attackers to send fake BHPs without DB acknowledgment. Such fake BHPs can affect the network and reduce its performance through decreasing bandwidth utilization and increasing data loss, leading to a denial of service (DoS) attack [3], which is one of the most crucial security threats to networks.

Several methods have been proposed to tackle DoS and BHP flooding attacks on OBS networks in the literature and have achieved satisfactory results [46]. However, due to the limited capability of OBS core switches, developing a lightweight method that can attain high accuracy with a small number of features is still a challenging issue for developers and researchers.

In this research, an effective and efficient approach is proposed for securing the OBS networks. Thus, the main objective of the work is to develop a lightweight ML model for detecting BHP flooding attacks based on the information gain (IG) feature selection method and a decision tree (DT) classifier. To achieve this objective, two key research questions are formulated to answer throughout this study. The first research question is does the feature selection method improve the effectiveness of the DT model to detect the BHP flooding attacks. The second research question is does the feature selection method improve the efficiency of the DT model for detecting the BHP flooding attacks. Actually, the lightweight property of the model comes from the fact that only a small number of features are used to build the classifier. The model will be evaluated using a public OBS dataset based on a set of performance metrics such as accuracy, precision, recall, and F-measure.

The remainder of the research is organized as follows:(i)In Section 2, related works are introduced to give details about the proposed approaches and methods of DoS attack on different networks.(ii)Section 3 presents the proposed approach architecture for detecting the BHP flooding attacks on OBS networks.(iii)Section 4 explains the experimental setup and results in more detail.(iv)Section 5 presents the conclusion of the study.

Nowadays, machine learning (ML) methods have been used in many intrusion detection systems (IDSs) to detect several types of network attacks. However, feature selection methods are also used to select the significant features of network traffic without reducing the performance of the IDSs [7]. Feature selection is the process of selecting the best set of features that can be most effective for classification tasks [8, 9]. The high number of features may decrease the performance and accuracy of many classification problems [10, 11].

In the field of optimization, feature selection methods are classified in three main approaches: embedded, wrapper, and filter methods [12]. For the filter methods, there are two major types of evaluation: subset feature evaluation and groups of individual feature evaluation. In the groups of individual feature evaluation, heuristic or metaheuristic filter methods or even the hybrid of them is utilized for ranking the features and then the best of them is selected based on some thresholds [11, 13]. In contrast, the subset feature evaluation methods find the subset of candidate features using a certain measure or a certain strategy. They compare the previous best subset with the current subset for finding the candidate subset of features. In the groups of individual feature evaluation methods, the redundant features are kept in the final subset of selected features according to their relevance but the group of subset feature evaluation methods removes the features with similar ranks. In general, the filter methods are considered as classifier-independent approaches [13]. The wrapper methods are classifier-dependent approaches that take each time a subset of features from the total features and calculate the accuracy of classifiers to find the best subset. Therefore, they are time consuming compared with filter methods [14]. The embedded methods combine wrapper and filter methods [15]. In this study, a filter-based method is used for feature selection.

In the literature review of intrusion detection, a set of ML and deep learning (DL) methods have been widely used to detect different types of attacks in several works [1620]. Meanwhile, a set of related works have also been proposed for detecting BHP flooding attacks using different ML methods like the decision tree (DT) method in [21]. This work evaluated the performance of the adopted method using different metrics and reported a 93% accuracy rate in classifying the classes of BHP flooding attack. Liao et al. [22] introduced a classification approach to classify the access patterns of various users using sparse vector decomposition (SVD) and rhythm matching methods. This study demonstrates that the approach is able to distinguish between the intruders and the legal users in the application layer.

Xiao et al. [23] offered an effective scheme for detecting a distributed DoS attack (DDoS) using the correlation of the information generated by the data center and the k-nearest neighbors (KNNs) method. They analyzed the flows of data traffic at the center to identify normal and abnormal flows. In [24], the authors proposed an approach for detecting DDoS attacks based on seven features and using an artificial neural network (ANN) method with a radial basis function (RBF). This NN-RBF approach can classify the data traffic into attack or normal classes by sending the IP address of the incoming packets from the source nodes to be filtered in the alarm modules which then decide if these data packets can be sent to the destination nodes.

The authors in [25] applied a data mining method for detecting a DDoS attack using the fuzzy clustering method (FCM) and a priori association algorithm to categorize the data traffic patterns and the status of the network. Another ML approach in [26] used a DT method with a grey relational analysis for detecting DDoS attacks. They also applied the pattern matching technique to the data flows for tracing back the estimated location of the attackers.

Alshboul [27] investigated the use of rule induction nodes for BHP classification in OBS networks. The author applied a set of data mining methods to the public OBS network dataset. He reported that the repeated incremental pruning to produce error reduction (RIPPER) rule induction algorithm, Naïve Bayes (NB), and Bayes Net were able to achieve a predictive accuracy of 98%, 69%, and 85%, respectively.

Chen et al. [28] developed a detection method to identify a DDoS attack using ANN. A set of different simulated DoS attacks were used for training the ANN model to recognize abnormal behaviors. Li et al. [29] offered different types of ANN models, including learning vector quantization (LVQ) models, to differentiate traffic associated with DDoS attacks from normal traffic. The authors converted the values of the dataset features into a numerical format before feeding them into the ANN model.

In [30], the authors presented a probabilistic ANN approach for classifying the different types of DDoS attacks. They categorized the DDoS attacks and normal traffic by applying radial basis function neural network (RBF-NN) coupled with a Bayes decision rule. Nevertheless, the approach concentrated on the events of unscrambling flash crowds generated by DoS attacks.

Li and Liu [31] proposed a technique that integrates the network intrusion prevention system with SVM to improve the accuracy of detection and reduce the incidents of false alarms. In [32], Ibrahim offers a dynamic approach based on distributed time-delay ANN with soft computing methods. This approach achieved a fast conversion rate, high speed, and a high rate of anomaly detection for network intrusions.

Gao et al. [33] introduced a data mining method for analyzing the piggybacked packets of the network protocol to detect DDoS attacks. The advantage of this method is to retain a high rate of detection without manual data construction. Hasan et al. [34] proposed a deep convolutional neural network (DCNN) model to detect BHP flooding attacks on OBS networks. They reported that the DCNN model works better than any other traditional machine learning models (e.g., SVM, Naïve Bayes, and KNN). However, due to the small number of samples in the dataset and the limited resource constraints of OBS switches, such deep learning models are not effective tools to detect BHP flooding attacks and they are not computationally efficient to run in such network.

3. Proposed Approach

The proposed approach in this paper consists of two main phases: feature selection and classification. The input of the approach is a set of OBS dataset features collected from network traffic. The output of the approach is a class label of the BHP flooding attacks. The flowchart of the proposed approach is illustrated in Figure 1.

In the feature selection phase of the approach, the input features of OBS network traffic are prepared for processing by using the information gain (IG) feature selection method. The purpose of IG is to rank the features and discover the merit of each of them according to the information gain evaluation of the entropy function. The output of the feature selection phase is a scored rank of features in decreasing order according to their merit, whereby adding any feature decreases the features merit.

This is then followed by the classification phase, in which the dataset with selected features will be used to train and test the DT classifier to detect attacks on OBS networks. The output of the classification phase is a DT trained model that is able to classify the BHP flooding attacks and return the class label of that attack. The following sections explain the methods used in the two phases of the proposed approach.

3.1. Information Gain (IG) Feature Selection Method

Information gain (IG) is a statistical method used to measure the essential information for a class label of an instance based on the absence or presence of the feature in that instance. IG computes the amount of uncertainty that can be reduced by including the features. The uncertainty is usually calculated by using Shannon’s entropy (E) [35] aswhere represents the number of class labels and is the probability that an instance in a dataset can be labeled as a class label c by computing the proportion of instances that belong to that class label for the instance as follows:

A selected feature divides the training set into subsets according to the values of , where has distinct values. The information required to get the exact classification is measured bywhere represents the weight of jth subset, is the number of instances in the dataset , is the number of instances in the subset , and is the entropy of the subset . Therefore, the IG of every feature is calculated as

After calculating the IG for each feature, the top k features with the highest IG will be selected as a feature set because it reduces the information required to classify the flooding attack.

3.2. Decision Tree Method

Decision tree (DT) is a tree-like model of decisions with possible consequences that is commonly used in the fields of data mining, statistics, and machine learning [36]. In machine learning, the goal of DT is to build a model that predicts or classifies the value of a target class based on a learning process from several input features. The tree model that has a target class label with discrete values is called a classification tree model. In this model, the tree leaves constitute the values of the class label and the tree branches constitute aggregations of features that produce this class label.

DT learning is a simple process to represent the features for predicting or classifying instances. DT models are created by splitting the input feature set into subsets that establish the successor nodes of the children, thereby establishing the tree root node. Based on a set of splitting rules on the values of the features, the splitting process for each derived subset is repeated in a recursive manner [36]. This recursive manner is stopped when the splitting process no longer adds values to the predictions or when the subset of nodes have all the same values of the target class label.

The DT can be described also as a mathematical model to support the categorization, description, and generalization of a given dataset.

Assume the dataset comes in the form of records as follows:where the variable y is a dependent target variable that we need to generalize or classify. The vector x consists of the features , which are led to the variable y.

In principle, the DT is based on the C4.5 algorithm [37], which is an updated version of the ID3 algorithm [38]. C4.5 can avoid the overfitting problem of ID3 by using the rule-post pruning technique to convert the building tree into a set of rules.

DT is used in the proposed approach because it is simple, very intuitive, and easy to implement. Furthermore, it deals with missing values, requires less effort in terms of data preprocessing, and does not need to scale or normalize the data [36].

4. Experiments and Discussion

The experiments of this research are implemented using a popular open source tool called the Waikato Environment for Knowledge Analysis (Weka) software [39], which offers a rich toolbox of machine learning and data mining methods for preprocessing, analyzing, clustering, and classification. It offers Java-based graphical user interfaces (GUIs). The implementation was performed on a laptop with an Intel Core i7 CPU processor, 2.0 GHz, 8 GB RAM, and a Windows 10 64 bit operating system. Due to the scarcity of OBS historical data, the experiments were conducted on a public optical burst switching (OBS) network dataset [1].

4.1. OBS Network Dataset Description

The OBS network dataset is a public dataset, available from the UCI Machine Learning Repository [1]. It contains a number of BHP flooding attacks on OBS networks. There are 1,075 instances with 21 attributes as well as the target class label. This target label has four types of classes, which are NB-no block (not behaving-no block), block, no block, and NB-wait (not behaving-wait). All dataset features have numeric values except for the node status feature that takes a categorical value out of three values: B (behaving), NB (not behaving), and potentially not behaving (PNB). The description of the dataset features is given in Table 1.

Table 2 shows the number of instances for each class in the dataset, while Figure 2 shows the distribution of instances over different types of BHP flooding attacks. This figure is deduced from the dataset.

4.2. Evaluation Measures

The experimental results will be evaluated using four evaluation measures. These measures are precision, recall, F-measure, and accuracy. The following equations show how these evaluation measures are computed:where FP is the number of false positives, FN is the number of false negatives, TP is the number of true positives, and TN is the number of true negatives.

4.3. Results and Comparisons

In this section, the experimental results for both the feature selection and classification phases of the proposed approach are given in detail. The average rank score and average merit of features from the IG feature selection method are shown in Table 3 and are based on a 10-fold cross-validation with stratified sampling in order to guarantee that both training and testing sets have the same ratio of classes.

In Table 3, the dataset features are ranked in decreasing order according to their significance to target classes. The reason behind this variation in the feature significance is that the target class has four categorical labels, and for each label, different values for each feature are assigned. Therefore, the rank score from the IG method determines how much each feature contributes to the target class label.

The rank scores in Table 3 show that the “packet received,” “10-run-AVG-drop-rate,” and “flood status” features have higher scores than all the other features. Thus, the hypothesis that those first three features (packet received, 10-run-AVG drop-rate, and flood status) are more influential and more correlated to the labels of target class will be checked experimentally in the following paragraphs.

To accept or reject this hypothesis, the evaluation results of the DT method are presented using all features and the combinations of the three selected features. These evaluation results are reported based on the holdout and 10-fold cross-validation techniques. For the holdout technique, the dataset is divided into 75% for training and 25% for testing. Before applying the DT method for classifying the types of BHP flooding attacks and getting the results, an analysis of the DT parameters is investigated to tune and select the best values of these parameters.

Practically, the DT classifier (J48) in Weka performs the pruning process based on a set of parameters, which are the subtree raising, the confidence factor, and the minimal number of objects. The default values of these parameters are true, 0.25, and 2, respectively. The subtree raising is the parameter that can be used to move the node of the tree upwards towards the root that can replace other nodes during the pruning process. Confidence factor is a threshold of acceptable error in data through pruning the DT and this value should be smaller. However, in the proposed approach, the values of subtree raising and confidence factor parameters are set to have the default values. The minimal number of objects is very important parameter to represent the minimal number of nodes in a single leaf. It is used to obtain smaller and simpler decision trees based on the nature of the problem. For tuning the minimal number of objects parameter, we try a set of different values for selecting the best value of this parameter. Figure 3 shows the accuracies of proposed approach at different values of minimal number of objects in the range from 2 to 5. These accuracies are obtained using the holdout technique with 75% training and 25% testing.

As shown in Figure 3, it is clear that the best values of minimal number of objects in a single leaf are 1 and 2 that generate a simple and accurate DT model. The value of this parameter is set to be 2 to make the DT model moderately simple.

Once the values of DT parameters are selected, the evaluation results of the proposed approach are reported in next tables and figures. Table 4 presents the evaluation results of the holdout technique for classifying BHP flooding attacks using all features in the dataset. Figure 4 shows the confusion matrix of classification for the 25% testing set.

Table 5 illustrates the evaluation results of the holdout technique for classifying the BHP flooding attacks using the first three selected features (packet received, 10-run-AVG-drop-rate, and flood status) of the dataset, and Figure 5 shows the confusion matrix of this evaluation result.

From Tables 4 and 5, as well as from Figures 4 and 5, it is clear that the selected features improved the values of evaluation measures for the DT method to classify the BHP flooding attacks. Moreover, for efficiency, detecting attacks using only three features is more efficient for the OBS core switches, which have limited resources.

To validate the evaluation results, other experiments for the DT classification method based on the 10-fold cross-validation technique were conducted using all features and using the first three selected features from the IG feature selection method. Table 6 shows the evaluation results, and Figure 6 shows the confusion matrix for classifying the BHP flooding attacks using all features based on the 10-fold cross-validation technique.

Similarly, Table 7 and Figure 7 present the evaluation results and the confusion matrix, respectively, for classifying the BHP flooding attacks using the first three selected features based on the 10-fold cross-validation technique.

The evaluation results in Tables 6 and 7 and Figures 6 and 7 validate the evaluation results of the 10-fold cross-validation technique that confirm the remarkable performance of the proposed approach. After further investigation, the evaluation results of the DT classification methods using one and two features from the first three selected features are compared with the previous results of the holdout and the 10-fold cross-validation techniques and are shown in Figure 8.

Table 8 shows and summarizes a comparison between the proposed approach and the recent related works on the OBS network dataset. In this comparison, we can see that the proposed work achieves the highest accuracy result with a small number of features compared to all these recent works.

The results presented in Figure 8 and Table 8 prove the hypothesis of the proposed approach that says that the first three selected features using the IG method are more influential and more correlated to the labels of BHP flooding attacks than any of the other features.

4.4. Result Analysis

For analyzing the results and linking the results with conclusion, we show how the proposed feature selection method can improve the model from three different angles: reducing overfitting, improving accuracy, and reducing training and testing (prediction) time.

From the definition of the overfitting problem, it occurs when the training errors are low or very low and the validation errors are high or very high. Therefore, reducing the overfitting problem requires to reduce the gap between the training and validation error. To show how the proposed method can reduce the overfitting problem, we depict the training error against the validation error in Figure 9 with different sets of features, which are ordered according to rank score given in Table 3. The training percentage is set to 75%, and the validation percentage is 25%. We notice that the gap between the training and validation error is decreased as the number of features is decreased until the gap reaches zero approximately when using the three selected features of the proposed method. We also notice that the overfitting problem is eliminated with 14 and 7 features. In our opinion, the overfitting problem is eliminated with 14 and 7 features because of an implicit pruning functionality implemented by the used decision tree algorithm (J48). In addition, it is clear that the accuracy is improved by the three selected features.

To evaluate the efficiency of the proposed feature selection approach, the average time of building and testing the DT model is computed. The DT model is trained on 75% of the dataset which consists of 806 instances and tested on 25% of the dataset which consists of 269 instances. Table 9 shows the computed average time of training and testing the DT model using all features and using our three selected features.

As shown in Table 9, we can see that the DT model has a lower average time for training and testing using our three selected features than using all features. In terms of time complexity, represented by O notation, the overall average time of the DT method is O (m × n), where m is the number of features and n is the number of instances [40]. Because the number of features in classification problems is limited, the running time will be O (C × n), where C is a constant time. Therefore, the time complexity of the DT method is O (n) for classification problems. The advantage of the proposed approach is that it reduces the number of features to three features (reducing C), which leads to faster running time compared with using all features. This confirms that the approach is able to detect the attacks more efficiently, especially in congested network with limited computing resources.

We can conclude that reducing the features to three and using the pruning process of the DT classifier helped the proposed approach to reduce the overfitting problem and classify the OBS flooding attacks. Consequently, all performance results clarified the effectiveness and efficiency of the DT model based on selected features to classify BHP flooding attacks. This reveals that the proposed approach is more accurate and suitable for real-time detection in the limited computing capability of OBS core switches.

5. Conclusion and Future Work

In this paper, an effective and efficient approach using the information gain (IG) feature selection method and the decision tree (DT) classifier is proposed to detect BHP flooding attacks on OBS networks. The approach starts with selecting the most important features of OBS network traffic to improve the accuracy and efficiency of attack detection in OBS switches that have limited resources. A set of experiments is conducted on an OBS network dataset using 10-fold cross-validation and holdout techniques to evaluate and validate the approach. The experimental results demonstrate that the proposed approach can classify the class labels of OBS nodes with 100% accuracy by using only three features. The comparison with recent related works reveals that the proposed approach is suitable for OBS network security in terms of effectiveness and efficiency.

One of the limitations of the proposed approach is the lack of evaluation on more OBS datasets that can be varied in size and types of attacks due to unavailable OBS datasets other than the dataset used in the experiments of this study. Moreover, because the proposed approach is based on the decision tree method for classification, the training time is relatively expensive in case of large training datasets. However, by reducing the number of features of the proposed approach and the emergence of high-speed processors, this limitation is no longer a major problem. In future work, a large set of OBS network data will be collected for further evaluation of the proposed approach and will be made available for researchers in the field. This is due to lack of public OBS network datasets other than the dataset used in this research work.

Data Availability

The OBS-network dataset used in this study is publicly available at the UCI Machine Learning Repository [1].

Conflicts of Interest

The author declares that there are no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Acknowledgments

This study was supported by the Deanship of Scientific Research at Prince Sattam Bin Abdulaziz University.