Abstract

The evolution of cellular technology development has led to explosive growth in cellular network traffic. Accurate time-series models to predict cellular mobile traffic have become very important for increasing the quality of service (QoS) with a network. The modelling and forecasting of cellular network loading play an important role in achieving the greatest favourable resource allocation by convenient bandwidth provisioning and simultaneously preserve the highest network utilization. The novelty of the proposed research is to develop a model that can help intelligently predict load traffic in a cellular network. In this paper, a model that combines single-exponential smoothing with long short-term memory (SES-LSTM) is proposed to predict cellular traffic. A min-max normalization model was used to scale the network loading. The single-exponential smoothing method was applied to adjust the volumes of network traffic, due to network traffic being very complex and having different forms. The output from a single-exponential model was processed by using an LSTM model to predict the network load. The intelligent system was evaluated by using real cellular network traffic that had been collected in a kaggle dataset. The results of the experiment revealed that the proposed method had superior accuracy, achieving R-square metric values of 88.21%, 92.20%, and 89.81% for three one-month time intervals, respectively. It was observed that the prediction values were very close to the observations. A comparison of the prediction results between the existing LSTM model and our proposed system is presented. The proposed system achieved superior performance for predicting cellular network traffic.

1. Introduction

With the high-paced development of smartphone technology, it is estimated that there will also be rapid cellular traffic growth. Also, the existence of services with completely different needs can result in perpetually dynamic traffic patterns and network capability requirements. Smartphone Internet not only augments people’s lives with entertainment but also provides an increasing amount of necessary information and access to needed services for daily living. Ericsson has estimated a 54% increase in global mobile traffic over 2020, which is the biggest challenge to telecommunication companies in managing the large network flow while increasing the QoS [1]. An accurate base traffic load in a cellular network wherein the number of humans varies can greatly help predict the incidence of network congestion, which permits us to efficiently allocate network resources successfully. This is crucial for competitive network support, ordinary maintenance, and the scheduling of sources. There are several applicable reviews on the traffic forecasting of base stations, in particular in public locations where the number of users is always changing [2].

The speed of telecommunication technology and the number of users accessing the mobile internet have both been increasing, which present many challenges to a cellular network. The presence of many, varied users at densely populated locations (high-speed rail stations, tourist attractions, business centres, playgrounds, sports competitions, concert venues, and many others) can create rapidly increasing cellular traffic that puts massive stress on its network structure [36]. Modelling and predicting mobile network traffic can help companies find ways to enhance the QoS of the network.

Traffic-exchange prediction is primarily based on an hourly granularity, which is used to help control the on-demand allocation of network resources in order to decrease network operation costs. For fairs or other large-scale events, the number of users and size of the mobile traffic at public locations ought to be predicted rapidly and appropriately based on the change in users and the tidal effect of the traffic. The modelling and prediction can help operators grasp upcoming congestion and make network enlargements, adjustments, and optimizations earlier; also, confined Wi-Fi services can be used to fulfil network peaks. Network planning has to evolve in order to allow entry to clients without degrading service in the case of unexpected increases in traffic. Due to the impact of congestion and blocking on a large-scale network, traffic and routing must be scheduled in a timely manner to ensure that the network maintains a proper entry rate, network connections are free in crucial regions, and user access is being maintained. Therefore, the prediction of cellular network traffic for a base station for a variety of multiple users to maintain connectivity in densely populated public locations is of great importance to network safety [7].

Cellular network traffic prediction plays an important role in the design, management, and optimization modelling of a telecommunication network. The prediction of cellular traffic can permit the planning capacity of a network and the improvement of a network’s QoS. At the present time, the study of predicting 4G Long-Term Evolution (LTE) and 5G traffic is of significant interest in order to enhance QoS in telecommunications. The prediction of cellular network traffic can be distinguished by two categories: long-range and short-range prediction. Long-range prediction provides a projection for a long period and is used for validating a detailed predicting network and providing network traffic patterns that can help to more easily design networks. Short-range prediction provides projections for a short period and can help improve networks. Artificial intelligence models have been widely used in many industrial applications, such as developing a prediction model to handle cellular network traffic for the current year. For example, [7] used linear regression and [8] applied support vector machine regression (SVMR) to predict cellular network traffic. A number of studies have presented advanced prediction models based on deep learning (such as LSTM) [9] to cellular network traffic. Shu et al. [10] proposed a convolutional neural network (STDenseNet) to predict cellular traffic.

In the literature, early work covers traffic predictions for circuit-switching networks by developing statistical time-series models based on observation data like autoregressive integrated moving averages (ARIMA) [11, 12]. Additionally, a number of modern models are used to handle packet data traffic prediction with advanced time-series models based on artificial intelligence in the use of a mobile network [13, 14]. A number of time-series models have been introduced for predicting short-term traffic (in minutes and seconds) by employing deep learning [15, 16]. Some designed a model to predict radio frequency planning [17]. Time-series models were applied to predict loading traffic in telecommunication networks; in previous research works, circuit-switched traffic forecasting was addressed by developing different statistical time-series models based on experimental data. Traditional time-series models like ARIMA, estimated short-term network traffic demand, and seasonal ARIMA (SARIMA) were used to predict seasonal traffic [18], and some used exponential smoothing models (such as the Holt–Winters method) [19, 20] for finding trends and seasonality in demand traffic. Researchers have extended the linear time-series model ARMA to the generalized autoregressive conditionally heteroskedastic (GARCH) technique [12] to predict long-range dependencies. Dietterich [21] proposed a hybrid wavelet-based deep learning framework to predict the number of users connected to a mobile network. Linear regression has also been used (ARIMA) [22, 23].

Currently, advanced time-series models have been used to predict cellular network traffic, along with applied Bayesian linear regression (BLR) [24], advanced learning machines [25], support vector regression (SVR) [26], and artificial neural networks (ANNs) [2730]. Qiang et al. [31] employed support vector machine regression to forecast daily tourist traffic. In addition, SVR was implemented to predict a toxicity assessment [32], battery life forecasting [33, 34], chemical prediction [35, 36], and financial support [37, 38] and to increase agricultural production through the use of a prediction model [39, 40]. However, research has found it much more challenging to predict loading packets [41]. Machine-learning models have been used to classify abnormalities in circuit-switched traffic. In [42], the short-term traffic volume in a cellular 3G network was predicted by using traditional time-series models like Kalman filtering. In [43], an ARIMA model was applied to predict the use rate in the volume of mobile traffic. Artificial intelligence has been used for deep learning based on LSTM units [4446]. In [47], a convolutional neural network was used for prediction and modelling traffic spatial dependencies, the same as the approach in [48]. As indicated in [49], deep learning schemes, such as LSTM [50], convolutional neural networks [51] and recurrent neural networks [46], have also been applied to coarser time resolutions (e.g., an hour) to extend the forecasting horizon to several days. Artificial neural network (ANN) models have been introduced to predict network traffic in the short term (minutes and seconds) [52, 53]. The models were used to manage dynamic radio resource management [54].

In this study, a proposed hybrid model was used to predict cellular network traffic, specifically three occurrences of monthly rush-hour data traffic per cell. The mobile network traffic data had been collected from a real live 4G LTE network. The main contributions of this research are as follows:(1)Network traffic data is very complex, with many sources of noise and data formats; this makes it a big challenge for researchers to find an accurate model. We have developed a system that can help predict cellular network traffic more intelligently.(2)We have developed an intelligent system to predict LTE network traffic with superior prediction performance.

2. Materials and Methods

Figure 1 shows the framework of the proposed system to predict 4G mobile network traffic.

2.1. Dataset

The LTE 4G network traffic dataset was identified and downloaded from Kaggle; the data had been collected from 4G cell traffic (i.e., the radio transmitter serving as the device was a 4G cell). All the LTE network traffic was generated from individuals using the mobile cells (although they are not uniquely identified in the data). In the current research, we have utilised three months of the data to examine the proposed system. Table 1 shows the data samples. Figure 2 shows the cellular traffic for the three months being examined. The public dataset is available at https://www.kaggle.com/naebolo/predict-traffic-of-lte-network.

2.2. Normalization

LTE network traffic data is very complex and is composed of underlying signals with very different characteristics. However, finding the transformation behaviour in cellular networks hopefully will be an aid to improving network traffic prediction models. In order to avoid loading packets with greater numeric values in the network from dominating those with smaller numeric values, the data will be scaled; this will also increase the processing speed of the model while maintaining good accuracy. A min-max method was used to transform the data to values between zero and one; scaling the data can help in improving the system for predicting network traffic. The two main advantages of scaling are to avoid instances of greater numeric ranges dominating those with smaller numeric ranges and to prevent numerical difficulties during the prediction. The transformation is accomplished as follows:where is the minimum of the data and is the maximum of the data. is the minimum number zero, and is the maximum number one.

2.3. Single-Exponential Smoothing (SES) Model

The single-exponential smoothing (SES) model is one of the common statistical algorithms used to predict data without a trend or seasonality. The model uses one significant parameter (alpha) to adjust the weight of the observation data for the obtained prediction data. Selecting a value of this parameter depends on the evaluation metrics. The model is defined as follows:where is the level of the trend, is the input sample, is the number of samples in the dataset, and is the output. The alpha values are 0 ≤ α ≤ 10 ≤ α ≤ 1 for smoothing the training data.

2.4. Long Short-Term Memory (LSTM)

The LSTM layer contains a series of many LSTM units that together are called the LSTM model [54, 55]. LSTM models contain three multiplicative units. First, the input gate is used to memorise the information of the present. Second, the output gate is used to display the results. Third, the forget gate is used to select some forgotten information from the past. Multiplicative units consist of a sigmoid function and dot product operation. The sigmoid function has a range between zero and one, while the dot product operation determines the amount of information to transfer. If the value of a dot product operation is zero, information is not transferred, while information is transmitted when the value of a dot product operation is one. The model is described as follows:where , , and are the input, forget, and output gates, respectively, and is the number of hidden layers in the cells. The weighted neural network is presented by , , and , and is the internal memory cell for the hidden layer. The bias of the neural network is indicated by and ; is the network traffic data.

Equation (3) represents the forget gate, which takes the input at time t as the input to the activation function in order to provide its output. Equation (4) represents the input gate, and the parameters are the same as in equation (2). Equation (3) works to calculate the candidate value in memory, where “tanh” is the activation function. Equation (6) works on combining memories of the past and the present. Equation (5) represents the output gate, and the parameters are the same as in equation (3). Equation (8) represents the cell output, and “tanh” is the activation function. W represents the matrix of weight vectors, and b represents the bias vector. The parameters of the LSTM model and their values are shown in Table 2.

2.5. Model Evaluation Criteria

The mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R), and squared correlation (R2) metrics are employed as evaluation criteria. The evaluation equations are used to find the differential between the observed and predicted data and are described in the following:where are the observed responses, are the estimated responses, and is the total number of observations.

3. Experiment Results

In this section, the results of the LSTM model to predict network traffic are presented.

3.1. Environment Setup

The proposed framework was evaluated using different hardware and software environments. Table 3 shows the equipment used to develop the proposed system.

3.2. Analysis of Results

The cellular network traffic was gathered from a real 4G LTE network over a time interval of three (01/01/2018 to 30/03/2018) and was used for testing the proposed system. The LSTM model was applied to predict the loading of the cellular traffic derived from the network. Min-max normalization was proposed to scale the data into an appropriate format. Due to the network characteristics of many bursts and high complexity, a single-exponential smoothing method was used to adjust the weighting of the observation values to obtain the new output. Single-exponential smoothing was proposed to handle overlapping values in order to improve the LSTM results. The SES model depends on the smoothing constant, which has a significant parameter alpha. The values of alpha range from 0.1 to 0.5. According to the MSE metric, we found that 0.5 was an appropriate value to obtain a good prediction. The data sets were divided into 80% training and 20% testing. The hybrid model obtained superior results; the prediction values were very close to the prediction values according to the evaluation metrics. Table 4 shows the numbers in the samples in the training and testing stages.

3.2.1. Training of the Hybrid Model

Eighty percent of the cellular network traffic dataset was used for the training process. The empirical results of the hybrid system in the training phases were superior in predicting the loading traffic in the cellular network.

Table 5 demonstrates the prediction results of the SES-LSTM model during the training process. The prediction results were closer to the observation data, according to the evaluation criteria. The MSE values were 0.00017, 0.00104, and 8.1547 × 10−05 for the months of January, February, and March 2018, respectively.

Figure 3 shows the time-series plot of the hybrid model for predicting loading traffic. While the target (x-axis) values represent the errors of the model, the output (y-axis) values represent the numbers in the sample. The prediction errors varied less according to the evaluation metrics, namely, the MSE, RMSE and NRMSE. The prediction errors of the January, February and March 2018 input data were MSE = (8.93 × 10−05), MSE = (0.000104) and MSE = (3.1547 × 10−05), respectively.

Figure 4 illustrates the histogram error obtained from the SES-LSTM model at the training phase for predicting the loading traffic. Histogram errors are metrics used to find the differences between the observation and prediction data. In the training phase, the mean error in the histogram is 0.00192 for the training data of January 2018, as shown in Figure 4(a); in February 2018, the mean error is 0.0025, as shown in Figure 4(b), and the mean error of March 2018 is 5.44 × 10−05, as shown in Figure 4(c).

3.2.2. Testing of the ANFIS Model

The testing phase was used to validate the use and to test and evaluate the SES-LSTM model in predicting the loading of cellular network traffic. The testing state uses unseen data to forecast future traffic. Table 6 presents the testing results of the proposed system for the three-month time period of the data. According to the evaluation metrics, the proposed system achieved the best prediction results, MSE values of 0.000175, 8.6238 × 10−05, and 2.9927 × 10−05 in terms of the three months (January, February, and March 2018, respectively) in the testing stage.

The time-series plots of the SES-LSTM model in predict loading traffic are presented in Figure 5. The prediction values were very close to the observation values according to the evaluation metrics.

In addition, Figure 6 displays the histogram errors obtained from the hybrid SES-LSTM model. The histogram metric for the testing process is to find the difference between the observation and unseen data obtained as future loading traffic. The means and standard divisions of the histogram errors are shown at the tops of the graphic representations. It was noted that the histogram error of the SES-LSTM model was very low for forecasting future load. The maximum mean error (0.00380) of the histogram is shown in Figure 6(a). The histogram error testing phase demonstrated the effectiveness and efficiency of the proposed system.

4. Results and Discussion

The self-sufficient prediction of cellular network traffic demand will be a key function in future telecommunication companies. Considering the fact that e-business, banking, and industrial business enterprises are notably associated with special and valued information that is communicated inside a network, it is far from meaningless to mention the significance of network traffic analysis in achieving suitable information security. Cellular network traffic analysis and prediction is a proactive strategy in the desire to maintain a healthy system; the network is also monitored to make sure that security breaches no longer arise inside it. Cellular network traffic prediction is an important phase for developing a growing successful system, protecting it and preventing congestion through control schemes and discovering abnormal packets in the network traffic. The significance of this integral subject matter and our urge to make contributions in fixing the lookup problem in intelligent cellular traffic prediction is the essential purpose of this study.

Modelling and predicting network traffic can help in updating the polling on a cellular network. In previous studies, researchers used statistical approaches to predict the loading network traffic. In this study, we have developed a hybrid SES-LSTM model to predict loading traffic for a 4G LTE network. Single-exponential smoothing was applied to adjust the observation values in the computations. Prediction values obtained from the SES method were processed by using a deep leaning model.

Table 7 shows the empirical results of SES-LSTM model and existing LSTM model systems; it is noted that the proposed SES-LSTM model was superior compared with the existing deep learning LSTM model. According to the individual correlation metrics, the prediction accuracy of the January 2018 data was R2 = 88.21%; the prediction accuracy of the February 2018 data was R2 = 95.09%; and the prediction accuracy of the March 2018 data was R2 = 89.81% in the training phase. Figure 7 shows the correlation plots in the training phase for the prediction cellular loading traffic by using our proposed SES-LSTM model. In addition, Figure 8 shows the regression plots for the predicted cellular loading traffic by using the existing LSTM model at a training phase. This plot is used to find the relationship between the predicted and the actual values by using Pearson’s correlation coefficient. It was observed that the SES-LSTM model outperformed the existing system.

The hybrid model was appropriate for predicting unseen load traffic in a cellular network. The experimental results of the proposed model in the testing phase were optimal. The prediction accuracy of the January 2018 data was R2 = 88.20%, the prediction accuracy of the February 2018 data was R2 = 86.16%, and the prediction accuracy of the March 2018 data was R2 = 87.24% in the testing phase. Figure 9 shows the regression plots of the SES-LSTM model for the prediction of cellular loading traffic. The graphical representations of the prediction results of the existing LSTM system are displayed in Figure 10. Overall, the SES-LSTM model achieved the best results in the unseen data compared with the existing LSTM model. We believe the efficiency and effectiveness of our proposed system will help improve network traffic by preventing congestion and providing good planning for any network.

5. Conclusion

Network traffic modelling and forecasting play an important role in determining network performance. Also, these models can help to obtain accurate data for interpreting the important characteristics of traffic, which requires very efficient analytical study. Thus, modelling network traffic has become an essential part of assisting the design of networks and controlling bandwidth waste. A good network traffic prediction model should be able to capture prominent traffic characteristics, such as long-range dependence (LRD), short-range dependence (SRD), and self-similarity. In this study, a hybrid SES-LSTM model was proposed to predict network traffic from real cellular 4G LTE network data. In conclusion, we can draw the following points:(i)Measuring 4G LTE network behaviours can be attained only if an accurate model is designed. Our system can intelligently enhance the quality of service (QoS) of a cellular network for best future performance.(ii)Real 4G LTE network data were used to evaluate and examine the proposed system.(iii)The proposed system was novel in that it combined a statistical SES model with an advanced artificial intelligence LSTM model to improve the accuracy of the prediction values.(iv)The hybrid SES-LSTM model has shown optimal results with fewer prediction errors.(v)The results of the proposed system were compared with an existing LSTM model system; it was noted that the proposed hybrid achieved superior prediction results.(vi)We believe that the proposed system can be used in any real-time application for predicting future demand.

Data Availability

The public dataset is available at https://www.kaggle.com/naebolo/predict-traffic-of-lte-network.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Faisal University for funding this research work through project no. 216016.