Attack sample generation algorithm based on data association group by GAN in industrial control dataset☆
Introduction
Industrial control system productions are widely used in key information infrastructure, especially in the fields of energy, electricity, transportation. And the stable operation of the economy and society is influenced by the security of industrial control system products [1], [2], [3], [4], [5]. In June 2010, the Stuxnet caused a lot of damage to centrifuges in Iran’s nuclear facilities. The principle of the virus is to hijacking business data. In this case, it is very important to analyze the underlying business data such as sensors. At present, research of industrial control network security focuses on anomaly detection and situational awareness. The intrusion detection of traditional industrial control system is analyzed based on the data of the network layer [6], and only a few research achievements are based on the business dataset. A great deal of research has been done on intrusion detection, which the traditional methods include misuse detection, intrusion detection and mixed detection [7], [8], [9]. After investigating 30 public datasets, we found that only one dataset (the BATADAL datasets [10]) is pure business data, while other datasets, such as the Mississippi SCADA dataset (Mississippi State University gas pipeline dataset), are network layer datasets. In Mississippi SCADA dataset, each sample contains 27 dimensions with only one dimension which is related to business data. At present, the scarcity of attack sample industrial control datasets seriously limits the study of anomaly detection of industrial control networks. In order to solve the problem of the scarcity of attack industrial control datasets, this paper proposes a attack sample generation algorithm based on data association group by the Generative Adversarial Network (GAN) in industrial control dataset.
For grouping different dimensions in original datasets, degree of membership function is used to make data distribution associate with dataset association degree [11]. Fuzzy set theory is also used to grouping [12], [13], [14]. Therefore, the membership function can be used to calculate the association degree between different dimensions of strongly Coupling datasets.
Common attack samples are constructed by false data injection attack [15], [16]. Common false data injection attack include three types: surge attack, bias attack and geometric attack [17]. Sinusoidal attacks are proposed to solve the problems of limited and poor concealment of common false data injection attack [18]. In the study of anomaly detection for business data, the attack data is generated by add false data injection attack on the original data.
GAN can expands small sample to large sample [19], [20]. Common small sample datasets are expanded by the GAN.
In this paper, proposes a attack sample generation algorithm based on data association group by GAN in industrial control dataset. The association degree of original industrial control dataset is calculated by the membership function. And the association groups is divided according to the association degree and the weight coefficient given by the expert experience. Then according to the frequency of association group, strong association group and weak association group are obtained. The attack sample is generated by the false data injection attack based on the result of the associated grouping. Finally, the negative sample is expanded by GAN to enlarged samples, and the negative sample dataset generation is realized.
Section snippets
Industrial control system
In order to solve the problem of attack sample in industrial control network, it is necessary to understand the framework of industrial control network and possible intrusion attacks. The Fig. 1 describes the spatial distributed industrial control system model.
The operation of the controlled process is controlled by the controller, which can receive the measured values of sensors distributed in different regions and transmit the control signals to the spatially distributed actuators using the
Dataset processing
Select the business dataset in the BATADAL datasets (hereinafter referred to as BATADAL dataset) and an oil depot business dataset (hereinafter referred to as oil depot dataset) for the experiment. Two dataset details are shown in Table 2.
In datasets, data of all sensors in each time point corresponds to the sample at that moment, data of whole time points of each sensor corresponds to the corresponding dimension. In order to experiment with dataset, need several step of preprocessing:
1. Remove
Experimental environment
operating system: Windows10
CPU: Intel(R) Core(TM) i7-9750U CPU @2.60 GHz
internal storage: 16 GB
debugging environment: python 3.5.6,PyTorch 1.3.0
Experimental result
Generated attack samples of the 50,000 pieces BATADAL dataset, the 100,000 pieces BATADAL dataset and the oil depot dataset. Through the algorithm in this paper, attack sample quantity is generated as shown in Table 4.
The generated samples were compared with the original data by DED, TFD, SVM and XGBoost. The result shown in Table 5:
It can be seen from
Conclusion
In this paper, problem of lack of business data in industrial control system is studied, and propose a attack sample generation algorithm. Firstly, the correlation grouping results are obtained by means of weight and membership distribution, and then the strong association grouping results are attacked to obtain the attack samples. Finally, the GAN is used for sample expansion. This paper use open dataset and one oil depot dataset generate attack samples, the coincident degree and trend fitting
CRediT authorship contribution statement
Wen Zhou: Supervision, Funding acquisition. Xiang-min Kong: Methodology, Software, Validation, Writing - original draft. Kai-li Li: Formal analysis, Investigation, Writing - original draft. Xiao-ming Li: Writing - review & editing. Lin-lin Ren: Writing - review & editing. Yong Yan: Writing - review & editing. Yun Sha: Resources. Xue-ying Cao: Methodology, Data curation, Visualization. Xue-jun Liu: Project administration, Conceptualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (32)
- et al.
A double-ended Raman temperature measurement method for hazardous chemicals warehouse
Optik
(2018) - et al.
Fairness-based packing of industrial IoT data in permissioned blockchains
IEEE Trans. Ind. Inf.
(2020) - et al.
Complementarity reformulations for false data injection attacks on PMU-only state estimation
Electr. Power Syst. Res.
(2020) - et al.
Information generative Bayesian adversarial networks: A representation learning model for transmission gear parameters
IEEE/ASME Trans. Mechatronics
(2019) - et al.
2019 China Internet Network Security Report
(2020) - et al.
Safety analysis of industrial control system
Netinfo Secur.
(2012) Research on Corners Detection of Hazardous Chemicals Stackings and Distance Measurement Algorithm Based on Piecewise Straight-Line Fitting
(2019)- et al.
A review on intrusion detection of industrial control systems
J. Commun
(2017) Research on Intrusion Detection Analysis Method Based on Modbus TCP Industrial Control Network
(2017)- et al.
Modbus Tcp network intrusion detection method based on multiple types of attacks
Inf. Technol.
(2020)
Research on bidirectional matching algorithm of variable threshold SIFT based on DBSCAN
J. Chem. Eng. Japan
Battle of the attack detection algorithms: Disclosing cyber attacks on water distribution networks
J. Water Resour. Plann. Manag.
Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning
IEEE Trans. Syst. Man Cybern. B
Influence of machine learning membership functions and degree of membership function on each input parameter for simulation of reactors
Sci. Rep.
Distributivity of implication operator on overlap and grouping functions in interval-valued fuzzy set
J. Jilin Univ. (Sci. Ed.)
Tuning membership functions of kernel fuzzy classifiers by maximizing margins
Memetic Comput.
Cited by (8)
Industrial cyber-physical systems protection: A methodological review
2023, Computers and SecurityDeep H2O: Cyber attacks detection in water distribution systems using deep learning
2023, Journal of Water Process EngineeringAn ensemble deep federated learning cyber-threat hunting model for Industrial Internet of Things
2023, Computer CommunicationsCitation Excerpt :However, this integration increases their attack surfaces and risks of being targeted by cyber-attackers [4,5]. One high-profile example is the Stuxnet campaign, which targeted Iranian centrifuges for nuclear enrichment in 2010, causing severe damage to the equipment [2,6]. Another example is the incident targeting a pump that resulted in the failure of an Illinois water plant in 2011 [7].
Digital Twins Temporal Dependencies-Based on Time Series Using Multivariate Long Short-Term Memory
2023, Electronics (Switzerland)Functional Pattern-Related Anomaly Detection Approach Collaborating Binary Segmentation with Finite State Machine
2023, Computers, Materials and ContinuaConstruction and Processing Method of Industrial Internet Attack Behavior Dataset
2023, Proceedings - 2023 IEEE International Conference on Smart Internet of Things, SmartIoT 2023
- ☆
The article has been supported by the National Key Research and Development Program of China (Grant No.2018YFC0824801) and CNAF KJ2019003.