A hybrid quantum ensemble learning model for malicious code detection

Qibing Xiong; Xiaodong Ding; Yangyang Fei; Xin Zhou; Qiming Du; Congcong Feng; Zheng Shan

doi:10.1088/2058-9565/ad40cb

1. Introduction

In recent years, machine learning has been increasingly used in many fields such as image processing, text analysis, and speech recognition, which has greatly promoted the intelligence and efficiency of people's work and life [1–3]. Ensemble learning as a mainstream model in machine learning, is often used to improve the performance and generalization ability of multiple weak classifiers, and has been widely used in many fields [4, 5]. The basic idea of ensemble learning is to construct different single and simple machine learning models by using different strategies, which can be the same type or different types, at the end combine these machine learning models into stronger models to make the final decision.

With the rapid development of the Internet of Things, cloud computing and mobile Internet, the data scale is growing geometrically, posing a serious challenge to the existing computer processing capabilities. Quantum computing is a new computing model with parallel computing capability and information carrying capacity that is difficult to reach by classical computing technology, which can effectively improve the speed of computation and realize the exponential acceleration of computation, and is expected to be a feasible solution to the arithmetic bottleneck of classical computing [6, 7]. Advances in hardware accelerate quantum computing into reality [8, 9]. The combination of quantum computing and machine learning has achieved a series of results, such as quantum support vector machine (QSVM), quantum K-nearest neighbors, quantum neural network, quantum reinforcement learning, quantum ensemble learning, and quantum cluster algorithms [10–18]. Classical-quantum hybrid algorithms introduced into the computational process of classical machine learning can effectively rely on quantum computing for acceleration, which has become one of the current research hotspots in the field and gradually gained the attention of scholars [19–26].

Cyberspace security is becoming more and more important with the deeper application of the technology of the network. Malicious code has become the main carrier of network attacks, seriously threatening the security of cyberspace. July 2023, AV-TEST statistics show that the number of malicious code has exceeded 10 billion, also the scale of malicious code on windows platform has reached more than 796 million, accounting for as high as 75.38% [27]. The current mainstream CPU+GPU model in classical machine learning is difficult to effectively deal with the rapid growth of malicious code, and there is an urgent need to propose new solutions.

For this reason, this paper designs a hybrid quantum ensemble learning model and applies it to the field of malicious code detection. The main innovative work is as follows:

Designed a hybrid classical-quantum machine learning model. Classical machine learning layer, constructed by classical machine learning methods such as MLPC (multilayer perceptron classifier), AdaBoost (Adaptive Boosting), GBDT (Gradient Boosting Decision Tree), used the stacking method to combine the above models. Quantum machine learning layer, constructed by heterogeneous quantum classifiers such as variational quantum classifier (VQC), QSVM and quantum K nearest neighbors (QKNN).
In the quantum machine learning layer, the two methods of parameter optimization and bagging method are used to achieve the improvement of quantum classifier performance. Firstly, a more accurate QKNN distance metric named polar distance is realized, is used to achieve an accurate measure of distance in quantum scenarios. Secondly, the Bayesian parameter optimization strategy is incorporated to obtain the optimal parameter combination of QSVM. Thirdly, Bagging ensemble learning method is used to realize the performance of QSVM and VQC again.
In the classical machine learning layer, the stacking method is used to cross-learning train a variety of heterogeneous classical machine learning models to enhance the feature extraction process. At the same time, it achieves the effect of dimensionality reduction of the data, and ensures that the model has a strong generalization ability.
Experimental validation on malicious code dataset, realizing the effective processing of high latitude data. We use the hybrid quantum ensemble learning model of this paper to the field of malicious code detection, and explore the current feasible quantum machine learning achievable path and application tasks.

The rest of this paper is organized as follows. Section 2 presents related works on classical machine learning, QSVM, QKNN, VQC, and so on. Section 3 provides the description and architecture of the hybrid quantum ensemble learning. In section 4, the datasets and experiments are described, and the comparative results are presented. Finally, section 5 concludes the paper and presents future work.

2. Related works

2.1. Classical machine learning

AdaBoost is a learning algorithm that uses Boosting enhancement by adding a new base classifier to each round of training until a predetermined error rate or a specified maximum number of iterations is reached [28]. It is adaptive reflected by that the samples wrongly classified by the previous base classifier are enhanced, i.e. they receive higher weight values, and the weighted whole samples are again used to train the next base classifier.

GBDT is an iterative decision tree algorithm [29]. The basic idea is to iteratively train a series of decision trees and combine them to form a strong classifier. In each iteration, GBDT will calculate the residual (the difference between the actual value and the predicted value) according to the prediction results of the previous round of models, and use the residual as the training target of the next round of models.

Random Forests consists of many decision trees, each of which is unrelated to the other [30]. A forest is built in a randomized way. After building the forest, when a new sample enters, each decision tree will predict it separately and then give the classification result based on the voting method.

SVM is a supervised learning algorithm whose decision-making process is a maximum-margin hyperplane for the solution of the learning samples [31]. The main steps are: find a number of data on the edge of the set (called Support Vector), use these points to find a plane (called the decision surface), so that the distance from the support vector to the plane is maximized.

MLPC is a feed-forward neural network based on classifier, which is a neural network model with forward propagation mechanism containing one or more layers of hidden nodes between the input and output layers [32]. It is learning process uses BP algorithm and the optimization problem is abstracted into logistic loss function.

2.2. QSVM

Kernel methods are an important class of methods in machine learning, and support vector machine is a typical representative of kernel methods. On classical computers, when the feature space of the data increases, the computational cost of kernel function estimation grows tremendously, which makes the advantages of kernel methods decrease and restricts their application scope. For this reason, some scholars have proposed to combine quantum computing and kernel methods in classical machine learning to fully utilize quantum computing to realize the acceleration of the inner product operation of data vectors, combined with the characteristics of high-latitude Hilbert Space, to improve the performance and advantages of the support vector machine.

QSVM contains two mainstream implementations, one based on variational quantum circuit and the other based on quantum kernel estimation methods [10]. In [33], the authors have applied the VQA method to the support vector machine algorithm solve the SVM optimization problem and transmit the optimized parameters to efficiently classify new data. Yang et al [34] proposed a QSVM model based on optimized HHL quantum circuits containing optimized preprocessing units as well as a result readout method for kernel matrix generating circuit, which reduces the depth of the quantum circuit [35].

At the heart of the QSVM algorithm is a non-sparse matrix power arithmetic technique that efficiently performs matrix inversion on the inner product matrix of the training data. In [34], the authors change the original SVM problem, a quadratic programming problem, into a problem of solving a linear equation system:

$\begin{align}F\left( {\begin{array}{*{20}{c}} b \\ {\vec{\alpha} } \end{array}} \right) \equiv \left( {\begin{array}{*{20}{c}} 0&{{{\vec{1}}^T}} \\ {\vec{1}}&{K + {\gamma ^{ - 1}}I} \end{array}} \right)\left( {\begin{array}{*{20}{c}} b \\ {\vec{\alpha} } \end{array}} \right) = \left( {\begin{array}{*{20}{c}} 0 \\ {\vec{y}} \end{array}} \right)\end{align} \tag{ 1 }$

here $K$ is the kernel matrix, and ${K_{ij}} = x_i^T \cdot {x_j}$ . Also, the SVM problem is transformed into a matrix exponentiation and matrix inverse problems. A similar HHL algorithm quantum circuit can be constructed for solving the two problems.

2.3. QKNN

In the classical KNN algorithm, the classification performance of the nearest neighbors method based on the distance metric is generally better and has been widely used. However, the computational complexity of the classical KNN algorithm is high, so it is of high theoretical and applied value to introduce the acceleration effect of quantum computing into the KNN algorithm.

Hamming distance and Manhattan distance are the two commonly used distance metrics. Some scholars have tried to realize the calculation of the two distances with the help of quantum circuit can effectively accelerate the distance metric process and achieve a quadratic acceleration effect [36–38].

In [39], the authors proposed a QKNN algorithm based on quantum fidelity measure, which uses quantum circuit with integrated switching tests, loads two N-dimensional states and computes the fidelity between them. In [11], the authors have proposed polar distance to enhance QKNN classification performance. Due to the quantum solution by Euclidean distance, there is a large error and a classical computational part. The algorithm is designed to implement a better distance metric using quantum data as an alternative to the Euclidean distance with better results.

According to polar coordinates, the polar distance in [11] combines cosine similarity and module length similarity, which is better suited to quantum distance metrics. The formula for calculating the polar distance is shown equation (2). The polar distance quantum circuit is shown in figure 1.

**Figure 1.** The polar distance quantum circuit, we take the calculation of the similarity between two samples $\left| x \right\rangle$ and $\left| v \right\rangle$ as an example.
Download figure:
Standard image High-resolution image

$\left| x \right\rangle $ — **Figure 1.** The polar distance quantum circuit, we take the calculation of the similarity between two samples $\left| x \right\rangle$ and $\left| v \right\rangle$ as an example.
Download figure:
Standard image High-resolution image

$\begin{align}d = {d_c} \cdot \left( {1 - w} \right) + {d_r} \cdot w \quad \left( {{d_c},{d_r},w \in \left[ {0,1} \right]} \right) .\end{align} \tag{ 2 }$

Among them, ${d_c}$ and ${d_r}$ represent cosine similarity and module length similarity, respectively, $w$ represents an adjustable parameter to better improve the performance of classification.

2.4. VQC

VQC is a hybrid quantum classical algorithm based on variational optimization for NISQ (Noisy Intermediate-Scale Quantum) [23], which is the most promising algorithmic direction to realize quantum dominance in the near future. The approximation and solution of relevant problems without complete error correction is achieved by fully utilizing the expressive power of a large number of noise-containing quantum bits. VQCs are characterized by the fact that their parameters can be adjusted and are also called parameterized quantum classifiers [40].

Variational quantum algorithm is essentially a parameterized Unitary transform characterized by the inclusion of learnable free parameters. In this case, the Unitary transform can be expressed as $U = {\text{ }}{e^{iH}}$ and $H$ is the Ermie matrix. The Ermie matrix can be transformed into a linear combination of the tensor product of the Pauli matrices ${\text{ }}\left\{ {{\sigma _x},{\sigma _y},{\sigma _z}} \right\}$ and the unit matrix I. The Pauli matrices correspond to a quantum gate. Consequently, the quantum variational algorithm can be expressed as:

$\begin{align}U\left( \theta \right) = {U_N}\left( {{\theta _N}} \right)V \cdots {U_2}\left( {{\theta _2}} \right)V{U_1}\left( {{\theta _1}} \right).\end{align} \tag{ 3 }$

Among them, ${U_i}\left( {{\theta _i}} \right)$ represents a parametric quantum gate, V represents a non-parametric quantum gate, and N represents the layers of quantum circuit. According to the quantum noise, the effective circuit width (the number of quantum bits) and depth (the layers of quantum circuit) that quantum devices can provide in the near future will be limited.

2.5. Ensemble learning

In the research related to ensemble learning, the main frameworks can be categorized into three types: Boosting method, Bagging method and Stacking method [4, 5]. Since Bagging and Stacking methods are used in this paper, only these two methods are introduced.

Bagging method is mainly used to construct several homologous base models by bootstrap operation on the original data distribution and utilizing multiple sample sets after sampling. In this way, those base models can be obtained from different training sample sets without changing the base algorithm, and the final prediction result can be obtained by combining multiple base classifiers through voting or other strategies.

Stacking method is used to minimize the generalization error of one or more learners, and their core idea is similar to a complex version of cross-validation. In Stacking model, first carries out a K-fold cross-validation on the dataset, and uses multiple base classifiers to make predictions on these data to get the output results, and then combines the predictions into new features, and trains on the new features using new learners. The individual base learners in Stacking should generally be independently accurate and vary from one base learner to another.

In quantum ensemble learning, [14] proposed a quantum classifier integration framework for quantum classifier fusion and each classifier is weighted according to its performance in classifying training data. Macaluso et al [41] combined quantum superposition, entanglement, and interference in order to build an ensemble of classification models to form a model for quantum integrated classification, also with exponential speedup.

2.6. Features of malicious code

Currently, the mainstream malicious code detection and identification methods mainly include static feature analysis method, dynamic feature analysis method, and hybrid feature analysis method. Static feature analysis method refers to not actually executing the malicious code itself, but scanning the file to obtain the content of the sample, or obtaining the code information of the sample through disassembly tools (e.g. IDA Pro, OllyDump, etc), and extracting the structure, statistical values, images, as the sample features from them [42–44]. In this paper, we mainly use the statics features, such as the opcode sequence, PE file structure, and the string statistical features.

2.6.1. Opcode sequence

Opcode sequence is a commonly used feature in malicious code detection, which contains a lot of information about how the code works. The code needs to be compiled into a single instruction in the computer to perform the function of the code. An instruction usually contains two parts: the opcode and the operand, where the opcode indicates the operation function of the instruction.

We use the disassembly tool IDA Pro to disassemble PE files to obtain the opcode and operand information of the samples, and together with the N-gram algorithm, we generate the OPCODE sequence features of PE files. The N-gram algorithm is often used for semantic extraction of sequence features. In this paper, we apply the algorithm to OPCODE sequences in order to realize the extraction of the semantic information of the code.

2.6.2. String statistical features

Special string sequences sometimes play an excellent effect in malicious code detection. In this paper, after parsing the printable strings in PE files, we use regular expressions to extract data such as registry, URL, path, etc, and count the total number of strings, calculate the entropy value of strings as string statistical features.

2.6.3. PE file structure

The PE file stores the code's file structure attributes and execution logic and other contents. In this paper, the PE file is parsed by LIEF tool, and analyzed from the perspectives of file attributes to obtain the PE file structure features.

The file attributes including file header information, export table, import table and section information. In more detail, the header information contains time_date_stamps, machine, subsystem, sizeof_code, sizeof_headers, dll_characteristics_lists, etc. The section information contains section name, section size, entropy, virtual_size, properties, etc.

3. Methodology

In this paper, we design a hybrid quantum ensemble learning model, including two levels of classical machine learning and quantum machine learning. The classical machine learning layer incorporates the Stacking ensemble learning method, and the quantum machine learning layer incorporates the quantum weak classifier performance improvement strategy. The formal description of the hybrid quantum ensemble learning model is shown in equation (4)

$\begin{align}y = \mathop \sum \nolimits_{x = 1}^n \left\{ {{f_{c1}}\left( x \right)\mathop \cup \nolimits {f_{c2}}\left( x \right)\mathop \cup \nolimits \cdots \mathop \cup \nolimits {f_{cn}}\left( x \right)} \right\} \cdot \left\{ {{U_{z1}}\left( x \right)\mathop \cup \nolimits {U_{z2}}\left( x \right)\mathop \cup \nolimits \cdots {U_{zn}}\left( x \right)} \right\} \cdot \left\{ {{U_{\theta 1}}\left( x \right)\mathop \cup \nolimits {U_{\theta 2}}\left( x \right)\mathop \cup \nolimits \cdots \mathop \cup \nolimits {U_{\theta n}}\left( x \right)} \right\}.\end{align} \tag{ 4 }$

Here, ${f_{c1}}\left( x \right),{f_{c2}}\left( x \right), \cdots ,{f_{cn}}\left( x \right) \in {\text{Set}}\left( {{\text{ Classical}}{\text{Machine Learning}}{\text{Models}}} \right)$ ,

${U_{z1}}\left( x \right),{U_{z2}}\left( x \right), \cdots ,{U_{zn}}\left( x \right) \in {\text{Set}}\left( {{\text{Quantum}}{\text{Feature Map}}} \right)$ , and

${U_{\theta 1}}\left( x \right),{U_{\theta 2}}\left( x \right), \cdots ,{U_{\theta n}}\left( x \right) \in {\text{Set}}\left( {{\text{Quantum Classifiers}}} \right)$ .

In the classical machine learning layer, heterogeneous learning models such as AdaBoost, GBDT, MLPC, Random Forests, SVM are selected, and 5-fold cross-validation is carried out using Stacking method to realize the enhancement and dimensionality reduction of data features. In the quantum machine learning layer, quantum classifiers like QSVM, VQC, and QKNN are selected, and Bayesian parameter optimization method and parameter enhancement method are used to obtain the best parameter combination and achieve the performance enhancement of quantum classifiers. Further, we use the bagging method to integrate the above parameter-enhanced models. Table 1 shows the components of classical machine learning layer and quantum machine learning layer. Figure 2 shows the framework for hybrid quantum ensemble learning model.

**Figure 2.** The hybrid quantum ensemble model, contains classical machine learning layer and quantum machine learning layer.
Download figure:
Standard image High-resolution image

Table 1. Components of classical machine learning and quantum machine learning.

Model	Classical machine learning	Quantum machine learning
0	SVM	VQC
1	MLPC	QKNN
2	AdaBoost	QSVM
3	Random Forests
4	GBDT

3.1. Classical machine learning

The main work of the classical machine learning layer is to preprocess the data features to obtain new features with high differentiation and effectiveness. Using Stacking method, the features such as PE file structure, string statistics values, and opcode sequences of the malicious code are fed into heterogeneous learning models such as AdaBoost, GBDT, MLPC, Random Forests, and SVM for 5-fold cross-validation, and the obtained the corresponding predicted values, as the new features of the dataset, and used as inputs to the next layer (quantum machine learning layer). After the ensemble learning of the heterogeneous learning models, the validity of the new features can be effectively ensured, and the dimensionality reduction is achieved, and at the same time, the generalization ability of the model is also improved. The parameters used in AdaBoost, GBDT, Random Forests, SVM, MLPC are shown in table 2. Algorithm 1 demonstrates the classical machine learning process.

Table 2. The parameters used in AdaBoost, GBDT, Random Forests, SVM, MLPC.

Models	Key parameter1 (base estimator/Kernel function/optimizer)	Key parameter2 (the number of base estimator/penalty parameter/hidden layers)	Key parameter3 (loss function/division criterion)
AdaBoost	DecsionTreeClassifier	50	None
GBDT	CART regression tree	100	Cross entropy
Random Forests	DecisionTreeClassifier	100	Gini impurity
SVM	Radial Basis Function	1.0	None
MLPC	Adam	(128, 64, 32)	Cross entropy

Algorithm 1. Classical machine learning (stacking method).

Input: traindata,trainlabel,testdata

Output: newfeatures 1: for feature in traindata,testdata: 2: total_feature = pe_feature + opcode_featrue + str_feature 3: return total_feature 4: split traindata to five parts 5: for model in models: #models[AdaBoost, GradientBoosting, MLPC, RandomForest, SVM] 6: data_list = [data0,data1,data2,data3,data4]: 7: for data_i in data_list: 8: model.fit(only except data_i,label) 9: pred_triandata = model.predict(data_i) 10: pred_testdata = model.predict(testdata) 11: model_traindata = pred_traindata0+...+pred_traindata4 12: model_testdata = average(pred_testdata0+...+predtestdata4) 13: newtraindata = concat(model_traindata) 14: newtestdata = concat(model_testdata) 15: return newtraindata,newtestdata

3.2. Quantum machine learning

The quantum machine learning layer mainly contains the processes of data quantum state coding, quantum circuit used for quantum classifiers, augmented quantum classifiers, and prediction result aggregation. Firstly, data quantum state coding, which uses common amplitude coding and quantum coding circuits 14 proposed in [45] to realize the data output in quantum states. Secondly, the quantum circuit used for quantum classifiers like QSVM, VQC, and QKNN. Thirdly, augmented quantum classifiers, which adopts the parameter optimization method to optimize the parameters for the three kinds of quantum classifiers, and uses the bagging method in ensemble learning to achieve integrated enhancement of the classification performance of QSVM and VQC. Finally, Prediction result aggregation, the prediction results of the above three enhanced quantum classifiers are aggregated using a voting strategy as the final prediction result.

3.2.1. Data quantum state coding

Quantum state coding is the transformation of classical data into quantum state data for use in quantum classifiers. QKNN model requires the test set to be compared with the entire training set for result prediction, which has lots of data computation. Before running the QKNN model, amplitude coding is used in this paper to minimize the sub-bit overhead.

Before running the QSVM and VQC models, the quantum coding circuit 14 proposed in [45] is used to encode the classical data to output the quantum state through the quantum circuit, which adopts the circuit block structure, and the entanglement method adopts the neighboring bit entanglement with the addition of the first and the last long-distance entanglement, and this long-distance entanglement is shifted with each shift of the circuit block by one step, which is demonstrated by the experiment that the circuit structure. The quantum state coding circuit is shown in figure 3, and we take 5 quantum bits as an example. Experiments show that this circuit structure has good data expression and entanglement ability, and is a better quantum state coding circuit in the NISQ era.

**Figure 3.** The quantum state coding circuit 14 proposed in [45], we take 5 quantum bits as an example.
Download figure:
Standard image High-resolution image

3.2.2. The quantum circuit used for VQC, QKNN, QSVM

The quantum circuit used for VQC is shown in figure 4, and we take 5 quantum bits as an example. The quantum circuit is mainly composed of RY and CNOT quantum gates, the depth of quantum circuit is 6 and the number of trainable parameters is 10. Also, the quantum circuit of QKNN is based on the quantum circuit extension of figure 1 (The polar distance quantum circuit), and the depth is 21. The QSVM uses the quantum circuit is same to the IBM qiskit's QSVM which is already implemented in the IBM qiskit, and the depth is 68.

3.2.3. Enhancement of quantum classifiers

The classification performance of quantum classifiers is not good enough in the current NISQ era. There are several methods or strategies to enhance weak classifiers in classical machine learning. In this paper, the parameter optimization method and bagging method are introduced into the learning and training of quantum classifiers to enhance the classification performance of quantum classifiers.

Parameter optimization is mainly to obtain the parameter combination that makes the quantum classifier performance optimal through the parameter optimization method to enhance the quantum classifier. In this paper, metrics such as accuracy and precision are selected for parameter optimization of quantum classifiers. For QSVM, the quantum kernel is fixed, and the classical hyperparameter optimization Bayesian parameter optimization is introduced to optimize the penalty parameter c and gamma parameter.

For QKNN, VQC, experimental validation in a small-scale dataset is used to get optimal parameter combinations. The optimization parameters of VQC include: loss function and optimizer. The optimization parameters of QKNN include: k-value, and the weight parameter w (the weighting percentage of the cosine distance and module length distance). Algorithm 2-1 demonstrates the parameter optimization process for the quantum machine learning layer.

Algorithm 2-1. Enhanced quantum classifier.

Input: models, traindata,labels,parameter_set

Output: models(best parameters) 1: for model in models: 2: If model is QSVM: by BayesSearchCV 3: model:parameter paremeter_set 4: BayessearchCV: model 5: model.fit(data,label) 6: compute(Accuracy) 7: Accuracy:sorted 8: else: 9: for para1 in list[parameter1_values]: 10: for para2 in list[parameter2_values]: 11: model(QKNN,VQC) 12: model.fit(data,label) 13: compute(Accuracy) 14: Accuracy:sorted 15: return model_best_parameters

Bagging method mainly further enhances the parameter-enhanced QSVM, VQC by utilizing the bagging integration strategy. Taking QSVM as an example, by constructing ten parameter optimizer parameter-enhanced QSVMs, the training set is split into 10 groups, which are respectively inputted into the ten QSVMs above for learning and training. Finally, the test set data are separately trained ten QSVMs to get ten prediction results, which are voted to generate the prediction results of BaggingQSVM.

For the QKNN model, the parameters obtained in the parameter optimization stage are directly applied to construct the classifier, and the dataset is fed into the model for classification to obtain the prediction as the predictions of the quantum machine learning layer. Algorithm 2-2 demonstrates the bagging ensemble learning process in the quantum machine learning layer.

Algorithm 2-2. Bagging quantum classifier.
Input: ParameterEnhanceQC,traindata,testdata
Output: Predicts 1: for model in ParameterEnhaceQC: 2: If model is QSVM or VQC: by BaggingEnsemble 3: traindata split to ten subtrainset 4: model train subtrainset 5: model predict testdata 6: voting(predicts) 7: QSVM_pred,VQC_pred 8: else: 9: QKNN train traindata 10: QKNN predict 11: QKNN_pred 12: return QSVM_pred,VQC_pred,QKNN_pred

3.2.4. Results aggregation

After Stacking ensemble learning, the new features of the dataset are obtained, and the corresponding prediction results can be obtained by inputting the quantum classifier enhanced by Bayesian optimization strategy and bagging method. In result aggregation, we use the voting method to integrate the predicted results. Finally, we vote the predict the result of QSVM, VQC, and QKNN enhanced by parameter optimization and Bagging method as the final prediction result.

4. Experiments and discussions

4.1. DataCon datasets

A malicious code dataset derived from a big data security analysis competition, co-sponsored by QIANXIN GROUP and Tsinghua University [46]. The dataset comes from a large number of mining and non-mining malicious codes captured from the existing network every day. In our experiment, we selected 2000 samples from the open dataset, contains 1000 mining malicious code samples and 1000 non-mining malicious code samples were selected for testing in the dataset. The experimental setup follows the conventional 7:3 ratio for training data/test data. In this paper, python, IBM qiskit, PyQpanda, and TensorFlow Quantum are used to build the experimental environment for model evaluation.

In this paper, four commonly used metrics such as accuracy, precision, recall and F1-score of binary classification task species are selected for model evaluation. The specific formulas for the four metrics are shown in equations (5)–(8). Among them, TP indicates that positive samples are correctly classified as positive samples, FP indicates that negative samples are incorrectly classified as positive samples, FN indicates that positive samples are incorrectly classified as negative samples, and TN indicates that negative samples are correctly classified as negative samples

$\begin{align}{\text{Accuracy}} & = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}\end{align} \tag{ 5 }$

$\begin{align}{\text{Precision}} & = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}\end{align} \tag{ 6 }$

$\begin{align}{\text{Recall }} & = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}\end{align} \tag{ 7 }$

$\begin{align}{\text{F1}} & = 2{{*}}\frac{{{\text{Precision}}*{\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}.\end{align} \tag{ 8 }$

4.2. The process of model enhancement realization

In conducting the model enhancement experiments, the range of parameter values and optimal combinations of quantum classifiers (QSVM, QKNN, VQC) are shown in table 3. To enhance the QSVM model, the set of values [0.1,0.5,1,5,10,100] is selected for penalty parameter c, and the set of values [0.01,0.05,0.1,0.5,1,3,5] is selected for the gamma parameter. To enhance the VQC model, L1Loss, L2Loss and CrossEntropyLoss are selected for loss function, COYBYLA, P_BFGS, SPSA, etc are selected for optimizers. To enhance the QKNN model, the set of integers [3,4,5,6,7,8,9] is selected for k-value, and the set of fractions (0,1) is selected for the weight parameter w.

Table 3. Quantum classifier and parameter enhanced result.

Model	Parameter	Parameter_valuesets	Best parameter
QSVM	C	[0.1, 0.5, 1, 5, 10, 100]	0.1
QSVM	gamma	[0.01, 0.05, 0.1, 0.5, 1, 3, 5]	3.0
VQC	loss	L1Loss, L2Loss, CrossEntropyLoss	CrossEntropyLoss
VQC	optimizer	COBYLA, P_BFGS, L_BFGS_B, GLSLS, SLSQP, SPSA	COYBYLA
QKNN	k	Int[3,...,9]	6
QKNN	w	Real(0,1)	0.38

4.3. Experimental results and analysis

In order to verify the performance of the model in this paper, we select QSVM and VQC of IBM qiskit, QKNN in [39] and two implementations in [26]. (named HQC_TTN and HQC_MERA), as well as DecisionTreeClassifier in classical machine learning as the comparison models to carry out the experiments. In this paper, four metrics such as accuracy, precision, recall, and F1-score are selected for model evaluation. The experimental results are shown in table 4. For the precision (accuracy, F1-score) rate, our model reaches 98.9% (94.5%, 94.24%), which is the highest among the seven models. Compared with the second best model, the precision (accuracy, F1-score) rate improves nearly 3% (0.5%, 0.24%). However, for the recall rate, our model is relatively low and not satisfactory enough.

Table 4. The experiment results.

Models	Accuracy	Precision	Recall	F1-score
QSVM_qiskit	0.925	0.9022	0.9533	0.9271
VQC_qiskit	0.915	0.9164	0.9133	0.9149
QKNN [39]	0.91	0.8994	0.9233	0.9112
HQC_TTN [26]	0.93	0.9607	0.8967	0.9276
HQC_MERA [26]	0.94	0.9342	0.9404	0.94
DecisionTreeClassifier	0.935	0.9065	0.97	0.9150
OurModel	0.9450	0.9890	0.9000	0.9424

In the NISQ era, the numbers of quantum bits, trainable parameters as well as quantum layers are important parameters in resource analysis. In this paper, we counted the above three resources required for all models in the experiment in table 5.

Table 5. Statistical analysis of resources.

Models	Qubits	Trainable parameters	qlayers
QSVM_qiskit	5	None	68
VQC_qiskit	5	10	6
QKNN [39]	15	None	18
HQC_TTN [26]	8	14	14
HQC_MERA [26]	8	22	22
DecisionTreeClassifier	None	None	None
OurModel	10	10	68

As shown in table 5, the resource requirements for our model is mediocre. Both the numbers of quantum bits and trainable parameters of OurModel are 10. The number of quantum layers is 68, which is the same as QSVM but higher than other models.

Overall, our model is higher than the comparison models (contains 5 quantum classifiers and 1 classical machine learning) in three metrics, including accuracy, precision, and F1-score, in which precision rate is improved by nearly 3 percentage points, which achieves better results and is important for the efficient and precise identification of malicious code. At the same time, compared to other models, our model has relatively low resource requirements. In particular, combined with the acceleration of quantum computing, our model is able to cope with the growing trend of the scale of malicious code and prevent the threats brought by malicious code more effectively, which is of great significance to the security of cyberspace.

In the experiments, we use multi-feature fusion means to ensure the richness and effectiveness of features, such as OPCODE sequences, String Statistical features and PE file structure. At the same time, Stacking method is used to integrate 5 kinds of heterogeneous learning models to ensure the effectiveness of the features while downgrading the data. The experimental results also illustrate that the Bayesian parameter optimization and bagging ensemble learning in classical machine learning are also used for quantum classifiers with good results.

In order to further display the experimental result, we display the four metrics under accuracy, precision, recall, and F1-score as bar charts of QSVM_qiskit, VQC_qiskit, QKNN, HQC_TTN, HQC_MERA, DTClassifier (DecsionTreeClassifier), and OurModel (the model in this paper). Figure 5 shows the effect of the above seven models in the form of bar charts.

**Figure 5.** The accuracy, precision, recall, and F1-score of QSVM_qiskit, VQC_qiskit, QKNN [39], HQC_TTN [26], HQC_MERA [26], DTClassifier, and OurModel in the form of bar charts.
Download figure:
Standard image High-resolution image

5. Conclusion and future work

In this paper, a novel hybrid classical and quantum ensemble learning model is designed, incorporating multiple optimization methods of classical machine learning, such as stacking, bagging and Bayesian parameter optimization. In the classical machine learning layer, heterogeneous learning models such as AdaBoost, GBDT, MLPC, and Random Forests are selected, and Stacking method is used for model training. The quantum machine learning layer adopts a variety of quantum state encoding methods, explores the application of classical optimization methods such as Bayesian optimization method and bagging method to quantum weak classifiers, and conducts experimental validation in the field of malicious code detection. The experimental result shows that the method can run effectively, and the accuracy, precision, and F1-score in malicious code binary classification are significantly improved compared with the unoptimized QSVM, QKNN, and VQC. The model still has some shortcomings, it does not implement Bayesian optimization for QKNN and VQC, and bagging method for QKNN, and the common method of probabilistic voting in soft voting can be considered for result aggregation.

Further research directions: It is exciting to look for ways to enhance the performance of quantum weak classifiers by using quantum optimization method or quantum computation, such as quantum genetic algorithms, quantum particle swarm optimization method, and so on. In ensemble learning, to realize the aggregation of predicted results, commonly used and probabilistic voting, as VQC and other quantum classifiers have not yet been designed probabilistic value output, can design some kind of measurement for probabilistic result output, in order to obtain more forms of results, which can deeply expand the application tasks of quantum machine learning.

Acknowledgments

This work is supported by the Major Science and Technology Projects in Henan Province, China, Grant No. 221100210600.

Data availability statement

No new data were created or analyzed in this study.

A hybrid quantum ensemble learning model for malicious code detection

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction