1 INTRODUCTION

Many researchers have worked on industrial wastes and hazardous substances for the environment and human health. The effects of hazardous substances are seen in many areas such as health, safety, military and industry. Liquids, which are readily available in everyday life, threaten human and environmental safety in another way as well. They are especially preferred for terrorist attacks in places like airports, train stations, transportation points, political rallies, shopping malls, concerts and other cultural activities where there are thousands of people. Therefore, detection of hazardous liquids must be done in order to prevent these attacks.

In recent years due to the increase in the number of terrorist attacks, some researchers have examined ways to detect hazardous substances and illegal objects, and have analyzed the existing systems, related techniques, their advantages and limitations. In this way, a vision of what can be done to prevent these attack threats has been created [1]. The main focus of these researches has been the development of systems that can detect the explosive automatically without the intervention of an operator. Accordingly, in the last few years, significant progress has been made in the development of X-ray imaging systems in the detection of explosives. As well as X-ray imaging systems, the use of nuclear quadrupole resonance (NQR) for explosive detection has been heavily investigated [2, 3]. NQR is a spectroscopic technique that can detect explosives with high chemical specificity [2]. Nuclear magnetic resonance (NMR) method was used to investigate and classify the liquid contents in closed nonmetallic containers [4]. For the detection of liquid explosives, Ultra low field magnetic resonance imaging technique has been proposed, too [5].

In the literature, the use of different techniques including nuclear magnetic resonance and X-ray has been proposed to detect explosives [6, 7]. Among these techniques the most commonly used one is X-ray systems [7]. X-ray systems have also been proposed to analyze unknown solid samples that may contain explosives and analyze peroxide-based explosives [8]. As well as nuclear magnetic resonance and X-ray, liquid detection and identification can be performed using THz time-domain spectroscopy [9]. However, although these approaches are very easy to perceive certain peroxide compounds, they cannot distinguish many types of liquids used in daily life. Therefore, there is a need for a system to distinguish these liquids [6].

Microwave measurement methods are widely used for various purposes in several applications related to industry and safety. For instance, these methods have been applied to reduce the environmental impacts of industrial wastes and hazardous materials and satisfactory results have been obtained [10, 11]. They have been also used for sludge stabilization [11]. In addition, the use of coaxial probe measurement techniques which is one of the microwave measurement methods has been proposed to find solutions to biofilm defects and wall thinning problems [12]. The propagation of microwaves in liquids is quite different than their propagation in air. Moreover, both frequency-dependent velocities and attenuations of microwaves vary from liquid to liquid, depending on the molecular composition of the liquid. As it is known, the complex permeability and reflection and transmission coefficients of liquids are different. Microwave and millimeter wave frequency bands can be used to determine the complex permeability, reflection and transmission coefficients of both solids and liquids [13]. A formula model optimized with artificial bee colony (ABC) algorithm is presented to calculate the relative permeability of the materials [14]. They can also be used to determine other properties such as chemical concentration, bio-content, and moisture content [15]. These properties can be used to characterize liquids. Material characterization is not only important in safety related applications but also in food, medical, bioengineering, construction, medical and military related researches and applications [15, 16]. It has also been used to calculate the permeability of liquids, the reflection coefficient, S11, and the transmission coefficient, S21 [17, 18]. Although a vector network analyzer can provide measurement of phase and magnitude in wide microwave frequency range, it is very expensive. Therefore, some researchers prefer simulation based-studies [16].

In the last decade machine learning techniques have been used for different purposes such as predicting compressive strength of concrete [19], diagnosing cancer and Thyroid diseases [20, 21], classifying drugs according to their milk/plasma concentrations [22], automatically classifying good and defective agricultural products and raw materials such as rice, coffee and green tea [23], classifying gasoline [24], and estimating the botanical and geographical origin of honey [24]. Different from the other uses and purposes of machine learning techniques, in this study, different machine learning algorithms are used to classify liquids based on S parameter measurements. The remainder of this paper is as follows. Methodology and experimental setup used in this paper is introduced in the second section. The classification algorithms used in this paper and metrics used in the performance evaluation are introduced in this section, too. The third section presents the results of the performance evaluation study. Finally, this paper is concluded in the fourth section.

2 EXPERIMENTAL SETUP AND METHODOLOGY

Different measurement techniques can be used obtain the dielectric properties of materials. Material state (gas, liquid, or solid), frequency range and temperature (high or low) are important factors in selecting the most appropriate measurement method [26]. In coaxial probe method electromagnetic wave penetrates into the liquid with minimum reflection [27]. Although coaxial probe method can be used for liquid measurements, it is generally not practical and sometimes dangerous to dip something into some hazardous liquids or even open the lid. On the other hand, the noncontact measurement platform used in this study allows measuring without opening the lid of the liquid and immersing it in the liquid. The experimental setup used in this study for liquid classification using microwave patch antenna is shown in Fig. 1. It consists of a microwave circular patch antenna design connected to a vector network analyzer in order to measure of the reflection coefficient of electromagnetic wave. To build the experimental setup, an antenna with a resonant frequency of 1.5 GHz was designed. The design was constructed on a FR4 based dielectric substrate with 1.6 mm height, 4.4 relative permittivity and 10 × 10 cm2 ground plane beneath it. The antenna is feed by 50 Ohm SMA (SubMiniature version A) feed probe. The geometry of the antenna is illustrated in Fig. 2 and the photos of the antenna are shown in Fig. 3.

Fig. 1.
figure 1

The experimental setup for liquid classification.

Fig. 2.
figure 2

The geometry of the antenna.

Fig. 3.
figure 3

The front and back views of the antenna.

The antenna diameter is calculated using the equation (1), (2).

$$F = \frac{{8.791 \times {{{10}}^{9}}}}{{{{f}_{r}}\sqrt {{{\varepsilon }_{r}}} }},$$
(1)
$$a = \frac{F}{{\left\{ {1 + \frac{{2h}}{{\pi {{\varepsilon }_{r}}F{{{\left[ {\ln \left( {\frac{{\pi F}}{{2h}}} \right) + 1.7726} \right]}}^{{1/2}}}}}} \right\}}},$$
(2)

where εr  is relative permittivity of the substrate, fr  is the resonant frequency, h is the height of the substrate, and a is the radius of the patch.

The following was done to handle the overall process. The electromagnetic wave reflection coefficient of the liquids was measured by keeping a distance of approximately 5 mm between the antenna and the bottle. Then, a database from the values of each liquid was created. The data set in this database was later on used for liquid classification. The entire data set in the database was used when classifying liquids. Thus, the success of the algorithms in the classification of liquids found in the database was tested in the classification process. In order to test the success of the algorithms, 10 times cross validation and 5 times cross validation were performed. Then, the performance of the classification algorithms was analyzed using different evaluation metrics. The methodology described here is illustrated in Fig. 4.

Fig. 4.
figure 4

Proposed methodology.

2.1 Classification Algorithms and Performance Metrics

Machine learning is used to create a model from existing data using mathematical and statistical methods and to determine which class a new incoming data belongs to as accurately as possible using this model. In this study, naive Bayes, linear discriminant analysis (LDA), qualitative data analysis (QDA), support vector machine (SVM), sequential minimal optimization (SMO), and K-nearest neighbors (KNN) were used as classifiers. In order to evaluate the performances of each classifier, confusion matrices were created.

K-fold cross validation technique has been preferred for the performance evaluation of the proposed system and classification algorithms. K-fold cross validation technique divides the data set into training and test sets in order to avoid possible overfitting and to understand how the model performs on a set of data that it has not seen before. Because in the overfitting problem, the model gives good results on the data set worked on, but makes unsuccessful predictions on new data sets that it has never seen. K-fold cross validation technique divides the training data set into random k segments. k – 1 is used for training, 1 part is used for the test set and k is repeated this time. The values obtained in each round are summed up; and the performance of the model is evaluated. K number is usually 10 or 5, as in this study. Several metrics should be used to evaluate how well a classifier performs at the end of the classification process. In this study, Kappa, RMS, confusion matrix and accuracy are used to evaluate the performance of the classification algorithms.

Kappa. This value is used to measure the consistency between predicted and observed classifications on a group of data. The calculation of Kappa value is given in (5). P(a) indicates the accuracy of the classifier, and P(e) is the weighted average of the expected accuracy of the classifier making random estimates on the same dataset. Kappa value is between –1 and 1. –1 indicates a complete mismatch, i.e. an inverse relationship, and 1 indicates a perfect fit. The closer the value is to 1, the greater the fit, and the smaller the distance. The interpretation of Kappa value is listed in Table 1.

$$K = \frac{{P(a) - P(e)}}{{1 - P(e)}}.$$
(3)
Table 1.   Kappa value

Root mean squared error (RMS). It is used to scale the differences between the actual and predicted values. It is determined by taking the square root of the mean square error as given in (4). P represents the estimated values and a represents the real values. As the RMS value approaches zero, the correct estimate of the classifier increases.

$$RMS = \sqrt {\frac{{{{{({{P}_{1}} - {{a}_{1}})}}^{2}} + \ldots + {{{({{P}_{n}} - {{a}_{{1n}}})}}^{2}}}}{n}} .$$
(4)

Confusion matrix. A confusion matrix contains information about actual and predicted groups made by a classification system. The diagonal elements of the matrix give the correct number of classified objects.

Accuracy. The most popular and simple method used to measure model performance is model accuracy. The accuracy given in (5) gives the number of samples correctly classified from all the samples.

$${\text{Accuracy}} = \frac{{{\text{true positive}} + {\text{true negative}}}}{{{\text{number of instances}}}}.$$
(5)

3 PERFORMANCE EVALUATION

For performance evaluation, the experimental setup described in Section 2 was used to classify a set of 36 liquids. Table 2 lists the set of 36 liquids used by the experimental setup of this study, 12 of these liquids are hazardous ones and 24 of these liquids are nonhazardous ones used in daily life. These liquids include alcoholic beverages. In the measurements, 0.5 liter thin pet bottle which has low reflectance and frequently used in daily life is preferred. The amount of liquid to be analyzed is sufficient to be approximately 7 cm high in the pet bottle. For the consistency, at the same room temperature, the same bottle is used for all of the measurements. The results are shown in Fig. 5. In this study, as listed in Table 3, the health hazards and flammability properties of the liquids in the hazardous group are indicated with a rating of 0 to 4. Here 0 means no hazard and 4 means the highest. Health hazard applies to direct oral use or skin contact. High flammability of materials can recklessly result in starting a fire or causing an explosion which endangers human life.

Table 2.   Liquids used in this study
Fig. 5.
figure 5

Frequency-dependent reflection coefficient measurements of liquids.

Table 3. Properties of a set of selected hazardous liquids

When the confusion matrix of the naive Bayes algorithm is considered (see Table 4), it can be seen that hazardous liquids were correctly classified. In the table, the green areas indicate the correct number of liquids and the reds indicate the incorrect number of liquids. Particularly, in the classification when the entire training set was used in the classification, naive Bayes correctly classified all of the hazardous liquids but classified 6 of the nonhazardous liquids into hazardous groups. When cross validation process was applied, naive Bayes classified 1 hazardous liquid into nonhazardous group and classified 5 nonhazardous liquids as hazardous.

Table 4.   Confusion matrix–naive Bayes

When the entire training set was used, LDA correctly classified 12 hazardous liquids, while 5 of the nonhazardous liquids were incorrectly classified as hazardous. In the case of cross validation, 1 hazardous liquid was not correctly classified and 6 nonhazardous liquids were classified incorrectly (see Table 5). The confusion matrix of QDA algorithm (see Table 6) is quite similar to LDA algorithm. However, QDA is more stable than LDA algorithm because it provides the same results in the classification used both the training set and the cross validation process. The confusion matrix of SVM algorithm (see Table 7) indicates that SVM algorithm failed to form a model. SVM classified all of the liquids as nonhazardous. Compared to SVM, SMO algorithm obtained better results. In the training set, SMO correctly classified 11 of hazardous liquids and 23 of nonhazardous liquids. When cross-validation was performed for SMO, the number of correct classifications decreased (see Table 8). Among all of the classification algorithms KNN achieved the highest accuracy. KNN algorithm correctly classified all of the hazardous and nonhazardous liquids when all the training data was used in the classification process. On the other hand, when cross validation was performed, 10-fold cross validation resulted in incorrect classification for 1 of the liquids and 5-fold cross validation resulted in incorrect classification for 3 of the liquids (See Table 9).

Table 5.   Confusion matrix–LDA
Table 6.   Confusion matrix–QDA
Table 7.   Confusion matrix–SVM
Table 8.   Confusion matrix–SMO
Table 9.   Confusion matrix–KNN

Table 10 lists the accuracy, Kappa and RMS values of all the classification algorithms when all the training was used and 10-fold and 5-fold cross correlations were applied. Correctly and incorrectly classified instances of all of the classification algorithms are shown in Fig. 6. As can be seen in Fig. 5, KNN algorithm provided the highest number of correct predictions and the lowest number of incorrect predictions. When Table 10 is taken into consideration, it can be seen that SVM algorithm provided the lowest accuracy rate and highest RMS value. Naive Bayes, LDA and QDA algorithms obtained similar results. SMO algorithm obtained a high accuracy rate of 94.4% in the training data set, however when cross validation was applied its accuracy decreased. KNN algorithm obtained the highest accuracy in the training set even when cross validation application was applied. In addition, KNN algorithm obtained the lowest RMS compared to the others. The Kappa value of KNN algorithm was 1 for the training set and close to 1 when cross validations were applied. This confirms the success of KNN algorithm.

Table 10.   Comparison of all of the classification algorithms
Fig. 6.
figure 6

Average number of correctly and incorrectly classified samples according to the classification algorithms.

4 CONCLUSIONS

In recent years, the increase in the number of terrorist attacks using liquid explosives has necessitated the development of systems that can easily and effectively distinguish between the liquids that can be used in these explosives and nonhazardous liquids. In this study, a noncontact hazardous liquid detection approach has been proposed and the performance of the classification algorithms that could be used in the proposed approach has been evaluated. The novelty of the proposed approach is that while a classification is being made using the proposed approach, the cap of the bottle does not need to be opened or removed from the bottle. After a prototype system based on the proposed approach is developed, the proposed approach can be used in airports, shopping malls and other places. Due to the easy and quite fast detection process, the proposed approach will possibly not result in queuing and loss of time at security points. In addition to proposing a novel approach to detect hazardous liquids, in this study the performance of six different classification algorithms used to identify hazardous liquids has been analyzed in terms of accuracy and time requirement. As the results prove, KNN is the most appropriate classification algorithm for hazardous liquid detection.