Abstract

To automatically detect plastic gasket defects, a set of plastic gasket defect visual detection devices based on GoogLeNet Inception-V2 transfer learning was designed and established in this study. The GoogLeNet Inception-V2 deep convolutional neural network (DCNN) was adopted to extract and classify the defect features of plastic gaskets to solve the problem of their numerous surface defects and difficulty in extracting and classifying the features. Deep learning applications require a large amount of training data to avoid model overfitting, but there are few datasets of plastic gasket defects. To address this issue, data augmentation was applied to our dataset. Finally, the performance of the three convolutional neural networks was comprehensively compared. The results showed that the GoogLeNet Inception-V2 transfer learning model had a better performance in less time. It means it had higher accuracy, reliability, and efficiency on the dataset used in this paper.

1. Introduction

Compared with metal gaskets, plastic gaskets have excellent insulation, corrosion resistance, and nonmagnetic properties, and they are lightweight, which make them very common in semiconductor, automobile, and aerospace industries, interior decoration, and other fields. However, in the production process, it leads to defective products because of immature processes or imperfect production conditions. To ensure the quality of their products, the manufacturer needs to sort out the plastic gaskets and remove unqualified products. Traditional plastic gasket sorting is mainly performed by human eyes. Artificial visual observation and sorting not only have low detection efficiency and high labor intensity but are also prone to be affected by the subjective factors of the testing personnel, resulting in false detection and missing detection. Moreover, long-term manual detection will cause certain damage to vision. In addition, using a manual method is not conducive to database management and storage.

Currently, noncontact detection is widely adopted with great advances in imaging technology [1]. Surface defect detection technology has been used in pharmaceutical packaging, textile, leather, steel plate surface, and machine part testing. Convolutional neural networks (CNNs) have already been widely used in image recognition and classification because they can effectively extract data features [2]. In this respect, it is a bit like a soft sensor [35]. However, it becomes a challenge to apply convolutional neural network to directly detect and classify plastic gaskets for the wide variety of plastic gasket defects, as shown in Figure 1. To solve this problem, a pretrained network is applied to our dataset which is composed of two types of plastic gaskets, namely NG (the defective product) and OK (the undefective product) based on the known technology transfer learning [6, 7]. To meet the industrial requirements and realize the automatic detection of the plastic gasket, one of the first popular pretrained networks, “GoogLeNet Inception-V2 [8],” was applied. In this work, data augmentation [9] was applied to our data (Section 5.1) to generate new images from the original pictures. The hyperparameters of this model were then modified to fit our data. The use of transfer learning also saves a lot of model training time and reduces the probability of model overfitting due to the few datasets. In this paper, a GoogLeNet [10] classification network for plastic gasket images is proposed (Section 4).

In the following section, the current research status of combining convolutional neural networks with various disciplines to solve problems in the corresponding fields will be discussed. Then, the design of the defect detection device is introduced in Section 3. In Section 4, the software research methods used in this study are described. In Section 5, the experimental process is described, the experimental results are analyzed, and the GoogLeNet Inception-V3 [11] and MobileNet-V1 [12] are compared with the chosen deep convolutional neural network (DCNN). Finally, Section 6 provides the conclusion.

Research on image classification using the DCNN to solve specific needs has been prominent [1321], as shown in Table 1. In Zhou et al.’s paper [15], a visual perception technology (VPT) framework based on deep learning was proposed, which relied on the image preprocessing (IP) scheme and the DCNN WR-IPDCNN. The framework was based on the improved DCNN [22] of LeNet-5 [23] to mine the newly established WR dataset, which significantly improved the automation and intelligence level of steel wire surface damage detection. The work reported in Apostolopoulos et al.’s paper [16] implemented VGG-19 [24], MobileNet-V2 [25], Inception [10], Xception [26], and Inception ResNet-V2 [27] as a pretrained CNN [28] to detect COVID-19 from X-ray images. Using COVID-19, bacterial pneumonia, viral pneumonia, and health status images as datasets, the pretrained DCNN was applied to case dichroism and triad classification. In Alqahtani et al.’s paper [17], an autonomous method for the detection and classification of fatigue crack damage and the associated risk assessment of machinery components that are often made of ductile materials was proposed. The underlying algorithms in the proposed method were built upon the concept of the DCNN, where the execution time is much less than the time in visual inspection, and the detection and classification process is expected to be significantly less error-prone than that of visual inspection. In Fujioka et al.’s paper [18], the DCNN was used to distinguish between benign and malignant lesions on the maximum intensity projection of dynamic contrast-enhanced breast magnetic resonance imaging (MRI), and the model showed comparable diagnostic performance. In Alencastre-Miranda et al.’s paper [19], computer vision and deep learning networks were used to select and plant healthy billets, which increased the plant population and yield per hectare of sugarcane planting. In this study, the DCNN and transfer learning were used to process image datasets to extend the results to different sugarcane varieties.

The aforementioned studies proved that using the DCNN for feature extraction and classification can be an effective method that can be applied to different production and living needs. In this study, based on machine vision technology, DCNN transfer learning was used to identify and classify plastic gasket surface defects to produce automatic and intelligent detection of the product.

3. Design of the Detection Device

The front and left views of the plastic gasket are shown in Figure 2. Plastic gasket products are prone to defects such as scratches, cracks, unfilled corner, and pits. They are different in form and have no obvious distribution. To improve the detection efficiency and accuracy of the defect characteristics and determine the actual detection requirements of the products, a set of software and hardware devices for automatic online detection and sorting of plastic gaskets based on the machine vision was designed.

The device was composed of a hardware and software part. The hardware part was divided into the main test bench and subtest bench. The details of the device are presented in Figure 3 and Table 2. The software part mainly includes the software platform, image acquisition, and image processing algorithms.

The lower end face of the gasket was taken as the detection object, and the detection method of the upper end face was the same, so it was not repeated. Because the image features collected from defective gaskets were numerous and inconsistent, GoogLeNet [10] transfer learning was used for image feature recognition and sorting to meet the detection requirements. The entire process of plastic gasket defect detection and classification is shown in Figure 4.

4. Gasket Defect Classification Based on Transfer Learning

4.1. Problem Statement

The original dataset used in this study is shown in Figure 1. It was characterized by a wide variety of defects, and some of them were difficult to distinguish from normal products. Therefore, the accuracy and efficiency of the original manual detection method were extremely low. However, deep learning can easily solve this problem by automatically extracting features for image recognition and classification. In addition, the transfer learning method can greatly shorten the training model’s time, make the model converge faster, and improve efficiency. However, deep learning requires the use of a large number of datasets for training, and because of the limitations of conditions, the datasets were insufficient. Therefore, we used the expanded the dataset (Section 5.1). In addition, the amount of data in the expanded dataset was still not enough. To prevent network overfitting caused by this, the parameter of the dropout layer was set to 0.5.

4.2. Overview of the GoogLeNet Inception-V2 Structure

GoogLeNet is a new deep learning structure proposed by Szegedy in 2014 [10]. Prior to this, AlexNet [29], VGG-19, and other structures all increased the depth of the network to achieve better training, but the increased number of layers brought many downsides, such as overfit, gradient disappearance, and gradient explosion. The Inception model is proposed to improve training results from a different perspective, namely the ability to use computational resources more efficiently and extract more features with the same amount of computation, thereby improving training results. GoogLeNet Inception is currently available in four versions, the classic being Inception-V2. Inception-V2 was derived from the original Inception-V1, which changed the convolution from a size of 5 × 5 in the third branch to two convolutions of sizes 3 × 3, as shown in Figure 5. Owing to the introduction of batch normalization (BN), the network can set a higher learning rate and make the model converge faster. Furthermore, the training samples can be more thoroughly disturbed so that the network can achieve a higher verification accuracy.

4.3. Pretraining and Fine-Tuning Learning

Transfer learning computer vision technology [30] is widely used for it can accelerate the convergence of models. Using migration parameters to initialize the network can improve the generalization performance, even if the target task has been significantly adjusted. Many studies have demonstrated the feasibility of the approach [31]. They showed how to transfer the trained model parameters to the new model to help with the new model training. In this study, the initial weight parameters came from the pretrained GoogLeNet network model, and we only changed the last layer such that it corresponded to our label number. In addition, although the softmax cross-entropy loss function degenerates into a sigmoid cross-entropy loss function in a binary classification problem, the two are completely equivalent mathematically. However, in various materials used in our network model training process, we found that the effect of the sigmoid cross-entropy loss function was better than that of the softmax cross-entropy loss function. Therefore, we chose the sigmoid cross-entropy loss function for training and chose the sigmoid function for the output layer. Then, depending on the network training situation, we adjusted the choice of the optimizer (Section 4.4). We chose a learning rate of 0.01, which remained unchanged in subsequent iterations. The batch size selected in the training process was 16, and the maximum number of iterations was 20 000, which was 320 epochs.

4.4. Choice of Optimizer

The goal of the DCNN training process is to minimize the loss value, and after we define the loss function, the optimizer becomes useful. There are many different types of optimizers that can be chosen. This study focused on comparing the two types of optimizers, namely the gradient descent method and the momentum optimization method. The mathematical formulas for the two optimizers are as follows:(1)Gradient descent:(2)Momentum:

By comparing the formulas, it can be found that for the GDA optimizer, the update adjustment of model parameters is related to the gradient of the loss function on the model parameters. Therefore, the training speed of the model using this optimizer is relatively slow, and it is easy for the model to fall into the local optimal solution. The momentum optimization algorithm introduces momentum-accumulating historical gradient information, which can solve the aforementioned problems adequately. In the actual training process, the model did occasionally fall into the local optimal solution, resulting in a low accuracy and a loss value that could not be reduced. Finally, using the momentum optimization method, the model jumped out of the local optimal solution. The momentum was set to 0.9.

4.5. Indicators for Model Evaluation

The seven indicators used to evaluate the performance of the DCNN were accuracy (), sensitivity (), specificity (), score, receiver operating characteristic (ROC) curve, precision-recall (PR) curve, and area under curve () values [32].

The accuracy of classification () is defined as follows:where refers to the number of correctly classified defective products, refers to the number of incorrectly labeled nondefective products, refers to the number of correctly classified nondefective products, refers to the number of incorrectly labeled defective products, refers to the number of defective products, and refers to the number of nondefective products.

Sensitivity () is defined as the percentage of the product correctly identified as defective and is expressed as follows:

Specificity () is defined as the percentage of a nondefective product that is correctly classified as nondefective.

score is defined as the harmonic mean of precision and sensitivity:where the precision rate is defined as the proportion of correctly classified defective products to all defective products divided by the classifier, and the recall rate is defined as the proportion of correctly classified defective products to the number of defective products.

ROC is a curve connected by points of the false-positive rate () (7) and true-positive rate () (8). For the ROC curve, the closer the curve is to the upper left of the graph, the better the performance of the model is. is the area formed between the ROC curve and the horizontal axis. The larger the value, the better the model, and its optimal value is 1. The PR is a curve connected by the points of recall and precision. For the PR curve, the closer the curve is to the upper right of the graph, the better the performance of the model is. In general, when a model’s performance evaluation is better than that of another model under both ROC and PR curves, it can be proved that the performance of that model is better than that of the other models.

5. Experimental Results

5.1. Image Preprocessing

In deep learning, a large amount of data must be used for training to suppress model overfitting. Generally, geometric transformations can be used to enhance datasets. This paper uses the Python language and the “opencv” library to expand the dataset in Visual Studio. The main methods used were brightness enhancement, contrast enhancement, angle rotation, and image flipping, and the transformations are shown in Figure 6. Because some pictures lost their original features after having their brightness enhanced, they could not be used as training samples, so they were removed. In addition, because the picture contained many blank parts, it did not contain any useful information, and it increased the amount of calculation of the neural network and slowed down the training speed. This also caused the program to occupy too much video memory, which could easily cause video memory overflow and crashing of the program. When the image angle was rotated and transformed, black bars appeared on both the sides of the image, which could easily cause a neural network feature extraction error and reduce the accuracy of the neural network. Moreover, the length and width dimensions of the pictures were not consistent. Thus, when the subsequent picture size was modified, it could easily cause distortion of the picture. To solve these problems, all pictures were cropped. Indeed, in the actual training process, we found that the accuracy of the model in the validation set improved after data enhancement. Therefore, we concluded that when the dataset was too small, proper data enhancement could improve network performance and reduce overfitting. In total, there were 1666 processed images.

Because the input image channel of the neural network must be three channels, the size requirement was . Therefore, when reading a picture, a single-channel picture was copied to generate a three-channel RGB picture, and the picture size was modified to meet the network input picture size requirements. In addition, to prevent the neural network from overfitting, the accuracy of the neural network in the training set was high at the beginning. However, this led to another problem because the gradient could not fall. Therefore, the picture was randomly scrambled when making and reading TFRecord files.

5.2. Results and Discussion

The final number of pictures used in the experiment was 1666, including 1350 NG pictures and 316 OK pictures. They were divided into training, validation, and test sets (Table 3) with a ratio of 8 : 2:2. During training, verification, and testing, the computer configured was a 64-bit Windows 10 operating system, the GPU is NVIDIA GeForce GTX 1050 Ti, the deep learning framework was TensorFlow, and the programming language was Python.

In general, the network model parameters used for comparison were set according to those recommended in GoogLeNet Inception-V2. It can be seen from Figure 7 that when the three models reached the 1000th iteration, that is, the 16th epoch, the accuracy rate on the verification set reached approximately 80%. At the end of the training, the loss values converged to 0%, the accuracy rate on the training set converged to 100%, and the accuracy rate on the verification set reached approximately 90%. It can be seen that GoogLeNet Inception-V3 converged the fastest.

Figures 8(a)8(c) correspond to the confusion matrices of GoogLeNet Inception-V2, GoogLeNet Inception-V3, and MobileNet-V1, respectively. Figure 9(a) shows the ROC curves corresponding to the three DCNNs. It can be seen from the figure that the ROC curves of the three networks are projected to the upper left corner, among which GoogLeNet Inception-V2 is closest to the upper left corner of the graph and completely surrounds the ROC curves of GoogLeNet Inception-V3 and MobileNet-V1, whereas the ROC curves of GoogLeNet Inception-V3 and MobileNet-V1 intersect. Figure 9(b) shows the corresponding PR curves of the three DCNNs. It can be seen from the figure that the PR curves of the three networks project to the upper right corner of the graph, among which GoogLeNet Inception-V2 is closest to the upper right corner of the graph and completely surrounds the PR curves of GoogLeNet Inception-V3 and MobileNet-V1, whereas the PR curves of GoogLeNet Inception-V3 and MobileNet-V1 intersect.

Table 4 shows the accuracy, sensitivity, specificity, F1 score, indexes, network training time, and test time of recognition obtained by the three DCNNs. It can be seen from the table that in terms of accuracy, specificity, score, and , GoogLeNet Inception-V2 had the highest values, the accuracy was as high as 95.495%, and the specificity value reached 100%. The second was GoogLeNet Inception-V3, whereas MobileNet-V1 was always the lowest, even with a specificity value of only 44.444%. In terms of sensitivity, the MobileNet-V1 value reached the highest value, followed by GoogLeNet Inception-V2 and GoogLeNet Inception-V3. Owing to the application of transfer learning, all three network models converge in a very short time. MobileNet-V1 had the shortest identification time, followed by GoogLeNet Inception-V2 and GoogLeNet Inception-V3. The recognition time of GoogLeNet Inception-V2 for each image was 12.455 ms, which was only 7.795 ms longer than that of MobileNet-V1 and could fully meet the requirements of industrial production.

Based on the above analysis, it can be found that although GoogLeNet Inception-V3 is an improvement of GoogLeNet Inception-V2, such an improvement only reflects the advantage of faster convergence in our dataset because our dataset features were not complicated. In the overall performance evaluation, GoogLeNet Inception-V2 was better than the other two networks.

6. Conclusion and Future Work

In this study, a surface defect visual classification device for plastic gaskets was designed based on GoogLeNet Inception-V2 transfer learning. The results show that when the datasets are few, the training effect can still be achieved through the network model transfer learning. This proves that under the restriction of conditions, when sufficient training sets cannot be obtained, the mode of transfer learning can be used to solve the dilemma. Using the test set to compare the performance indexes of the three types of networks, it is shown that the GoogLeNet Inception-V2 network model had higher accuracy, reliability, and efficiency on the dataset used in this paper. This indicates that the device designed in this study can fully meet production needs and obtain higher profits for manufacturers in this industry.

Since our experimental results basically meet the current industrial production testing requirements, we will not perform extensive research on the defect detection of plastic gaskets. In the next stage, we plan to conduct research on the surface defects of the camshaft. In this study, we will classify each defect to achieve subsequent automatic repair and elimination. In future studies, CNNs can be integrated with different disciplines (such as environmental science and biomedical science) to solve key problems in corresponding disciplines and make such research more socially valuable.

Data Availability

All the picture data in this study were taken from the experimental equipment. The code of this paper is not available to the public.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this study.

Acknowledgments

This work was supported by the National University Students Science and Technology Innovation Project (201910345063X) and the Key Research and Development Program of Zhejiang Province under Grant 2020C01153 and Grant 2019C01134, and in part by the National Social Science Foundation of China under Grant 17BGL086.