Abstract

The defects of solar cell component (SCC) will affect the service life and power generation efficiency. In this paper, the defect images of SCC were taken by the photoluminescence (PL) method and processed by an advanced lightweight convolutional neural network (CNN). Firstly, in order to solve the high pixel SCC image detection, each silicon wafer image was segmented based on local difference extremum of edge projection (LDEEP). Secondly, in order to detect the defects with small size or weak edges in the silicon wafer, an improved lightweight CNN model with deep backbone feature extraction network structure was proposed, as the enhancing feature fusion layer and the three-scale feature prediction layer; the model provided more feature detail. The final experimental results showed that the improved model achieves a good balance between the detection accuracy and detection speed, with the mean average precision (mAP) reaching 87.55%, which was 6.78% higher than the original algorithm. Moreover, the detection speed reached 40 frames per second (fps), which meets requirements of precision and real-time detection. The detection method can better complete the defect detection task of SCC, which lays the foundation for automatic detection of SCC defects.

1. Introduction

Among all kinds of renewable energy, solar energy, as a kind of primary energy of renewable resources, is expected to become the fastest growing renewable energy with its obvious advantages such as clean, safe, and inexhaustible [1]. Solar cell component (SCC) is the key part of photovoltaic power generation system which converts solar energy into electric energy. The quality of the SCC directly affects the output power and service life of photoelectric conversion [2]. In the SCC production process, due to the influence of materials, processes, and human factors, the silicon wafers in SCC will inevitably generate various defects, such as cracks, scratches, and black spots. These defects not only reduce the yield of the SCC but also affect the service life and the efficiency of photoelectric conversion. Therefore, the defect detection of silicon wafer is critical for quality improvement of finished SCC.

At present, there are two defect detection methods of silicon wafers: physical inspection methods and visual inspection methods. The physical detection method relies on personal experience and involves more limitations for efficiency and precision, while the visual inspection method has advantages in accuracy, effectiveness, and stability and is widely used in quality detection [3]. For vision-based solar cell quality detection methods, the imaging schemes mainly include electroluminescence imaging (EL) and photoluminescence imaging (PL). EL technology needs to contact the solar cell for power-on detection, which may cause secondary damage to the cell by electric current, and also has a constrained detection efficiency [4], while the PL combines light sources with different wavelengths to irradiate, so the silicon ion transitions produce luminescence imaging without touching the solar cell. The PL method not only can image the surface and internal defects of the solar cell at the same time but also can detect the process sheet (nonfinished cell), which is more conducive to the quality control of the product. Therefore, PL technology has gradually become the main imaging technology for solar cell detection [5]. Figure 1 shows the processing of PL imaging.

For solar cell defect detection, Chen et al. [6] proposed a cell crack defect detection scheme based on structure perception. By designing the structure similarity measure (SSM) function, using the nonmaximum value suppression method to extract candidate crack defects, the proposed SSM function has stronger crack defect protrusion and suppression of randomly distributed grains. It is for further extraction of crack defects that provide efficient preparation. Experimental results showed that this method has a good effect on the detection of prominent cracks, but the parameters in SSM cannot be automatically selected, which results in limitations for other types of defect detection. Tsai et al. [7] proposed an automatic defect detection scheme based on haar-like feature extraction and a new clustering technology. In the training process, only images without defects are selected as training samples, and a simple distance threshold is automatically determined for each cluster. Experimental results showed that this method can effectively detect various defects in solar cells, but the algorithm has limitations on the size of the image to be detected. Chen et al. [8] designed a visual defect detection method based on multispectral deep convolutional neural network (CNN). By adjusting the depth and width of the model, the impact of model depth and kernel size on the recognition results was evaluated. Experimental results showed that the multispectral deep CNN model can effectively detect surface defects of solar cells, has higher accuracy and stronger adaptability to large-area defects, but has weak feature extraction capabilities for small-area defects and linear defects. Liu et al. [9] proposed a support vector machine algorithm with radial basis as the kernel function. This method used integral projection and gray barycentric algorithm to obtain the geometric characteristic parameters of solar cell defects and used these parameters as a support vector machine. After learning, the accuracy of the support vector machine to identify common defects was over 90%, but the real-time performance of the detection method needs to be improved. Wang et al. [10] proposed a solar cell surface defect detection algorithm based on deep learning (DL), which reconstructed the image through a built-up deep belief network, and compared the detection result with the real defect image to realize the defect detection of the image. Lichun et al. [11] proposed a solar cell surface quality detection method based on machine vision and artificial neural network in response to the low efficiency and accuracy of solar cell surface quality detection. For the tiny defect of broken grid, the correct recognition rate can reach 98.57% by training the regularized RBF classifier. Finally, the classifier was used in the defect detection system; however, the system cannot identify the internal defects such as hidden cracks and black spots.

The defects are small and blur in the solar cells, and general detection methods will cause false detection and missed detection. In the field of target detection, algorithms based on CNN have the characteristics of strong feature extraction ability and high accuracy. In recent years, they have emerged and been widely used [12]. On the whole, these algorithms can mainly be divided into two branches: two-stage algorithms and one-stage algorithms. The two-stage algorithms generate a series of candidate regions (Region Proposal) through the feature extraction network firstly [13] and then classify and regress the candidate regions on this basis. The R-CNN series of algorithms are typical representation of two-stage algorithms, such as R-CNN, Fast R-CNN, and Faster R-CNN [1416]. The R-CNN series have higher detection accuracy, but the detection speed is slower due to more network levels. The main idea of the one-stage algorithm is to directly classify and regress the target through the CNN and remove the candidate region step of the two-stage algorithm. Representative algorithms are YOLO [17], SSD [18], and so on. Compared with the two-stage algorithm, the one-stage algorithm has higher real-time performance with relatively lower accuracy. However, with the continuous development in recent years, the one-stage algorithm has a greater improvement in detection accuracy while taking into account the real-time performance, making it easier to meet the industrial application requirements. Therefore, the one-stage algorithm has gradually become the focus of current research. Li et al. [19] used a fully convolutional YOLO detection network to provide an end-to-end solution for strip steel surface defect detection. The network was used to evaluate six types of defects, and a 97.55% mapping rate and 95.86% recall rate were achieved. Qiu et al. [20] developed an autonomous visual detection system based on various defects of wind turbine blades, combined with CNN and YOLO models. In order to detect small-size defects, a small target detection method based on YOLO using multiscale feature pyramid fusion multilayer features is proposed. Experimental results showed that this method was superior to existing methods in detection accuracy and reliability with an average accuracy rate of 91.3%. Compared with other one-stage detection algorithms, the YOLO series has the characteristics of fast speed, simplicity and efficiency, high accuracy, and strong generalization ability [21]. As a simplified version of YOLOv3, YOLOv3-Tiny has a simple model structure, lower hardware requirements, and faster detection speed, meanwhile, introduces decreased detection accuracy and increased missed detection rate as the simplified backbone network [22]. Yi et al. [23] aimed at the problem that YOLOv3-Tiny real-time pedestrian detection often loses part of the detection accuracy. Deepen the YOLOv3-Tiny network, enhance the feature extraction ability of the target, and use k-means clustering on the training set to find the best prior box. Experimental results show that this method has high detection accuracy under the premise of meeting real-time performance.

In this paper, defect detection is carried out on the PL image of the SCC. The defect types include the crack type1, the crack type2, scratch, and black spot (as shown in Figure 2). In order to detect tiny defects in high-pixel SCC image, a local difference extremum of edge projection (LDEEP) method is proposed to segment and extract solar cell units in the SCC. As the small size of the defects and the background interference, common one-stage algorithm cannot reach production demands. In view of this, an improved lightweight convolutional neural network model was established. This model deepened the backbone feature extraction network structure with enhancing feature fusion layer, and third-scale prediction layer, thereby strengthening the model performance of feature extraction and small defect target detection. The experimental results proved that the lightweight convolutional neural network model proposed in this paper improves the accuracy of solar cell defect detection under the requirement of real-time performance. This method is suitable for the detection of the crack type1, the crack type2, scratch, and black spot of the solar cells and can complete the detection tasks of the above-mentioned defect types in the SCC. Figure 2 shows PL image of SCC and partial enlarged view of defect.

The rest of this article is as follows: Section 2 introduces the SCC unit segmentation. Section 3 introduces the defect recognition based on lightweight deep neural network. Section 4 introduces the experimental results and analysis. Finally, Section 5 provides the conclusion of this article.

2. Solar Cell Component Unit Segmentation

As the SCC images have more than 70 million pixels (about ), whereas defects usually account for small areas (about ) in a single silicon wafer (about ) of the SCC image (as shown in Figure 2). If the SCC image is detected directly, it will cause a large amount of calculation and time-consuming. Meanwhile, the defects occupy relatively small proportion, which is prone to miss detection for the common detection methods. A viable solution under the premise of ensuring efficiency is to extract each single silicon wafer and then transfer defect position to the whole coordinate of SCC image. Under this strategy, it is necessary to perform single silicon wafer segmentation firstly.

2.1. Local Extremum Neighborhood Difference Based on Edge Projection

From Figure 2, it can be seen that the SCC image (with silicons) has a black background. In order to obtain each silicon wafer region, the component region should be extracted firstly and segmented with a priori knowledge of layout ( or ). So minimum bounding rectangle (MBR) is adopted for the component region extraction as shown in Figure 3.

As shown in Figure 3, a whole gray SCC image is changed to binary image firstly to divide target and background, and then, the binary image is processed with the MBR to obtain the target region of the SCC silicon wafer. Hence, the position coordinates of the four corners A, B, C, and D of the rectangle are also determined. Assuming that the position coordinates of the first corner point A on the upper left are , then the coordinates of points B, C, and D are shown in Table 1.

In Table 1, ws and hs are the width and height of the image of the silicon wafer region, respectively. Then, segment and extract the SCC image area to obtain the silicon wafer.

As the silicon wafer is arranged on the moving board by a robot arm, there are prominently distributed horizontal and vertical lines. In order to extract the straight lines in the image to facilitate subsequent edge projection positioning, the Sobel edge detection is used to perform differential operations to highlight the changes of the boundary. The binary Sobel edge image (with between 0 and 6333 and between 0 and 3809, as shown in Figure 4) is projected on horizontal direction and vertical direction, respectively, and then scanned row by row or column by column to make the statistics of validating pixel points. By analyzing the peak value in the projected image, the position information of a single slice in the silicon wafer area image can be determined. The result of projection of binarized edge image is shown in Figure 5.

It can be seen from Figures 5(a) and 5(b) that the vertical and horizontal projection of the image can clearly reflect the distribution of the silicon wafer region and the edge position (as E1 shown in Figures 5(a) and 5(b)). Due to the existence of the laser hole (as shown in Figure 6) and the unevenness of the silicon wafer properties, there are also maintained interference pulses near the actual edges of the silicon wafer (E2 in Figures 5(a) and 5(b)) that will reduce the detection stability of the silicon wafer boundary.

Actually, the interference projection values have certain neighborhood approximation, and the boundary often appears sudden value. Many methods use adjacent difference to obtain the edge of the target [24, 25]. However, the fixed difference distance is difficult to determine and could weaken the value of the edge itself, especially with gradient. Hence, a local difference extremum of edge projection (LDEEP) is proposed. The main idea is to find the maximum contrast in a neighborhood range for difference, as shown in where is the difference between the current projection position and the maximum contrast in the local minimum range, is the edge index in the neighborhood range, and △ represents the local search range.

Figures 5(c) and 5(d) are the LDEEP results obtained by △ of 12. It can be seen that most of the interference peaks have been reduced, effectively retaining the edge value of the silicon wafer arrangement. Although the neighborhood difference reduces the absolute peak value of the edge at the E1 silicon wafer, the interference peaks decrease more significantly. That is the contrast between real edge and interference is more obvious and is conducive to obtain the edge position of each column and row of silicon wafers.

2.2. Single Silicon Wafer Segmentation Based on Selective Peak Quick Sort

With the LDEEP results, the first (one more than the number of silicon wafer in horizontal or vertical direction) positions with maximum values can be looked at as the edge position of the silicon wafer. The bubble sorting algorithm is most commonly used algorithm but needs to search whole projection values, and the time complexity is . That is, for a silicon image area with a size of , it needs 63332 search times at horizontal direction and 38092 search times at vertical direction for bubble sorting algorithm. Actually, for the specific silicon wafer area image () projection peak order, it is only necessary to obtain the first boundary values in the horizontal direction, and the first boundary values in the vertical direction, without all of them to be sorted. So a selective peak quick sorting (SPQS) algorithm was presented in this paper. The main idea is to set the distance between two adjacent extreme values. When the distance is less than a threshold, the two extreme values are integrated and the larger one maintained. Figure 7 uses the horizontal projection (from up to bottom in silicon wafer region) as an example for the edge search process of SPQS.

In Figure 7, is the distance between the th position in the neighborhood difference projection vector and the corresponding position of , and represents the length of the vector. In a traversal process, only finite peak values need to be compared and bubbling, so that the prior conditions (the arrangement of the silicon wafers) can be fully utilized to speed up the search. On the other hand, the value of at the edge of the silicon wafer is often much larger than that at the nonedge position. On the basis of setting the condition (such as ), the search frequency can be greatly reduced. Table 2 shows the ranking results.

Table 2 shows the search results of edge positions of silicon wafer in a SCC image. According to the results, the boundary location of each silicon wafer is shown in Figure 8. Meanwhile, the single silicon wafer is segmented.

3. Defect Recognition Based on Lightweight Deep Neural Network

When imaging with PL pattern, the laser hole and the surrounding grid lines cannot be excited by ions, so there are black dots and crosses appearance with corner features (as shown in Figure 6). Due to the fragility and stress concentration of the silicon wafer substrate, a kind of internal microcracks (called hidden crack) are prone to occur around the laser hole (as shown in Figures 9(a) and 9(b)). In the PL image, hidden crack usually presents a darker direction line with 45° diagonals (crack type1) or an “” shape cross (crack type2) with a size of 0.5 mm~5 mm. Due to the improper detection method in the process of manual inspection, the friction force between the sharp object and the cell is different, resulting in scratches on the surface of the cell, as shown in Figure 9(c). Due to the partial shadow heating in the SCC, a hot spot effect is generated, causing black spots on the cell, as shown in Figure 9(d). These defects are easy to cause damage to the solar cell and seriously affect the life and translate efficiency of the solar cell. Therefore, the detection of cracks, scratches, and black spots is an important content of solar cell detection, and it is also the key and difficult point to realize automatic defect recognition of the SCC.

In order to improve the accuracy of cell detection, this paper provided an improved deep network with the lightweight backbone of YOlOv3-Tiny to detect and recognize the defects. In the traditional YOLOv3-Tiny network structure, the backbone network uses a size convolution kernel for feature extraction, and a pooling operation is performed after each feature extraction. Although the network structure is lightweight and has real-time performance, there are still some shortcomings: first, the backbone network convolutional layer has only seven layers, relatively insufficient for detecting hidden cracks with weak and small size. Second, the detection scale information of a network model is less. Only two scales of and are used to detect single-chip defects, which can easily lead to missed detection of small defects and low detection accuracy.

3.1. Improvement Network Model with Shallow Features

The network structure is deepened with 39 layers and more and convolutions. The convolution kernel is used to adjust the number of channels, is used to extract high-dimensional spatial features. Meanwhile, an upsampling layer on the basis of the original network’s and defects predictions to form , , and three-scale predictions to further improve the accuracy of defect detection. The improved network structure is shown in Figure 10.

The backbone network consists of 13 convolutional layers and 6 maximum pooling layers. The convolutional layer (Conv) in the network consists of a two-dimensional convolutional layer (Conv2d), a normalized operation layer (BN), and an activation function layer (Leak ReLU) in sequence. The BN layer makes the input of each batch in the effective area of Leak ReLU present a normal distribution, and Leak ReLU can reduce the disappearance of gradients during the training process.

The input image is first converted to and becomes the size of in the output layer. The output layer has a deeper number of network layers and a larger receptive field, which is suitable for predicting large targets. In the first prediction Prediction1 on this feature map, the feature layer uses a convolution kernel to adjust the number of channels, and the output is upsampled. The upsampled feature is stacked with the thirteenth feature layer of the backbone network in the channel dimension, and the output layer with a size of is obtained after convolution operation. The output layer has a relatively shallow number of network layers and a small receptive field, which is suitable for small target prediction. Prediction2 is used for the second prediction on this feature map.

In order to make the proposed model suitable for variant size of defects, the third prediction layer Prediction3 with shallow information is constructed. In the second prediction path, the feature layer uses a convolution kernel to adjust the number of channels, and the output is upsampled once, and the upsampled feature is stacked with the ninth feature layer of the backbone network in the channel dimension. After convolution feature fusion, a new output layer with a size of is constructed, as shown added prediction scale module in Figure 10. The feature information of the output layer comes from the fusion of the shallow information and the second predicted feature map, which contains more low-dimensional feature information and smaller receptive field, and is suitable for predicting smaller targets. Upon this, the third prediction branch Prediction3 is designed for different scales fused, and a certain degree of deep semantic understanding ability can also be provided in the shallow features, so as to give better recognition results of small target objects.

3.2. Prediction of Bounding Box

In the YOLOv3-Tiny algorithm, the picture input into the network is divided into cells according to the scale of the feature map, and the scale of the feature map is and , respectively. The first prediction layer generates grids; the prediction box is larger. The second prediction layer generates grids, which is smaller than the prediction box of the first prediction layer.

The crack type1, the crack type2, and black spot are small in the solar cell, and smaller prediction frame is needed. The scale of the third prediction layer constructed is in this paper. The input image is finally divided into grids. Each grid corresponds to the channel information of the prediction layer. Each prediction layer channel contains the final prediction parameters of the grid.

Taking the first prediction layer as an example, each channel is composed of the offset and of the grid where the center point of the prediction box is located relative to the upper left corner of the grid. are the width, height, and the prediction confidence of the prediction box, represent the score of the crack type1, the crack type2, the scratch, and black spot, respectively. Each grid in the prediction layer generates three preset bounding boxes according to the corresponding channel information. Each prediction box contains the above 9 parameters, so the dimension of the output channels of the three prediction layers is 27. When training, the bounding box constantly adjusts in size, the predicted box in the grid is matched with the target box in the label, and the bounding box with the largest IOU (Intersection Over Union) value is selected as the output result [26]. The schematic diagram of bounding box prediction is shown in Figure 11.

The formula for obtaining the actual coordinates of the bounding box is as follows: where and are the coordinates of the center point of the bounding box, and are the width and height of the bounding box, and are the coordinates of the upper left corner of the grid where the center of the target box belongs, and are the offset distance of the center coordinates relative to the upper left corner of the grid, and and are the width and height of the preset bounding box.

4. Experimental Results and Analysis

4.1. Sorting Out Single Silicon Wafer Defect Data Set

In this paper, a total of 3200 SCC images with different specifications ( and ) have been collected in a large-scale solar cell factory in Jiangsu province of China. Each silicon wafer in the SCC was segmented and extracted, and 182,400 images of solar cells were obtained. After sample screening, 6100 representative defect samples were selected and maintained. In order to increase the diversity of the data, some defective samples with brightness transformation, flipping, and other operations were added to augment the data set. Finally, 18,000 sample data sets were formed with 4 types of defects: the crack type1, the crack type2, scratch, and black spot. The size of each image is between and pixels. The defect size in the image is between and pixels. Part of the data set after data augment is shown in Figure 12.

The defect sample data set was divided into training set, validation set, and test set according to a certain proportion. In order to facilitate the detection of defects, the data set format was converted to the VOC data set format, and the images containing the defects were marked. The number of various data sets after sorting is shown in Table 3.

4.2. Evaluation Index

In order to examine the performance of the improved model, we involved some evaluation indicators including precision (), recall (), and mean average precision (mAP). For the VOC data set, if the IOU (the intersection ratio of the predicted box and the real box) is greater than 0.5, the predicted object and the actual object are considered to be the same object; otherwise, it is considered to be not the same object. Under this definition, the calculation formulas for accuracy and recall are where TP represents that a positive sample is predicted to be a positive sample, FP represents that a negative sample is predicted to be a positive sample, and FN represents that a positive sample is predicted to be a negative sample.

With the calculation formulas for precision and recall, a PR curve can be drawn with recall rate as abscissa and precision rate as ordinate for a certain type of target. The area enclosed by the curve is defined as AP. The mAP value can be calculated by the average of the AP values of each class. The calculation formula is as follows:

In formula (8), represents the accuracy rate when pictures can be recognized, represents the change in recall rate when the number of recognized pictures changes from to , and is the number of target categories.

4.3. Experimental Results and Analysis

The experiment was carried out under Windows operating system, using the Darknet deep learning framework [27], with NVIDIA GTX 1050, Intel (R) core i5-7300HQ, 8 G memory, install CUDA10.0, cudnn7.4.1 to achieve GPU acceleration, using Python language to achieve the improved algorithm. Set the maximum number of iterations for model training to 20,000, the initial learning rate is 0.001, after 10,000 iterations, the learning rate is 0.0001, the batch is 64, and the subbatch is 8.

The PR curves of YOLOv3-Tiny and improved algorithm for different types of defects are shown in Figure 13. The areas under the PR curve in the figures are the AP value of the corresponding class. The larger the area under the curve, the higher the corresponding AP value and the better detection performance.

The comparison of AP values between YOLOv3-Tiny and the improved network algorithm on the crack type1, the crack type2, scratch, and black spot is shown in Figure 14.

From Figures 13 and 14, it can be seen from the figures that the overall AP value of scratch is the lowest among all types. That is because the training data set of scratches is thin and small, it is prone to overfitting, and the generalization ability on the test set is poor. On the other hand, as the diversified shapes of scratch defects, false detections are prone to occur during the testing process.

Furthermore, the scratch AP value reaches the most absolute increase with 0.17, which indicates the detection accuracy of defect target improved by multiscale feature fusion layers. The AP value of the crack type2 is the least increase with only 0.03. This is mainly because images of crack type2 have more training data sets, single defect shape, and better detection effect. From the overall effect, the improved algorithm has better detection performance and better generalization ability.

Generally, the experiment proved that increasing the depth of the network and the number of convolution kernels could improve the feature extraction capability; meanwhile, it may take more time consuming. In order to inspect the real-time performance of the improved algorithm, the mAP value and speed are analyzed. Under the same experimental conditions, SSD, YOLOv3, YOLOv3-Tiny, and the improved network model are tested on the test data set. The test results are shown in Table 4.

As shown in Table 4, the mAP values of SSD and YOLOv3 are 97.52% and 94.31%, respectively, which are much higher than those of YOLOv3-Tiny and the improved algorithm. However, due to the large network model, many parameters bringing large amount of calculation, the detection speed is slow [28], which is difficult to adapt the real-time detection of solar cell on the production line. The mAP value of the improved network model is 87.55% with 6.78% higher than that of the YOLOv3-Tiny network model, indicating that the detection performance of the improved network model has improved. Because the improved algorithm deepens the network structure and increases the computation of the network model to a certain extent, the detection speed of the improved network model is slightly lower than that of YOLOv3-Tiny. Generally, the detection speed above 25 fps can meet the real-time requirements of target detection, and the improved algorithm can still meet the real-time requirements. It can be seen that the improved network model can take into account both the detection speed and accuracy, which can better complete the task of solar cell defect detection.

In order to show the detection effect of the improved network model more intuitively, Figure 15 shows the comparison of the detection results before and after the model improvement.

In order to verify the feasibility of the improved model in this paper, various types of component images have been collected on the SCC production line. These images are randomly collected at different times, locations, and under different lighting conditions. Take the specification module as an example; the SCC is composed of 60 cell silicon wafers arranged. The improved model detection speed is 40 fps, and the average time per cell is 0.025 s. Therefore, the average time to complete a SCC detection is 1.5 s, which meets the real-time detection requirements. The effect of using the improved network model to detect different types of solar cell defects is shown in Figure 16. It can be seen that the improved network model can more accurately detect the crack type1, the crack type2, scratch, and black spot.

Now, the detection model was successfully applied in the field and worked well (as shown in Figure 17). On-site operation shows that the defect detection method can ensure the speed and accuracy of SCC defect detection and lay the foundation for automatic identification of defects in SCC products.

5. Conclusion

In the SCC PL image, the defect area accounts for less than one ten thousandth, which brings difficulties to the efficient detection and recognition of defects. On the premise of ensuring the efficiency, in order to improve the detection accuracy, the SCC units are segmented and extracted, which lays the foundation for the next step of defect detection and recognition. In order to recognize the weak and small defects, an improved network model based on YOLOv3-Tiny is proposed. We deepened the structure of the main feature extraction network, added a feature pyramid layer to realize three-scale prediction, and strengthened the semantic information of the shallow feature map. On the solar cell defect test data set, the recognition rate of mAP is 87.55%, which is 6.78% higher than the original algorithm, and the detect speed is 40 fps, which meets the requirements of real-time detection. The experimental results show that the improved model improves the detection accuracy, reduces the missed detection rate, and achieves a good balance between detection accuracy and detection speed.

Whereas, due to the limited model parameters, when the environment is more complicated (such as occlusion and dark light), the detection accuracy of the model still needs to be improved. On the premise of both speed and accuracy, improving the generalization ability of the model will be the main research direction in the future.

Data Availability

The [DATA TYPE] data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research and Development Plan of China (Grant No. 2018YFC1902400).