Open Access
28 April 2020 Adversarial autoencoder for detecting anomalies in soldered joints on printed circuit boards
Author Affiliations +
Abstract

The inspection of solder joints on printed circuit boards is a difficult task because defects inside the joints cannot be observed directly. In addition, because anomalous samples are rarely obtained in a general anomaly detection situation, many methods use only normal samples in the learning phase. However, sometimes a small number of anomalous samples are available for learning. We propose a method to improve performance using a small number of anomalous samples for training in such situations. Specifically, our proposal is an anomaly detection method using an adversarial autoencoder (AAE) and Hotelling’s T-squared distribution. First, the AAE learns features of the solder joint following the standard Gaussian distribution from a large number of normal samples and a small number of anomalous samples. Then, the anomaly score of a solder joint is calculated by Hotelling’s T-squared method from the features learned by the AAE. Finally, anomaly detection is performed by thresholding using this anomaly score. In experiments, we show that our method performs anomaly detection with few false positives in such situations. Moreover, we confirmed that our method outperforms the conventional method using handcrafted features and a one-class support vector machine.

1.

Introduction

Inspection of the solder joints on a printed circuit board (PCB) is challenging because such defects cannot be observed directly due to the solder joints being sandwiched between the PCB and an integrated circuit (IC) chip. To solve this problem, automated x-ray inspection, which can perform nondestructive inspection, is generally employed.1,2 In our method, we employed an automated x-ray inspection that collects sliced images of the solder joints by x-ray computed tomography (CT) scans on the x-ray inspection machine and detects defects in the solder joints.

In recent years, automatic visual inspection systems using machine learning, especially deep learning, have been studied as a method of classifying normal and anomalous samples. This is motivated by the fact that inspection by human experts is problematic, with fatigue possibly causing the expert to miss anomalous samples. One of the most popular anomaly detection methods using machine learning is a one-class support vector machine (OCSVM).3 This method requires handcrafted features extracted by human experts in advance. Then, the extracted features are input to the trained OCSVM, and inputs are classified by the output of OCSVM. In this case, OCSVM is trained with only normal samples, but it has the disadvantage of the feature needing to be designed by human experts in advance and requiring redesign of the feature extraction method when the product specification is changed. When deep learning methods are used, because product images are directly inputted to neural networks, extracting features by human experts is not required. Therefore, even if the product specification is changed, only network retraining is required; thus the operating cost can be greatly reduced. In general, one of the anomaly detection methods using deep learning is to classify normal and anomalous samples using a binary classifier.4,5 However, in anomaly detection for industrial products, it is difficult to guarantee enough anomalous product samples for training the classifier because defects rarely occur on the production line. Therefore, anomaly detection is generally performed using only normal data.3,6 However, because a small number of anomalous samples is sometimes available for the learning phase, improvement of performance can be expected by adding anomalous samples to the training dataset. In this method, normal samples as well as a small number of anomalous samples were used for learning. In particular, our method extracts features following the standard Gaussian distribution by an adversarial autoencoder (AAE)7 from such imbalanced samples. Furthermore, anomaly scores are calculated from the features by Hotelling’s T-squared method8 and each solder joint is classified by an anomaly score threshold. In this experiment, we show that our method is superior to the method using handcrafted features and OCSVM on the imbalanced samples. Our contribution is a method that detects defects from a large number of normal samples and a small number of anomalous samples during the quality inspection of industrial products.

2.

Related Work

Recently, the x-ray CT method has been mainly used to detect anomalies in PCB solder joints because they cannot be observed directly. The x-rays pass through the PCB because it consists of materials with low atomic weight, but solder joints are imaged because they have high atomic weight.9 For example, the solder ball portion of the solder joints is represented as voxel data to obtain the condition of the solder joints using two-dimensional x-ray CT images taken from multiple directions.10 The voxel data are input to a three-dimensional convolutional neural network and classified by the output of the network. However, in typical anomaly detection tasks, a neural network classifier has the problem of requiring both normal and anomalous samples for the training stage, and their prediction performance is unstable for unknown anomalous samples not seen in training samples. Therefore, training methods that can produce satisfactory classification results when only normal samples or a small number of anomalous samples are used are needed.

A previously developed anomaly detection method uses an OCSVM in the latent space of extracted features. However, this has some disadvantages. The feature extraction method must be designed beforehand, and the features are changed by every target. To solve this problem, an autoencoder,11 which is a model of a neural network, extracts the features in the latent space from the input samples automatically. In anomaly detection methods using an autoencoder, methods based on reconstruction error6 and the normal condition model in the latent space12 are used. The proposed method belongs to the latter approach. In the former method, the networks are usually trained with only normal samples. As a consequence, the networks can reconstruct normal samples with small reconstruction errors; however, anomalous samples cannot be reconstructed, and the reconstruction errors become large. Hence, the samples are classified by a threshold for reconstruction errors. In the latter method, a normal model is defined in the latent space, and the likelihood of an input sample being in this space is calculated to classify it. In Ref. 12, test samples are classified by a threshold not only for the reconstruction error but also for the likelihood for a Gaussian distribution of the features extracted by the AAE. Compared with our method, it is different in terms of thresholding on the reconstruction error and likelihood, rather than on anomaly scores calculated by Hotelling’s T-squared method.

3.

Proposed Method

3.1.

X-Ray Computed Tomography

Because the solder joints sandwiched between the PCB and the IC chip cannot be inspected directly, we obtain sliced images of the solder joints with x-ray CT. When the IC chip and the PCB are joined, many solder joints are formed. Our approach is to detect each solder joint and cut out these places in advance to capture sliced images of each solder joint. The number of sliced images λ is taken from each solder joint, and we define these sliced images as the sample for one solder joint. Hence, only anomalous solder joints can be treated as anomalous samples in each PCB where anomalous solder joints exist. An overview of the method for capturing sliced images of a solder joint is shown in Fig. 1. We took eight sliced images from one solder joint. Each is assigned a layer number corresponding to its image layer.

Fig. 1

Overview of obtaining sliced images with x-ray CT. Each image is given a layer number.

JEI_29_4_041013_f001.png

Examples of the captured sliced images of normal samples are shown in Fig. 2(a), and Fig. 2(b) shows anomalous samples.

Fig. 2

Examples of eight sliced images for normal and anomalous solder joints. Black dots in some normal sliced images are not anomalous factors. Anomalies are indicated by solder joints that are thinner than normal.

JEI_29_4_041013_f002.png

3.2.

Hotelling’s T-Squared Method

The number of anomalous samples is much smaller than the number of normal samples; thus, the normal model is defined from only normal samples or a small number of anomalous samples. If it is assumed that the normal model generated from the dataset Z=(z1,z2,,zn) and each z=(z1,z2,,zd)Rd is represented by the parameter θ, the negative log-likelihood probability a(z) of unknown sample z is defined as an anomaly score in the following equation:

Eq. (1)

a(z)=logq(z|θ).
In the normal model q(z|θ), the probability density of the normal samples is high and that of the anomalous samples is low. Therefore, the anomaly scores of the former are low and those of the latter are high, and it is possible to classify normal and anomalous samples by a threshold on the anomaly score. Hotelling’s T-squared distribution is an anomaly detection method that can be applied to a dataset following a Gaussian distribution. Here, a(z) of z is calculated as Eq. (2) using the two parameters of the Gaussian distribution, latent vector μ and variance-covariance matrix Σ:

Eq. (2)

a(z)=logN(z|μ,Σ)=log1(2π)d|Σ|exp(12(zμ)Σ1(zμ))(zμ)Σ1(zμ).
The last term of Eq. (2) is equal to the Mahalanobis distance. Moreover, if μ=0 and Σ=I, the dataset follows the standard Gaussian distribution, and a(z) is calculated by the following equation:

Eq. (3)

a(z)=logN(z|0,I)=log1(2π)dexp(12zz)zz.
The last term of Eq. (3) is equal to the Euclidean distance. In Hotelling’s T-squared method, a(z) follows the chi-square distribution with the degree of freedom of d and the scale factor of 1. The chi-square distribution with d=16 is shown in Fig. 3.

Fig. 3

Plot of the chi-square distribution. The vertical axis is the density of the distribution, and the horizontal axis is the a(z) value. Degree of freedom d=16.

JEI_29_4_041013_f003.png

In Fig. 3, the graph shows the likelihood of the a(z) value of z sampled from the normal model following the standard Gaussian distribution. When an a(z) value is high, the probability of being a normal sample is low; therefore, the sample can be regarded as anomalous. Hence, it is possible to classify normal and anomalous samples by predetermining any upper probability on the graph and setting a one-dimensional threshold.

3.3.

Adversarial Autoencoder

Although images are high-dimensional data, they can be compressed to lower-dimensional features in the latent space. This is because normal samples are assumed to have common features. An autoencoder is a low-dimensional feature extractor for neural networks. An autoencoder is composed of two networks: encoder (En) and decoder (De). En is trained to extract features as latent vector zq(z) from input xpdata(x), where pdata(x) is the data distribution of the input samples. De is trained to reconstruct the input x from z. The loss function is shown as follows:

Eq. (4)

LAE=Expdata[(xDe(En(x)))2].

Principal component analysis13 is another conventional dimensional reduction method, but it can only map linearly from the high-dimensional space to the low-dimensional latent space. The autoencoder enables nonlinear mapping using activation functions and deep layers. This leads to the model extracting more representative features of complex structured data because the projection functions En and De are more flexible.

Although low-dimensional features of input samples can be acquired by the autoencoder, the distribution of the features in the latent space cannot be specified. Therefore, to apply the Hotelling’s T-squared method described in Sec. 3.2 to the distribution of the features extracted by the autoencoder, we employ an AAE consisting of the autoencoder and discriminator networks shown in Fig. 4. The AAE allows for matching of the distribution of the latent space to an arbitrary distribution by an adversarial manner.7 To incorporate Hotelling’s T-squared method to the deep generative model, we train the AAE with an adversarial loss between the distribution of the encoded latent vectors and the standard Gaussian distribution. Furthermore, we assume the real-world situation in which a large number of normal samples and a small number of anomalous samples are available. The adversarial training with such imbalanced samples facilitates the normal samples being mapped to the high density of the standard Gaussian distribution and the anomalous samples being mapped to the low density. This means that the AAE constructs a normal model that follows the standard Gaussian distribution in the latent space. Therefore, it is possible to apply Hotelling’s T-squared method in the latent space. The reason for defining the arbitrary distribution as a standard Gaussian distribution is to simplify the anomaly score calculations described in Sec. 3.2.

Fig. 4

Architecture of an AAE consisting of an autoencoder and discriminator. In the autoencoder, the En extracts latent vector z from input images x sampled from pdata(x), and the De reconstructs x from z. The discriminator determines whether the input is sampled from standard Gaussian distribution p(z) or latent distribution q(z).

JEI_29_4_041013_f004.png

The discriminator is trained to determine whether the input vector is sampled from latent distribution q(z) or from standard Gaussian distribution p(z). In contrast, the En is trained to approximate q(z) to p(z). These actions are called adversarial training and are defined in a loss function as Eq. (5). The En is trained to minimize and the De is trained to maximize function V, and E means cross entropy between a subscript and square brackets.

Eq. (5)

minEnmaxDV(D,En)=Ezp[log(D(z))]+Expdata[log(1D(En(x)))].

Discriminator (D) updates its own parameters to output D(z)=1 when input vector z is sampled from p(z) and output D(z)=0 when z is sampled from q(z). Therefore, when the discriminator maximizes Eq. (5), it can determine whether the input z is sampled from p(z) or q(z). The loss function of the discriminator LD is transformed from Eq. (5) to Eq. (6) as follows:

Eq. (6)

LD=maxEzp[log(D(z))]+Expdata[log(1D(En(x)))]=maxEzp[log(D(z))]+Ezq[log(1D(z))].

In contrast, Eq. (5) is minimized when the En can approximate q(z) to p(z) sufficiently, and the loss function of the En LEn can be transformed from Eq. (5) to Eq. (7) as follows:

Eq. (7)

LEn=minExpdata[log(1D(En(x)))]=minEzq[log(1D(z))].

To summarize, the AAE is trained to repeat the following procedure:

  • 1. update En and De parameters to minimize Eq. (4),

  • 2. update D parameters to maximize Eq. (6), and

  • 3. update En parameters to minimize Eq. (7).

4.

Experiments

We performed experiments with the proposed method using the AAE and x-ray CT images of solder joints on PCBs. The anomaly detection procedure of the proposed method is as follows:

  • 1. Each solder joint in a PCB was detected and λ sliced images were captured by x-ray CT on an x-ray inspection machine.

  • 2. The number of sliced images λ was combined into one sample, and the sample was input to the AAE network as λ channels. The AAE was trained with a large number of normal and a small number of anomalous samples.

  • 3. Test samples were input to the trained AAE, and the latent vector was obtained from the output of the En. The anomaly score for each latent vector was calculated by Hotelling’s T-squared method.

  • 4. Normal and anomalous samples were classified by setting an anomaly score threshold.

The architecture used in the experiments is shown in Fig. 5. Each of the sliced images consisted of eight-layer images, as shown in Fig. 1. We resized the sliced images to 64×64  pixels and input λ=8 sliced images to the AAE network as eight channels.

Fig. 5

Parameters of the AAE autoencoder. Numbers following words are filter sizes. “s2” means a convolution with a stride of 2. The input of the discriminator is from the autoencoder’s latent vector at dense layer 16 or the vector sampled from the standard Gaussian distribution.

JEI_29_4_041013_f005.png

We compared our method with a method using handcrafted features and OCSVM. This is to show that our method using features extracted automatically by AAE is superior to the classification by machine learning using features designed by human experts. The handcrafted features designed by human experts were four-dimensional features: the substrate area, head-in-pillow area, circularity, and luminance ratio.

The experimental results are shown in Table 1. In this table, the result for handcrafted features + OCSVM was with the condition of using all normal samples for training the OCSVM, and we show another result of training the OCSVM with fewer normal samples in Table 2. This result shows that the accuracy improved as the number of training samples increased; however, it was inferior to the proposed method even if all normal samples were used for training the OCSVM. The AAE architecture used in the experiments is shown in Fig. 5. The inputs to the network were 64×64×8. We used the AAE parameters of batch size = 64 and epoch = 100, and the OCSVM parameter γ=0.11 and radial basis function kernel. Our code is available at https://github.com/rearwist3/aae_solder_tf. We chose 100 epochs empirically by observing the performance of the model every 20 epochs over 200 epochs. Figure 6(a) contains all of the results, and Fig. 6(b) omits the results at 20 epochs to show the details of the false positive rate (FPR) from 40 to 200 epochs. Because low FPR was obtained at 100 epochs and 120 epochs with 10 anomalous training samples, we chose 100 epochs. In the network, the computation time of the learning phase through 100 epochs was 80  min on an RTX 2080 Ti GPU.

Table 1

Comparison of our AAE method (ours) with handcrafted features + OCSVM.

Number of training samplesNumber of test samplesTPR (%)FPR (%)
OCSVM3,510,000+02,908,386+4101001.10
Ours40,000+102,908,386+4001000.07

Table 2

Comparison of results of handcrafted features + OCSVM with different number of samples used for training OCSVM. The rows denote the results for using each number of training samples. (small, medium, and large).

Number of training samplesNumber of test samplesTPR (%)FPR (%)
Small1,404,000+05,014,386+4101002.52
Medium2,339,766+04,078,386+4101001.72
Large3,510,000+02,908,386+4101001.10

Fig. 6

FPR of every 20 epochs on 10 anomalous training samples is used in the training dataset.

JEI_29_4_041013_f006.png

We set the threshold as 100% true positive rate (TPR) in both models to avoid classifying anomalous samples as normal. FPR of handcrafted features + OCSVM was 1.10% after training with 3,510,000 normal samples. In contrast, AAE + Hotelling’s T-squared method could be trained with only 40,000 normal and 10 anomalous samples, and it could classify normal and anomalous samples with fewer false positives. To verify the results, we selected 10 anomalous training samples at random three times and trained the network with each dataset for 100 epochs. Mean and standard deviation of the resulting FPR are 0.07±0.01%. We show the results when training the network with 0, 20, 50, and 100 anomalous samples to prove that including anomalous samples in the training dataset improves anomaly detection performance and to find the optimal balance between normal and anomalous samples for training the network. The resulting FPR is 5.15%, 0.93%, 0.10%, and 1.25%, respectively. The results confirm that including anomalous samples in the training dataset is effective in the anomaly detection method and the case with 10 anomalous training samples had the best performance and the fewest anomalous training samples.

Moreover, we compared our method with classification by a binary classifier, which is a typical anomaly detection method using deep learning. By this experiment, we show the effectiveness of the proposed method under the condition in which a sufficient number of anomalous samples for the training classifier cannot be guaranteed. The result when the binary classifier is trained with a large number of normal and a small number of anomalous samples is shown in Table 3. The classifier could not classify normal and anomalous samples when the number of normal and anomalous training samples was imbalanced. Moreover, we show the result for the binary classifier under the condition in which the number of normal and anomalous samples is not imbalanced in Table 3. In this experiment, we reduced the number of normal samples to match the number of anomalous samples to equalize each sample class and then trained the classifier. Neither result was as good as that of the proposed method, and we thus conclude that the proposed method is effective when the number of anomalous samples is small.

Table 3

The results when the samples were classified by the binary classifier. The first row denotes the results of training the classifier with the imbalanced dataset (without undersampling). The second row denotes the results of undersampling the dataset (with undersampling).

Number of training samplesNumber of test samplesTPR (%)FPR (%)
Without undersampling40,000+102,908,386+40099.300
With undersampling10+102,908,386+40098.070

5.

Conclusion

In this paper, we propose a method for inspecting solder joints on PCBs by an anomaly detection method using an AAE. We captured sliced images of solder joints using x-ray CT, and the sliced image features following the standard Gaussian distribution were extracted by the AAE. Defects were detected by applying Hotelling’s T-squared method to these features. Experimental results showed that the AAE could classify normal and anomalous samples with few false positives even when the number of data samples was small. However, when compressing high-dimensional data to low-dimensional space, the number of low latent dimensions required for the full expression of high-dimensional data depends on the inputs, and we need to select the optimal number of latent dimensions for every dataset. Statistical implementation of methods to optimize the number of latent dimensions will be studied in future work.

Acknowledgments

This work was supported by Aisin AW Co. Ltd.

References

1. 

A. Teramoto et al., “Development of high speed oblique x-ray CT system for printed circuit board,” SICE Trans. Ind. Appl., 6 (9), 72 –77 (2007). Google Scholar

2. 

Z. C. Feng et al., “Characterization of solder defects on package on packages with AXI systems for inspection quality improvement,” (2015). Google Scholar

3. 

B. Schölkopf et al., “Estimating the support of a high-dimensional distribution,” Neural Comput., 13 (7), 1443 –1471 (2001). https://doi.org/10.1162/089976601750264965 NEUCEB 0899-7667 Google Scholar

4. 

D. Soukup and R. Huber-Mörk, “Convolutional neural networks for steel surface defect detection from photometric stereo images,” in Int. Symp. Vis. Comput., 668 –677 (2014). Google Scholar

5. 

D. Weimer, B. Scholz-Reiter and M. Shpitalni, “Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection,” CIRP Ann., 65 (1), 417 –420 (2016). https://doi.org/10.1016/j.cirp.2016.04.072 CIRAAT 0007-8506 Google Scholar

6. 

M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proc. MLSDA 2014 2nd Workshop Mach. Learn. for Sens. Data Anal., (2014). Google Scholar

7. 

A. Makhzani et al., “Adversarial autoencoders,” in Proc. Int. Conf. Learn. Represent. Workshop, (2016). Google Scholar

8. 

H. Hotelling, “The generalization of student’s ratio,” Breakthroughs in Statistics, 54 –65 Springer, New York (1992). Google Scholar

9. 

G. Leinbach and S. Oresjo, “The Why, Where, What, How, and When of Automated X-Ray Inspection,” Agilent Technologies, Loveland, Colorado (2001). Google Scholar

10. 

B.-J. Lin et al., “Use 3D convolutional neural network to inspect solder ball defects,” in Int. Conf. Neural Inf. Process., 263 –274 (2018). Google Scholar

11. 

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, 313 (5786), 504 –507 (2006). https://doi.org/10.1126/science.1127647 SCIEAS 0036-8075 Google Scholar

12. 

L. Beggel, M. Pfeiffer and B. Bischl, “Robust anomaly detection in images using adversarial autoencoders,” (2019). Google Scholar

13. 

H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., 24 (6), 417 (1933). https://doi.org/10.1037/h0071325 JLEPA5 1939-2176 Google Scholar

Biography

Keisuke Goto is a graduate student at Gifu University of Natural Science and Technology. He received his BS degree from Gifu University of Faculty of Engineering in 2018. His research interests are visual inspection and deep learning.

Kunihito Kato is an associate professor at Gifu University of Faculty of Engineering. He received his BS and MS degrees and his PhD from Chukyo University of Faculty of Information Science in 1993, 1995, and 1996, respectively. He has been a faculty member at University of Maryland Institute for Advanced Computer Studies. His research interests include image processing, pattern analysis, and computer vision.

Takaho Saito belongs to Aisin AW Co. Ltd.

Hiroaki Aizawa is a doctoral student at Gifu University of Graduate School of Engineering. He received his BS and MS degrees from Gifu University in 2016 and 2018, respectively. His research interests are computer vision, machine learning, and deep learning.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Keisuke Goto, Kunihito Kato, Takaho Saito, and Hiroaki Aizawa "Adversarial autoencoder for detecting anomalies in soldered joints on printed circuit boards," Journal of Electronic Imaging 29(4), 041013 (28 April 2020). https://doi.org/10.1117/1.JEI.29.4.041013
Received: 1 October 2019; Accepted: 10 April 2020; Published: 28 April 2020
Lens.org Logo
CITATIONS
Cited by 10 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Inspection

X-rays

X-ray imaging

X-ray computed tomography

Feature extraction

Statistical modeling

Binary data

Back to Top