Elsevier

Journal of Manufacturing Systems

Volume 62, January 2022, Pages 811-822
Journal of Manufacturing Systems

3DSMDA-Net: An improved 3DCNN with separable structure and multi-dimensional attention for welding status recognition

https://doi.org/10.1016/j.jmsy.2021.01.017Get rights and content

Highlights

  • We make use of the time sequence information into deep learning-based welding status recognition to enhance accuracy.

  • We propose a method of 3DCNN-oriented convolution kernel separation as a lightweight time sequence model.

  • We propose multi-dimensional attention mechanism for reducing the loss of accuracy caused by the separation operation.

  • Identification of globule transition mode and the types of molten pool.

Abstract

The vision-based welding status recognition (WSR) provides a basis for online welding quality control. Due to the severe arc and fume interference in the welding area and limited computational resources at the welding edge nodes, it becomes a challenge to mine the most discriminative feature contained in welding images by using a lightweight model. In this paper, we propose an improved three-dimensional convolutional neural network (3DCNN) with separable structure and multi-dimensional attention (3DSMDA-Net) for WSR. The proposed 3DSMDA-Net uses 3DCNN to adaptively extract abstract spatiotemporal features in a welding process and then leverages such time sequence information to improve the recognition accuracy of WSR. In addition, we decompose the classical 3D convolution into depthwise convolution and pointwise convolution to produce a lightweight model. A multi-dimensional attention mechanism is further proposed to compensate for the loss of accuracy caused by the separation operation. The results of experiments reveal that the proposed method reduces the model size to 1/7 of the classical 3DCNN without sacrificing accuracy. The comparison experiment results have indicated that the accuracy of the proposed method is more accurate and noise-resistant than that of the conventional model.

Introduction

As arc welding is an important metal joining method, ensuring the quality of the weld is critical to improving the reliability of a workpiece [1]. After welding materials and welding methods have been determined, controlling a welding process becomes the key to ensure the consistency of welding quality [2]. A traditional welding process optimizes the craft through the CAE simulation analysis before welding and the destructive or non-destructive testing after welding [[3], [4], [5], [6], [7]]. However, this model will not only waste resources but also lack interactions with the welding process. In fact, skilled welding workers can dynamically adjust their welding craft by observing welding processes such as the globule transition mode and the shape of the molten pool. Therefore, it is an effective way of improving welding quality by monitoring the welding status based on visual sensing and adjusting the welding craft with feedback [[8], [9], [10]]. In a vision-based welding status monitoring process, the common monitoring objects are molten pool shape [11], penetration degree [12] and globule transition mode [13].

A typical robotic welding system is shown in Fig. 1. The controller is responsible for sending cooperative welding instructions to the robot and the positioner. After receiving the welding instruction, the wire feeding mechanism can provide welding wire to ensure the welding progress. In addition, in order to obtain a good welding environment, shielding gas is generally delivered to the base metal area. After the arc is ignited, a strong high temperature is generated and the welding wire is melted. Under the action of gravity, the globule transfers to the base metal. Under the high temperature of the arc and the globule, the base metal is melted. This melting area is the molten pool. After the molten pool is cooled, it solidifies into a welded beam to realize the connection of the base metal. Therefore, the globule transition state, molten pool shape and penetration degree can directly reflect the welding quality. In the experiment, the CCD camera was fixed on the front of the robotic arm and moved with the robotic arm to capture the welding process in real time. The vision-based welding status recognition (WSR) can be regarded as a task of pattern classification. Such a task for welding images faces the following challenges due to the characteristics of a welding process. Firstly, a process of arc welding is accompanied by strong arc light and smoke interference. This makes it difficult for industrial cameras to obtain clear images of the welding process [14]. Further, the vibrations of a welding process will cause the motion blur problem of industrial cameras [15,16]. The issues of interference and blurring directly limit the acquisition of high-quality weld images. Secondly, the differences among different classes of welding images are small, while the differences within one class are large. Therefore, the most discriminative features must be extracted to effectively identify the welding status [17]. Thirdly, in the context of the deep integration of information technology and manufacturing, a robot welding system is equipped with multi-source perception modules (visual signal, acoustic signal, spectral signal, electrical signal, etc.) on top of traditional modules (motion module, craft module, wire feeding module, etc.). This leads to producing a massive amount of real-time data in a robot welding system. As such, the traditional cloud-based centralized computing model has gradually shifted to the edge computing model [18]. Therefore, the storage and computing resources that can be allocated to the vision module in a welding edge node are very limited. Among the above three challenges, the problem of low image quality and feature recognition has severely limited the accuracy of welding image pattern recognition. The limited storage and computing resources result in extremely high requirements for the lightweight of the vision-based WSR model.

In reality, skilled welders rely on not only the current status but also the previous status of welding to make judgements. In practical welding tests, we obtained the three types of droplet transitions shown in Fig. 2 by changing the welding process. It is difficult for us to distinguish the three images at time ti, for example. However, we can easily make a judgement by using the information of images over time ti-7 ∼ ti.

Previous studies in vision-based WSR have mainly focused on a single image, as shown in Fig. 3. To remedy this, our study considers the temporal correlation of a welding process as illustrated in Fig. 2. Further, DL models that incorporate temporal information are typically large. This poses a challenge for computation and storage at an edge. To meet the requirements of a welding system, we also take into account the lightweight of the time sequence models.

Motivated by addressing the above-mentioned issues, we propose an improved three-dimensional convolutional neural network (3DCNN) with a separable structure and multi-dimensional attention (3DSMDA-Net) for WSR. For improving the accuracy of WSR, the proposed method uses Resnet18 [19] as the backbone network and 3DCNN to adaptively extract the complex spatiotemporal features contained in a welding process. Considering a fact that 3DCNN models with a large number of their parameters are difficult to be deployed at the edge, we further propose a lightweight model of a 3DCNN-oriented separation method that can alleviate the storage and computation pressure at the edge. The authors [20] visualized the decision-making basis of the network through an explainable method, and the results show that excellent network structures can accurately locate the target area in the image. Therefore, to compensate for the loss of accuracy due to the separation operation, we incorporate a multi-dimensional attention mechanism (MDA) to explicitly model this capability according to the characteristics of the separation operation. To the best of our knowledge, this is the first work on addressing both issues of time sequence information and model lightweight in the WSR.

In summary, the contributions of this paper are as follows: 1. We make use of the historic information into deep learning (DL)-based WSR to enhance recognition accuracy; 2. We propose a method of 3DCNN-oriented convolution kernel separation as a lightweight time sequence model; and 3. We propose multi-dimensional attention mechanism for reducing the loss of accuracy caused by the separation operation. This is achieved by considering the characteristics of the separation operation without adding additional parameters.

The rest of this paper is organized as follows: Section 2 reviews the related work on vision-based WSR methods, DL-based sequence image recognition methods, and DL-oriented model lightweight methods. Section 3 presents our design of the overall architecture of the 3DSMDA-Net, the structure of 3DCNN with separable (3DS) operation, and the MDA mechanism for the lightweight method. Section 4 describes the experimental setup, followed by reporting numerical experiments of the proposed method on our self-built dataset and public dataset in Section 5. Finally, Section 6 concludes this paper and makes recommendations for future work.

Section snippets

Related work

In this session, we summarize the time sequence image pattern recognition methods based on DL after reviewing the WSR methods based on vision. A robot welding system has many components, and the welding state recognition model incorporating time sequence information brings challenges to the storage and calculation in welding edge nodes, we finally reviewed the lightweight methods for DL models.

The general framework of 3DSMDA-Net

Our general framework of the 3DSMDA-Net for WSR is illustrated in Fig. 4. In particular, 3SCMDA-Net uses classical Resnet18 as its backbone network. The input is an image sequence with a frame number of 8, while the output is the label distribution of the image at the end of the sequence. The size of each image frame is 64*64*1 (width, height, and channel). This input first goes through a classical 3DCNN with a stride of 1 to produce a feature tensor of size 64*64*64*8. Then four 3DCNN with 3DS

Datasets

In our experiments, we use two datasets of our own and public ones. We construct a dataset of globule image (GID) with three transition types: streaming transfer (ST), projected transfer (PT) and short circuit transfer (SCT). The data statistics of GID is described in Table 1. Examples of GID dataset are shown in Fig. 9. For the public dataset, we use the molten pool image dataset SS304 published [31] to further verify the performance of the proposed method. The data statistics and samples of

Performance evaluation

The loss and accuracies of the proposed method during the training on GID is plotted in Fig. 11. All data are trained for 3 epochs. It can be seen from Fig. 10 that both 3DSMDA-Net and 3DCNN converge to near 1 after about 2200 iterations. The 3DS and CNN-LSTM methods converge to 0.985 after about 2600 iterations. The 2DCNN without time sequence information converges to only 0.945.

We use the validation to test the performance of the proposed method. In particular, the performance of the proposed

Conclusion and future work

Inspired by the observation of a welding process by skilled welders, we introduced and made use of time sequence information in the process of WSR in this paper. The use of time sequence information can not only indirectly augment data, but also enhance the judgment basis of the model to the current welding status. In terms of accuracy-related metrics, the MDS enhances the spatiotemporal and channel features learned from the model, and 3DSMDA-Net achieves results comparable to 3DCNN on a

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgments

This work was supported by “the Fundamental Research Funds for the Central Universities and Graduate Student Innovation Fund of Donghua University” under Grant No. CUSF-DH-D-2020053.

References (52)

  • R. Lu et al.

    In-situ monitoring of the penetration status of keyhole laser welding by using a support vector machine with interaction time conditioned keyhole behaviors

    Opt Laser Eng

    (2020)
  • H. Chen et al.

    Effects of arc bubble behaviors and characteristics on droplet transfer in underwater wet welding using in-situ imaging method

    Mater Des

    (2019)
  • N. Wang et al.

    A robust weld seam recognition method under heavy noise based on structured-light vision

    Robot Cim-Int Manuf

    (2020)
  • C.V. Dung et al.

    A vision-based method for crack detection in gusset plate welded joints of steel bridges using deep convolutional neural networks

    Automat Constr

    (2019)
  • L. Hong et al.

    Vibration test on welding robot

    Procedia Comput Sci

    (2020)
  • B. Wang et al.

    Intelligent welding system technologies: state-of-the-art review and perspectives

    J Manuf Syst

    (2020)
  • J. Zapata et al.

    An adaptive-network-based fuzzy inference system for classification of welding defects

    NDT&E Int

    (2010)
  • M. Leo et al.

    Automatic visual monitoring of welding procedure in stainless steel kegs

    Opt Laser Eng

    (2018)
  • Z. Zhang et al.

    Weld image deep learning-based on-line defects detection using convolutional neural networks for Al alloy in robotic arc welding

    J Manuf Process

    (2019)
  • Z. Zhang et al.

    Real-time penetration state monitoring using convolutional neural network for laser welding of tailor rolled blanks

    J Manuf Syst

    (2020)
  • Y. Zhang et al.

    Welding defects detection based on deep learning with multiple optical sensors during disk laser welding of thick plates

    J Manuf Syst

    (2019)
  • R. Miao et al.

    Online defect recognition of narrow overlap weld based on two-stage recognition model combining continuous wavelet transform and convolutional neural network

    Comput Ind

    (2019)
  • D. Bacioiu et al.

    Automated defect classification of SS304 TIG welding process using visible spectrum camera and machine learning

    NDT&E Int

    (2019)
  • W. Cai et al.

    Application of sensing techniques and artificial intelligence-based methods to laser welding real-time monitoring: a critical review of recent literature

    J Manuf Syst

    (2020)
  • Q. Wang et al.

    Deep learning-empowered digital twin for visualized weld joint growth monitoring and penetration control

    J Manuf Syst

    (2020)
  • Z. Hu et al.

    3D separable convolutional neural network for dynamic hand gesture recognition

    Neurocomputing

    (2018)
  • Cited by (22)

    • A novel deep learning architecture and its application in dynamic load monitoring of the vehicle system

      2024, Measurement: Journal of the International Measurement Confederation
    View all citing articles on Scopus
    View full text