3DSMDA-Net: An improved 3DCNN with separable structure and multi-dimensional attention for welding status recognition

doi:10.1016/j.jmsy.2021.01.017

Journal of Manufacturing Systems

Volume 62, January 2022, Pages 811-822

https://doi.org/10.1016/j.jmsy.2021.01.017 Get rights and content

Highlights

•
We make use of the time sequence information into deep learning-based welding status recognition to enhance accuracy.
•
We propose a method of 3DCNN-oriented convolution kernel separation as a lightweight time sequence model.
•
We propose multi-dimensional attention mechanism for reducing the loss of accuracy caused by the separation operation.
•
Identification of globule transition mode and the types of molten pool.

Abstract

The vision-based welding status recognition (WSR) provides a basis for online welding quality control. Due to the severe arc and fume interference in the welding area and limited computational resources at the welding edge nodes, it becomes a challenge to mine the most discriminative feature contained in welding images by using a lightweight model. In this paper, we propose an improved three-dimensional convolutional neural network (3DCNN) with separable structure and multi-dimensional attention (3DSMDA-Net) for WSR. The proposed 3DSMDA-Net uses 3DCNN to adaptively extract abstract spatiotemporal features in a welding process and then leverages such time sequence information to improve the recognition accuracy of WSR. In addition, we decompose the classical 3D convolution into depthwise convolution and pointwise convolution to produce a lightweight model. A multi-dimensional attention mechanism is further proposed to compensate for the loss of accuracy caused by the separation operation. The results of experiments reveal that the proposed method reduces the model size to 1/7 of the classical 3DCNN without sacrificing accuracy. The comparison experiment results have indicated that the accuracy of the proposed method is more accurate and noise-resistant than that of the conventional model.

Introduction

As arc welding is an important metal joining method, ensuring the quality of the weld is critical to improving the reliability of a workpiece [1]. After welding materials and welding methods have been determined, controlling a welding process becomes the key to ensure the consistency of welding quality [2]. A traditional welding process optimizes the craft through the CAE simulation analysis before welding and the destructive or non-destructive testing after welding [[3], [4], [5], [6], [7]]. However, this model will not only waste resources but also lack interactions with the welding process. In fact, skilled welding workers can dynamically adjust their welding craft by observing welding processes such as the globule transition mode and the shape of the molten pool. Therefore, it is an effective way of improving welding quality by monitoring the welding status based on visual sensing and adjusting the welding craft with feedback [[8], [9], [10]]. In a vision-based welding status monitoring process, the common monitoring objects are molten pool shape [11], penetration degree [12] and globule transition mode [13].

A typical robotic welding system is shown in Fig. 1. The controller is responsible for sending cooperative welding instructions to the robot and the positioner. After receiving the welding instruction, the wire feeding mechanism can provide welding wire to ensure the welding progress. In addition, in order to obtain a good welding environment, shielding gas is generally delivered to the base metal area. After the arc is ignited, a strong high temperature is generated and the welding wire is melted. Under the action of gravity, the globule transfers to the base metal. Under the high temperature of the arc and the globule, the base metal is melted. This melting area is the molten pool. After the molten pool is cooled, it solidifies into a welded beam to realize the connection of the base metal. Therefore, the globule transition state, molten pool shape and penetration degree can directly reflect the welding quality. In the experiment, the CCD camera was fixed on the front of the robotic arm and moved with the robotic arm to capture the welding process in real time. The vision-based welding status recognition (WSR) can be regarded as a task of pattern classification. Such a task for welding images faces the following challenges due to the characteristics of a welding process. Firstly, a process of arc welding is accompanied by strong arc light and smoke interference. This makes it difficult for industrial cameras to obtain clear images of the welding process [14]. Further, the vibrations of a welding process will cause the motion blur problem of industrial cameras [15,16]. The issues of interference and blurring directly limit the acquisition of high-quality weld images. Secondly, the differences among different classes of welding images are small, while the differences within one class are large. Therefore, the most discriminative features must be extracted to effectively identify the welding status [17]. Thirdly, in the context of the deep integration of information technology and manufacturing, a robot welding system is equipped with multi-source perception modules (visual signal, acoustic signal, spectral signal, electrical signal, etc.) on top of traditional modules (motion module, craft module, wire feeding module, etc.). This leads to producing a massive amount of real-time data in a robot welding system. As such, the traditional cloud-based centralized computing model has gradually shifted to the edge computing model [18]. Therefore, the storage and computing resources that can be allocated to the vision module in a welding edge node are very limited. Among the above three challenges, the problem of low image quality and feature recognition has severely limited the accuracy of welding image pattern recognition. The limited storage and computing resources result in extremely high requirements for the lightweight of the vision-based WSR model.

In reality, skilled welders rely on not only the current status but also the previous status of welding to make judgements. In practical welding tests, we obtained the three types of droplet transitions shown in Fig. 2 by changing the welding process. It is difficult for us to distinguish the three images at time t_i, for example. However, we can easily make a judgement by using the information of images over time t_i-7 ∼ t_i.

Previous studies in vision-based WSR have mainly focused on a single image, as shown in Fig. 3. To remedy this, our study considers the temporal correlation of a welding process as illustrated in Fig. 2. Further, DL models that incorporate temporal information are typically large. This poses a challenge for computation and storage at an edge. To meet the requirements of a welding system, we also take into account the lightweight of the time sequence models.

Motivated by addressing the above-mentioned issues, we propose an improved three-dimensional convolutional neural network (3DCNN) with a separable structure and multi-dimensional attention (3DSMDA-Net) for WSR. For improving the accuracy of WSR, the proposed method uses Resnet18 [19] as the backbone network and 3DCNN to adaptively extract the complex spatiotemporal features contained in a welding process. Considering a fact that 3DCNN models with a large number of their parameters are difficult to be deployed at the edge, we further propose a lightweight model of a 3DCNN-oriented separation method that can alleviate the storage and computation pressure at the edge. The authors [20] visualized the decision-making basis of the network through an explainable method, and the results show that excellent network structures can accurately locate the target area in the image. Therefore, to compensate for the loss of accuracy due to the separation operation, we incorporate a multi-dimensional attention mechanism (MDA) to explicitly model this capability according to the characteristics of the separation operation. To the best of our knowledge, this is the first work on addressing both issues of time sequence information and model lightweight in the WSR.

In summary, the contributions of this paper are as follows: 1. We make use of the historic information into deep learning (DL)-based WSR to enhance recognition accuracy; 2. We propose a method of 3DCNN-oriented convolution kernel separation as a lightweight time sequence model; and 3. We propose multi-dimensional attention mechanism for reducing the loss of accuracy caused by the separation operation. This is achieved by considering the characteristics of the separation operation without adding additional parameters.

The rest of this paper is organized as follows: Section 2 reviews the related work on vision-based WSR methods, DL-based sequence image recognition methods, and DL-oriented model lightweight methods. Section 3 presents our design of the overall architecture of the 3DSMDA-Net, the structure of 3DCNN with separable (3DS) operation, and the MDA mechanism for the lightweight method. Section 4 describes the experimental setup, followed by reporting numerical experiments of the proposed method on our self-built dataset and public dataset in Section 5. Finally, Section 6 concludes this paper and makes recommendations for future work.

Section snippets

Related work

In this session, we summarize the time sequence image pattern recognition methods based on DL after reviewing the WSR methods based on vision. A robot welding system has many components, and the welding state recognition model incorporating time sequence information brings challenges to the storage and calculation in welding edge nodes, we finally reviewed the lightweight methods for DL models.

The general framework of 3DSMDA-Net

Our general framework of the 3DSMDA-Net for WSR is illustrated in Fig. 4. In particular, 3SCMDA-Net uses classical Resnet18 as its backbone network. The input is an image sequence with a frame number of 8, while the output is the label distribution of the image at the end of the sequence. The size of each image frame is 64*64*1 (width, height, and channel). This input first goes through a classical 3DCNN with a stride of 1 to produce a feature tensor of size 64*64*64*8. Then four 3DCNN with 3DS

Datasets

In our experiments, we use two datasets of our own and public ones. We construct a dataset of globule image (GID) with three transition types: streaming transfer (ST), projected transfer (PT) and short circuit transfer (SCT). The data statistics of GID is described in Table 1. Examples of GID dataset are shown in Fig. 9. For the public dataset, we use the molten pool image dataset SS304 published [31] to further verify the performance of the proposed method. The data statistics and samples of

Performance evaluation

The loss and accuracies of the proposed method during the training on GID is plotted in Fig. 11. All data are trained for 3 epochs. It can be seen from Fig. 10 that both 3DSMDA-Net and 3DCNN converge to near 1 after about 2200 iterations. The 3DS and CNN-LSTM methods converge to 0.985 after about 2600 iterations. The 2DCNN without time sequence information converges to only 0.945.

We use the validation to test the performance of the proposed method. In particular, the performance of the proposed

Conclusion and future work

Inspired by the observation of a welding process by skilled welders, we introduced and made use of time sequence information in the process of WSR in this paper. The use of time sequence information can not only indirectly augment data, but also enhance the judgment basis of the model to the current welding status. In terms of accuracy-related metrics, the MDS enhances the spatiotemporal and channel features learned from the model, and 3DSMDA-Net achieves results comparable to 3DCNN on a

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgments

This work was supported by “the Fundamental Research Funds for the Central Universities and Graduate Student Innovation Fund of Donghua University” under Grant No. CUSF-DH-D-2020053.

References (52)

C. Xia et al.
A review on wire arc additive manufacturing: monitoring, control and a framework of automated system
J Manuf Syst
(2020)
A. Sharma
A fundamental study on qualitatively viable sustainable welding process maps
J Manuf Syst
(2018)
X. Peng et al.
An adaptive Bernstein-Bézier finite element method for heat transfer analysis in welding
Adv Eng Softw
(2020)
X. Le et al.
A learning-based approach for surface defect detection using small image datasets
Neurocomputing
(2020)
A. Kulkarni et al.
Microstructure and mechanical properties of A-TIG welded AISI 316L SS-Alloy 800 dissimilar metal joint
Mat Sci Eng A-Struct
(2020)
S. Miller et al.
WeldANA: Welding decision support tool for conceptual design
J Manuf Syst
(2019)
S. Chen et al.
Research evolution on intelligentized technologies for arc welding process
J Manuf Process
(2014)
X. Gao et al.
Monitoring of welding status by molten pool morphology during high-power disk laser welding
Optik
(2015)
W. Meng et al.
Dynamic features of plasma plume and molten pool in laser lap welding based on image monitoring and processing techniques
Opt Laser Technol
(2019)
Y. Cheng et al.
Detecting dynamic development of weld pool using machine learning from innovative composite images for adaptive welding
J Manuf Process
(2020)

Z. Hu et al.

3D separable convolutional neural network for dynamic hand gesture recognition

Neurocomputing

(2018)

Cited by (22)

AF-FTTSnet: An end-to-end two-stream convolutional neural network for online quality monitoring of robotic welding
2024, Journal of Manufacturing Systems
Online welding quality monitoring (WQM) is crucial for intelligent welding, and deep learning approaches considering spatiotemporal features for WQM tasks show great potential. However, one of the important challenges for existing approaches is to balance the spatiotemporal representation learning capability and computational efficiency, which makes it challenging to adapt welding processes with complex and drastic molten pool dynamic behavior. This paper proposes a novel approach for WQM using molten pool visual sensing and deep learning considering spatiotemporal features, the proposed deep learning network called attention fusion based frame-temporality two-stream network (AF-FTTSnet). Firstly, a passive vision sensor is used to acquire continuous dynamic molten pool images. Meanwhile, temporal difference images are computed to provide novel features and temporal representations. Then, a two-stream feature extraction module is designed to concurrently extract rich spatiotemporal features from molten pool images and temporal difference images. Finally, an attention fusion module with the ability to automatically identify and weight the most relevant features is designed to achieve optimal fusion of the two-stream features. The shop welding experimental results indicate that the proposed AF-FTTSnet model can effectively and robustly recognize five typical welding states during helium arc welding, with an accuracy of 99.26%. This model has been demonstrated to exhibit significant performance improvements compared to mainstream temporal sequence models. Available: https://github.com/Just199806/TSCNN/tree/master.
Online penetration prediction based on multimodal continuous signals fusion of CMT for full penetration
2024, Journal of Manufacturing Processes
Online penetration monitoring for complex butt welding is challenging due to steel plate's groove instability and welding heat deformation. In this paper, automatic cold metal transfer (CMT) welding is used to join two complex bevelled austenitic stainless steel with SS304 as the base metal. This work reports a hybrid approach combining deep learning, computer vision, and sound signal processing to monitor groove welding penetration under full penetration in real time. Sequence signals such as video and sound can complimentarily characterize the melt pool state. In this paper, the proposed Multimodal continuous signals Characteristic Reinforcement Network (MCRNet) utilizes 3D convolution and multiscale convolution with channel attention to considerably improve the performance of lightweight networks. At the same time, a new fusion method with similarity loss is proposed to cope with the input of visual and acoustic signals. That improves the effect by at least 18 % compared with the single-modal signal input. The experimental results show that the Mean Square Error (MSE) of MCRNet improved the performance by 44 % compared with the mainstream deep learning framework. Meanwhile, the inference speed under multimodal input reaches 57 frames per second (FPS). MCRNet finally achieves online penetration accurate prediction of the melt pool.
A novel deep learning architecture and its application in dynamic load monitoring of the vehicle system
2024, Measurement: Journal of the International Measurement Confederation
In a complex vehicle system, the monitoring of dynamic load is difficult work. Therefore, a novel deep learning architecture based on a divide-and-conquer strategy is developed to construct the dynamic load monitoring model. The mathematical model of the vehicle system is established to provide data and theoretical support for the deep learning model. To solve the problems of long modeling time and poor accuracy under full operating conditions, the unsupervised Deep Temporal Clustering (DTC) method and supervised lightweight 3D convolutional neural network (3DCNN) are respectively used to identify operating conditions and construct local models. The test results show that the proposed deep learning model has high reliability, and the shape and amplitude of the predicted results of the dynamic load are basically consistent with the real results. Compared with the traditional model, the MAE index and MSE loss of the proposed model are smaller and have better performance.
Online monitoring system for welding states of bottom-locking joints in high-speed trains via multi-information fusion and 3DCNN
2024, Journal of Manufacturing Processes
Recognition and control of the laser welding states of aluminum profiles in China Railway High-speed trains is a challenge to current monitoring system, because the bottom-locking joints possess more complex welding states (Lack, Lock, Good, Full and Over penetration) than common butt joints. This work developed an online monitoring system based on a 3D-convolutional neural network (3DCNN), which utilized temporal information of both laser plume and weld pool images. The results indicated that multi-information fusion was necessary because the ability of single-information method to distinguish five welding states were insufficient. On this basis, there kinds of 3DCNN using different information-fusion methods were compared. The separated-channel 3DCNN with feature-level fusion was found to be more efficient than the original 3DCNN with data-level fusion. It realized the same level accuracy while decreasing the network params by 49.5 %. Additionally, the importance of different information was considered by trainable weight indexes and weight-fusion 3DCNN was proposed. The weight of weld pool information gradually decreased while the weight of laser plume increased. The weight-fusion 3DCNN was superior to the other two networks and the highest accuracy of 98.9 % was obtained. The processing speed of the whole online monitoring system with feedback control reached 10 Hz, which could meet the requirements in industrial applications. This work will provide more guidance on the application of multi-information fusion and deep learning methods in online monitoring system.
Construction of chub mackerel (Scomber japonicus) fishing ground prediction model in the northwestern Pacific Ocean based on deep learning and marine environmental variables
2023, Marine Pollution Bulletin
Accurate prediction of the central fishing grounds of chub mackerel is substantial for assessing and managing marine fishery resources. Based on the high-seas chub mackerel fishery statistics and multi-factor ocean remote-sensing environmental data in the Northwest Pacific Ocean from 2014 to 2021, this article applied the gravity center of the fishing grounds, 2DCNN, and 3DCNN models to analyze the spatial and temporal variability of the chub mackerel catches and fishing grounds. Results:1) the primary fishing season of chub mackerel fishery was April–November which catches were mainly concentrated in 39°∼43°N, 149°∼154°E. 2) Since 2019, the annual gravity center of the fishing grounds has continued to move northeastward; the monthly gravity center has prominent seasonal migratory characteristics. 3) 3DCNN model was better than the 2DCNN model. 4) For 3DCNN, the model prioritized learning information on the most easily distinguishable ocean remote-sensing environmental variables in different classifications.
Deep learning-based welding image recognition: A comprehensive review
2023, Journal of Manufacturing Systems
The reliability and accuracy of welding image recognition (WIR) is critical, which can largely improve domain experts’ insight of the welding system. To ensure its performance, deep learning (DL), as the cutting-edge artificial intelligence technique, has been prevailingly studied and adopted to empower intelligent WIR in various industry implementations. However, to date, there still lacks a comprehensive review of the DL-based WIR (DLBWIR) in literature. Aiming to address this issue, and to better understand its development and application, this paper undertakes a state-of-the-art survey of the existing DLBWIR research holistically, including the key technologies, the main applications and tasks, and the public datasets. Moreover, possible research directions are also highlighted at last, to offer insightful knowledge to both academics and industrial practitioners in their research and development work in WIR.

View all citing articles on Scopus

View full text

3DSMDA-Net: An improved 3DCNN with separable structure and multi-dimensional attention for welding status recognition

Highlights

Abstract

Introduction

Section snippets

Related work

The general framework of 3DSMDA-Net

Datasets

Performance evaluation

Conclusion and future work

Declaration of Competing Interest

Acknowledgments

J Manuf Syst

J Manuf Syst

Adv Eng Softw

Neurocomputing

Mat Sci Eng A-Struct

J Manuf Syst

J Manuf Process

Optik

Opt Laser Technol

J Manuf Process

Opt Laser Eng

Mater Des

Robot Cim-Int Manuf

Automat Constr

Procedia Comput Sci

J Manuf Syst

NDT&E Int

Opt Laser Eng

J Manuf Process

J Manuf Syst

J Manuf Syst

Comput Ind

NDT&E Int

J Manuf Syst

J Manuf Syst

Neurocomputing