Incorporating domain knowledge into reinforcement learning to expedite welding sequence optimization
Introduction
Gas Metal Arc Welding (GMAW) is the most common technique for joining metal components and it has been preferred for its versatility, speed, and relative ease of robotic automation which is extensively used in automotive, shipbuilding, aerospace, construction, heavy and earth-moving equipment (Masubuchi, 1980, Islam et al., 2014). However, structural deformation due to welding is a serious concern for industry since it accrues various additional costs such as constraints in the design phase, extra operations, cost of quality, and overall capital expenditure (Goldak and Akhlaghi, 2005). WSO is highly cost effective which reduces welding structural deformation significantly. The ad hoc industry practice is to select the best sequence by experience and sometimes conduct a simplified design of experiments which typically leads to a sequence that generates considerably more structural deformation than the optimal one (Biswas et al., 2011). In order to get better welding sequence, it is needed to conduct innumerable real welding experiments which is very expensive and time consuming as well. To alleviate this problem, structural deformation yielded due to welding are predicted through a welding simulation software based on Finite Element Analysis (FEA) where thermo-mechanical models are commonly used and a reasonable solutions are achieved through FEA for numerous welding conditions and geometric configurations (Tikhomirov et al., 2005). There are three different FEA based models: (a) simplified: fast but less accurate, (b) thermo-mechanical: medium complexity but reasonable solutions, and (c) thermo-mechanical-metallurgical model: computationally very expensive and time consuming but highly accurate (Islam et al., 2014).
Selecting optimal welding sequence which ensues less deformation leads to a combinatorial optimization problem which is NP-hard by nature (Papadimitriou and Steiglitz, 1982). WSO can be mapped as a traveling salesman problem which is very popular in Operations Research (OR). Traveling Salesman (TS) problem can be described as given a list of cities and the distances between each pair of cities, discover the shortest possible route that visits each city exactly once and returns to the origin city. In the similar fashion, WSO can be described as given a list of welding seams to be placed along with all possible welding direction, find the best welding sequence which produces least structural deformation. The best welding sequence can be certainly found by executing the full factorial design of experiments. The total number of welding configurations for full factorial design can be counted by , where and are the number of welding directions and beads (seams or segments) respectively. This number grows exponentially with the number of welding beads. For example, a complex weldment like an aero-engine assembly, it might have 52–64 weld segments (Jackson and Darlington, 2011). Hence, the full factorial design is not feasible for industrial applications and is often practically in-feasible even using FEA at the early stages of Product Delivery Process (PDP) (Romero-Hdz et al., 2017). In order to succeed in the rapidly evolving global manufacturing landscape, there is a pressing need to increase the competitiveness in the welding industry. Moreover quality and efficiency are main drivers. So, mega-trends such as Internet-of-Things (IoT), Industry 4.0 as well as the development and usage of advanced materials will be critical to future competitiveness (Lindgren, 2007). Process simulation enables the implementation of Artificial Intelligence (AI) and Machine Learning (ML) techniques, because usually a great amount of process output data is required and “time to market” and “Do It Right The First Time” are pushing the industry to exploit virtual tools. Fig. 1 illustrates the deformation problem and the AI framework where coupled FEA-AI virtual tool controls the amount of deformation instead of conducting real experiments to keep the Geometric Dimensioning and Tolerancing (GD&T) features within tolerance and ensure the assemblability.
In this research, we present a novel and efficient Reinforcement Learning (RL) algorithm for Welding Sequence Optimization (WSO) to improve the weld quality where structural deformation is used to compute the reward function. We utilized a thermo-mechanical FEA modeling to predict welding deformation. RL, in the context of AI, is a type of dynamic programming where the agent over time makes decisions to maximize its reward and minimize its penalty. In the welding context, the agent will be rewarded if the sequence (action) taken minimizes the overall structure deformation. The advantage of this approach to AI is that it allows an AI program to learn without a programmer spelling out how an agent should perform the task. An agent is allowed to learn in an interactive environment by trial and error using feedback from its own actions and experiences (Sutton and Barto, 1998). Unlike supervised learning where feedback provided to the agent is correct set of actions for performing a task or explicitly mention how to perform a task, RL learns without human intervention by using rewards and punishment as signals for positive and negative behavior, i.e., the agent receives rewards by performing correctly and penalties for performing incorrectly. On the other hand, while the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a good behavior, a suitable action model or a label for each particular situation that would maximize the long-term benefits (cumulative reward) that the agent receives. RL algorithm has been extensively used in different fields such as gaming, neuroscience, psychology, economics, engineering communications, engineering power systems, and robotics (Sutton and Barto, 1998).
Here, we make the following technical contributions:
(A) Lessen computational complexity of a combinatorial optimization problem: We incorporate domain knowledge into Q-learning algorithm to expedite the convergence and we call it “DKQRL”. Proposed DKQRL algorithm commendably curtails the computational complexity over exhaustive search. We conducted the experiment on a mounting bracket which includes eight weld seams that can be applied in two welding directions. In this scenario, the total number of welding configurations for exhaustive search is 10,321,920. However, in this experiment the DKQRL converges after 40 welding configurations. The average execution time for each welding configuration using FEA simulation software is 30 min. Thus we reduce considerable amount of computational time. (B) Solve the Exploration–Exploitation Dilemma of RL through Domain Knowledge: RL algorithm can be accelerated as well as converged through suitable determination of exploration and exploitation at each stage of RL algorithm. According to the domain expert of welding, it is advisable to weld the bead near the Center of Mass (CM) first to lessen the structural deformation due to welding. In the first step of DKQRL algorithm, if the weld seam near the CM causes minimum deformation we allow more exploration than exploitation throughout the process. In addition, when one bead of each part of the system is welded, it enhances the rigidity of the whole system that allows more exploration since high rigidity resists structural deformation (Park and An, 2016). Thus domain knowledge controls the ratio of exploration and exploitation throughout RL algorithm and hence expedite WSO. (C) Traveling Salesman Problem and Welding Sequence Optimization: We cast the problem of WSO with TS problem. TS problem consists of visiting each city only once with minimum cost. Similarly, WSO consists of welding each seam only once. As soon as one bead is welded, we remove the bead from the set of allowable states and the corresponding welding directions from the set of allowable actions. Mapping WSO with TS facilitates implementing RL in WSO and provides a realistic solution for the combinatorial optimization algorithm. (D) State-of-the-art performance: We conducted the simulation experiment of Gas Metal Arc Welding (GMAW) through the well-known welding simulation software Simufact®. The average execution time for each welding configuration took 30 min using a workstation with two Intel® Xeon® @2.40 GHz, 48G GB of RAM and 4 GB of dedicated video memory. The study case is defined as a typical mounting bracket which is widely used in telescopic jib (Derlukiewicz and Przybyłek, 2008) and automotive industries (Subbiah et al., 2011, Romeo et al., 2016). We validated the simulation results through real floor-shop welding experiment. Results demonstrated a high agreement between the result of simulation and real experiment in terms structural deformation. Experimental results demonstrated that best welding sequence can reduce significantly the amount of structural deformation (71%) over worst sequence. DKQRL based approach substantially speeds up the computational time over standard RL, Genetic Algorithm (GA) and exhaustive search.
The organization of the paper is as follows. Section 2 presents literature review. Proposed domain knowledge driven reinforcement learning algorithm is presented in Section 3. Results are demonstrated in Section 4. Section 5 concludes this work. Relevant references are listed at the end of the paper.
Section snippets
Literature review
The literature review is organized into three parts. First, we summarize state-of-the-art optimization techniques implemented in fields related to welding that can be used for further research in WSO such as manufacturing process parameters optimization, mechanical and structural design optimization, Second, we describe Q-learning and RL approaches. Subsequently, we illustrate the domain knowledge for WSO.
Methodology
Here we present a novel RL algorithm where domain knowledge in the field of welding discussed in the previous section has been incorporated for accelerating WSO by solving the exploration–exploitation dilemma through adapting -greedy algorithm. In this section, we first outline the optimization framework, then we describe the implementation of the welding domain knowledge exploited in this study and lastly, the proposed DKQRL tailored for WSO is detailed.
Experimental results and discussions
In this section, first we illustrate the study case. Then, we discuss FEA based simulation experiment conducted for welding deformation prediction. Subsequently, we illustrate the results of the FEA for the best and worst sequence found by the proposed DKQRL algorithm. After that, we demonstrate the effects of welding sequence on WSO. Next, we demonstrate a comparative study among Modified Lowest Cost Search (MLCS) (Romero-Hdz et al., 2016a, Romero-Hdz et al., 2016b), single objective Genetic
Conclusions and future work
Welding sequence optimization has considerable effect in structural deformation. In this study, the maximum structural deformation is exploited as the Q-function of the RL algorithm for WSO. RL significantly reduces the search space over exhaustive search. We incorporated the domain knowledge and expedite the RL algorithm for WSO by resolving the exploration–exploitation dilemma. Welding simulation software was used to compute the structural deformation using FEA. Proposed DKQRL algorithm for
CRediT authorship contribution statement
Baidya Nath Saha: Formal Analysis. Seiichiro Tstutsumi: Project administration. Riccardo Fincato: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors gratefully acknowledge the support provided by Osaka University through the JWRI in Japan and The National Council of Science and Technology of Mexico through CIDESI and CIMAT.
References (41)
- et al.
Chosen aspects of FEM strength analysis of telescopic jib mounted on mobile platform
8th International Conference on Computer-Aided Engineering
Autom. Constr.
(2008) - et al.
Determination of welding sequence: a neural net approach
Eng. Anal. Bound. Elem.
(1990) - et al.
Simulation-based numerical optimization of arc welding process for reduced distortion in welded structures
Finite Elem. Anal. Des.
(2014) - et al.
Effect of muffler mounting bracket designs on durability
Eng. Fail. Anal.
(2011) - et al.
Weld sequence optimization: The use of surrogate models for solving sequential combinatorial problems
Comput. Methods Appl. Mech. Engrg.
(2005) - et al.
Partition mutation PSO for welding robot path optimization
Welding how-to’s - problems - distortion
(2018)- et al.
A study on the effect of welding sequence in fabrication of large stiffened plate panels
J. Marine Sci. Appl.
(2011) Welding how-to’s - problems - distortion
(2018)- et al.
Structural damage detection using modal parameters and particle swarm optimization
Mater. Test.
(2012)
Computational Welding Mechanics
Advanced engineering methods for assessing welding distortion in aero-engine assemblies
IOP Conf. Ser.: Mater. Sci. Eng.
Scheduling for an arc-welding robot considering heat-caused distortion
J. Oper. Res. Soc.
Robot arc welding task sequencing using genetic algorithms
IIE Trans.
Computational Welding Mechanics: Thermomechanical and Microstructural Simulations
Analysis of Welded Structures: Residual Stresses, Distortion, and Their Consequences
International Series on Materials Science and Technology
Optimization of welding route by automatic machine using reinforcement learning method
J. Japan Soc. Naval Archit. Ocean Eng.
Combinatorial Optimization: Algorithms and Complexity
Effect of welding sequence to minimize fillet welding distortion in a ship’s small component fabrication using joint rigidity method
Proc. Inst. Mech. Eng. B
Cited by (7)
Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework
2021, Engineering Applications of Artificial IntelligenceCitation Excerpt :The fundamental motivation of applying deep learning and RL to CO lies in the discovery and reasoning of new policies. Compared with traditional algorithms, machine learning can discover the inherent characteristics of the instances to guide future instances by learning and applying the solving experience of existing instances (Romero-Hdz et al., 2020). It also makes it possible for NP-hard problems that were not easy to solve in the past.
Automation of load balancing for Gantt planning using reinforcement learning
2021, Engineering Applications of Artificial IntelligenceEffect of welding conditions on the deformation of lithium battery pack of aluminum alloys
2024, Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile EngineeringEngineering design optimisation using reinforcement learning with episodic controllers
2022, Cognitive Computation and SystemsUAV Networks against Multiple Maneuvering Smart Jamming with Knowledge-Based Reinforcement Learning
2021, IEEE Internet of Things Journal