Abstract

In the hot strip rolling process, accurate prediction of bending force can improve the control accuracy of the strip flatness and further improve the quality of the strip. In this paper, based on the production data of 1300 pieces of strip collected from a hot rolling factory, a series of bending force prediction models based on an extreme learning machine (ELM) are proposed. To acquire the optimal model, the parameter settings of the models were investigated, including hidden layer nodes, activation function, population size, crossover probability, and hidden layer structure. Four models are established, one hidden layer ELM model, an optimized ELM model (GAELM) by genetic algorithm (GA), an optimized ELM model (SGELM) by hybrid simulated annealing (SA) and GA, and two-hidden layer optimized ELM model (SGITELM) optimized by SA and GA. The prediction performance is evaluated from the mean absolute error (MAE), root-mean-squared error (RMSE), and mean absolute percentage error (MAPE). The results show that the SGITELM has the highest prediction accuracy in the four models. The RMSE of the proposed SGITELM is 11.2678 kN, and 98.72% of the prediction data have an absolute error of less than 25 kN. This indicates that the proposed SGITELM with strong learning ability and generalization performance can be well applied to hot rolling production.

1. Introduction

In hot strip rolling (HSR) process, the product quality of the strip is mainly contributed by dimensional accuracy, mechanical properties, and surface properties. Here, the dimensional accuracy of the strip requires two important indicators: strip thickness and surface profile [1]. The surface profile of the strip is defined as the difference of thickness between the center and a point of 40 mm from the edge of the strip, or in other words, this is the difference of thickness across the width of the strip [2]. There are many factors that affect the surface profile of the strip, which are mainly related to the roller, strip, and rolling conditions in the HSR process [3].

The hydraulic bending roll control is one of the main methods for hot strip profile control. By applying a certain bending force on the work roll, the strip profile can be improved. The schematic diagram of the hydraulic roll bending technique is shown in Figure 1. The hydraulic roll gap control device can be used as the actuator of the leveling operation. How to set the appropriate bending force, which can affect the strip head and tail end walking behavior and strip profile, is the urgent problem to be solved. Therefore, the prediction accuracy of the roll bending force directly affects the strip profile control accuracy. The prediction accuracy is high, which is beneficial to the closed-loop feedback control of bending force. In the actual production process, the bending force is calculated according to the requirements of temperature, thickness, width, rolling force, material, roll thermal expansion, roll wear, and target profile [4].

Generally, the initial value setting of bending force is very complicated and is a multivariable optimization problem. At the same time, the adjustment of bending force needs to be calculated according to the strip profile at the exit. Some rolling parameters related to the bending force model have the characteristics of nonlinearity, strong coupling, large detection error, etc. Therefore, the mathematical model established by the traditional theory has the disadvantages of slow response speed and low control accuracy in production practice. All these problems seriously restrict the further improvement of strip profile control accuracy. With the continuous improvement of rolling speed, the rolling process control system should be improved to adapt to the changes of the rolling process and the improvement of product requirements. Therefore, the high-precision prediction method of rolling process based on industrial big data has attracted attention [5].

Artificial neural network (ANN) is the earliest and most widely used data-driven method. ANN has been widely used in the field of metallurgy and materials because of its ability of parallel information processing, function approximation, self-learning, self-adaptation, and fault tolerance [68]. In the HSR process, ANN was proposed to predict the roll force [912], the strip profile and flatness [2, 1315], and mechanical properties [1618]. With the continuous development of ANN prediction in HSR, more application backgrounds have been continuously developed. For the first time, Wang et al. [4] have established an ANN model to predict the bending force of strip. To improve the performance of the model, it has also been optimized with a genetic algorithm (GA) and a GA-ANN model with good prediction performance has been obtained.

The ANN method improves the performance of prediction to a certain extent; there are still some problems, such as the slow learning speed of the algorithm for training models and the difficulty of adjusting numerous parameters. Therefore, the new machine learning algorithm is applied to various prediction problems in the rolling field. Huang et al. [19] proposed a new learning algorithm called extreme learning machine (ELM), the single-hidden layer feedforward neural networks (SLFNs), which randomly chose the input weights and determined the output weights of SLFNs by least-squares solution. Compared with the traditional gradient-based approaches, the ELM has a significantly faster learning speed and presents better generalization performance. ELM has been widely used in metallurgy and metal processing fields, such as the mechanical properties of hot rolling products [20], rolling force prediction in HSR [21], silicon content prediction [22], alumina concentration detection [23], gas utilization ratio prediction in blast furnace [24], and tool fault diagnosis in numerical control machines [25].

In this paper, ELM is first applied to predict the bending force of the strip in HSR. Furthermore, three improved ELM-based models are developed to realize the high precision and high stability prediction of bending force. Section 2 introduces the theory and optimization strategies of the ELM model. Section 3 shows the experimental data, prediction results, and discussion. Finally, the conclusion is drawn in Section 4.

2. ELM-Based Methods

2.1. ELM

The basic theory of the ELM model states that, for N arbitrary distinct samples (), where and are the model input and the target data, standard SLFNs with hidden layer nodes and an activation function are mathematically described as follows:where is the weight vector connecting the ith hidden node and the input nodes, is the weight vector connecting the ith hidden node and the output nodes, is the model output, and are the bias parameters of the ith hidden node. denotes the inner product of and . The and parameters can be randomly assigned if the activation function is infinitely differentiable, so only the parameters need to be optimized when minimizing the mean squared error between the model output and the target data. In detail, the standard SLFNs with hidden nodes with activation function can approximate these N samples with zero error means that . Therefore, in the ELM approach, training an SLFN is equivalent to simply finding the least-squares solution of the linear system, which can be written compactly as follows:where the hidden layer output matrix H isthe output weight matrix and the target data matrix T are

Algebraically, the linear system for is solved via the Moore–Penrose generalized inverse .

The principle which distinguishes ELM from the traditional neural network methodology is that the parameters of the feedforward networks (input weights and hidden layer biases) are not required to be tuned in the former. The studies of Tamura and Tateishi [26] and Huang et al. [27] showed in their works that the SLFNs with randomly chosen input weights efficiently learn distinct training examples with a minimum error. After randomly choosing the input weights and the hidden layer biases, SLFNs can be simply considered as a linear system. The output weights which link the hidden layer to the output layer of this linear system can now be analytically determined through a simple generalized inverse operation of the hidden layer output matrices. This simplified approach makes the ELM model many times faster than that of ANN [27]. Figure 2 shows the basic schematic topological structure of an ELM network.

2.2. ELM Optimization by GA (GAELM)

Because the initialization of the weights and biases is randomly assigned in the ELM algorithm, it will not lead to the optimal state of the network during the training process. Therefore, in this study, we introduce the GA algorithm to optimize the weights and biases of the ELM network. GA is a parallel random search optimization method based on the natural genetic mechanism and natural selection theory in the biological world, and it is very suitable for complex nonlinear optimization problems [28]. The GA selects, crosses, and mutates the operator according to a random initial set of solutions and iteratively generates new solutions. After a certain number of algebras, a global optimal solution is obtained.

The select operator means selecting individuals with a strong vitality in the group. The roulette wheel selection method is adopted, and which formula is as follows:where is the fitness value of th individual and N is the number of population.

The cross operation in GA is that two pairs of chromosomes (individuals) are exchanged with each other in some way. The cross operation of the node of the kth chromosome and the node of the lth chromosome at the j position is as follows:where is a random number between 0 and 1. Cross operation is shown in Figure 3.

The mutate operation is achieved by flopping the randomly selected bits (see Figure 4), and the mutate probability is usually small. The selection of an individual in a population is carried out by the evaluation of its fitness, and it can remain in the new generation if a certain threshold of fitness is reached. The Individuals with higher fitness are more likely to reproduce. Mutation operations are as follows:where is the upper bound of ; is the lower bound of ; is the current iteration time; is the maximum time of evolution; and and are the random number between 0 and 1. The overall algorithm process for optimizing ELM by using GA is shown in Figure 5.

2.3. GA with Simulated Annealing (SAGA)

In the early stage of traditional GA, the individual difference is large. When the classic roulette method was used, the number of new individuals is proportional to the fitness of the original individuals. New individuals are easily flooded to the whole population, which causes early maturity. In the later stage, the fitness tends to be consistent, and the superior individuals do not have obvious advantages when they produce new individuals, which stops the evolution of the whole population [29]. Therefore, it is necessary to properly stretch the fitness. The simulated annealing (SA) algorithm proposed by Metropolis can realize the stretching of the fitness function [30]. In this study, SA was used to optimize the selection process of GA. The SA algorithm mainly includes the Metropolis criterion, that is, the probability of accepting the new solution of SA. As for the optimization problem of taking the minimum value for the objective function, the probability that the SA accepts the new solution is as follows:where x represents the current solution; represents the new solution; represents the objective function value of the current solution; represents the objective function value of the new solution; and T represents temperature.

2.4. An Improved Multilayer ELM

Already in 1997, the results of Tamura and Tateishi showed that with the increase of network layer, the prediction accuracy will become higher and higher, and the number of hidden layer nodes will decrease greatly [26]. Qu et al. [31] proposed a two-layer ELM (TELM) neural network. The structure of the TELM is composed of an input layer, two-hidden layers, and an output layer, and the neurons between the layers are all connected. The TELM still retains some of the advantages of the ELM, such as strong generalization ability, fast operation speed, and very little chance of falling into overfitting. The principle of the TELM is as follows:

Randomly generate the first hidden layer input matrix weights and biases . The first hidden layer input parameter matrix is defined as . And the augmented matrix of the input matrix is defined as . Calculating the output matrix of the first hidden layer as

The output weights matrix between the first hidden layer and the final output layer according to the traditional ELM are calculated as

The expected output matrix of the second hidden layer is calculated as

The augmented matrix of the first hidden layer output matrix is constructed and the second hidden layer input parameters matrix is calculated as

The output matrix of the second hidden layer is calculated as

The output weight matrix between the second hidden layer and the final output layer is calculated as

The final output of the network iswhere the superscript −1 represents the Moore–Penrose generalized inverse operation; represents the inverse function of . The basic schematic topological structure of a TELM network is shown in Figure 6.

To get a more stable prediction output and better generalization performance, we calculate the output weight matrix of TELM by considering the following three cases:

If the number of training samples is more than the number of hidden layer nodes, then

If the number of training samples is less than the number of hidden layer nodes, then

If the number of training samples is equal to the number of hidden layer nodes, thenwhere is a uniformly distributed random number between 0 and 1 [32]. And this method is named as improved TELM (ITELM).

Based on these tasks, we optimized the ITELM network further by using the SAGA algorithm. We call it the SGITELM algorithm. The flow chart of SGITELM algorithm is shown in Figure 7.

3. Experimental Results and Analysis

3.1. Data and Processing

In this paper, the final stand rolling data of a 1580 mm HSR process in a steel factory are collected for experiments. The input variables used for the proposed prediction model of bending force are entrance temperature (°C), entrance thickness (mm), exit thickness (mm), strip width (mm), rolling force (kN), rolling speed (m/s), roll shifting (mm), yield strength (MPa), and target profile (μm). The output variable of the model is the bending force (kN). A total of 1300 pieces of steel data are employed in the experiments, and the dataset is divided into the following two subsets: training set (70%) and testing set (30%). The training dataset is used in determining the model structure and selecting training parameters. In this paper, the testing dataset is used as an unseen validation dataset to verify the model performance. The fractal dimension visualization diagram of the collected dataset is shown in Figure 8. Clearly, the input data vary considerably in different dimensions. Table 1 shows the data distributions for each input variable. To eliminate the difference between the numbers of different dimensional data, avoid the prediction error increase because of the big difference between input and output data and update the weights and biases conveniently in the modeling process. It is necessary to scale the data to a small interval in a certain proportion. Normalization is required prior to data entry into the model [33]. The following formula is used to normalize the data:where , , , and are the normalized data, original data, maximal data, and minimal data, respectively.

3.2. Evaluation Criteria

Complete assessment of model performance is carried out by calculating the mean absolute error (MAE), root-mean-squared error (RMSE), and mean absolute percentage error (MAPE) on the testing dataset. The formulas of the three evaluation criteria are described as follows:where n denotes the number of sample data and and are the measured value and the predicted value of the ith sample.

3.3. Determining the Best Configuration for ELM

ELM can fit nonlinear complex functions well, mainly because the activation function is used in the hidden layer. The activation function plays an important role in learning the model and understanding the very complex and nonlinear relationships. It can learn complex arbitrary function mappings that represent nonlinearities between input and output. For the commonly used activation functions “Radbas,” “sin,” “sigmoid,” “Hardlim,” and “Tribas,” this paper tests their impact on the performance of the prediction model. The formulas for these activation functions are as follows:

Besides the activation function, the number of hidden layer nodes also plays a crucial role in the accuracy and generalization ability of the model. If there are too many hidden layer nodes, it will inevitably cause some initial node units to be invalid or redundant, which will greatly affect the generalization ability of the model. If the node is set too small, the accuracy of the model will be affected. We must reasonably control the number of nodes to reduce the generation of redundant nodes and ensure the prediction accuracy of the model. The results in Table 2 fully reflect the RMSE of activation functions and the number of hidden layer nodes of the ELM algorithm. Table 2 shows that the RMSE value corresponding to the “sigmoid” activation function is always the smallest in the case of the same number of hidden layer nodes. Table 2 also shows that when “sigmoid” is used as the activation function, the RMSE decreases first and then increases with the increase of the number of hidden layer nodes, which is consistent with the previous description. Although the RMSE of activation functions “Radbas,” “Hardlim,” and “Tribas” are still decreasing with the increase of the number of nodes, they are always larger than the RMSE of activation function “Sigmoid” when the number of hidden layer nodes is 90. For the three activation functions, when the number of hidden layer nodes is large enough, there may be a relatively small RMSE, but the increase of the number of hidden layer nodes means the increase of model complexity. Therefore, “sigmoid” is finally proposed as the activation function and 90 is the number of hidden layer nodes.

3.4.  ELM Optimized by SAGA

Among the SAGA parameters that must be determined, population size and crossover probability are discussed in this section. Large population size may lead to slow convergence speed, while a small population size can involve a local optimum point. Crossover probability is employed to determine if two individuals must be crossed or not. With the results shown in Table 3, the population size of 10 generates the lowest RMSE of 12.3072. The effect of crossover probability from 0.4 to 0.9 at intervals of 0.1 is listed in Table 4, and the optimal crossover probability is 0.7. Based on the results of the extensive and complex experiments carried out above, a convincing ELM model optimized by SAGA (SGELM) is finally established, and the parameters of each variable are listed in Table 5.

Figure 9 shows the optimization procedures of the algorithms including GAELM and SGELM. It can be observed that the fitness curve of GAELM completely converges after the 60th iteration, while the fitness curve of SGELM just starts to converge from the 86th iteration. However, it can also be observed that the final fitness value of SGELM is better than that of GAELM. It indicates that, although the convergence rate of SGELM is slow, the solution quality of SGELM is better than that of GAELM.

3.5. Determining the Best Configuration for TELM

The number of nodes in each hidden layer plays an important role in the capacity of TELM, and there is no accepted theory to determine it. In this study, the activation function with “sigmoid” is determined first and the number of nodes in each of the two hidden layers is set to be the same [32]. Based on the results shown in Table 6, a TELM with the hidden layer structure 40-40 has the lowest RMSE of 12.9550.

3.6. Comparison with Different ELM-Based Methods

The prediction results of ELM, GAELM, SGELM, SGITELM, and real bending force values are shown in Figure 10. For better visualization, only the predicted values for the 40 samples in the testing dataset are shown. As can be seen from Figure 10, the bending force values predicted by the four ELM-based models are relatively close to the measured bending force values. It shows that the four ELM-based models can be applied to the prediction of bending force in HSR. In addition, among the four models, ELM has relatively poor prediction performance, because its prediction value is farthest from the measured value.

The prediction performance of the ELM, GAELM, SGELM, and SGITELM is represented by the scatter plot, as shown in Figure 11. The color scale is used to grade the absolute error; as the color goes from red to blue, the absolute error increases from 0 to 25 kN and the pink spot symbols indicate an error of over 25 kN. For higher production requirements, the absolute error between the predicted and measured bending force is expected to be less than 25 kN. Therefore, the more pink spots the model has, the worse the prediction performance the model has. In Figure 11, the predicted values are evenly distributed on both sides of the diagonal line symmetrically. Among them, ELM has the largest number of pink spots and the SGITELM has the least number of pink spots; 98.72% of the absolute errors of SGITELM prediction results are less than 25 kN. It also shows that the prediction performance is improved after the model is optimized.

Reasonable error distributions of the model are of great significance to analyze the feasibility of the model. Figure 12 is the histograms and distribution curves of the errors from the ELM, GAELM, SGELM, and SGITELM. The error distribution curves have a bell shape of normal distribution, which indicates that the prediction errors of all models are normal distribution. SGITELM performs relatively well, and their normal distribution curves are higher and narrower with the smallest standard deviation σ, which indicates that more prediction values with smaller errors are obtained. These results further prove the superiority of the prediction performance of the SGITELM algorithm.

Figure 13 shows the boxplot for the relative error results of the proposed methods. In general, the relative error reflects the deviation between the predicted value and the measured value, which better reflects the reliability of the prediction performance than the absolute error. The boxplot represents the degree of spread for the relative error with its respective quartile. SGITELM has the least outliers and the smallest quartile, which indicates that SGITELM has advantages over other bending force prediction methods.

To evaluate the prediction accuracy of these ELM-based methods more intuitively, the results of the three evaluation criteria are given in Figure 14. The results show that the accuracy ranking results of the four models under MAE, MAPE, and RMSE are the same; the evaluation criteria in ascending order is ELM > GAELM > SGELM > SGITELM. The proposed SGITELM has the best prediction performance, and the MAE, MAPE, and RMSE of SGITELM are 9.0893, 1.1433%, and 11.2678, respectively. These results fully prove that the proposed SGITELM is more suitable for bending force prediction of the hot strip than traditional ELM methods because of its higher prediction accuracy and better generalization performance.

4. Conclusion

In this paper, four ELM-based methods to predict the bending force were proposed. A total of 1300 pieces of steel data were collected to train and test the models. Values of hidden layer nodes, activation functions, and hidden layer structure of ELM were separately tested to determine the structure of ELM. The prediction performance of ELM, GAELM, SGELM, and SGITELM was evaluated, and the prediction accuracy was compared with the three criteria of MAE, MAPE, and RMSE. The prediction accuracy of the ELM-based methods can be significantly influenced by different hidden nodes, activation functions, parameter settings of SAGA, and the hidden structure of the TELM. The improved SGITELM is the most recommended method, which has the highest prediction accuracy and the best generalization performance and can be recommended for bending force prediction in hot strip rolling. ELM-based methods can work well for bending force prediction in hot strip rolling. It is also recognized that if ELM-based methods are introduced into other predictions in the hot rolling industry, more production benefits and economic benefits may be obtained.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the National Key R&D Program of China (no. 2017YFB0304100), the National Natural Science Foundation of China (no. U20A20187), and the Fundamental Research Funds of the Central Universities (nos. N180708009 and N2007006).