当前位置: X-MOL 学术Symmetry › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Big Data as a Tool for Building a Predictive Model of Mill Roll Wear
Symmetry ( IF 2.940 ) Pub Date : 2021-05-12 , DOI: 10.3390/sym13050859
Natalia Vasilyeva , Elmira Fedorova , Alexandr Kolesnikov

Big data analysis is becoming a daily task for companies all over the world as well as for Russian companies. With advances in technology and reduced storage costs, companies today can collect and store large amounts of heterogeneous data. The important step of extracting knowledge and value from such data is a challenge that will ultimately be faced by all companies seeking to maintain their competitiveness and place in the market. An approach to the study of metallurgical processes using the analysis of a large array of operational control data is considered. Using the example of steel rolling production, the development of a predictive model based on processing a large array of operational control data is considered. The aim of the work is to develop a predictive model of rolling mill roll wear based on a large array of operational control data containing information about the time of filling and unloading of rolls, rolled assortment, roll material, and time during which the roll is in operation. Preliminary preparation of data for modeling was carried out, which includes the removal of outliers, uncharacteristic and random measurement results (misses), as well as data gaps. Correlation analysis of the data showed that the dimensions and grades of rolled steel sheets, as well as the material from which the rolls are made, have the greatest influence on the wear of rolling mill rolls. Based on the processing of a large array of operational control data, various predictive models of the technological process were designed. The adequacy of the models was assessed by the value of the mean square error (MSE), the coefficient of determination (R2), and the value of the Pearson correlation coefficient (R) between the calculated and experimental values of the mill roll wear. In addition, the adequacy of the models was assessed by the symmetry of the values predicted by the model relative to the straight line Ypredicted = Yactual. Linear models constructed using the least squares method and cross-validation turned out to be inadequate (the coefficient of determination R2 does not exceed 0.3) to the research object. The following regressions were built on the basis of the same operational control database: Linear Regression multivariate, Lasso multivariate, Ridge multivariate, and ElasticNet multivariate. However, these models also turned out to be inadequate to the object of the research. Testing these models for symmetry showed that, in all cases, there is an underestimation of the predicted values. Models using algorithm composition have also been built. The methods of random forest and gradient boosting are considered. Both methods were found to be adequate for the object of the research (for the random forest model, the coefficient of determination is R2 = 0.798; for the gradient boosting model, the coefficient of determination is R2 = 0.847). However, the gradient boosting algorithm is recognized as preferable thanks to its high accuracy compared with the random forest algorithm. Control data for symmetry in reference to the straight line Ypredicted = Yactual showed that, in the case of developing the random forest model, there is a tendency to underestimate the predicted values (the calculated values are located below the straight line). In the case of developing a gradient boosting model, the predicted values are located symmetrically regarding the straight line Ypredicted = Yactual. Therefore, the gradient boosting model is preferred. The predictive model of mill roll wear will allow rational use of rolls in terms of minimizing overall roll wear. Thus, the proposed model will make it possible to redistribute the existing work rolls between the stands in order to reduce the total wear of the rolls.

中文翻译:

大数据作为建立轧辊磨损预测模型的工具

大数据分析已成为世界各地以及俄罗斯公司的日常工作。随着技术的进步和存储成本的降低,当今的公司可以收集和存储大量的异构数据。从此类数据中提取知识和价值的重要步骤是一个挑战,所有寻求保持竞争力和在市场上占有一席之地的公司最终都会面临这一挑战。考虑了一种通过分析大量运行控制数据来研究冶金工艺的方法。以轧钢生产为例,考虑了基于处理大量运行控制数据的预测模型的开发。该工作的目的是基于大量的运行控制数据来开发轧机轧辊磨损的预测模型,该运行控制数据包含有关轧辊填充和卸载时间,轧辊分类,轧辊材料以及轧辊运行时间的信息。在运作中。进行了用于建模的数据的初步准备,其中包括消除异常值,非特征和随机测量结果(缺失)以及数据缺口。数据的相关分析表明,轧制钢板的尺寸和等级以及制成轧辊的材料对轧机轧辊的磨损影响最大。基于对大量操作控制数据的处理,设计了工艺过程的各种预测模型。R 2),以及轧辊磨损的计算值和实验值之间的皮尔逊相关系数(R)值。另外,通过模型预测的值相对于直线Ypredicted = Yactual的对称性评估模型的充分性。事实证明,使用最小二乘法和交叉验证构建的线性模型是不充分的(确定系数R 2不超过研究对象的0.3)。在相同的操作控制数据库的基础上构建了以下回归:线性回归多元,Lasso多元,Ridge多元和ElasticNet多元。但是,这些模型也证明不足以达到研究目的。测试这些模型的对称性表明,在所有情况下,预测值都被低估了。还建立了使用算法组合的模型。考虑了随机森林和梯度增强的方法。发现这两种方法都适合研究的目的(对于随机森林模型,确定系数为R 2 = 0.798;对于梯度增强模型,确定系数为R 2= 0.847)。但是,与随机森林算法相比,梯度提升算法具有较高的准确性,因此被认为是更可取的。参照直线Ypredicted = Yactual进行对称的控制数据表明,在开发随机森林模型的情况下,存在一种低估预测值的趋势(计算值位于直线下方)。在开发梯度提升模型的情况下,预测值相对于直线Ypredicted = Yactual对称放置。因此,优选梯度提升模型。轧机辊磨损的预测模型将在最小化总体辊磨损方面允许合理使用辊。因此,
更新日期:2021-05-12
down
wechat
bug