当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A predictive noise correction methodology for manufacturing process datasets
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-10-17 , DOI: 10.1186/s40537-020-00367-w
Omogbai Oleghe

In manufacturing processes, datasets intended for data driven decisions are majorly generated from time-sequenced sensor readings. Industrial sensor systems are prone to transmit inaccurate readings, which result in noisy datasets. Noisy datasets inhibit machine learning and knowledge discovery. Using a multi-stage, multi-output process dataset as an experimental case, this article reports a methodology for replacing erroneous sensor values with their predicted likely values. In the methodology, invalid values specified by process owners are first converted to missing values. Then, ReliefF algorithm is used to select the most relevant features to progress for prediction modelling, and also to boost the performance of the prediction model. A Random Forest classifier model is built to predict replacement values for the missing values. Finally, predicted values are inserted into the dataset to fill in the missing entries. With many attributes having a significant number of erroneous values, the invalid values replacement is done one attribute at a time. To do this systematically, the process flow direction and stages in the manufacturing process are exploited to partition the dataset into subsets for model building. The results indicate that the methodology is able to replace erroneous values with likely true values, to a very high degree of accuracy. There is a paucity of this type of methodology for dealing with invalid entries in process datasets. The methodology is useful for both missing and invalid value correction in process datasets. In the future, the plan is to inject the prediction models into streaming data to simultaneously enable erroneous value correction and predictive process monitoring in real-time.



中文翻译:

制造过程数据集的预测性噪声校正方法

在制造过程中,主要由按时间顺序排列的传感器读数生成用于数据驱动决策的数据集。工业传感器系统易于传输不准确的读数,从而导致数据集嘈杂。嘈杂的数据集阻碍了机器学习和知识发现。本文使用多阶段,多输出过程数据集作为实验案例,报告了一种将错误的传感器值替换为其预测的可能值的方法。在该方法中,首先将流程所有者指定的无效值转换为缺失值。然后,使用ReliefF算法选择最相关的特征以进行预测建模,并提高预测模型的性能。建立随机森林分类器模型来预测缺失值的替换值。最后,预测值插入到数据集中以填充缺少的条目。由于许多属性具有大量错误值,因此一次只能对一个属性进行无效值替换。为了系统地做到这一点,可以利用制造流程中的流程方向和阶段将数据集划分为子集以进行模型构建。结果表明,该方法能够以很高的准确度将错误的值替换为可能的真实值。这种类型的方法很少用于处理过程数据集中的无效条目。该方法对于过程数据集中的缺失值和无效值校正都是有用的。在将来,

更新日期:2020-10-17
down
wechat
bug