当前位置: X-MOL 学术J. Chemometr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning methods to predict solubilities of rock samples
Journal of Chemometrics ( IF 1.9 ) Pub Date : 2020-02-01 , DOI: 10.1002/cem.3198
Pál Péter Hanzelik 1 , Szilveszter Gergely 2 , Csaba Gáspár 3, 4 , László Győry 1
Affiliation  

Interests in the use of chemometric and data science methods for laboratory techniques have grown rapidly over the last 10 years, for the reason that they are cheaper and faster than traditional analytical methods of material testing. This study uses 888 rock samples collected from the exploration and production (E&P) sector of the oil industry. Based on the Fourier‐transform infrared (FT‐IR) spectra of these rock samples their solubility predictions have been developed and investigated with nine methods including both linear and non‐linear ones. Two of these methods such as Partial Least Squares Regression (PLSR) and Support Vector Regression (SVR) are available in a commercial software package and the other seven methods, Extreme Gradient Boosting (XGBoost), Ridge Regression (RR), k‐nearest neighbours (k‐NN), Decision Tree (DT), Multilayer Perceptron (MLP), Support Vector Regression (SVR), Artificial Neural Network (ANN) with TensorFlow (TF), were coded by the authors based either on commercial applications or open source libraries. The investigation starts with spectral data pre‐processing carried out by standard normal variate (SNV), baseline correction and feature selection methods creating the feature set for all machine learning (ML) applications. The accuracy of predictions has been evaluated with mean squared error as a performance metric for each investigated method. The comparisons of predicted values to real data of test samples have shown that mineral solubility in acids can be well predicted in the range of the uncertainties of real laboratory measurements, therefore it can be used to improve the response time of these investigations and reduce the risk in industrial applications. In those cases, where the unknown samples have got some out of the range features, the limitations in the accuracy of predictions have become clear. We have also identified the limitations in the methodology and planned steps to further improve the prediction capabilities. The identified constraint of samples' multitude further emphasizes the need for database building efforts, so that the real potential in big data and machine learning can be realized.

中文翻译:

预测岩石样品溶解度的机器学习方法

在过去的 10 年中,人们对将化学计量学和数据科学方法用于实验室技术的兴趣迅速增长,因为它们比传统的材料测试分析方法更便宜、更快捷。本研究使用从石油行业的勘探和生产 (E&P) 部门收集的 888 块岩石样本。基于这些岩石样品的傅里叶变换红外 (FT-IR) 光谱,他们开发了溶解度预测,并使用包括线性和非线性在内的九种方法进行了研究。其中两种方法,如偏最小二乘回归 (PLSR) 和支持向量回归 (SVR) 可在商业软件包中使用,其他七种方法,极限梯度提升 (XGBoost)、岭回归 (RR)、k-最近邻(k-NN), 决策树 (DT), 多层感知器 (MLP)、支持向量回归 (SVR)、带有 TensorFlow (TF) 的人工神经网络 (ANN) 由作者基于商业应用程序或开源库进行编码。调查从通过标准正态变量 (SNV)、基线校正和特征选择方法进行的光谱数据预处理开始,为所有机器学习 (ML) 应用程序创建特征集。已使用均方误差作为每种研究方法的性能指标来评估预测的准确性。预测值与测试样品的实际数据的比较表明,在实际实验室测量的不确定性范围内,可以很好地预测酸中的矿物溶解度,因此,它可用于提高这些调查的响应时间并降低工业应用中的风险。在这些情况下,未知样本具有一些超出范围的特征,预测准确性的局限性就变得很明显。我们还确定了该方法的局限性并计划了进一步提高预测能力的步骤。样本数量众多的确定约束进一步强调了数据库构建工作的必要性,以便可以实现大数据和机器学习的真正潜力。我们还确定了该方法的局限性并计划了进一步提高预测能力的步骤。样本数量众多的确定约束进一步强调了数据库构建工作的必要性,以便可以实现大数据和机器学习的真正潜力。我们还确定了该方法的局限性并计划了进一步提高预测能力的步骤。样本数量众多的确定约束进一步强调了数据库构建工作的必要性,以便可以实现大数据和机器学习的真正潜力。
更新日期:2020-02-01
down
wechat
bug