当前位置: X-MOL 学术Comput. Geosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improved well log classification using semisupervised Gaussian mixture models and a new hyper-parameter selection strategy
Computers & Geosciences ( IF 4.4 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.cageo.2020.104501
Michael W. Dunham , Alison Malcolm , J. Kim Welford

Abstract Well log classification, the process of mapping well log measurements to lithofacies identified from core samples, is a common procedure in the oil and gas industry. Manually assigning lithofacies to the wire-line log measurements without core can be time consuming, and can also introduce a bias. Supervised machine learning algorithms are commonly used to automate this process, but they are prone to overfitting when the training data are scarce, which is common for well log classification problems. Semisupervised machine learning algorithms are designed for classification problems with minimal training data, and we adopt a semisupervised Gaussian mixture model (ssGMM) method to solve this problem. The dataset we consider for our study is from a machine learning competition held in 2016 and we simulate a semisupervised scenario by assuming only one out of the ten wells is the labeled data. We apply ssGMM to this well log dataset and compare its performance to the supervised method that was the winner of this competition, XGBoost. To try and improve the performance of both ssGMM and XGBoost, we also introduce a new hyper-parameter selection strategy that simultaneously uses the mean and standard deviation cross-validation scores, compared to the default procedure that only utilizes the mean cross-validation scores. Our results indicate that ssGMM is able to slightly outperform XGBoost in our semisupervised context, which supports the suggestion that semisupervised algorithms are more appropriate in low training data situations. We also show that our new hyper-parameter selection technique selects hyper-parameters for ssGMM that perform better on the testing data, but the performance is mixed for XGBoost.

中文翻译:

使用半监督高斯混合模型和新的超参数选择策略改进测井分类

摘要 测井分类是将测井测量值映射到从岩心样本中识别出的岩相的过程,是石油和天然气行业的常见程序。在没有岩心的情况下手动将岩相分配给有线测井测量可能非常耗时,并且还会引入偏差。监督机器学习算法通常用于自动化此过程,但当训练数据稀缺时,它们容易过度拟合,这在测井分类问题中很常见。半监督机器学习算法是为训练数据最少的分类问题而设计的,我们采用半监督高斯混合模型(ssGMM)方法来解决这个问题。我们为研究考虑的数据集来自 2016 年举行的机器学习竞赛,我们通过假设十口井中只有一口是标记数据来模拟半监督场景。我们将 ssGMM 应用于这个测井数据集,并将其性能与本次比赛的获胜者 XGBoost 的监督方法进行比较。为了尝试提高 ssGMM 和 XGBoost 的性能,我们还引入了一种新的超参数选择策略,与仅使用平均交叉验证分数的默认程序相比,该策略同时使用均值和标准差交叉验证分数。我们的结果表明 ssGMM 在我们的半监督环境中能够稍微优于 XGBoost,这支持了半监督算法在低训练数据情况下更合适的建议。
更新日期:2020-07-01
down
wechat
bug