We discuss the effects of data characteristics on point-based and sequence-based lithological classification algorithms.
•
To improve the performance of lithology classification, we add CRF layer into sequence-based model.
•
The SVM-assisted sequence-based algorithm proposed combines the advantages of point-based and sequence-based.
•
According to the data characteristics, we construct the synthetic records by Gaussian Mixture Models.
Abstract
When available core samples are limited, logging data becomes important in the lithology classification. For different lithologies, the distribution of logging data usually overlap each other, which increases the solution multiplicity of a single spatial point. The modeling of lithologic sequences depending on the vertical spatial relationship can reduce this multiplicity. To improve the modeling ability of sequences, we propose a lithological sequence classification algorithm modeled by bi-directional Gated Recurrent Units and Conditional Random Field layer (Bi-GRU-CRF) referring the proposed Artificial Neural Networks and Hidden Markov Models (ANN-HMM) hybrid framework. However, due to the limited training data and the difference of lithologic sequences, unlike the point-based algorithms, the generalization performance of the sequence-based algorithms can be significantly reduced. For this problem, we concatenate the probability output vector of Support Vector Machine (SVM) with original data as input to Bi-GRU-CRF, and the overall structure is named SVM + Bi-GRU-CRF. In the cross-validation with field data, whenever the sequences of training and test dataset are similar or dissimilar, SVM + Bi-GRU-CRF can generally achieve the best results comparing with all point-based and other sequence-based algorithms. Furthermore, the applicable conditions of this algorithm are discussed in three aspects: the relationship between the algorithm and data, the function of each module, and the influence of step size parameters. This work is progressed in three designed experiments with two groups of comparative synthetic datasets, which are generated with Gaussian Mixture Models (GMMs). Finally, a convictive and comprehensive evaluation of SVM + Bi-GRU-CRF is given out.