Abstract

Accurate determinations of water (H2O) content in natural gases especially in the methane (CH4) phase are highly important for chemical engineers dealing with natural gas processes. To this end, development of a high performance model is necessary. Due to importance of the solubility of methane in the aqueous solutions for natural gas industries, two novel models based on the Decision Tree (DT) and Adaptive Neuro-Fuzzy Interference System (ANFIS) have been employed. To this end, a total number of 204 real methane solubility points in aqueous solution containing NaCl under different pressure and temperature conditions have been gathered. The comparisons between predicted solubility values and experimental data points have been conducted in visual and mathematical approaches. The R2 values of 1 for training and testing phases express the great ability of proposed models in calculation of methane solubility in pure water systems.

1. Introduction

In the natural gas industry, the precision estimations of water content in the methane-rich gas phase have vital importance. An accurate approach is necessary for prediction of vapor-liquid equilibria (VLE) of methane (CH4) and water (H2O) binary systems. There are several equations of state (EOS) to estimate VLE of CH4-H2O systems. Luedecke et al. used the Mansoori expression and Van der Waals equation as repulsive and attractive parts, respectively. Their estimation for binary systems was acceptable approximately, but the model was not completely successful in prediction of VLE for ternary systems [1]. Bakker used an EOS for the ternary system of CH4-H2O-NaCl. However, this approach does not perform well near critical points [2]. Then, Chapoy and workers used a static-analytic apparatus to measure the solubility of methane in H2O for conditions of 0.1–18 MPa and 275.11–313.11 K. For this system, the binary interaction coefficients were calculated based on the measured data to enhance the performance of Patel–Taja EOS [3]. Mohammadi and coworkers developed a new approach by using mixing rules and Patel–Teja EOS [4]. After that, Haslam used the Hudson–McCoubrey rule and expressed that in the CH4-H2O system, and the square-well potential had better performance than the Lennard–Jones potential [5]. Markocic implemented the Redlich–Kwong EOS to evaluate the binary system by using nine sets of data [6]. Li and Nghiem developed an approach to estimate the VLE of CH4-H2O-NaCl and CH4-H2O systems based on Henry’s law and the Peng–Robinson EOS [7]. Carroll and Mather developed a method by using Henry’s law and the Peng–Robinson EOS to estimate the solubility of CH4 in alkanolamines aqueous solutions and pure H2O [8]. Wu and Prausnitz applied the Peng–Robinson EOS to determine the Helmholtz energy for CH4-H2O systems [9]. Yarrison applied the Peng–Robinson EOS and liquid model for prediction of VLE of CH4-H2O systems. The model estimations concluded with the average absolute relative deviation (AARD) of more than 6% [10]. Abudour et al. investigated the coalbed gas and water system by using the Peng–Robinson EOS in the model; however, its performance was not accurate [11, 12]. Li combined the Peng–Robinson and Pitzer models for determination of the aqueous phase and gas phase based on 18 adjustable parameters [13]. Then, another method was developed by Zhao for pure water systems in the range of 0.1–150 MPa and 274–573 K. The AARD values of 7% and 4% were determined for liquid and gas phases, respectively [14].

There are numerous studies on applications of machine learning in different industries [1518]. Najafzadeh and Azamathulla used the neuro-fuzzy-based group method of data handling (NF-GMDH) to estimate the scour process at pile groups due to waves in terms of wave characteristics upstream of group piles, arrangement of pile group, pile spacing, geometric property, and sediment size [19]. Najafzadeh et al. employed NF-GMDG combined with the gravitational search algorithm, genetic algorithm, and particle swarm optimization to determine scour depth [20]. In another study, ANFIS and Support Vector Machine were used to study the local scour depth in long contractions of the waterway [21]. Saberi-Movahed et al. used the group method of data handling (GMDH) in the determination of the longitudinal dispersion coefficient (LDC) as a critical variable in investigation of pollution profiles in the water pipeline [22]. Nazari et al. used machine learning models to determine energy and energy efficiencies in terms of productivity, wind velocity, ambient temperature, nanofluid temperature, basin temperature, solar radiation, fan power, and nanoparticle volume fraction [23]. Najafzadeh and Oliveto studied the scouring propagation rate around pipelines in terms of the current angle of attack to the pipeline, the Shields parameter, the ratio of embedment depth to pipeline diameter, and the approaching flow Froude number by using machine learning models [24].

The wide application of machine learning approaches shows that these approaches can be employed in complex issues. The Adaptive Neuro-Fuzzy Interference System and Decision Tree are two user-friendly and simple models which can be used by engineers working in different fields [25]. A little knowledge in machine learning can provide ability to develop DT and ANFIS algorithms. In the present study, Adaptive Neuro-Fuzzy Interference System and Decision Tree algorithms have been used to predict the solubility of methane in the pure water system. Furthermore, the CPA-vdW [26], CPA-HV [27], and SRK-HV [28] models have been employed to compare with proposed model results.

2. Methodology

2.1. Experimental Databank

In order to prepare and validate the ANFIS and DT algorithms, a comprehensive databank of 470 actual methane solubility points in the pure water system in a wide range of temperature and pressure has been collected from various papers. The details of this databank are reported in the following reference [27]. This databank has been divided into 353 and 117 data points for training and testing sets.

2.2. Adaptive Neuro-Fuzzy Interference System

In the literature for the first time, Zadeh introduced fuzzy logic (FL) [29]. The capacity of alteration of linguistic variables to mathematical forms is known as the major feature of fuzzy logic. Sometimes, this approach fails to achieve appropriate results because of contrasts in assessment or insufficient data. To solve this issue, other methods including the artificial neural network (ANN) can be combined with fuzzy logic for process modeling. The FL and ANN approaches are combined together and produce a new algorithm, namely, the Adaptive Neuro-Fuzzy Interference System (ANFIS). The combination of these methods performs based on definition of membership functions (MFs) and IF-THEN rules. There are several popular MFs including Gaussian, triangular, generalized bell-shaped, and trapezoidal. In the literature, there are two structures for ANFIS called Takagi–Sugeno and Mamdani types [30]. In the present work, Takagi–Sugeno has been implemented because of its ability in solving the nonlinear relationship between the output and the inputs. The main processes of designing an ANFIS algorithm are shown in Figure 1. In its different layers, there are various relations which are explained as follows [31, 32]:

The achievement of linguistic terms from the raw input data occurs in the first layer. Inputs are connected to nodes which are used for defining linguistic terms. This definition is constructed by the MFs [3335]. The utilized MF in this study is the Gaussian type which described as follows:where O stands for the output of the first layer, and and z represent the variance term and Gaussian MF center, respectively.

The second layer or the firing strength layer in which the accuracy and adequacy of the previous sections conditions are investigated. The formulation of firing strength is as

Here, and stand for MF and the rule’s firing strength.

After that, the normalization of the rule is performed in the third layer. The following formulation expresses the process of normalization:

The fourth layer has characterized the model’s output linguistic terms. The following expression is used to determine the level of each rule that influences the model’s output:

In this equation, the linear variables are obtained by optimization of ANFIS.

Finally, the fifth layer sums up the existing rules and changes them to a quantitative form as follows [36, 37]:

2.3. Decision Tree

In the recent years, one of the most applicable machine learning tools is the decision tree classifier [38]. This method is constructed based on a tree-like hierarchy to create a classification tree that has a simple scheme in which the terminal nodes stand for decision outputs and nonterminal nodes represent the attributes [39]. In this method, the major advantage is that the classification can have an easy visual representation. However, there are some disadvantages that include it cannot produce multiple outcomes, and it is approximately susceptible to the data noise [40]. Recently, many decision trees based on C4.5 [41], ID3 [42], the chi-square automatic interaction detector [43], and the classification and regression tree [44] have been suggested. The J48 decision tree or C4.5 algorithm has been applied as the fundamental classifier in ensemble frameworks. Although the C4.5 decision tree is an interesting approach for classification, its estimative ability can be enhanced by utilization of ensemble approaches [45]. In this study, the ensemble approach, namely, bagging has been used. It is one of the recent ensemble approaches which uses the bootstrap sampling strategy. This strategy samples randomly by replacing to produce multiple samples creating a training subset. These created subsets are used to generate the decision tree, and at last, they are aggregated into the final model. The mentioned strategy improves the classification performance by reducing the variance in the errors [46]. The scheme of the bagging process is shown in Figure 2.

3. Results and Discussion

In this part, the proposed ANFIS and DT models results have been evaluated in different stages including training and testing. In training of the ANFIS algorithm, particle swarm optimization (PSO) has been used. The selected cost function in this work is the mean squared error function, whose variations in terms of iterations are shown in Figure 3. After optimization of the ANFIS algorithm, comprehensive statistical comparison has been carried out. The number of clusters, population size, and iterations in training of the ANFIS model are 6, 65, and 1500, respectively. For DT, the learning rate and number of additive terms are 0.1 and 300, respectively. To this end, various indexes expressing the quality of the match between predicted and actual methane solubility values are determined and reported in Table 1. The R2 values of 1 for both stages and low values of errors including MRE= 4.571, MSE= 4.33045E − 08, RMSE= 0.0001, and STD= 0.0002 for total dataset express the high ability of the ANFIS model in estimating methane solubility in pure water systems. The MRE= 4.624, MSE= 1.99892E − 08, RMSE= 0.0001, and STD= 0.0001 in the testing phase confirm the performance of the developed ANFIS model for prediction of other unseen conditions.

The visual comparison of predicted and experimental methane solubility values is a necessary part of evaluation of models. To that end, the model outputs and actual methane solubility points are shown simultaneously in Figure 4. In addition, this excellent agreement between forecasted and actual methane solubility values are shown by cross plot depiction in Figure 5. As can be seen, the methane solubility points are located on the bisector line for both phases. In addition, the relative deviation between forecasted and experimental methane solubility points are determined and shown in Figure 6. It can be seen that the relative deviation points are highly close to zero.

In the present work, three other models are borrowed from the literature to compare with ANFIS and DT algorithms in prediction of solubility of methane in pure water systems. As shown in Figure 7, CPA-HV, CPA-vdW, SRK-HV, DT, and ANFIS models have been employed to predict methane solubility in different temperatures and pressures. This figure shows that the ANFIS algorithm has the most accuracy between the aforementioned models. Moreover, the accuracy of four other models can be affected by pressure and temperature, while the accuracy of the developed ANFIS algorithm is interesting for a full range of investigated conditions.

4. Conclusions

The main aim of the present work is development of innovative and accurate methods for estimation of the solubility of methane in pure water systems for extensive ranges of pressure and temperature. These methods have been constructed based on ANFIS and DT algorithms by using 470 methane solubility points. This databank has been used in determination of optimum parameters of ANFIS and DT algorithms in the training step and performance evaluation of suggested ANFIS and DT algorithms in determination of unseen methane solubility points. For ANFIS model as the most accurate method, the determined R2, RMSE, MSE, MRE, and STD are 1, 0.001, 4.33045E − 08, 4.571, and 0.0002, respectively. On the other hand, results of three other models from the literature have been compared with the ANFIS algorithm. This comparison shows that the ANFIS model is the best tool for estimating methane solubility in aqueous systems. Due to these results, the present algorithms are useful tools for chemical engineers dealing with the natural gas industries.

Data Availability

Data are included within the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (Grant no. 61801301), General Support Projects of Shenzhen Colleges and Universities (Grant no. SZWD2021002), and the Natural Science Foundation of Top Talent of SZTU (Grant no. 2019203).