Abstract

The present work introduces a quantitative structure-property relationship (QSPR)-based stochastic gradient boosting (SGB) decision tree framework for simulating and capturing of the thermal decomposition kinetics of biomass considering effective parameters of the ultimate analysis (such as carbon, hydrogen, oxygen, nitrogen, and sulfur content) and process heating rate. Through a total of 149 pyrolysis kinetics, this study developed an artificial model and subjected it to training and testing phases. The proposed model was validated using error analysis, sensitivity, regression, and outlier detection. The coefficient of determination (R2) and mean relative error (%MRE) were calculated to be 0.993 and 4.354%, respectively, suggesting good performance in the estimation of the pyrolysis kinetic parameters. Also, the sensitivity results indicated the process heating rate to have the strongest effect on the model output with a relevancy factor of 0.43. Eventually, the proposed model showed superior performance compared to earlier frameworks.

1. Introduction

The thermochemical pyrolysis procedure is usually performed in an oxygen-free setting at 300 to 700 degrees Celsius [1, 2]. Modeling biomass thermal breakdown kinetics is a challenging but critical part of pyrolysis technology [35].

Many semiempirical, empirical, and theoretical methods are used to model the kinetics of biomass thermal breakdown [68]. For all their ability to describe thermochemical breakdown, these models fail to account for the interactions between biomass mixture and construction variables and are only linked to a few key variables (such as pyrolysis duration and heating rate) [911]. That is why researchers must discover a comprehensive approach that can include a variety of factors [12, 13].

The relationship between biomass thermal breakdown kinetics and related process factors is complex and nonlinear, making developing a general mathematical model difficult [14, 15]. In recent years, the application of new methods of data analysis for complex problems in various sciences has become widely used [1621]. Nevertheless, many computational intelligence methods, like the ANN (artificial neural network) methodology and its expansions, could successfully solve this problem [2227]. The pyrolysis kinetics (mass loss) of different biomass feed-stocks have been modeled using experimental data in many case-specific neural models described in various researches [28, 29]. For example, using an ANN built according to the heating velocity and the reaction temperature, Çepeliogullar and colleagues [30] could effectively model the refuse-derived fuel mass loss throughout the pyrolysis process.

Ahmad and colleagues were able to correctly estimate the bulk reduction of Typha latifolia during the pyrolysis mechanism using an ANN approach based on the speed of warming and the reaction temperature [31]. Rasool and others [32] created an ANN model built based on the temperature of the reaction and rate of heating to accurately predict walnut shell mass loss throughout the pyrolysis mechanism. Naqvi and colleagues were able to correctly predict the bulk degradation of rice shells and wastewater sludge in the pyrolysis pathway, employing an ANN approach centered on warmth and combination formation [33].

In the presence of a microalgae ash catalyst, Bong and colleagues [34] utilized an ANN model based on warming frequency and reaction temperature to correctly estimate the mix bulk reduction of peanut husk and Chlorella vulgaris throughout the pyrolysis mechanism. Bi and coworkers reported that using an ANN design based on temperature, warming velocity, and combining proportion, they could accurately predict how much charcoal gangue and peanut husk mixture would be left once the pyrolysis mechanism was completed [35, 36]. To forecast the remaining bulk % of the wastewater sludge and peanut husk mixture during the pyrolysis operation, Bi and colleagues [36] utilized an artificial neural network model focused on warming velocity, the proportion of blending, and temperature.

Despite the promising findings previously described, no general neural methods can be established for different biomass feed-stocks and reaction circumstances. In other words, without retraining, neural networks which were developed for particular biomass within selected reaction conditions cannot be used for diverse biomass. As a result, a broader approach to biomass feedstock and reaction conditions has to be installed.

Sunphorka and coworkers [22] thus attempted to construct a generic neural framework focused on biomass constituent assessment that could forecast the pyrolysis kinetics characteristics. The analytical model was then used to estimate biomass thermal breakdown kinetics using the obtained reaction kinetic parameters. As a consequence, they concluded that the novel method might correctly forecast biofuel pyrolysis kinetics in the future. Additionally, Aghbashlo and colleagues improved the Sunphorka and colleagues’ neural models by utilizing the neurofuzzy approach refined by a particle swarm optimizing technique [22]. Aghbashlo and colleagues [37] included the process heating rate as an additional input variable in addition to cellulose, lignin, and hemicellulose. The proposed intelligent hybrid model was able to predict the thermal breakdown kinetic parameters accurately and, as a result, the thermal breakdown kinetics of lignocellulosic biomass.

This research claims that by using experimental data and learning from them, computational intelligence methods might generalize biomass thermal breakdown kinetics. Despite the encouraging findings, both intelligent techniques were created using lignocellulosic materials compositional analysis (such as lignin cellulose and hemicellulose). Empirical kinetic, sulfuric acid hydrolysis, and near-infrared spectroscopy techniques could be used to analyze the composition of lignocellulosic biomass [38]. The retrieval of meaningful data requires statistical techniques, even though the near-infrared spectroscopic technique is quick and efficient. Several statistical methods include partial least squares discriminant assessment, fundamental constituent evaluation, hierarchical cluster assessment, soft independent simulation of category similarities, k closest neighbor, and support vector device. The sulfuric acid hydrolysis technique is accurate but is a lengthy and laboring procedure [38].

Overall, it is not easy to do a compositional study of biomass utilizing the time-consuming, expensive, and labor-intensive methods described previously. Also, the significant components’ monomer concentration and chemical linkages (especially lignin) vary significantly in lignocellulosic biomass [39]. Genetic variations, regional circumstances, climatic conditions, management methods, and harvest season contribute to this diversity. It is impossible to estimate the kinetics of the thermal decomposition of nonlignocellulosic materials, including fats and proteins, using compositional analysis-based algorithms. It is possible to resolve the previously mentioned problems using a widely accepted ultimate analytical technique that measures lignocellulosic biomass’s carbon, nitrogen, oxygen, hydrogen, and sulfur content. However, because of the wide range of lignocellulosic materials, it is challenging to create a comprehensive mathematical model that links the kinetics of biomass breakdown with its ultimate analysis. This complex problem may be solved using neurofuzzy methods such as the ANFIS, ANN, LSSVM, and ELM techniques [4042].

Consequently, based on the final evaluation utilizing oxygen, sulfur, hydrogen, carbon, nitrogen content, and the frequency of warming throughout the mechanism, this study sought to develop an intelligent generic framework to describe biomass pyrolysis kinetics known as the quantitative structure-property relationship (QSPR)-based stochastic gradient boosting (SGB) decision tree model. Generalization of biomass pyrolysis kinetic variables from ultimate quantitative data and warming velocity was accomplished using intelligent modeling. In order to perform the modeling process, 149 actual data from previously reported work were collected [4]. We separated 75% of this data to build the model and kept the rest for the testing phase. In this research, various statistical methods have been used to analyze and evaluate the accuracy of the proposed model in predicting the target parameter.

2. SGB Model

Friedman’s [33] SGB is a novel variant of the statistical learning and approximation function. It enhances regression trees. The SGB method calculates a simple tree sequence in which a new tree is constructed using the prediction residuals of the preceding tree [43]. Tree complexity is determined by a split of a root node with two child nodes. SGB partitions data in a stepwise procedure, calculating the observation-residual differences of the partitions [44, 45]. Then, a new partition is established through the residual-fitted tree-node tree. Such a partition reduces the data residual variance in the tree sequence. Statements are classified, accumulating the constructed trees. This diminishes the dependence of SGB on outliers, training datasets, and unbalanced data.

Additionally, an ensemble learning technique (e.g., bagging and boosting) integrates predictions of various models. Ensemble learning has robust performance in data mining and machine learning applications [39]. This approach assumes the following [46, 47]:where k denotes the base learner and ensemble sizes, while fk(x) stands for the function of input x from the training dataset. Ensemble prediction (x) linearly combines the base learner estimates, where denotes the linear combination parameters. Through boosting, (x) is estimated based on the following expansion [48]:

It should be noted that (x; ak), a = (a1, a2, …) is a simple function of x. Expansion coefficients and parameters are fitted to the training dataset through a forward-stage procedure that calculates estimation F0 (x). Then,where k is 1, 2, …, K. To solve equation (1), arbitrary and differentiable loss functions are subjected to a gradient boosting-based two-step procedure. The base learner function (x; ak) is fitted to the current pseudoresiduals through least-square criterion .

Subsequently, the optimal coefficient is obtained as follows:

Therefore, equation (2) replaces equation (1) (the complex function optimization problem in). It utilizes least squares and a single value optimization (equation (3)) on the basis of the loss criterion L. SGB resolves observations close to data model-defined decision boundaries in boosting [49, 50]. Particular in-tree observations close to the other classes are more likely to be identified and corrected in boosting [51, 52].

3. Sensitivity Analysis

To explore the effects of inputs on the target, a sensitivity analysis of the input parameters was carried out. The effects of the parameters were quantified by the relevancy factor as follows [53, 54]:where n denotes the total number of data points, Xk,i represents input i of parameter k, and Yi stands for output i. Also, is the average of input k, while is the average of the output. The relevancy factor varies from −1 to 1; the parameter with a larger absolute value has a stronger influence on the target [55]. A positive (negative) relevancy factor stands for a direct (inverse) influence. Thus, it can be said that a rise in a parameter raises (diminishes) the variable under a positive (negative) r-value.

The present work examined a total of 2 inputs directly affecting the output. Figure 1 depicts the results of the sensitivity analysis. According to Figure 1, k0 was found to have the largest eigenvalue (relevancy factor = 0.43).

4. Preanalysis Phase

The SGB-derived values were estimated and validated by 5 statistical measures. The developed model was executed in MATLAB. The collected data were exploited in the training phase. Nearly one-fourth of the data was utilized as the testing dataset. Moreover, the data were normalized.where x represents parameter n. Also, Dk was expectedly found to have an absolute value below 1. The output value is estimated by introducing the other variables as inputs to the SGB model.

5. Outlier Detection

Differing from the data bulk, outliers may exist in large experimental datasets and impact the reliability and accuracy of empirical frameworks. As a result, outlier data must be identified when developing a model, particularly in the training phase. Failure to consider particular unexplained influences would diminish model performance. Such performance declines can be evaluated by accurately examining outliers [56]. This includes calculating the model deviation using the associated empirical data. Such deviations are standardized cross-validated residuals forming the hat matrix. Previous studies provide more detailed descriptions in this respect. The present work identified outliers by the leverage process, calculating the hat matrix as follows [57]:where X is an N × p matrix, in which N is the total number of data points, p is the total number of inputs, and T denotes the transposed operator “−1” stands for the inverse operator. Also, warning leverage was defined as follows:

The present work adopted a rectangular represented by R = ±3 and 0 ≤ H ≤ H as the feasible area. According to Figure 2, a total of 4 outliers were identified among the data points (457).

6. Model Development and Verification Methodology

It is necessary to validate the model and measure its accuracy under parameter range extensions. The proposed model was validated using the mean relative error (MRE), root-mean-square error (RMSE), standard deviation (STD), mean squared error (MSE), and the coefficient of determination (R2). These measures are calculated as follows:

7. Results and Discussion

Figure 3 compares the training and testing datasets of the output values. As can be seen, the developed model showed good predictive performance. Model validity was evaluated using several graphical and statistical measures.

Figure 4 plots the regression results. As can be seen, the proposed model showed good predictive power for the output data based on the significant data density around line Y = X.

Figure 5 plots the output versus relative error for both datasets. According to Figure 5, the errors were found to have a distribution concentrated around zero deviation. The MRE was found to be smaller than 50%, suggesting good predictive performance.

Table 1 reports the performance evaluation measures. As can be seen, STD, RMSE, R2, and AARD were obtained to be small. This is suggestive of high predictive accuracy in the output estimates.

8. Conclusions

The present study provided new physical insights into the estimation of the thermal decomposition kinetics of biomass. A model was introduced based on SGB decision trees. The model was validated using training and testing data. Furthermore, the proposed prediction approach could introduce an efficient and effective measurement framework. The MRE was calculated to be 4.354 for this model. The proposed approach was found to outperform earlier works in terms of accuracy, generalizability, and validity.

Data Availability

The data references are described in the text of the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.