Abstract

The consumption of fossil fuels has exponentially increased in recent decades, despite significant air pollution, environmental deterioration challenges, health problems, and limited resources. Biofuel can be used instead of fossil fuel due to environmental benefits and availability to produce various energy sorts like electricity, power, and heating or to sustain transportation fuels. Biodiesel production is an intricate process that requires identifying unknown nonlinear relationships between the system input and output data; therefore, accurate and swift modeling instruments like machine learning (ML) or artificial intelligence (AI) are necessary to design, handle, control, optimize, and monitor the system. Among the biodiesel production modeling methods, machine learning provides better predictions with the highest accuracy, inspired by the brain’s autolearning and self-improving capability to solve the study’s complicated questions; therefore, it is beneficial for modeling (trans) esterification processes, physicochemical properties, and monitoring biodiesel systems in real-time. Machine learning applications in the production phase include quality optimization and estimation, process conditions, and quantity. Emissions composition and temperature estimation and motor performance analysis investigate in the consumption phase. Fatty methyl acid ester stands as the output parameter, and the input parameters include oil and catalyst type, methanol-to-oil ratio, catalyst concentration, reaction time, domain, and frequency. This paper will present a review and discuss various ML technology advantages, disadvantages, and applications in biodiesel production, mainly focused on recently published articles from 2010 to 2021, to make decisions and optimize, model, control, monitor, and forecast biodiesel production.

1. Introduction

Fossil fuel, the most popular fuel with an essential role in developing economy and politics in both established and developing countries, has been a common industrial energy source for several decades because of its perfect properties combination like easy transportability, versatility, accessibility, and costly prices [13]. Although many undiscovered oil reserves remain in geological structures and rich unconventional oil reservoirs like tar sands, heavy oil, and oil shale indicate a suitable possibility of commercially viable resources, they are nonrenewable and limited. The world energy demand is assumed to reach a 56% growth between 2010 and 2040; hence there is a dire need for a sustainable alternative energy resource [46]. In addition to resources limitation, fossil fuel consumption for economic and industrial activities causes many challenges like air pollution, global warming, environmental deterioration, health problems, global climate change issues, and emitting greenhouse gas (GHG) in the entire world [7]. The energy crisis followed by high dependence on fossil fuels, increasing resource fluctuation, and environmental challenges exacerbated the resources ending-up concern and leading the world towards eco-friendly energy resources to assure a sustainable energy supply and meeting the escalating energy requirements from a renewable source [812]. Fossil fuel production will not suddenly stop and remains a universal energy resource, but scientists try to obtain low carbon footprint energy [13]. Biofuels, hydrogen, compressed natural gas, liquefied petroleum gas, and alcohol have enough potential to become alternative energy sources [1416].

A study was performed on renewables to choose the best alternative energy where bioenergy, today’s largest renewable energy resource, presented great potential in addressing climate change and global energy issues [17]. Biofuel includes biodiesel, bioethanol, and biogas, obtained from biomass resources, which can be applied instead of fossil fuels due to integrating enhanced energy security, environmental benefits, availability, renewability, and sustainability to produce various energy sorts like electricity, power, and heating or to sustain transportation fuels [6, 16, 1822]. Figure 1 illustrates the research trend in the biofuels field. The number of published documents has increased sharply from 2002 till 2020. Since 2016 a decrease in the growing number of articles was observed; however, it is still progressing.

Among all sustainable alternatives to fossil fuel, biodiesel is a suitable choice for diesel engines due to lower engine emissions (41% less greenhouse emission), physical and chemical properties advantages, and no need for significant modifications [2326]. Biodiesel and petrodiesel are miscible in any ratio, which leads to the use of their combination rather than pure biodiesel, not only in developed countries such as The United States, France, Italy, and Germany, but also in developing countries such as Malaysia, Brazil, Indonesia, and Argentina [7, 2729]. Biodiesel production capacity is an attractive growing trend; the automotive biofuels market is growing dramatically; it has engaged many scientists and researchers to satisfy the ever-rising energy supply demands by producing alternative fuels [25, 30]. As shown in Figure 2, the share of renewable energy in generating power is expected to have a 23% increase by 2030.

The challenge is to identify the biofuel production process outputs relationship as a function of process parameters, then maintaining and optimizing effective parameters in an optimum range to ensure high quality and productivity [13, 32]. Various transesterification associated raw materials parameters and reaction conditions like temperature, oil and catalyst type, reaction duration, oil to alcohol molar ratio, and catalyst concentration can affect productivity and production process response features, estimated through physical experiments [13, 3337]. Despite the necessity of experiments, the prediction of factors effect is not successful due to the underlying nonlinear relations between the responses and parameters and also plenty of process parameters; therefore, high accurate experimental modeling methods like machine learning-based prediction and artificial intelligence (AI) techniques are beneficial to overcome experiment methods limitations and traditional computing techniques challenges [13, 3840].

They provide mathematical models or independent modeling approaches according to the nature of the process to prevent waste of time and money and, furthermore, to study a wide range of physical and chemical process parameters separately and generate experimentally inaccessible details [10, 12, 4144].

2. An Introduction to AI and ML

AI is the ability of machines to simulate the human brain activities, applied through different computer science techniques, like heuristic algorithms, machine learning, and fuzzy logic [4547]. It is chiefly employed to predict biomass and biofuel properties, bioenergy end-use systems performance, conversion process performance, supply chain modeling, and optimization. Recommended optimization methods are response surface methodology (RSM), genetic algorithm, and Taguchi method; in the meantime, artificial neural network (ANN), regression, and analytical methods are trending modeling methods in internal combustion engine research [4851].

ML algorithms evolved with deep learning, reinforcement learning, transfer learning, and extreme learning are utilized in industrial processes to optimize, monitor, and control the systems, forecast maintenance, diagnose mistakes, and notify process attacks [2, 5256]. Linear regression, Principal Component Analysis (PCA), Decision Trees (DT), Genetic Algorithms (GA), K-nearest Neighbor Classifier (KNN), Random Forests regression (RF), Artificial Neural Networks (ANN), and Support Vector Machines (SVM) are some powerful machine learning algorithms [57]. Machine learning refers to a programmed process using consecutive iterations based on inputs of external variants, gradually updating problem-solving capability and self-improvement to solve the study’s complicated questions [57, 58].

AI applications to bioenergy systems are limited; however, studies indicate its great potential in addressing bioenergy development obstacles. Former reviews have separately focused on either a single AI approach or a part of bioenergy systems [2, 44, 48, 49, 59]. Due to the wide variety of AI techniques, conversion technologies, bioenergy products, biomass types, and supply chain design, a comprehensive review of AI applications throughout biomass agriculture to the consumption phase is necessary. This review intends to recommend advanced statistical methods and current popular machine learning algorithms conflux to obtain overall pragmatic models as an experiential agreement.

3. An Introduction to Biodiesel Production

Biodiesel is a clean, aromatic, biodegradable fatty acid methyl ester derived from waste oils, edible and nonedible vegetables oil, and animal fat (i.e., chicken and mutton tallow) as an alternative fuel source for diesel engines to reduce engine emissions, becoming a global mainstream for transportation [34, 45, 51, 6062]. In addition to alternative transport fuel, biodiesel has other potential usages such as heating oil, plasticizers, power production, high boiling absorbents for cleaning gaseous industrial emissions, lubricants, and various solvent applications. Biodiesel has similar properties to diesel fuel, for instance, cetane number, viscosity, energy content, and phase variations. Biofuels can provide a new business for agricultural products and revitalizing rural areas [63].

3.1. Advantages

(i)Sulfur-free(ii)Releasing fewer emissions(iii)Profitable Physicochemical properties such as density, cetane number, flash point, viscosity, and lubrication(iv)More complete combustion because it is highly oxygenated(v)Promoting energy sufficiency [42].

3.2. Disadvantages

(i)Less energy content(ii)Releasing more nitrogen oxides(iii)Higher maintenance cost(iv)High cost of establishment(v)Separation and purification stage for product(vi)Undesirable side reactions [51, 64]

Easy production from available renewable feedstock makes it more attractive. Nonedible tree seed oil resources are easily found everywhere, even in nonappropriate food crops land. Pure biodiesel, or a mixture of commercial diesel and biofuel, can be used in unmodified diesel engines due to the environmental sustainability advantages [51, 65]. Several countries command to add biodiesel into all diesel fuels to encourage people to use biodiesel [63, 66].

The most common reaction in the biodiesel production process is transesterification, which uses heterogeneous or homogeneous acid and base catalysts to improve transesterification under mild reaction conditions. Sodium hydroxide and potassium hydroxide (NaOH, KOH) are regular alkaline catalysts that can provide higher biodiesel yield [6770]. The transesterification reactions among the oil (i.e., canola oil, Simarouba glauca oil, soybean oil, sunflower seed oil, Thevetia peruviana seed oil, palm oil, etc.) and alcohols (i.e., methanol, ethanol) produce biodiesel [62, 7176]. It is a costly energy-consuming production process which results from product purification and separation, requiring a pretreatment step to reduce water and free fatty acids over a long period [2]. Low esterification efficiency arose from undesired side reactions. Figure 3 illustrates the transesterification reaction for biodiesel production and input and output variables.

Various transesterification associated parameters and reaction conditions like temperature, oil and catalyst type, reaction duration, oil to alcohol molar ratio, and catalyst concentration affect productivity, and production process response features significantly affect transesterification reaction [37, 78]. Statistical tools and many physical experiments are necessary to predict reaction responses and interactions to each parameter due to optimizing transesterification [36, 42].

4. ML Methods Application in Biodiesel Life Cycle

Producing biodiesel from renewables includes the following steps: extracting oil, pretreating feedstock, transesterification reaction, separating products, recovering unreacted alcohol, neutralizing glycerin, washing, and purification of biodiesel [70, 79]. In this section, we attempted to categorize and review ML technology applications in 5 crucial steps of biodiesel production, including soil, feedstock, production, consumption, and emissions [57, 80].

ML technology can be beneficial in all five stages to enhance the quality of estimations. There are plenty of research reviews on applications of machine learning technology in modeling biodiesel-fueled engines and combustion approaches; therefore, this study mainly focuses on the first three stages. Figure 4 shows an overview of the biodiesel production trend, inspired by Aghbashlo et al. [79] and Ahmad et al. [57].

4.1. ML Applications in Soil Stage

Numerous studies on the plot and tree cases applying ML have been reported in the soil stage of the biofuels’ life cycle. The most common ML methods in the soil stage are Random Forest (RF), Gaussian Process Model (GPM), and Support Vector Machines (SVM).

Sorghum crop is beneficial for producing health-promoting food from seeds, fodder, and biofuels from aboveground biomass [81]. To predict future trends in sorghum bicolor yield, Huntington et al. [82] used the RF approach under four greenhouse gas (GHG) emission scenarios and two different watering regimes. The most valuable sorghum productivity predictors were vapor pressure deficit, time, and irrigation practices. The RF model obtained a rational prediction accuracy by uniquely training and classifying data samples by year and country. Habyarimana et al. [81] performed a study based on sorghum fields satellite imaging to predict sorghum biomass yield using various ML methods like radial basis kernel (SVM-R), nonlinear kernel (SVM-G), PCA discriminant analysis (PCA-DA), PLS discriminant analysis (PLS-DA), SVM with linear classifier (SVM), radial basis kernel with polynomial basis kernel (SVM-P), simple linear model, RF, ANN, eXtreme Gradient Boosting-XgbLinear method (GBL), eXtreme Gradient Boosting-xgbDART method (GBD), and eXtreme Gradient Boosting-xgbtree method (GBT), where the eXtreme Gradient Boosting-xgbtree method performed better results.

Gleason et al. [83] compared the Linear Mixed-effects Regression (LME), Cubist, Support Vector Regression (SVR), and Random Forest (RF) methods to predict biomass in a moderately dense forest with 40 to 60% canopy closure where SVR performed the most accurate biomass model. Lee et al. [84] conducted a four-scenario context emissions-based study using Boosted Regression Tree (BRT) model to estimate corn production environmental impacts from 2022 to 2100, where the BRT model achieved a 0.82 estimating eutrophication impacts correlation coefficient and 0.78 in global warming. Yang et al. [85] applied Gaussian Process Model (GPM), a Bayesian inference method in a two-stage machine learning method, to achieve more accurate estimations. First of all, GPM crops yield downscaling and then an RF model estimated yield. Soil characteristics, solar radiation, average precipitation, wind speed, and temperature are usually input parameters, and the output parameters are future life cycle environmental impact and biomass yield. Table 1 provides a summary of soil phase studies in order to understand the effective method in each study. To form this table, various research papers have been used to extract data [8185].

4.2. ML Applications in Feedstock

According to machine learning applications in the feedstock phase studies, ANN, multiple linear regression, statistical regression, and multiple nonlinear regression models are the most popular methods. Blend composition, temperature, mixing speed, and mixing time are typical input variables, and the output variables are viscosity, flash point, oxidation stability, density, methane fraction, higher heating values, and cetane number. Mairizal et al. [86] examined biodiesels generated from various resources such as walnut oil, sunflower oil, peanut oil, rapeseed oil, hydrogenated coconut oil, hydrogenated copra oil, and beef tallow to predict higher heating value, viscosity, flashpoint, biodiesel’s oxidative stability, and density by using multiple linear regressions. Results showed that prediction performance increases by adding PU/MU (mono- and polyunsaturated fatty acids balance) as an independent parameter. Model inputs were feedstock polyunsaturated fatty acids content, iodine value, and saponification value. In another study on various biodiesels generated from fatty acid, the ANN method was applied to estimate cetane number, density, kinematic viscosity, and flashpoint [87]. Average absolute deviation and model’s estimation accuracy are showed in following values, respectively: cetane number (1.637%; 96.6%), flash point (0.997%; 99.07%), kinematic viscosity (1.638%; 95.80%), and density (0.101%; 99.40%). Tchameni et al. [88] used multiple ANN and nonlinear regression (MNLR) to forecast waste vegetable oil rheological properties. Results presented ANN model superiority over the MNLR method performance. Using single linear regressions and multiple linear regressions to estimate methane yield in biomass structural components revealed a quite considerable correlation between methane biomass’ potentials and chemical composition. Table 2 provides a summary of feedstock phase studies [8688] to classify the efficient method and study purposes.

4.3. ML Applications in Production

In the production stage, choosing a proper ML method depends on produced biofuel type (i.e., biodiesel, biogas, and biohydrogen). Based on studies, machine learning applications in biodiesel study can be organized into four sections: both quality and yield optimization, estimating quality, estimating yield, estimating, and optimizing process conditions and efficiency [57].

4.3.1. Quality Prediction

The prevailing ML method for quality prediction is ANN developed by the regression model, using reaction temperature, reaction time, calcination temperature, pressure, and flow rate as input variables and FAME (fatty acid methyl ester) content, viscosity, composition, quantity, cetane number, and density stand as output variables.

Soltani et al. [89] used an artificial neural network (ANN) to model various reaction parameter effects, i.e., calcination temperature, metal ratio, reaction time, and reaction temperature in a palm fatty acid (PFAD) to esters distillation, using sulfonated mesoporous zinc oxide SO3HZnO catalyst. Assessed optimum conditions for predicting a 56.41 nm SO3H–ZnO nanocrystalline catalyst size were 160°C reaction temperature, 700 calcine temperature, and 0.004 mole of Zn concentration during 18 min reaction time. Zinc concentration and the reaction time are recognized as the most and least effective parameters, respectively.

Ahmad et al. [90] used an ensemble learning method like Least Squares Boosting (LSBoost) integrated with the polynomial chaos expansion method (PCE) to predict quantity, quality, flow rate, the cetane number of fatty acid methyl esters (FAME), and composition in the vegetable oil-based biodiesel production process. Predicted values showed 1% uncertainty in all process parameters using mean absolute deviation percent (MADP), showing high accuracy of the proposed model in outcomes prediction and quantification uncertainty effect in the process. During the biodiesel production process from vegetable oil, the PCA method was applied to estimate relative density, viscosity, and percentage of vegetable oil conversion to methyl esters. Using PCA is an effective technique to differentiate and discriminate between pure biodiesel, pure diesel, waste oil, and their mixture.

Sarve et al. [91] used artificial neural network (ANN) and response surface methodology (RSM) based on a central composite design (CCD) to predict fatty acid methyl ester (FAME) content in biodiesel production from sesame oil, using barium hydroxide as a basic catalyst. The best possible combination of optimum condition values is methanol-to-oil molar ratio (6.69 : 1), reaction time (40.30 min), catalyst concentration (1.79 wt.%), and (31.92°C) temperature, which resulted in 98.6% of FAME content. The study revealed that catalyst concentration has the main influence on the FAME contents in the final product. ANN has a better capability in predicting the FAME content due to better correlation coefficient, root mean square error (R2), standard error of prediction (SEP), and relative percent deviation (RPD) values compared to RSM.

4.3.2. Yield Estimation

Several studies concentrated on ML methods application in predicting biodiesel synthesis from nonedible oils like anaerobic sludge, castor oil, and jatropha-algae.

Kumar et al. [92] trained an ANN model with Levenberg–Marquardt (LM) algorithm and backpropagation learning algorithm to predict biodiesel yield in the transesterification process, using jatropha-algae oil blends as inputs. The R-square value of 0.9976 compared with the experimental results confirmed the competency of the ANN technique.

Banerjee et al. [93] used the ANN and CCD model in castor oil and methanol transesterification using H2SO4 acid catalyst to predict the % fatty acid methyl ester content. They also devised a kinetic model using the experimental and computed data. Also using ANN-based predicted data and the experimental outputs, the rate constants of a kinetic model have been estimated. The temperature, catalyst concentration, and methanol-to-oil molar ratio are input parameters. The ANN model predicted a % fatty acid methyl ester yield with an 8% deviation.

Kanat et al. [94] used the ANN method and multilayer neural networks topology to model and estimate the anaerobe thermophilic upflow sludge blanket digester biodiesel and biogas production rate. Trained and tested experimental data were evaluated in both steady conditions and abnormal conditions; a high correlation coefficient showed ANN optimistic results for online monitoring of the thermophilic reactors. In a jatropha-algae oil blend study, ANN performed better than RSM [95].

A biodiesel synthesis process from waste goat tallow containing remarkable free fatty acids (FFAs) has been modeled by RSM and ANN to identify optimum parametric values that resulted in maximum FA conversion. Under optimal conditions, response surface methodology (RSM) and ANN presented similar predictability performance [96].

In another study, a linear regression (LR) and ANN model based on a Levenberg–Marquardt learning algorithm were developed for predicting soybean oil-based biodiesel transesterification yield, where the ANN performed better than LR [97]. Various conditions of soybean oil to biodiesel transesterification process have been studied to predict biodiesel yield [39]. In this study, the artificial neural network is applied with a multilayer feedforward neural network and kinetic models. The results showed the ANN model superiority, accuracy, and clarity over the kinetic modeling method. Guo et al. [98] used an adaptive neurofuzzy interference system (ANFIS) method, based on a statistical learning theory to estimate the biodiesel production yield as a function of methanol/oil ratio, pressure, reaction time, and temperature in the noncatalytic supercritical methanol (SCM) method. The high value of R-squared results indicates the ANFIS model’s impact on biodiesel yield prediction. Mostafa et al. [35] compared adaptive neurofuzzy inference system (ANFIS) and response surface methodology (RSM) to predict and simulate the efficiency of these approaches in modeling the transesterification yield. Box-Behnken design of RSM and two ANFIS approaches (hybrid and backpropagation optimization methods) investigated independent variable’s impact on the conversion of fatty acid methyl esters (FAME). The considerable R2 value was 0.9669 for RSM compared with 0.9812 and 0.9808 for two ANFIS models indicating the ANFIS models superiority against the RSM model for modeling and optimizing. Maran et al. [49] compared artificial neural network (ANN) and response surface methodology (RSM) efficiencies to predict and simulate muskmelon oil-based biodiesel yield. Central composite rotatable design CCRD investigated the ANN model against the RSM model. Catalyst concentration, reaction time, reaction temperature, and methanol-to-oil molar ratio affect FAME conversion by Multilayer Perceptron (MLP) neural network and RSM. The R2 value for RSM was 0.869, and it was 0.991 for ANN models, showing the ANN model superiority against the RSM to model and optimize FAME production.

4.3.3. Quality and Yield Estimation

Numerous studies have focused on biodiesel quality and yield optimization. Bobadilla et al. [77] used a set of Support Vector Machines (based on radial basic function kernel, linear kernel, and polynomial kernel) and linear regression methods to predict and improve biodiesel yield of particular properties like turbidity, higher heating value (HHV) with decreased viscosity, and density. Appling genetic algorithms to the regression models obtained more accurate biodiesel optimization scenarios to identify the best combination of independent and dependent variables.

Cheng et al. [99] developed a GA-ESIM method which is the combination of Evolutionary Support Vector Machine Inference Model (ESIM) and K-means Chaotic Genetic Algorithm (KCGA) to predict precisely and optimize biodiesel mixture properties. They found GA-ESVM better than ANN-GA and SVM. Obtained results demonstrate that the GA-ESIM model performance in prediction is more accurate than other AI-based tools.

Sivamani et al. [100] used ANN-GA-based and RSM models to predict and optimize the biodiesel yield in Simarouba glauca transesterification. They used a gas chromatography-mass spectroscopic (GC-MS) analysis oil to observe free fatty acid (FFA) level, and alcohol ratio, reaction time, and reaction temperature were input variables.

Ighose et al. [101] focused on an RSM optimization tool alongside the ANFIS model to predict and optimize the biodiesel yield in the Thevetia peruviana seed oil transesterification process. In addition to ANFIS and RSM model, using GA resulted in higher Thevetia peruviana methyl esters yield (TPME) in less time. The results determined the priority of ANFIS prediction capability over the RSM model. Dhingra et al. [102] applied ANN and GA combination in polanga oil-based biodiesel production to predict and optimize reaction variables to maximize the transesterification process. The input variables are the ethanol-to-oil molar ratio, the reaction temperature, the catalyst concentration, the reaction time, and the stirring speed. Outputs were combined with GA to optimize reaction conditions resulting in 92% by weight biodiesel yield.

4.3.4. Estimation and Optimization of Process Conditions and Efficiency

Karimi et al. [103] implemented a multiobjective analysis, using RSM and ANN to estimate FAME content and exergetic efficiency in waste cooking oil transesterification (WCO) for biodiesel production. Water concentration, reaction time, immobile lipase, and methanol concentration have been optimized to achieve 95.7% predicted FAME content. Corresponded input variables are the 35% catalyst concentration, 12% water content, methanol-to WCO molar ratio of 6.7, in 20 hours, produced 86% FAME content, and 80.1% exergy efficiency.

Patle et al. [104] used nondominated sorting GA-II (NSGA-II) multiobjective optimization to simulate and compare palm waste cooking oil esterification and transesterification reactions and optimizing heat duty, profit, and organic waste. As the heat duty increased, the profit improved, which increases the amount of organic waste. Rouchi et al. [105] used a Multivariate Curve Resolution Alternative Least Square (MCR-ALS) to process analysis and control the reaction parameters into the desired path. Multiple Scatter Correction preprocessing technique and MCR-ALS evaluate concentrations, the component’s type, and spectra to obtain biodiesel production from the soybean process. The correlation coefficient and standard deviation of residuals demonstrated the suitability of the MCR-ALS method. Shukri et al. [106] used ANN to optimize the engine performance, using a mixture of palm oil methyl ester and diesel as fuel in a diesel engine. Both experimental results and the ANN model showed better engine performance for the biodiesel 10 percent blend (B10) diesel fuel and palm oil blends due to the higher heating value and cetane number.

Aghbashlo et al. [107] developed an ANFIS model integrated with linear interdependent fuzzy multiobjective (ALIFMO) approaches and nondominated sorting genetic algorithm (NSGA-II) to optimize operating conditions as a function of inputs. Input parameters were reaction temperature, methanol/oil molar ratio, and residence time. Optimization minimized normalized exergy destruction (NED) and maximized functional exergy efficiency (FEE) and universal exergy efficiency (UEE) output parameters towards achieving the best conversion efficiency (CE), which is more than 96.5% of biodiesel content. Applied ANFIS models perfectly estimated the FEE, UEE, NED, CE parameters with an .

Sarve et al. [108] compared ANN and RSM in biodiesel production optimization concerning their analysis sensitivity, predictivity and generalization capability, and parametric effects. 97.42% of fatty acid ethyl ester (FAEE) content have been obtained at optimized temperature, ethanol-to-oil molar ratio, initial CO2 pressure, reaction time, and temperature, where the temperature was the most effective. ANN model performed better results than the RSM in mahua oil FAEE content predictions and data fitting.

In a biodiesel production process from vegetable oil, Nicola et al. [80] employed a multiobjective GA optimization to maximize important compounds’ purification and minimize energy requirements by optimizing main parameters in the process. Input parameters to the process model are reflux ratio, the mass flow rate of water, the water temperature, flash temperature, the number of trays, and dryer temperature. Among all optimized configurations, the one which confirms the minimum specific energy consumption and meets the biodiesel quality required standards was detected. Noriega et al. [109] used group interaction parameters (GIP) to predict and validate all present two-phase equilibriums between liquids in the biodiesel production system, including glycerol, low molecular weight alcohols, water, fatty acids, and biodiesel. Results demonstrated that the amount of carbon, hydroxyl groups, and unsaturated bonds affect liquid-liquid equilibrium, and the most efficient parameter was distributed component overall mass fraction, afterward length of the alcohol chain.

López-Zapata et al. [110] used an Extended Kalman Filter (EKF) and virtual sensors to measure and estimate operating conditions variables, control performance, and monitor the reaction. Performance analysis used alcohol, triglycerides (TG), methyl ester, diglycerides (DG), glycerol (GL), and monoglycerides (MG) concentrations to evaluate jatropha oil-based biodiesel due to a minor number of measurable variables, like PH and temperature. Fahmi and Cremaschi [111] developed an ANN superstructure model to recognize the optimum biodiesel production plant and best operation conditions. The ANN model was an effective alternative for thermodynamics, unit operation, and mixing models, presenting a less complicated model for the synthesis process. As mentioned before, Soltani et al. [89] used ANN to model various reaction parameter effects, using SO3HZnO catalyst. Assessed optimum conditions were 160°C reaction temperature, 700 calcine temperature, and 0.004 moles of Zn concentration through 18-minute reaction time. Zinc concentration and the reaction time were the most and the least effective parameter, respectively.

5. Conclusions

According to the machine learning applications in this study, the most common ML methods in the soil stage are Random Forest, Gaussian Process Model, and Support Vector Machines. In the feedstock phase studies, ANN, multiple linear regression, statistical regression, and multiple nonlinear regression models are the most popular methods. Blend composition, temperature, mixing speed, and mixing time are typical input variables, and the output variables are viscosity, flash point, oxidation stability, density, methane fraction, higher heating values, and cetane number. The prevailing ML method for quality prediction is ANN developed by the regression model, using reaction temperature, reaction time, calcination temperature, pressure, and flow rate as input variables, and FAME content, viscosity, composition, quantity, cetane number, and density stand as output variables. The prevailing ML method for yield estimation is ANN accompanied by ANFIS, using methanol-to-oil molar ratio, reaction time, catalyst concentration, total volatile fatty acid of the effluent, and temperature, while % FAME yield, biogas production rate estimation, biodiesel yield, and biodiesel production are regular output variables. The prevailing ML method in optimizing yield and quality section is ANN accompanied by GA-based ANFIS and SVM. The top five main frequently used input variables are methanol-to-oil molar ratio, stirring speed, catalyst concentration, reaction time, and reaction temperature. The most common output variables are FAME yield, biodiesel yield, high heating value density, and oil’s final acid value. The dominant ML method in the process efficiency and optimization portion is ANN accompanied by ANFIS. Frequently used input variables are reaction time, concentration, water content, methanol-to-oil molar division, and temperature, while CE, universal exergy efficiency (UEE), FAME content, biodiesel yield, and functional exergy efficiency are output variables. ANN, ANFIS, ELM, and SVM Machine Learning methods were employed to study consumption, engine performance, and emission.

Nomenclature

ALIFMO:Artificial linear interdependent fuzzy multiobjective optimization
AI:Artificial intelligence
ANFIS:Adaptive neurofuzzy interference system
ANN:Artificial neural networks
ALS:Alternative least square
B10:Biodiesel 10 percent blend
BRT:Boosted regression tree
CCD:Central composite design
CE:Conversion efficiency
CN:Cetane number
DA:Discriminant analysis
ELM:Extreme learning machine
FAME:Fatty acid methyl ester
FAs:Fatty acids
FEE:Functional exergy efficiency
FP:Flash point
GA:Genetic algorithm
GBD:eXtreme Gradient Boosting-xgbDART
GBL:eXtreme Gradient Boosting-xgbLinear
GBP:eXtreme Gradient Boosting-xgbtree
GBT:Gene expression programming
GHC:Greenhouse gas
GIP:Group interaction parameters
GPM:Gaussian process model
HC:Hydrocarbon
IAV:Initial acid value of vegetable oil
K-ELM:Kernel-based extreme learning machine
KV:Kinematic viscosity
LLE:Liquid-liquid equilibrium
LME:Linear mixed-effects
LR:Linear regression
LS:Least square
MAPE:Mean absolute percentage error
MCR:Multivariate curve resolution
ML:Machine learning
MNLR:Multiple nonlinear regression
MO:Mustard oil
MSE:Mean squared error
PU/MU:Mono- and polyunsaturated fatty acids balance
NED:Normalized exergy destruction
PAT:Process analytical technologies
PCA:Principal component analysis
PLS:Partial least square
RB-FNN:Radial basis function neural network
RF:Random forest
RFM:Random forest model
RLS:Recursive least squares
RSM:Response surface methodology
SVM:Support Vector Machines
SVR:Support vector regression
UEE:Universal exergy efficiency
UHC:Unburned hydrocarbons
VCR:Variable compression ratio.

Data Availability

The data used to support the findings of this study are provided within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank Iftikhar Ahmad et al. for the review article “Machine Learning Applications in Biofuels’ Life Cycle: Soil, Feedstock, Production, Consumption, and Emissions,” which was our guide to write this article.