1 Introduction

The cognizance of fossil fuel depletion started in the early 1970s. Engineers then suggested that the consumption of more fuel than extracted will ultimately lead to its exhaustion. To prevent this, the demand for these fuels should be moderated along with the quest for alternate forms of energy that are capable of replacing fossil fuels. For a decade, there has been a constant increase in primary energy consumption (PEC) around the world. The average PEC growth rate (PECGR) for the years 2008–2018 globally was 1.6%, and remarkably in 2019, it decreased to 1.3%. The reasons for this decline are many and one of the chief reasons is the feeble economic growth of nations like Russia, the USA, and India. The decline of PECGR was observed in many nations except China. China stands as the highest individual country contributing about 24.3% of global PECGR, followed by the USA, India, and Russia with 16.2%, 5.8%, and 5.1% respectively. Though the global PECGR in 2019 has decreased, the individual PECGR by countries like China and India has increased from 135.33 and 33.30 exajoules (EJ) respectively to 141.70 and 34.06 EJ when compared to 2018 [1].

To control the surge in energy consumption, emerging countries like India and China have started to use the available fuels more efficiently. The oil consumption of countries like China and India has increased by 5% and 2.9% respectively. On the one hand, oil production is going down drastically, and on the other hand, oil consumption has been increasing [1,2,3]. Due to the prevailing COVID-19 pandemic, global energy review suggests that during the lockdown phase economies over the globe can expect a 20 to 40 % decline in economic output. In India, the national lockdown has reduced the energy demand by 30% [4]. Once this pandemic phase is over, it is expected the global energy demand would eventually increase and the statistics of the past 10 years suggest the need for alternate fuel like biofuels.

In India, there is a 9.8% increase in the production of energy from renewable sources when compared with the previous year 2018. Global biofuel production increased from 1787 thousand barrels of oil equivalents per day (tboe/d) to 1842 tboe/d which accounts for about a 3% increase in global biofuel production. India has increased its biofuel production by 24.9% when compared with 2018 and stands in the second position after Indonesia (37.5%) in the Asian-Pacific countries [1,2,3]. The demand for bioenergy is expected to increase by up to 11% by 2040 [5]. In India, the contribution of bioenergy to total energy demand is gradually increasing. This increase can be accelerated as India has abundant reserves of biomass, which can be the raw material for producing various forms of biofuel. The pursuit is for efficient technology that can convert the biomass into bioenergy. According to the United Nations Food and Agriculture Organization, in India, there is an increase in the total area covered by forests in recent times. This signifies that there is no insufficiency of lignocellulosic raw material in India [6]. Plant biomass is a reliable source for sugars and when subjected to fermentation will yield biofuel. So, lignocellulose is a very significant source for the production of various biofuels [7].

Lignocellulose is chiefly composed of cellulose, hemicellulose, and lignin. Cellulose contributes about 40–50%, hemicellulose contributes 25–30% of lignocellulose and the rest is lignin [8]. Consequently, keeping present inferior lignocellulose separating approaches [9] in view, humongous attention has been given to improve the methods of lignocellulose hydrolysis facilitated by the novel, efficient, and engineered enzymes [10]. Combinations of different glycosyl hydrolases are necessary for the comprehensive breakdown of lignocellulose into a blend of different sugars. In a lignocellulose-degrading habitat, the microbiome produces different mixtures of glycosyl hydrolases which aid in the thorough degradation of lignocellulose [11, 12]. So the quest for novel and efficient approaches for biofuel applications is still going on and one such new approach is metagenomics. Metagenomics is a novel approach for studying genomes of the entire microbiome residing at a given habitat. It helps in understanding the microbial composition of the habitat and gives a way to explore and exploit many novel genes from uncultivable/cultivable microbiome [13]. The increasing demand for steadfast and efficient lignocellulases and hemicellulases targeting biofuels may be met by this novel approach of metagenomics. In this review, the emphasis is to analyze most of the reported metagenomic-derived cellulases (endoglucanases in specific).

Lignocellulosic biomass which is available as agricultural, industrial, and municipal solid waste and forest residues around the globe is a prospective raw material for bioethanol as well as other value-added biochemical production [14,15,16]. Production of bioethanol from lignocellulosic biomass consists of three important steps. (i) The pre-treatment process reduces the recalcitrant nature of lignin, thus allowing enzymatic hydrolysis of biomass converting it to fermentable sugars. Pre-treatment techniques can be categorized into physical/mechanical, physicochemical, chemical, and biological. Each pre-treatment method has its advantages and disadvantages (Table 1). Based on the composition of biomass and economics, appropriate pre-treatment method is employed. (ii) Saccharification and fermentation, the leftover biomass (rich in cellulose) after pre-treatment will be converted to monomeric glucose by using cellulase enzymes. Bacteria, fungi, and actinomycetes are major cellulase-producing microorganisms at the laboratory scale, for industrial and bioenergy applications (Table 2). (iii) Recovery, to separate/extract the bioethanol produced from the raw fermentation broth to obtain high-purity bioethanol. Even though several separation methods are available, either distillation or in combination with other processes remains the primary approach for bioethanol purification.

Table 1 Advantages and disadvantages of different pre-treatment methods of lignocellulosic biomass (source: [17])
Table 2 Cellulases available in the market (all the price details are available in the website links except PCT1518*, the price of it was obtained from a local vendor)

In the orthodox method of producing bio-alcohol, processes like saccharification as well as fermentation are performed as a distinct individual process, involving their respective optimum parameters. This process is referred to as separate hydrolysis and fermentation (SHF) [18]. The chief limitation of SHF is the cellulase enzyme’s feedback inhibition, implicated by sugars liberated by the hydrolysis of the substrate [19,20,21]. To overcome this issue, simultaneous saccharification and fermentation (SSF) was recommended, which enhanced the enzyme consumption and efficiency of the process [22,23,24,25]. The major drawback of this process is incompatible temperatures of hydrolysis (45–60 °C) and fermentation (30 °C) [22,23,24, 26]. To alleviate this issue, non-isothermal simultaneous saccharification and fermentation (NSSF) has been proposed, involving partial enzymatic hydrolysis at optimum temperature, and as soon as the culture media is inoculated, the optimum temperature for the microbial growth is set [20, 27]. In the process of simultaneous saccharification, filtration, and fermentation (SSFF), membranes are used to obtain a clear sugar-rich filtrate from the hydrolysis liquid. The filtrate contains hydrolyzed sugars along with partially hydrolyzed lignocellulosic biomass. After glucose, xylose is the next abundant saccharide in many lignocellulosic materials. It would be apt to use the simultaneous saccharification and co-fermentation (SSCF) process to efficiently use the xylose part of the filtrate. In this process, xylose and glucose utilizing wild-type or engineered microorganisms are employed for ethanol production [28,29,30,31]. However, it is essential to gaze for new alternatives to the SSCF process, and one such alternative is consolidated bioprocessing (CBP) (Fig. 1).

Fig. 1
figure 1

Various strategies of bioprocessing for converting lignocellulosic biomass to biofuel. (CBP means consolidated bioprocessing; SSCF means simultaneous saccharification and co-fermentation; SSF means simultaneous saccharification and fermentation; SHCF means separate hydrolysis and co-fermentation; SHF means separate hydrolysis and fermentation)

Consolidated bioprocessing (CBP) has been designed to evade the setbacks and expenses of orthodox biofuel production from lignocellulosic biomass. This involves the application of either pure culture or consortia depending on the output and the process parameters. CBP aims to associate the processes like production of enzyme, hydrolysis, and fermentation into a single step and also try to combine the pentose sugar utilization process into the same. This is expected to improve the efficacy of the processes by eliminating the dependency on various hydrolytic enzymes that are being supplemented exogenously and decreasing the cellulase feedback inhibition by sugars [32, 33]. This leads to the reduction of unit operations involved in the total process, thus decreasing the process inclusive capital costs [32]. Further advances in CBP can evade the pre-treatment process of biomass by producing the biofuel from raw biomass [18].

2 Structure of lignocellulose

The major constituents of plant biomass mainly include cellulose, hemicellulose, and lignin. Apart from these, there are also minor volumes of ash, extractives, protein, and pectin. Yet, the configuration of these components differs among different species of plants and can also vary in the same plant concerning its stage of development, age, and other conditions. The degree of association of polymers with each other in the heteromatrix depends on the source of the biomass, its species, and its type [34,35,36]. To determine the most appropriate energy conversion path for specific biomass, the most important factor that has to be considered is the relative composition of the three main polymers in the lignocellulosic biomass [37]. To liberate sugars for fermentation, lignocellulosic biomass needs to undergo an aggressive pre-treatment process, to produce a substrate that is further broken down either using commercially available cellulolytic enzymes or employing microbes that are capable of producing such enzymes [38].

The rigidity of the plant cell wall is due to the presence of structural polymers, e.g., lignin and cellulose. Along with them, there are also minor volumes of ash, extractives, protein, and pectin. Apart from plants, there are other sources of cellulose; few microbes are also capable of producing cellulose like bacteria (Gluconacetobacter xylinum) [39] and algae (Cladophora green algae) [40]. In its unbranched homopolymer form, cellulose is made up of β-d-glucopyranose units, which are connected by β-(1,4)-glycosidic linkages. Reiterating units of cellobiose form the polymer chains of cellulose [34, 41]. Twenty to three hundred cellulose chains are connected by hydrogen bonds and van der Waals interactions to give a microfibril, this is concealed by hemicellulose and lignin, and these microfibrils are further grouped to give cellulose fibers. The sugar d-glucose can be liberated from cellulose by treating with acid or enzymes that break β-(1,4)-glycosidic bonds. In any given biomass, cellulose is observed to be in two forms—amorphous and crystalline cellulose. Most of the cellulose exists in crystalline form, whereas only a small proportion of it is amorphous. The unorganized cellulose chains existing in the amorphous form are easier to degrade using enzymes [42]. Cellulose elementary fibril (CEF) is a thin fibrillary product of cellulose synthase and a bundle of CEFs is often referred to as macrofibril. A macrofibril should not be confused with microfibril; microfibril may have a minor part of either macrobfibril or sometimes a CEF too as given in Fig. 2 [43].

Fig. 2
figure 2

Structural architecture of lignocellulosic biomass

The second most abundantly found polymer in lignocellulosic biomass, making up about 20–50% of it, is hemicellulose. It has a backbone that can be either a heteropolymer or a homopolymer. Unlike cellulose, which is a chemically homogeneous polymer, hemicellulose has short branches containing different sugars such as pentoses, hexoses, and uronic acids. The branches are connected by β-(1,4)-glycosidic linkages, and sometimes also by β-(1,3)-glycosidic linkages [34, 44]. Hemicelluloses have a special virtue of being the most heat-labile and chemically sensitive among all the major components of lignocellulosic biomass [45, 46]. As hemicellulose coats the fibrils of cellulose in the cell wall of plants, it has been proposed that to improve the digestibility of cellulose, it is required to get rid of 50% or more of the hemicellulose present [46]. Therefore, severity conditions during the pre-treatment process are generally compromised to maximize sugar recovery. The fraction of hemicellulose acquired after pre-treatment can be obtained in solid state, or a mixture of solid and liquid states, depending on the chosen method of pre-treatment [36].

The next in abundance with respect to lignocellulosic biomass as polymer is lignin. It is primarily observed in the cell wall of plants. The composition of lignin is very complex with many cross-linked polymers. This provides mechanical support as well as structural integrity to overcome microbial attacks and oxidative stress. The chief components of this polymer are sinapyl alcohol (syringyl alcohol), coniferyl alcohol (guaiacyl propanol), and coumaryl alcohol (p-hydroxyphenyl propanol) linked by different ether bonds [45]. Lignin glues together the different constituents of lignocellulosic biomass [46]. The intimate association of lignin with cellulose microfibrils is the main cause of hindrance for both microbial degradation of biomass through enzymes [47]. In 2000, Chang and Holtzapple showed that, upon increasing lignin removal, the digestibility of biomass was enhanced [48]. Delignification causes engorgement of the biomass and disrupts the structure of lignin. It also increases the internal surface area and makes cellulose more accessible to cellulolytic enzymes. During the pre-treatment process, lignin liquefies and then solidifies once cooled, which allows for it to be precipitated out [33, 46, 49].

3 Enzymes involved in hydrolysis

It is very difficult to completely hydrolyze cellulosic biomass due to the presence of microfibrils and its high degree of self-association. As part of the ecological carbon cycle, cellulose degradation with the help of microbial cellulosomes and cellulases drifts the carbon to environmentally available CO2 from various fixed biomasses [50]. To fulfill their nutritional requirements, microorganisms are capable of performing enzymatic degradation of cellulose. A group of cellulase enzymes, which are members of the glycosyl hydrolase (GHase) family (EC: 3.2.1), are responsible for the complete hydrolysis of cellulose [51,52,53]. These enzymes not only cleave the glycosidic bonds which are observed between carbohydrates but also cleave the glycosidic bonds between a carbohydrate and a non-carbohydrate compound. According to the International Union of Biochemistry and Molecular Biology (IUBMB) nomenclature, glycosidases (glycosyl hydrolases) were classified based on the characters like molecular mechanism and specificity towards substrates. However, there are diverse varieties of polysaccharides existing in nature [54]. To overcome this problem, glycosyl hydrolases were proposed to be classified based on their similarities in the amino acid sequences [55]. By using this classification, we can account for structural and functional relationships, mechanistic models, and substrate specificity [56].

The mechanism of glycosyl hydrolase catalysis is classified broadly into two types [57], one is the inverting mechanism and the other is the retaining mechanism. In the inverting mechanism, an anomeric position is shifted from β to α through a single-displacement mechanism, whereas in the retaining mechanism, anomeric carbon remains in the same positions as it is mediated by the double-displacement mechanism Fig. 3 [59]. In both the mechanisms, there will be no change in the proton donor’s position and it remains in the distance within which it can form hydrogen bonding. In the inverting mechanism, the catalytic base is distantly placed from the anomeric position to accommodate the water molecule between the sugar molecule and the base. As the retaining mechanism does not involve water molecules, it remains in the vicinity of the sugar molecule’s anomeric position [51]. Intermediate states like the epoxides [60] and oxacarbenium-ion like states [61] are observed in both the mechanisms (inverting, retaining mechanisms).

Fig. 3
figure 3

Mechanisms of glycosidic bond cleavage (adapted from [58])

For instance, the endoglucanase from Nasutitermes takasagoensis (a termite) belonging to the family glycosyl hydrolase 9 possesses three conserved catalytic residues, out of which two are aspartic acid and one is glutamic acid. Both the aspartic acid residues deprotonate water molecule (acts as a base) creating a nucleophile, which is capable of attacking the anomeric carbon. This breaks the glycosidic linkage and inverts the anomeric position. Glutamate residue acts as a proton donor/acid and protonates the sessile oxygen in the glycosidic bond [58]. Most of the endoglucanases belonging to GH-5 family have a conserved dyad of glutamic acid residues which take part in catalysis. Tripti et al. reported that the enzyme Cel-1, which was isolated from a buffalo rumen, possessed a conserved dyad of glutamic acid residues (E314 and E179) in its active site [62]. It is observed that many families of GH do not possess catalytic proton donor/acceptor and/or nucleophile [63, 64]. So, alternate catalytic mechanisms like substrate-assisted catalysis, proton transferring network, non-carboxylate residues, and exogenous base/nucleophile [65] also exist.

Cellulase enzymes can be classified into three types: endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91 or 3.2.1.176), and β-glucosidases (EC 3.2.1.21) [11].

3.1 Endoglucanase

Endo-1,4-β-d-glucanases attack the cellulose chain randomly, thus leading to the formation of more free ends, which serve as a substrate for exoglucanases. Thus, the two enzymes work together to accelerate the hydrolysis process [66].

3.2 Cellobiohydrolase

The tunnel-like active site present on cellobiohydrolase or β-1,4-exoglucanase binds to the reducing or non-reducing ends of microfibrils, thus allowing the enzyme to cut the cellulose polymer into shorter chains and release either glucose or cellulose as the product [67].

3.3 β-Glucosidase

The β-glucosidase (β-d-glucoside gluco-hydrolase) enzyme facilitates the transfer of a glycosyl moiety among nucleophilic oxygens. This enzyme performs the essential task of hydrolyzing cellobiose, which is the key factor that defines the rate of reaction. The product of cellobiose hydrolysis is glucose and increased glucose inhibits the β-glucosidase by feedback regulation [68, 69]. This enzyme can thus regulate the entire process of cellulose hydrolysis [70, 71]. Apart from participating in glycosyl bond hydrolysis, this enzyme can also catalyze the reverse process of hydrolysis under certain definite conditions [72]. The schematic representation of the mode of action for all three enzymes (endoglucanase, exoglucanase, β-glucosidase) and their products are represented in Fig. 4.

Fig. 4
figure 4

Schematic representation of the mode of action for all three enzymes (endoglucanase, exoglucanase, β-glucosidase) and their products

4 Carbohydrate-binding modules

Carbohydrate-binding modules (CBMs) were originally referred to as cellulose-binding domains (CBDs) as the domains discovered in initial times were all associated with cellulose binding. Later, when a lot of other carbohydrate-binding moieties were also discovered, the group emerged as CBMs [73]. A CBM is an adjoining polypeptide chain to some carbohydrate-active enzymes and it is capable of folding independently. It tends to bind to carbohydrates but does not modify its chemical structure. CBMs perform two important actions, one is to channelize appropriate carbohydrate to its catalytic fragment and the other is to hold the catalytic fragment at its vicinity to facilitate easy substrate binding (Fig. 5) [74]. Aromatic amino acids in ligand-binding sites play a very vital role in the binding of the substrate, especially the amino acids tyrosine and tryptophan [75]. To date, there are about 180,143 modules classified into 85 families and 559 modules yet to be classified (CAzy database).

Fig. 5
figure 5

Schematic representation of the carbohydrate-binding module

All the CBMs have been grouped into three categories based on the topographical anatomy of the ligand-binding site. The first is type A CBM which has a hydrophobic and planar surface as a salient feature. This gives type A CBM the capability of binding to polysaccharides like chitin and cellulose. Examples of type A include CBM families 1, 2, 3, 5, 63, 64, and 79. Though the binding of the substrate in both types A, B, and C is due to the same aromatic amino acids, the topology of type B CBM is different. In type B, the substrate can bind to two different sites of the same protein, one is a variable loop site (VLS) and the other a concave face site (CFS). The VLS is present at the tail of the protein whereas CFS is present on the concave portion of protein. Type B can bind to an extensive range of glycans (like galactans, xylans, and mannans) but cannot bind to substrates like cellulose. Examples of type B include CBM families 4, 16, 22, 31, 48, 58, 61, 75, and 80. The topological anatomy of type C is entirely different from the other two with a pocket-like substrate binding site which allows only small saccharides (like mono-, di-, or tri-saccharides) to interact with it. Examples of type C include CBM families 13, 14, 32, 42, 62, 66, and 71 [75].

5 Culture-independent as well as culture-dependent approaches

Microbes possess huge metabolic diversity which helps them to adapt to varied climatic conditions, increasing the eventual range of colonizing environments [76]. However, about 99% of bacteria existing in any given tellurian habitat cannot be cultivated. Correspondingly, this cultivable percentage is still low in marine habitats and it is about 0.001 to 0.1% only [77]. Hence, with the help of cultivable approaches, only < 1% of the microbes existing in any habitat can be explored [78]. Therefore, microbial enzymes derived from pure cultures which are currently being used for biofuel production are not an apt indication of the full potential for bio-catalysis possessed by microorganisms. Taking this into consideration, in recent years, some metagenomics-based and culture-independent-based methods have been established to survey the different bio-diversities in varied environments. Metagenomics not only provides the diversity profiling of microbial communities but also allows the quest for identifying novel genes and proteins/enzymes of industrial as well as biotechnological importance (like lignocellulose-degrading enzymes) overcoming the uncultivable nature of the microbes [78]. In a study conducted by Fang Z et al., a β-glucosidase (Bgl1A) was isolated from a marine microbial metagenomic library. Bgl1A was observed to be very stable even at highly saline conditions. When the concentration of glucose was increased, the enzyme activity of Bgl1A has slowly decreased. There are huge numbers of enzymes like Bgl1A that need to be explored in varied extreme environments with high potential [79]. Omics-based methods are also being used nowadays to analyze microbial metabolic diversity. The diversified approaches regarding both the cultivable and uncultivable microbes are represented in the form of a flow chart in Fig. 6.

Fig. 6
figure 6

Schematic representation of culture-dependent and culture-independent techniques (adapted from [78])

6 Exploring lignocellulosic biomass with the help of functional and structural metagenomics

Metagenomics has two major objectives, the first objective is to know about the taxonomic composition of the entire microbiome (structural) and the next objective is to know about the total genomes and associated genes of the microbiome (functional). Structural metagenomics is about identifying the major genera and species of organisms inhabiting a particular ecosystem to study and understand their roles in environmental interactions, evolutionary aspects, and biogeochemical cycles, whereas functional metagenomics is about the study of genomic diversity in an environmental sample with the purpose of isolating new genes and pathways which can encode functional enzymes or synthesize new biomolecules [80]. The steps involved in both of them are represented in the form of a flow chart in Fig. 7a, b. Functional metagenomics has been successfully used in isolating and identifying new protein families, especially lignocellulolytic enzymes such as cellulases, esterases, lipases, and xylanases [82,83,84].

Fig. 7
figure 7

a Schematic representation of steps involved in structural and functional metagenomics. b Process flowsheet of steps involved in metagenomic library construction (adapted from [81])

Generally, in metagenomics, two most common methods are used to understand a given microbiome. The first one is amplicon-based metagenomics and the second one is shotgun metagenomics. Amplicon-based metagenomics involves 16S ribosomal RNA for bacterial identification, internal transcribed spacer (ITS), and 18S region for fungal and eukaryote identification. The 16S/18S rRNA genes contain both hypervariable and conserved sequences. 16S sequencing targets hypervariable V1–V9 regions of the 16S ribosomal RNA gene to identify the diverse bacterial communities in the microbiome. While analyzing the 16S rDNA sequencing data, sequences with greater than 97% identity are grouped into an operational taxonomical unit (OTU) and each OTU is considered as taxa [85]. While analyzing the data if any two organisms have identical 16S rRNA sequence, they are considered as one species even though they are from two different species. This is the major drawback of this method. In the case of closely related species like Shigella flexneri and Escherichia coli (E. coli), this method cannot be used as it cannot differentiate them [86]. A great amount of sequence variation is observed in the ITS region located between 5.8S and 18S rRNA genes. ITS region sequences help in analyzing the fungal diversity of the sample whereas 18S rRNA sequences help in studying fungal taxonomies [87]. The fungal taxonomic analysis depends on the ribosomal genes like large subunit (LSU) or 28S, small subunit (SSU) or 18S, and 5.8S subunit rRNA genes. The 16S/18S rRNA metagenomics does not justify the title metagenomics as rRNA sequencing gives information about specific communities among microbiomes and not the entire microbiome existing in the habitat [88]. For analyzing all the microbes existing in the habitat, an untargeted shotgun sequencing approach has been designed. Shotgun metagenomics involves processes sample collection, processing, and sequencing, quality filtering the obtained sequencing reads, assembly of reads, binning contigs, and analyzing the obtained data [88, 89].

Before a shotgun study, there is a need for estimating the amount of microbial diversity of the given habitat to identify the relative abundance of species. This can be accomplished by 16S/18S rRNA metagenomic data analysis of the habitat. Species abundance is expected to be more in a soil sample when compared to a gut sample of an organism. So there is a need to generate more sequencing data for a soil sample than for a gut sample. Unless the sequencing depth is increased, there are fewer chances of identifying rare taxa. So, as the species abundance increases, the sequencing depth should also be increased for better insight into the habitat [89]. For instance, the shotgun metagenomic analysis of a carboxymethylcellulose (as sole carbon source) enriched bacterial consortia revealed the reconstruction of six complete genomes and four out of which were novel. Three of them were Bacillus thermozeamaize, Geobacillus thermoglucosidasius, and Caldibacillus debilis. The CAZy analysis revealed the presence of several genes associated with the degradation of lignocellulosic material. Out of all the genomes, the genome of Bacillus thermozeamaize had abundant glycosyl hydrolases (GHs) [90].

The metagenomic approach has been used to study diverse environments, but out of these, only a few are lignocellulosic-rich ecosystems [78, 91,92,93]. Although the chemical composition and the structural complexity of lignocellulosic biomass may make it difficult for microorganisms to colonize these environments and also hinder the primary and essential high-quality DNA extraction step, the best option to study lignocellulosic microbes and investigate the catabolic potential of related non-cultivable organisms is by using lignocellulosic materials. Therefore, environments that contain sugarcane bagasse, wheat straw, corn stover, rice straw, etc. are ideal for identifying new lignocellulolytic enzymes from uncharacterized microbial populations using functional metagenomics [93].

7 Process of metagenomic library construction

There exist a few technical limitations in the process of construction and functional streaming of metagenomic libraries which are specific to lignocellulose rich ecosystems. High-quality DNA is a critical requirement for library construction, but during its extraction from lignocellulosic material, it is often contaminated by acids, furan derivatives, and phenolic compounds [94]. These contaminants can cause denaturation of nucleic acids, interfere with DNA transformation, and also act as inhibitors for several enzymes required for library preparation [95]. The quality of DNA extracted from plant biomass can also be compromised by the presence of fertilizers, preservatives, and stabilizers from industrial processes. Due to limited microbial colonization of lignocellulosic biomass, metagenomic DNA (mgDNA) yield from this material is low and the presence of the above-mentioned contaminants will further decrease the yield of mgDNA. Variations in sample granularity have also been observed to affect yields [91, 94]. Consequently, no standardized methods exist to date to extract mgDNA from lignocellulosic materials. So, for the reason stated above, there are no optimized mgDNA extraction protocols from any lignocellulose-rich sample. However, to date, only amended protocols of existing procedures are being used for mgDNA extraction [91, 93].

The problem of low mgDNA yields due to low microbial load in lignocellulosic-rich materials can be solved by using enrichment strategies. In the case of cellulolytic organisms, the sample is supplemented with additional cellulose to enrich the sample with only cellulolytic organisms. Due to this, the target organisms would mostly increase in their population and eventually lead to an increase in the yield of mgDNA [96]. If the target organisms are prokaryotes, there is a need to separate them from eukaryotes to avoid DNA contamination and vice versa. Considering the advantage of small size, prokaryotes can be separated using size-selective filters and centrifugation to avoid eukaryotic genome contamination [97, 98]. When microbial populations are subjected to enrichment, the natural biomass ecosystem undergoes external modifications, thus making structural studies lose their relevance. One of the strategies is to use silica gel in the process of mgDNA extraction after the lysis of cells; this will reduce the shearing of mgDNA and increase the yield as well as the quality of mgDNA. The other strategy is to separate the large particulate matter from the sample (before the mgDNA extraction) by gently centrifuging at 3000 rpm (645 times gravity) to obtain a translucent pellet (containing microbes) and then using lysis buffer (1% cetyltrimethylammonium bromide (CTAB), 100 mM ethylenediaminetetraacetic acid (EDTA), 1.5 M NaCl, 100 mM Tris-HCl) and proteinase to lyse the microbes. These strategies increase the chances of isolating novel genes encoding cellulases, xylanases, and lipases/esterases.

Following the purification of mgDNA, it is size-fractionated and then cloned into plasmids (< 20 kb insert size), cosmids and fosmids (< 40 kb insert size), or bacterial artificial chromosomes (BACs) (> 40 kb insert size), depending on the target of functional screening of mgDNA libraries. Generally, genes involved in related metabolic pathways are present in clusters (such as operons or super-operonic clusters) in the microorganism’s genome. These clusters are more frequently observed in prokaryotes than in eukaryotes. Thus, it is preferable to clone the mgDNA into cosmids or fosmids for functional screening [99, 100]. On the other hand, when short inserts are cloned into plasmids, large gene clusters cannot be recovered, thus reducing the productivity of functional metagenomics [101]. However, plasmids containing promoters located on both sides of a multiple cloning site facilitate bidirectional transcription, which can increase the number of positive clones in plasmid-based libraries [102]. Furthermore, since gene expression is greatly host-dependent, broad host range systems should be used to maximize the chances of successful expression and detection of target genes [103]. E. coli is frequently used as a host for economical, effective, and high-level production of several heterologous proteins [104]. All the steps involved in the process of fosmid library construction are represented in the form of a flow chart in Fig. 7b. However, the use of E. coli hosts may have limited the number of lignocellulolytic enzymes that have been isolated from metagenomic libraries [105]. For example, the use of bacterial host systems such as E. coli significantly reduces the chances of identifying and isolating lignocellulolytic enzymes of fungal origin. This is due to a variety of factors such as differences in codon usage, promoter regulation and activation, and RNA processing and translation, which hinder the functional expression of eukaryotic genes in prokaryotic systems. Furthermore, essential post-translational modifications (such as glycosylation of eukaryotic cellulases and xylanases to facilitate secretion) are lacking in prokaryotic hosts [105,106,107]. Due to these limitations, the majority of lignocellulolytic enzymes identified to date belong to prokaryotic proteins [105]. Therefore, the use of alternate host expression systems should be considered instead of E. coli (e.g., Pseudomonas putida, Burkholderia graminis, Bacillus subtilis, Ralstonia metallidurans, Caulobacter vibrioides, Thermus thermophilus, Sulfolobus solfataricus, and Streptomyces) [108,109,110]. For example, T. thermophilus has been successfully used as a metagenomic library host for the detection of esterases and recombinant expression of xylanases, giving a higher yield of active clones than E. coli [111, 112]. The ongoing development of eukaryotic host systems will also contribute significantly to metagenomic studies of biomass-degrading enzymes.

8 Metagenomic 16S rDNA sequencing

After unleashing the potential of 16S rDNA for the phylogenetic analysis of bacteria in the early 1990s, it has been applied extensively. The metagenomic 16S rDNA sequencing gives details of the entire bacterial species surviving in the habitat. Furthermore, many researchers have even used sequential metagenomics for the identification of putative genes. Few examples are presented in Table 3.

Table 3 List of few lignocellulose rich environments analyzed for their 16S rDNA sequence and glycosyl hydrolases present in their genomes (∞—data unavailable)

9 Metagenomic-derived functional cellulases

Several cellulases have been isolated from different microorganisms and analyzed, but common practices for isolation and characterization by cultivable methods lead to the biased selection of microorganisms. Thus, commercial cellulases produced by Trichoderma or Aspergillus strains lack accessory enzyme activities and are unable to perform efficient saccharification of untreated biomass [136]. The development of more efficient cellulases using the metagenomic approach could increase the possibility of improving cocktails of enzymes used in lignocellulose conversion. Cellulase enzyme systems required for industrial conversion of cellulosic biomass should be able to function under robust physiochemical conditions.

9.1 Endoglucanase

Endoglucanase initiates the cellulose degradation process; thus, they are crucial for the cellulolytic action of microorganisms. The complete genes found in environmental samples such as insect gut, soil, spill water, and feces belonged to the GH5, GH9, GH12, GH44, and GH45 families of glycosyl hydrolases. The source of these genes may be from a varied group of microbes such as Bacillus, Cellulomonas, Cellvibrio, Clostridium, Vibrio, and other unknown microbes.

Endoglucanases exhibit differences in the amino acids located at positions 244 to 1005 in the protein sequence. Characterization was also done after performing heterologous expression and purification of endoglucanases. Upon performing kinetic analysis of the enzyme as well as studying its structure-function relationships, it was discovered that the functional properties of the endoglucanase are mainly dependant on the environment that the genes are isolated from. For example, a majority of cellulases from biogas digester [137], sugarcane bagasse compost [138], or rice straw compost [139] were found to be thermophilic. Similarly, cellulases from soda lake [140] or mangrove soil [141] were capable of withstanding high concentrations of salt. However, there are also a few enzymes which do not correspond in nature to the environment that they were isolated from, which can be observed in the case of the endoglucanase isolated from compost soil [142], which remains active even at a low temperature range of around 10–40 °C and has optimum activity at 25 °C. In 2016, Maruthamuthu performed functional metagenomics-based screening for hemicellulases and cellulases in two groups of wheat straw-degrading microorganisms using a multi-substrate approach revealing new thermo-alkaliphilic enzymes [143]. In 2016, Cheng et al. isolated and characterized nonspecific endoglucanase from the metagenomic library of goat rumen [144].

There are about 165 glycosyl hydrolase families (GH-1 to GH-165). Some families like GH-5, GH-13, GH-30, and GH-43 have 56, 42, 9, and 37 sub-families respectively. To date, there are about 664,285 modules classified into 165 families and 10520 modules yet to be classified (CAzy database). Since there is a continuous effort for identifying novel GHs from the past few decades, a great number of GHs were discovered and characterized. A large portion of GHs identified and grouped under the GH-5 family were cellulose-degrading and GH-93 were arabinan-degrading [145]. From Table 4, it has been observed that about 50% of listed endoglucanases belong to the GH-5 family and the rest of the endoglucanases belong to 9 different GH families like GH-5, GH-6, GH-8, GH-9, GH-16, GH-44, GH-45, and GH-74.

Table 4 List of functional metagenome-derived endoglucanases (* endo/exoglucanase)

Out of all the considered metagenome-derived endoglucanases, only three had carbohydrate-binding domains (CBMs). As the enzymes are all proved beyond doubt about their capability for cellulose degradation, they can be grouped under type A CBMs. It is observed that all three had the CBM at the c-terminal only. A few more details of the metagenomic-derived endoglucanases associated with CBMs are presented in Table 5.

Table 5 Details of metagenome-derived endoglucanases associated with CBMs

As the majority of the enzymes derived through metagenomics originate from uncultivable bacteria, their characteristics may be differing from the regular enzymes. Their optimum pH and temperature depend on the habitat from which it has been isolated. Various characters of enzyme-like co-factors (different metal ions), the effect of different types of detergents and organic solvents would not only help to enhance the enzyme’s activity but also support in further steps like bioprocessing and fermentation, at industrial scale.

9.2 The pH and temperature optima of metagenome-derived endoglucanases

As discussed earlier, the major drawback of second-generation biofuels is the lack of efficient, thermostable enzymes to liberate fermentable sugars from biomass. So, industries require robust, acid/base-tolerant, thermostable cellulases to increase biofuel production. Generally, every enzyme has an optimum temperature and pH, at which it attains maximum activity. The habitat of the organism and its intracellular environment decide the optimum temperature and pH of the enzymes in that organism. pH and temperature play a major role in both the folding and unfolding of a protein. Table 6 presents the information about the optimum pH and temperature for various endoglucanases reported in the literature so far.

Table 6 Optimum pH and temperature for various endoglucanases reported so far

Out of the 44 endoglucanases listed (in Table 6), 28 were observed to be in the range of 20–50 °C, and 9 were observed to be in the range of 50–60 °C. Only 7 were observed to be in the range of 60–90 °C, these thermostable enzymes have industrial importance. About 40% of listed endoglucanases were observed to be between 45 and 50 °C and 85% are active above 45 °C. All the animal gut-derived endoglucanases were in the range of 45 to 55 °C except PersiCel4 which had 85 °C. All the listed endoglucanases are in the pH range of 4 to 8.5. About 70% (31 enzymes) of the listed endoglucanases are acidic, 20% (9 enzymes) were neutral, and 10% (4 enzymes) were observed to be basic. The majority of the animal gut-derived endoglucanases were observed to be acidic, which might be due to the acidic environment of the gut [181].

9.3 Metal ions and their effect

Metal ions associate/dissociate with proteins to activate or inactivate them by interacting with amino groups, carboxylic groups, and other functional groups like sulfhydryl groups of amino acids [182]. They can also act as electron donors and acceptors. Though, the effect of various other divalent ions on cellulases has been variable depending on the structural linage of the protein. For an ion, the two characters which decide its potential are the ionic radius and charge [183]. Ionic radius and the potential to attract charged amino acids are inversely proportional. The lesser the ionic radius, the greater the potential to attract charged amino acids; this plays a key role in disturbing the residues of the catalytic site [182]. Water also plays a crucial role in metal ion-macromolecule interactions. Depending on the extent of the ion’s interaction with water by sharing the electrons with its adjacent molecules of bulk water portrays ion’s soft or hard nature [184]. Metal ions generally used for cellulose characterization are of different valences like mono-, di-, and tri-valent (Ca2+, Co2+, Cu2+, Fe3+, Fe2+, Hg2+, Mn2+, Mg2+, Ni2+, and Zn2+) and their effects are reported in Table 7.

Table 7 Effect of Metal ions on metagenome-derived endoglucanases (¥ means enhancing effect, € means diminishing effect, ∞ means data unavailable, and £ means inhibited completely)

Out of many metal ions, Co2+ is considered as the activator for cellulases [185]. From the tabulated data (Table 7), it is observed that many metagenome-derived cellulases can enhance their catalytic ability in the presence of Co2+, Ca2+, and Mn2+ ions. When compared to Co2+ ions, Ca2+ and Mn2+ ions are observed to enhance more number of listed endoglucanases. The literature suggests that Cu2+ and Fe2+ have always been inhibitory towards cellulases [186], whereas the inhibitory effect of Fe2+ was observed to be less on metagenomic-derived endoglucanases. Cu2+, Hg2+, Ag+, and Zn2+ appear to be potent inhibitors. Though the literature available regarding the effect of Hg2+ and Ag+ ions on metagenome-derived endoglucanases is less, the available studies suggest that both the ions almost inhibit the enzyme’s activity. According to Ani Tejirian et al. [186], Hg2+ is a lethal inhibitor for cellulases; the reason might be the interaction of Hg2+ with sulfur in the amino acids like methionine and cysteine. In the same study, they have also reported that the effect of Fe3+ was much higher than that of Fe2+ [186]. Similar results were observed by Juan Liu et al. in the case of metagenome-derived endoglucanase Cel5G [149]. Even if Fe2+ is supplemented in the reaction, it voluntarily oxidizes to Fe3+ in the presence of water. The carbohydrates which have hemiacetal reducing ends are prone to undergo oxidation in the presence of Fe3+. This oxidation can reduce the availability of cellulase degradable cellulose to the enzyme thereby leading to a reduction in the product yield. Hence, comprehensive knowledge of metal ions that can enhance/inhibit the enzyme’s activity is essential while scaling up the process to the industrial level.

9.4 Detergents and their effect

In recent times, cellulases are being employed in various fields and one such field is the laundry detergent industry. Many detergent industries are following new methods by supplementing detergents with compatible enzymes to ensure ease stain removal without losing the smoothness of the fabric. For this application, the compatibility of cellulases with different types of detergents is analyzed. Detergents are compounds with amphipathicity. They are structurally well-defined with a head associated with a tail. The head group is polar and the tail is hydrophobic. They can be mainly grouped into four major clans. They are ionic detergents, non-ionic detergents, bile salts, and zwitterionic detergents.

Ionic detergents have a charged head group, maybe anionic or cationic, and the hydrophobic tail may be a hydrocarbon chain or sometimes steroidal support too, e.g., SDS (sodium dodecyl sulfate) and CTAB (cetyltrimethylammonium bromide). SDS has inhibited the majority of the listed endoglucanases (Table 8) and CTAB has also drastically reduced the activity of endoglucanases suggesting that these endoglucanases are incompatible with ionic detergents (SDS, CTAB). Non-ionic detergents have hydrophilic head groups that are uncharged. They are mild and do not denature proteins like ionic detergents as they do not act on protein-protein interactions, Examples are Triton X-100, Tween 20, Tween 40, and Tween 80. Two cases reported by Chang-Muk Lee et al. [157] and Puneet Gupta et al. [167] were observed to enhance the endoglucanase activity in the presence of Tween (20, 40, 80) (Table 8). Most of the listed endoglucanases have reduced their endoglucanase activity in the presence of Triton X-100 except one enzyme, which is reported by Marjolaine Martin et al. They reported that Triton X-100 has enhanced the endoglucanase activity of Cel 5.1_3 [156]. These reports suggest that the above-stated enzymes are compatible with Tween and Triton X-100. Bile salts have a strong steroidal backbone which results in having an indistinct head group; example is sodium cholate. Zwitterionic detergents have a combination of both ionic and non-ionic detergent properties; examples are CHAPS (3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate) [187]. None of the studies has used bile salts and zwitterionic detergents for characterizing their enzymes.

Table 8 Effect of detergents on metagenome-derived endoglucanases (¥ means enhancing effect, € means diminishing effect, ∞ means data unavailable, and £ means inhibited completely)

9.5 Organic solvents

In the available data, only a few endoglucanases were studied for the effect of organic solvents on their enzyme activity. Isopropanol had a deteriorating effect on all the reported metagenome-derived endoglucanases (Table 9). Xinxin hu et al. have reported the enhancement of activity due to isopropanol. In their study, it has been observed that the trypsin’s hydrophobic portion formed a hydrogen bond with the isopropanol molecule. The predicted amino acid of the hydrophobic region in the bond formation with isopropanol is serine in position 214. Due to this, there was a change in the secondary structure of the protein which in turn changed the tertiary structure and total symmetry of the amino acids. This effect has slacked the basic structure of the protein, exposing the active sites. As a result of this, the availability of active sites increased which might increase or decrease product formation [188].

Table 9 Effect of organic solvents on metagenome-derived endoglucanases (¥ means enhancing effect, € means diminishing effect, ∞ means data unavailable, and £ means inhibited completely)

In most cases, dimethyl sulfoxide (DMSO) has partially reduced the enzyme’s activity; in some cases, it has not shown any effect whereas Cel 5.1_3 reported by Marjolaine Martin et al. has been completely inhibited [156]. DMSO can alter protein characteristics, which might lead to either degradation or aggregation. In meagre amounts, it might not affect the activity of the enzyme but can influence its binding properties [189]. Conversely, at higher concentrations, DMSO unfolds the proteins [190]. Surprisingly, the En1 enzyme reported by xing yan et al. has enhanced its activity in the presence of DMSO [155]. Many enzymes listed in Table 9 had a deteriorating effect in the presence of ethanol and few cases like Umcel9y-1 reported by Yu Zhou et al. [160] and Cel 5.1_3 reported by Marjolaine Martin et al. [156]; the enzyme was completely inhibited. Many studies have reported that the protein denaturation by ethanol reaches its peak between 20 and 50% of ethanol [191].

Most of the reported enzymes had reducing effects on the enzyme’s activity in the presence of acetone, and surprisingly in the case of Cel5R reported by Narender Kumar et al., acetone has enhanced its activity [173]. Acetone is generally used for precipitating protein and sometimes for concentrating them also. Deborah M. Simpson et al. in 2010 reported that acetone which may sometimes remain as residual contamination in purified proteins can selectively alter amino acids which might lead to protein’s conformational changes. This modification might sometimes increase or decrease the activity of the enzyme [192]. Cel5R reported by Narender Kumar et al. [173] has enhanced its activity in the presence of methanol, and in most of the reported cases, it has partially reduced the enzyme’s activity whereas in the case of Cel5G reported by Juan Liu et al. [149] the activity was completely inhibited. In 2011, Soyoun Hwang et al. reported that the presence of methanol expands the structure of the protein, probably by reducing protein’s hydrophobic properties. This expansion might increase access to the active site and increase the activity or sometimes even decrease the activity if the expansion disrupts the active site integrity [193]. Chloroform is considered a potent protein denaturant. It is regularly used in combination with phenol and isoamyl alcohol for the process of DNA extraction to separate proteins from nucleic acids [194]. Table 9 represents the effect of a few organic solvents discussed above on metagenomic-derived endoglucanases.

9.6 Other chemical compounds

Cel01 reported by Heiko nacke [151] has enhanced its activity in the presence of glycerol and the rest of the reported enzymes had a declining effect. Generally, proteins are very stable in an aqueous environment with the help of co-solvents and one such co-solvent is glycerol [195]. Glycerol has the capability to swift the native protein to a more compressed structure. It also prevents the aggregation of proteins in the process of protein refolding [195].

In most of the reported enzymes, EDTA had a deteriorating effect, and in the case of Umcel9y-1 reported by Yu Zhou et al. [160], it has completely inhibited the enzyme activity. Few enzymes like C67-1, Cel5G, CS10, and PHS remained unaffected in the presence of EDTA [147, 149, 157, 167]. EDTA is a popular protease and metal chelating agent from the ages past, which can reduce the enzyme activity by chelating the metal ions that act as co-factors for activating many enzymes. But Gajendra S. Naika et al. explained the enhancing effect of EDTA on endoglucanases. According to them, EDTA is capable of opening the protein molecule partially to create a transitional state, which might improve the enzyme-substrate interaction. Thus, the EDTA exhumes a portion of a protein that facilitates a more appropriate structure for better interaction of substrate [196]. Among the listed endoglucanases, only two enzymes were evaluated for their activity in the presence of dimethylformamide (DMF) (Umcel9y-1, CS10) and both had a deteriorating effect [157, 160]. In small concentrations, DMF can easily remove the interacting water molecules from the protein’s surface and intensely compete for hydrogen bonds. This process denatures the protein structure leading to protein unfolding [197]. Table 10 represents the effect of few such chemical compounds on metagenomic-derived endoglucanases.

Table 10 Effect of chemical compounds on metagenome-derived endoglucanases (¥ means enhancing effect, € means diminishing effect, ∞ means data unavailable, and £ means inhibited completely)

Dithiothreitol (DTT) has a special character in reducing disulfide bonds. Apart from that, in higher concentrations, it might create steric hindrance at the ligand binding site leading to decreased ligand binding and may bring conformational changes which would decrease the ligand binding further [198]. Surprisingly, three of the reported enzymes (C67-1, nmGH45, p4818Cel5_2A) had increased their activity in the presence of DTT and one (CelRH5) had a deteriorating effect [147, 170, 174, 177]. Only one (PHS) enzyme was checked for the activity in the presence of polyethylene glycol (PEG) and it has enhanced the enzyme’s activity [167]. PEG is a polymer that is hydrophilic and non-ionic, which has the capability of precipitating proteins [199]. Only one (EndoG) enzyme was checked for the activity in the presence of phenylmethylsulfonyl fluoride (PMSF) and it has partially reduced the enzyme’s activity [162]. PMSF has the capability of inhibiting a lot of proteases. The inhibition is mainly due to the interaction of serine residue in the active site of protein with PMSF [200]. NmGH45 reported by Junqi Zhao et al. [170] has enhanced its activity in the presence of β-mercaptoethanol. It is most commonly used for reducing disulfide bonds. Apart from that, it can also act as a chelating agent [201]. Sometimes, it might even activate some enzymes by capturing the metal ions which act as co-factors for inhibitors, thereby inhibiting the inhibitors from acting on their target proteins [182].

10 Significance of metagenome-derived cellulases in CBP

Stimulating the expression of cellulases and the involved secretion systems are the most puzzling phases in scheming cellulolytic organisms on an industrial scale. Cellulases exist as either bound to the surface (cellulosomes) or soluble extracellular (free cellulases). In 2014, Parisutham et al. have reported that each cellulolytic organism has over 400 different cellulose-related enzymes like glycoside hydrolases, glycoside esterases, carbohydrate esterase, cellulose-binding enzymes, endoglucanases, exoglucanases, and hemicellulases. Most of these enzymes can be introduced into their suitable hosts and utilized for CBP to depolymerize the cellulosic material into fermentable sugars as well as the subsequent conversion of obtained sugars into required biofuel [202]. Further, to make the CBP process efficient and cost-effective, metagenomics offers prospects to screen for novel cellulase genes from the pool of uncultivable microbes. A robust cellulase enzyme with high activity, wide substrate specificity, thermos-tolerance, and chemical and lignin tolerance can be obtained through metagenomics approach from natural and extreme environmental samples [202].

Metagenome-derived cellulases further may be improved by using protein engineering strategies for inhibitor tolerance, thermal, pH stability, enzymes with multi-functionality, synthetic cellulosome with a combination of active and robust cellulases, and plummeting product inhibition inside the cell [203,204,205,206,207]

11 Metagenome-derived libraries and their improved traits

To successfully grow various recombinant microbes on lignocellulosic biomass directly, support from various cellulases is required. These supporting enzymes can be either free enzymes from extracellular secretions or surface strapped like cellulosomes to accomplish biomass hydrolysis proficiently [202]. For acquiring ideal microbes for CBP, advanced approaches like metagenomics, metatranscriptomics, and even protein engineering can be employed. As the above-stated approaches have the potential in identifying novel cellulases with specificities towards diverse substrates along with efficient activity, chemical, and thermal tolerance. All the efforts for designing the best strategies compiled above aim to develop a futuristic design involving synthetic microbial consortia, to enable the engineering of robust microbial communities targeting highly efficient cellulose conversion industrially (Table 11).

Table 11 List of metagenomic approach-derived endoglucanases with improved traits

12 Conclusion and future prospects

Despite being rich in lignocellulosic biomass, India still lags in the discovery of industrially viable lignocellulolytic enzymes for effective production of second-generation biofuels (SGB) at an industrial scale. In the last few years, advanced sequencing strategies like next-generation sequencing (NGS) have come up with the capability of generating large amounts of sequence data, unlike conventional methods. With the help of NGS and metagenomics, a lot of unexplored, uncultivable, novel lignocellulases can be identified and exploited for the acceleration of SGB production in India. In this review, few metagenomic outcomes were discussed to enlighten the importance of many unexplored habitats for novel cellulolytic gene mining. This review also highlights the potential of different metagenomics approaches as most of the uncultivable cellulose-degrading microbiome and their efficient enzymes remain unexploited. It is observed that the degree of success depends on the methodology opted for, as every methodology has its drawbacks. In the future, these drawbacks can be overcome and even different strategies can be developed in combination with various methodologies described in this review.