Introduction

The gram positive bacillus bacteria, Nocardia asteroides causing nocardiosis targeting to immunocompromised patients in direct state (Wilson 2012). A total of 225 species have been estimated till now under the genera of Nocardia (Bennett et al. 2014). It is an opportunistic pathogen responsible for several diseases in human as well as other vertebrate animal (Moylett et al. 2003). Practically these bacteria serve as predominant causing factors for pulmonary infection. Consequently it shows acute necrotizing pneumonia that also reflects in the inflammation of cutaneous and subcutaneous tissue. While the brain is recognized as most favored site for secondary infection in 25% patients amongst all victims (Ellis and Beaman 2002; Quinones-Hinojosa 2012). Subsequently N. asteroides is the prime cause of serious cerebral abscess as reported by researcher (Fleetwood et al. 2000). Nocardiosis is distributed worldwide but it is most prevalence through the United States (Patil et al. 2012; Saubolle and Sussland 2003). Infection of nocardiasis is influences by the sex and age of the patient. As noted the male female patient ratio 3:1 reflects that men are prone to nocardiasis rather than the female. Recently, around 0.375 nocardia infection for every 100 000 individuals have been reported throughout the world (Palmieri et al. 2014). Exclusively 500-1000 individuals are affected in United States every year in a random mode (Sethy et al. 2016). Numerous endeavors are taken to overcome the fatal disease but till now the disease is prevalence as research data reported. The Nocardiosis disease is predominant to the alcoholic, diabetic, cancer, AIDS and organ transplanted patient so, its need immediate control and monitoring supports. In present scenario records of these patients are quiet in frequent numbers. Therefore, this disease already occupied great concern and requires immediate remedy by promising, cost effective ways. Vaccination against the disease will be the finest and reliable option for controlling the disease outbreak (Nichol et al. 2003). A vaccine would have more effective if it can elicit both B and T-cell dependent immune responses. Vaccine designing against any disease can be performed using two specific approaches, conventional approach and reverse vaccinological approach. Within the present research plan the reverse vaccinological approach has been applied to design epitopic vaccine against N. asteroides as the conventional technique has some limitations. The conventional technique requires much time and also very expensive to formulate. Furthermore, the technique applied to killed, attenuated or indolent pathogens for vaccine development which will prove deadly if the pathogen alters to its pathogenic appearance (Khan et al. 2015). Besides from such conventional approach, the reverse vaccinology harnesses the immunoinformatics for developing vaccine against pathogenic diseases (Vivona et al. 2008). The advance approach involves several web portals and computational tools to predict the potential vaccine candidate. The fundamentality of reverse vaccinology showing it targets of protein derived epitopic sequence for novel vaccine generation. The reverse vaccinology provides easy and reliable techniques to find out the accurate B and T-cell epitopes (Srivastava et al. 2019). While the exomembrane and secreted protein of pathogens are the ideal for vaccine generation as these can easily interact with immune system (Gourlay et al. 2013). Considering the bacterial proteins are foreign to host body, it will be further recognized as antigenic determinants. Expression of both B and T-cell epitopes will enhance the immunogenicity driven by B and T-cell proliferation. The virulence factor of Mce family protein found to be responsible for the virulency of N. asteroides and pathogenicity expression. The current research focuses on the computational analysis and identification of potential epitopes present within the bacterial Mce family protein candidate. Therefore, identified epitopes will be served as ideal component for future vaccine development against the infection of N. asteroides. In order to execute the subsequent research performance the supreme computational aided techniques has been employed. Consequently the targeted protein sequence has been crucially analyzed through several authentic web prediction portals focus to extract the functional epitopes leading to narrative vaccine generation against Nocardiosis disease.

Materials and Methods

Retrieval of Amino Acids Sequence

The amino acid sequences of a protein is the preliminary requirement for development of computer aided epitopic vaccine component. Amino acid sequence of the virulence factor Mce-family protein has been retrieve from NCBI protein database (Coordinators 2017). Identified amino acids sequence was downloaded and carried out further computer based analysis particularly supported in FASTA file format.

B-Cell Epitope Prediction

B-cell epitope identification is one of the most crucial steps intended to development of epitopic vaccine (Jamal et al. 2017). Potential B-cell epitope has been predicted employing the ABCpred web server (Saha and Raghava 2006). The server utilizes recurrent neural network technique to locate the specific B-cell epitopes present in targeted protein sequence. The server also uses five parameters to validate the predicted outputs.

These factors are:

$${\text{Q}}_{\text{sens}} \left( {\text{sensitivity}} \right) = \frac{TP}{TP + FN} \times 100\%$$
$${\text{Q}}_{\text{spec}} \left( {\text{specificity}} \right) = \frac{TN}{TN + FP} \times 100\%$$
$${\text{Q}}_{\text{acc}} \left( {\text{accuracy}} \right) = \frac{TP + TN}{TP + FP + TN + FN} \times 100\%$$
$${\text{Q}}_{\text{ppv}} \left( {\text{positive prediction value}} \right) = \frac{TP}{TP + FP}$$
$${\text{MCC }}\left( {\text{Matthew Correlation Coefficient}} \right) = \frac{{\left( {TP} \right)\left( {TN} \right) - \left( {FP} \right)\left( {FN} \right)}}{{\sqrt {\left[ {TP + FP} \right] \left[ {TP + FN} \right] \left[ {TN + FP} \right] \left[ {TN + FN} \right]} }}$$

TP, FN, TN and FP represents true positive, false negative, true negative and false positive output respectively (Saha and Raghava 2006). The collective FASTA sequence of targeted protein virulence factor Mce-family protein has been submitted to the ABCpred server. Further, for the B-cell epitope identification, threshold score is set to 0.75 and window length is set to 20 in order to clarify the result. Moreover an overlapping filter has been used to eliminate the overlapped epitopic prediction.

T-Cell Epitope Prediction Within the B-Cell Epitope

In order to generate the immense immune response, an antigenic epitope necessarily to be accessible for the both MHC class type molecules (Naz et al. 2015) along with the B-cell (Barh et al. 2010; Bhattacharya et al. 2019). In support to the prediction of potent T-cell epitopes, the ProPred (Singh and Raghava 2001) and ProPred-I (Singh and Raghava 2003) servers are appointed for MHC-II and MHC-I molecules correspondingly. The ProPred and ProPred-I web servers predict epitopes that can be recognized by 51 MHC-II and 47 MHC-I alleles (Mustafa and Shaban 2006). Both the server applied matrices prediction algorithm method (Lafuente and Reche 2009; Lin et al. 2008) to find out the potent T-cell epitope. Current research work predicted epitopes for B-cell, are submitted to the ProPred and ProPred-I server with default parameters to identify the common epitopes anticipate by both B and T-cell.

Analysis of Antigenic Property of Epitopes

Predicted common epitopes for both the B and t-cell are further investigated for their antigenic property through the VaxiJen (v. 2.0) server (Doytchinova and Flower 2007). The server requires a query sequence of amino acids to evaluate the antigenic propensity with 70–97% accuracy level (Dimitrov et al. 2016). The server also specifies the field of antigenic prediction in five subsequent target group like- bacteria, virus, tumor, parasite and fungal (Zaharieva et al. 2017). The auto cross covariance (ACC) method has been also incorporated to convert the submitted sequence into uniform length of amino acids chain (Doytchinova and Flower 2007).

The ACC is estimated by the formulas

$$Ajj\left( l \right) = \mathop \sum \limits_{i}^{n - l} \frac{{z_{j,i} \times z_{j,i + 1} }}{n - l}$$
(1)
$$C_{jk} \left( l \right) = \mathop \sum \limits_{i}^{n - 1} \frac{{z_{j,i} \times z_{k,i + 1} }}{n - l}$$
(2)

Where ‘j’ is the z-scale (j = 1, 2, 3), ‘n’ is the amino acids present in sequence (i = 1, 2,…n) and ‘l’ is the lag (l = 1, 2,…L). The Eq. (2) is used when there are two different z- scales. The query sequence has been submitted to the server by selecting bacteria as target organism. In order to obtain more precise result, a threshold value of 1.0 has been set instead of default threshold value (0.4).

Epitope Selection

The common epitopes of B and T-cell having antigenic property are promising candidate for the epitopic vaccine designing. The epitopes predicted by ABCpred, ProPred and ProPred-I server also having Vaxijen predicted antigenic nature are selected for the vaccine development.

3D Conformation of the Epitopes

Three dimensional conformation of the epitope have its significant role leading to vaccine development. It is the most vibrant object for investigation of epitope-antibody docking system (Alam et al. 2016). Due to short amino acid length of the epitope conventional modeling servers are not in use so, DISTILL 2.0(Baú et al. 2006) server have been introduced for the process. The server uses two sets of bidirectional recurrent neural networks technique (Pollastri et al. 2002). For the result output, filtering is applied in two successive stages. In first stage network input has been amplified through prophecies and averaged over several adjacent windows. If σj = (αj, βj, ϒj) signifies output of j, subsequent to helix, strand and coil prediction reflects the inputs. In second stage network, Ij is the input instead of j.

$$I_{j} = \left( { \sigma_{j} , \mathop \sum \limits_{{h = k_{ - p} - \omega }}^{{k_{ - p} + \omega }} \sigma_{h} , \ldots , \mathop \sum \limits_{{h = k_{ - p} - \omega }}^{{k_{p} + \omega }} \sigma_{h} } \right)$$

In which kf = j + f(2ω + 1), 2ω +1 represents window size (ω = 7) and 2p + 1 represents window numbers (p = 7) taken into account (Pollastri and Mclysaght 2004). PBD file of the experimental epitopic sequence is provided by the server as result element.

Structural Configuration of the Protein

The three dimensional (3D) architecture of a protein plays essential role in protein function and stability (Roy et al. 2012; Willard et al. 2003). So, unrevealing the protein structural configuration is quiet necessary and purposeful for the study. The virulence factor Mce-family protein structure of N. asteroides is unavailable in PDB database (Berman et al. 2000). Because of the unavailability of PDB structure, the Phyre2 (Kelley et al. 2015) web server has been introduced to generate the protein structural data. The Phyre2 server applies the Hidden Markov model alignment to detect structure of target protein through HHsearch, open-source software package (Kumar and Jena 2014; Nema and Pal 2013). This server also uses Poing folding simulation to figure out the non allied part of the protein sequence (Nema and Pal 2013). Amino acid sequence of virulence factor Mce-family protein has been submitted to the server and a zip file containing predictions is comes out as systematic result component.

Successive Justification of Protein Model

Justification of a predicted protein model is very much fundamental aspect for establishing the specified protein 3D configuration (Laskowski et al. 2006; Rodriguez et al. 1998). To properly justify the model quality, two web portals are methodologically assigned. The PROCHECK web server (Laskowski et al. 1993), to analyze stereochemical properties of the protein model emphasizing on the torsion angle of Cα atoms of amino acids. A PDB file of the targeted protein model has to provide to the online web server. The server offered a Ramachandran plot of all available amino acids, that also vital for the validation of targeted protein model (Laskowski et al. 1993). The ProSA-web is introduce to calculate the Z score of the protein model and furnishes a plot of the protein model within all known protein structure (Wiederstein and Sippl 2007). This server also furnishes energy plot for perfect assessment of the model quality. While the negative energy value of the amino acid residues indicates a good model quality of examined protein (Belkina et al. 2001; Wiederstein and Sippl 2007).

Transmembrane Helix Prediction

Transmembrane helix prediction of the protein model was performed through TMHMM (ver. 2.0) server, the stand-alone software package (Sonnhammer et al. 1998). The server predicts transmembrane helices based on a Hidden Markov model method with 97–98% accuracy of result (Krogh et al. 2001). FASTA sequence of the protein has been submitted to the server and the server provides lists of transmembrane prediction along with an apparent graphical interpretation.

Molecular Docking Analysis

Molecular docking between epitope and antibody component is the critical for proper functioning of vaccine candidate. In order to carry out molecular docking the PatchDock server have been appointed in present research (Duhovny et al. 2002). The geometric contour complementarity method has been applied for docking the peptide and protein sequences (Schneidman-Duhovny et al. 2003). Hence, the PDB file or PDB code of the protein and epitope has to be submitted to the server for molecular docking analysis. The accurate docked complexes ranked according to geometric shape complimentary score and presented along with ACE value, interface area and PDB data (Schneidman-Duhovny et al. 2005). In molecular docking the aggregated desolvation energy of atom pairs termed as ACE. Atom set inside S1 and S2 with threshold distance d, the ACE is:

$$E_{ACE} = \sum\limits_{{s \in S_{1} ,t \in S_{2} ,\left\| {s - t} \right\| \le d}} {T\left[ {s,t} \right]}$$

Here |s − t| signifies Euclidean distance amid s and t, T [s,t] signifies the prearranged score of s and t atom pair.

T [s,t] is estimated using the subsequent formula:

$$\varvec{T}\left[ {\varvec{s},\varvec{t}} \right] = - \varvec{ln}\frac{{\varvec{N}_{{\varvec{s},\varvec{t}}} /\varvec{C}_{{\varvec{s},\varvec{t}}} }}{{\left( {\frac{{\varvec{N}_{{\varvec{s},0}} }}{{\varvec{C}_{{\varvec{s},0}} }}} \right) \times \left( {\frac{{\varvec{N}_{{\varvec{t},0}} }}{{\varvec{C}_{{\varvec{t},0}} }}} \right)}}$$

Here 0 signifies the solvent. (Ns,t) is number of s,t connection and the number of s-0 connection (Nt,0) are suitable connection numbers of recognized complexes. Moreover, Cs,t and Cs,0 are signifies as the possible numbers of s,t connection and s-0 connection (Guo et al. 2012).

Results

Retrieving of Sequence

The amino acid sequence of the protein virulence factor Mce-family protein has been retrieved from the NCBI protein database. The virulence factor Mce-family protein consisting of 493 amino acids and the sequence was downloaded in FASTA format (GenBank: SFL64340.1) (Benson et al. 2012). Afterwards, this sequence processed through several web servers to find out the potential epitopes for promising vaccine development.

B-Cell Epitope Prediction

ABCpred server predicts 20 linear epitopes (Table 1) within the virulence factor Mce-family protein of 20 amino acids length (window length). The sensitivity, specificity and accuracy are 57.14%, 71.57% and 64.26% correspondingly at window length of 20 (Saha and Raghava 2006). A score is also assigned against predicted epitopes and are ranked accordingly (Han et al. 2015). Predicted sequence with the higher score secures better chance to be an potent epitope (Jones and Carter 2014). The predicted epitopes are superior because of high threshold score of 0.75.

Table 1 Predicted B-cell epitopes of virulence factor Mce-family protein by ABCpred server

T-Cell Epitope Prediction Within the B-Cell Epitope

The identified B-cell epitope were further investigated for detection of prospective T-cell epitope through ProPred and ProPred-I to improve immunogenicity (Oprea and Antohe 2013). The B-cell epitopes those are recognized by both MHC molecules are taken under additional consideration. The epitopes common to both MHC classes and B-cell are listed in the Table 2 which may be recognized for ideal vaccine candidate. Following MHC alleles are also enlisted in the supplementary table. The ProPred server predicts epitopes of only nine amino acids moreover, the common epitopes comprised of 9mer sequence.

Table 2 Common MHC epitopes within the B-cell epitope

Analysis of Antigenic Property of Epitopes

Manifestation of antigenicity is the prime criterion of a novel epitope in immunobiological aspect (Chen et al. 2007). The common B and T-cell epitopes are successively validated against antigenic propensity via Vaxijen server; Table 3 presents the epitopes along with relevant antigenic score. The 9mer epitopes availing antigenic score beyond the threshold value 1.0 secures antigenic characteristics. In that consequence 10 out of 13 9meric epitopes proved futile to express antigenicity. Three 9mers VLGSSVQTA, VNIELKPEF and VVPSNLFAV having antigenic score 1.1110, 2.4569 and 1.0810 respectively proved to secure antigenic propensity.

Table 3 Vaxijen predicted antigenic and non antigenic common B and T-cell epitopes

Epitope Selection

The three 9mer B and T-cell epitopes VLGSSVQTA, VNIELKPEF and VVPSNLFAV having antigenicity (Table 4) will be easily accessible to the immune system (Comerford et al. 1991). These epitopes can be utilized for the future prospective of epitopic vaccine development. The epitopes will elicit strong immune response when administrated in the body as potent vaccine element.

Table 4 Selected common B and T-cell antigenic epitope

3D Conformation of the Epitopes

The selected epitopes VLGSSVQTA, VNIELKPEF and VVPSNLFAV are submitted to DISTILL server (2.0). The DISTILL server contributes five models of each of the epitope in PDB format. Only the top ranked model of epitopes is taken under consideration. UCSF Chimera (ver. 1.13.1) program has been implemented for visualization of the selected PDB files of the targeted epitopes (Pettersen et al. 2004). Subsequently the generated images are retrieved and presented in the Fig. 1a, b, c in order of VLGSSVQTA, VNIELKPEF and VVPSNLFAV protein.

Fig. 1
figure 1

a 3D structure of VLGSSVQTA epitope. b 3D structure of VNIELKPEF epitope. c 3D structure of VVPSNLFAV epitope

Structural Configuration of the Protein

The virulence factor Mce-family protein lacking PDB entry, for that reason the structure of protein has been depicted by homology modeling. Modeling of virulence factor Mce-family protein has been performed via Hidden Markov Model in Phyre2 server. Amongst 493 amino acids of virulence factor Mce-family protein 41%, 11%, and 4% amino acid construct α-helix, β-strand and transmembrane helix respectively. Phyre2 predicted PDB data of the protein processed through UCSF Chimera program in order to generate structural image. Figure 2 showing the three dimensional conformation of virulence factor Mce-family protein in N. asteroids.

Fig. 2
figure 2

Tertiary structure of the protein virulence factor Mce family protein

Rational Justification of Protein Model

The topology of virulence factor Mce-family protein is justified using ProSA and PROCHECK web servers. According to the prediction of PROCHECK generated Ramachandran plot (Fig. 3), 79.3% of residues exist in most favoured region and only 1.7% exists in disallowed region (Table 5). Existence of only two non-glycine and non-proline residues in disallowed region established the model quality. ProSA estimated Z score (− 2.38) of the model resides within the range of experimentally proved protein structures (Fig. 4) (Sharma and Jaiswal 2009). As presented in Fig. 5, most of the residues of the model possess negative energy value only few have positive value then the model is justified (Bodade et al. 2010).

Fig. 3
figure 3

All atom presentation of virulence factor Mce family protein model in Ramachandran plot by PROCHECK server

Table 5 Plot statistics of all atoms of virulence factor Mce family protein
Fig. 4
figure 4

Predicted model (black dot) of virulence factor Mce family protein within the Z-score range of ProSA plot

Fig. 5
figure 5

Energy plot of amino acids of virulence factor Mce family protein by ProSA prediction server

Transmembrane Helix Prediction

TMHMM server (ver. 2.0) predicts the existence of one transmembrane helix in virulence factor Mce family protein. The transmembrane helix comprises with 20 amino acids of virulence factor Mce family protein. Four of 493 amino acids present inside the membrane while 469 secure outer membrane location and rest of the other 20 form transmembrane helix. The graphical output of transmembrane helix prediction is interpreted in Fig. 6. Transmembrane nature of a protein is very crucial to express antigenicity and accessible to the immune system (Sánchez-Martínez et al. 2006; Xu et al. 1991).

Fig. 6
figure 6

Predicted transmembrane helix localization by TMHMM server

Molecular Docking Analysis

PDB file of VLGSSVQTA, VNIELKPEF and VVPSNLFAV epitopes and variable domain of T cell receptor delta chain (PDB id:1TVD) are submitted to PatchDock algorithm based server (Li et al. 1998). The server provides 20 docking complex for each epitope ranking them against geometric shape complimentary score. Only the top ranked complex of each epitopes is picked for computational analysis. Epitope along with score, area of interface and ACE value are listed in Table 6. The significantly low ACE value of docking complexes indicated elevated reactivity between epitope and receptor molecules (Ramanathan et al. 2009). The PDB file of docking complex are viewed aiding UCSF Chimera and PyMOL (v1.7.4) graphic system (DeLano 2002). The docking complexes are presented in Figs. 7a, b,8a, b and 9a, b along with molecular surface interaction. Docking of protein and peptide performs the essential role maintaining cellular activities and linked regulation (Guo et al. 2012; Lavi et al. 2013). The docking of selected epitopes and T cell receptor confirms that the epitopes will accessible to immune system and generate specific immunogenicity.

Table 6 Representation of geometric shape complimentary score, area of interaction and ACE value of the epitope receptor docking complexes
Fig. 7
figure 7

Docking complex with surface interaction (a) and bond length representation (b) of VLGSSVQTA epitope with 1TVD

Fig. 8
figure 8

Docking complex with surface interaction (a) and bond length representation (b) of VNIELKPEF epitope with 1TVD

Fig. 9
figure 9

Docking complex with surface interaction (a) and bond length representation (b) of VVPSNLFAV epitope with 1TVD

Discussion

Rational identification, authentication and in sillico analysis of epitopic components facilitates the successful generation of novel vaccines. As noted the vaccine element performs a key role not only recovering from the infections but also controlling the future disease outbreak. Hence, the nocardiosis has carried a dreadful influence over the immunocompromised patient as they manifest high level of disease susceptibility. Particularly the organ transplant patients encounter the greater chances of nocardiosis. The casualty of organ recipient patients for nocardia infection often reflects in higher degree (Husain et al. 2002). Consequently, the causing pathogen also highly resistance to several well known, market available antibiotics (Husain et al. 2002). Greater survivability, infections and antibiotic resistance nature of the bacterial pathogen is one of the greatest apprehensions for medical biotechnologist. Present advanced, computational analysis assisted research emphasizes much on the discovery of effective vaccine against to N. asteroides. The virulence factor Mce family protein of N. asteroides is responsible for the pathogenesis of the bacteria to other organism. Hence this protein served as the supreme component for designing epitopic vaccine against nocardiosis. The virulence factor Mce family protein is processed through several bioinformatic tools to execute the suitable epitopes having antigenicity. The transmembrane localization of the protein permits it to intermingle with exact immune system. The transmembrane helix prediction server TMHMM server (ver. 2.0) attests the exomembrane localization of the protein component. Both the B and T-cell epitopes are considerate for obtaining the maximum immune response through humoral and cell mediated immunity. The epitopes were identified through sequence based prediction method using the several web prediction servers. After retrieving the protein sequence of virulence factor Mce  family protein (GenBank: SFL64340.1) from NCBI database, it is processed through the ABCpred server for B-cell epitope motif. The 20 numbers of B-cell epitopes having considering threshold value are again submitted to ProPred and ProPred-I servers simultaneously for MHC-II and MHC-I binding allele (Barh et al. 2010). The common epitopes are shortened to only 9mers because of ProPred prediction module provides only 9meric epitopes. The epitopes VLGSSVQTA, VNIELKPEF and VVPSNLFAV were selected for vaccine designing after accurate validation against antigenic property through Vaxijen server. These are highly antigenic (VLGSSVQTA = 1.1110, VNIELKPEF = 2.4569 and VVPSNLFAV = 1.0810) being laid over the customized threshold antigenic score. This research design reveled that, three epitopes are the finest vaccine components as identified by B-cell and MHC molecules. The motif of the particular epitopes was mapped out through DISTILL 2.0 for conformational uniqueness and molecular docking. Intend to proper binding of epitopes with the T cell receptor delta chain reflects through the significantly lower ACE values. The structural profile of virulence factor Mce family protein also been mapped out via Phyre2 server that will be fundamental for therapeutics to design the vaccine. The epitopes VLGSSVQTA, VNIELKPEF and VVPSNLFAV will be much more effectual for perfect designing of epitopic vaccine against N. asteroides limiting nocardiosis and subsequent casualties linked with it.

Conclusions

Specialized immunoinformatic studies focuses the virulence factor Mce-family protein of N. asteroides established its significances lead to expression of bacterial pathogenicity. Existing research will be surely valuable in modern therapeutics purposes, to resist the nocardiosis outbreak. Particular manifestation of antigenicity by the common B and T-cell epitopes (VLGSSVQTA, VNIELKPEF and VVPSNLFAV) substantiates the critical aptitude to generate humoral and cell mediated immunity. Consequently, the targeted epitopes assist for easy interaction with the immune receptors favor to transmembrane localization of protein element. Literal structural signature of considerable protein along with its epitopes served as decisive factor for novel vaccine development. However, the epitopes requires substantial in vivo and in vitro justification for accurate refinement to generate finest vaccine component restricting the nocardia infection. Such computer aided research techniques are also highly influential and efficient for designing of desired epitopic vaccine against several associated diseases in light of immunoinformatics.