Skip to main content

Quantitative structure-activity relationship, molecular docking, drug-likeness, and pharmacokinetic studies of some non-small cell lung cancer therapeutic agents

Abstract

Background

Lung cancer has been reported to be among the leading cancer cases in the world. It was also reported to have caused a lot of death every year and accounted for about one-third of the whole cancer deaths in the globe. The main subset of lung cancers that accounts for about 85% of the problems of lung cancer raised above was non-small cell lung cancer (NSCLC). The most common cause of NSCLCs that mostly affects women and cigarette smokers was recognized to be overexpression of epidermal growth factor receptor tyrosine kinase (EGFR TK).

Results

Five models on thirty five (35) NSCLC therapeutic agents were developed via quantitative structure-activity relationship (QSAR) technique. The best model among them was selected and reported due to its fitness statistically with the following validation parameters: R2 of 0.8764, R2adj of 0.8370, Qcv2 of 0.7655, R2test of 0.7024, and LOF of 0.3312. Molecular docking was used to elucidate the mode of binding interactions between the thirty five (35) NSCLC therapeutic agents and the binding pose of EGFR tyrosine kinase receptor (3IKA) in this research. Compound 29 was recognized to have the most excellent binding affinity of − 8.8 kcal/mol among others. The drug-likeness and pharmacokinetic properties of all the NSCLC therapeutic agents were predicted using SWISSADME, and none among the molecules under investigation violated more than the permissible limit of the conditions stated by Lipinski’s RO5 filters. Five hit compounds were identified using molecular docking virtual screening. The five (5) hit compounds were further screened and identified compound 16 and 27 as excellent among them using their pharmacokinetic profiles and drug-likeness properties.

Conclusion

QSAR technique was used to build five models on thirty five (35) NSCLC therapeutic agents. The best model among them was reported because it is statistically significant with good validation parameters. The molecular docking result has identified five (5) hit compounds. The most common amino acid residues to all hit compounds under investigation were Glu762, Leu718, Lys745, and Val726 which might be responsible for the higher inhibitory activities/binding affinities of the compounds under investigation. Furthermore, these five (5) hit compounds were further subjected to drug-likeness and pharmacokinetic properties prediction to determine which among them have the best pharmacokinetic profile. Compounds 16 and 27 among the hit compounds were observed to have high chance of passive absorption by the gastrointestinal tract while the other three have low tendency of passive absorption. More so, only compounds 16 and 27 have higher bioavailability scores, and none of the two have more than one violation of the RO5 criteria. The cause of efficiency of compounds 16 and 27 might be as a result of good pharmacokinetic profiles and drug-likeness properties possessed by the molecules when compared to other hit compounds.

1 Background

Among the hurdles faced by medicinal chemists was the discovery of inhibitors for mutant-selective kinase and was among the primary interest for epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors [1]. The treatment of EGFR to control non-small cell lung cancers with the T790M resistance mutation prevails as a vital medical necessity [2].

Lung cancer has been reported to be among the leading cancer cases in the world. It was also reported to have caused a lot of death every year and accounted for about one-third of the whole cancer deaths. The main subset of lung cancers that accounted for about 85% of the problems of lung cancer was NSCLC [3]. The most common cause of NSCLCs that mostly affects women and cigarette smokers was recognized to be overexpression of EGFR tyrosine kinase. It was found in about 10–15% and 30–40% of the population of patients in Caucasia and Asia [3].

NSCLC therapeutic agents manifest a very high response rate in patients with stimulating changes of EGFR. NSCLC therapeutic agents are categorized into two different classes: the first class is reversible NSCLC therapeutic agents (first-generation EGFR inhibitors) which include gefitinib and erlotinib. The second class is referred to as irreversible inhibitors and consists of the second and third generation NSCLC therapeutic agents. The second and third generation NSCLC therapeutic agents include afatinib and osimertinib. All these classes of drugs mentioned were designed purposely for the treatment of NSCLC. Most especially, the first-generation reversible NSCLC therapeutic agents were designed to manage EGFRL858R mutations. The second-generation irreversible NSCLC therapeutic agents were designed for the treatment of EGFRT790M mutations. And the third-generation irreversible NSCLC therapeutic agents were designed for the medication of EGFRT790M/L790M double mutations [1, 4,5,6].

QSAR is a computer-aided molecular modeling technique which quantitatively relates experimentally determined biological activities (response variable) of a molecule and its physicochemical properties (molecular descriptors) [7]. In addition, QSAR modeling is used to develop a model which could be used to predict the activities of newly designed small molecules [8]. Molecular docking is an in silico virtual screening method applied in computer-aided drug design used to elucidate how ligand and receptor interact with one another using their individual 3D structures [9]. The drug-likeness and pharmacokinetic properties of a drug give an insight on how the body responds to the administration of this drug. Therefore, there is a need to study the drug-likeness and pharmacokinetic properties of this drug before it reaches the final (clinical) stage [10].

The aim of this work is to develop a model with good predictive power using QSAR modeling technique, to screen and identify hit among the compounds under investigation (by elucidating the mode of binding interactions between the NSCLC therapeutic agents (ligands) and the EGFR tyrosine kinase enzyme) using molecular docking simulation, and also to predict their drug-likeness and pharmacokinetic properties.

2 Method

2.1 Sourcing of dataset

A set of thirty five (35) N-(5-((5-chloro-4-((2-(isopropylsulfonyl) phenyl) amino) pyrimidin-2-yl) amino)-4-methoxy-2-(4-methyl-1, 4-diazepan-1-yl) phenyl) acrylamide derivatives as potent NSCLC therapeutic agents with their inhibitory activities (GI50) in μM, synthesized under the same condition sharing the same assayed procedure with significant variations in their structure and potency, were downloaded from the literature of Chen et al. for this research [11]. The corresponding inhibitory activities (GI50) of these potent NSCLC therapeutic agents were then converted to their pGI50 using Eq. 1 shown below [12]. Table 1 presents the structural formula, GI50, and pGI50 for all the dataset used in this research.

$$ {\mathrm{pGI}}_{50}=-\log\ {\mathrm{GI}}_{50}\times {10}^{-6} $$
(1)
Table 1 Structural formula, GI50, and pGI50 of the data set

2.2 Stable structure generations and structure sketching

Before stable structure generations, the sketching of the 2D structures of the studied NSCLC therapeutic agents must be done, and this was achieved using the Chemdraw software version 12.0.2 [13]. The Spartan 14 software was used to transform the 2D structures of the sketched NSCLC therapeutic agents to 3D structures before energy minimization (it is achieved by direct importation of the 2D structures to the interface of the software). Also, prior to stable structure generations, there is need to remove constrain from the generated 3D structures, and this was achieved via energy minimization [14]. Stable structure generation is a process of determining the optimum structure of a compound, and this was performed by utilizing the Spartan 14 software. The determination of the optimum structure of all the NSCLC therapeutic agents was achieved adopting density functional theory method at B3LYP/6-311G* level of theory [15].

2.3 Independent variable (descriptors) computation, removal of constant/redundant variables, and data separation

For the computation of the independent variables (descriptors), the most stable structures generated were saved in SDF, a file format that is recognized by the software used in computing descriptors (PaDEL descriptor tool kit) [16]. PaDEL descriptor tool kit was used to compute both fragment count descriptors, topological descriptors, and geometrical descriptors [17]. Pre-treatment of data is very vital in QSAR modeling which helps in eliminating constant and redundant descriptors from the data before model development so as to allow GFA select most significant descriptors. In present study, data pre-treatment was performed manually. Another crucial point in QSAR modeling is development of model building (training) and validation (test) sets. As the name implies, model building set is used to develop the model, and the validation set is used in verifying the built model. Data division software retrieved from DTC lab was moreover utilized in splitting the data into model building set which contains 30 molecules and validation set of 5 molecules utilizing Kennard-Stone algorithm in this regard [18].

2.4 Building of the model

In developing the models, the actual pGI50 was used as the response parameter while the descriptors were used as independent parameter. Variable selection is very important in building QSAR models. In view of this, the models were built by adopting multi-linear regression (MLR) analysis using genetic function approximation (GFA) method in which it creates an original population of descriptor sets and determines the most suitable set from it by utilizing evolutionary crossover and mutation speculators which generates a succeeding derivative population of descriptor sets [19]. One of the distinct characteristics of GFA is that it selects highly significant independent variables to generate thousands of models so as to choose the most significant among the generated models [20]. Equation 2 below presents the MLR-GFA equation for the model:

$$ {\mathrm{pGI}}_{50}={A}_1{b}_1+{A}_2{b}_2+\dots \dots +C $$
(2)

where A’s are the descriptors, b’s are the coefficient of the corresponding descriptors, and C is the regression constant.

2.5 Validation of the model built

Validation of QSAR model is of utmost importance. This is why a QSAR model is not considered valid unless it undergoes so many assessment, which if it passes then it can be used. The parameters used in evaluating or validating the quality of a QSAR model were the squared coefficient of correlation for the training set (R2training), adjusted R2 (R2 adj), cross-validation coefficient (Qcv2), and squared coefficient of correlation for the test set (R2 test). The equations for the mentioned assessment parameters are given below [21]:

$$ {{\mathrm{R}}^2}_{\mathrm{training}}=1-\frac{\sum {\left({\boldsymbol{x}}_{\boldsymbol{obs.}}-{\boldsymbol{x}}_{\boldsymbol{pred}.}\right)}^2}{\sum {\left({\boldsymbol{x}}_{\boldsymbol{obs.}}-{\overline{\boldsymbol{x}}}_{\boldsymbol{training}}\right)}^2} $$
(3)

where xobs., xpred., and \( {\overline{\boldsymbol{x}}}_{\mathbf{training}} \) represents the actual, estimated, and mean activities of the model building set. The R2 value was established to rely on the number of descriptors in the model.

Therefore, the R2 value must be adjusted. The adjusted R2 is computed utilizing Eq. 4 below:

$$ {\boldsymbol{R}}_{\mathbf{adj}.}^{\mathbf{2}}=\mathbf{1}-\left(\mathbf{1}-{\boldsymbol{R}}^{\mathbf{2}}\right)\frac{\boldsymbol{a}-\mathbf{1}}{\boldsymbol{a}-\boldsymbol{b}-\mathbf{1}}=\frac{\left(\boldsymbol{a}-\mathbf{1}\right){\boldsymbol{R}}^{\mathbf{2}}-}{\boldsymbol{a}-\boldsymbol{b}+\mathbf{1}} $$
(4)

where b represents the number of descriptors used in the model and a represents the number of compounds in the model building set.

$$ {\boldsymbol{Q}}_{\boldsymbol{CV}}^{\mathbf{2}}=\mathbf{1}-\frac{\sum_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{n}}{\left({\boldsymbol{Y}}_{\mathbf{\exp}.}-{\boldsymbol{Y}}_{\mathbf{pred.}}\right)}^{\mathbf{2}}}{\sum_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{n}}{\left({\boldsymbol{Y}}_{\mathbf{\exp}.}-\overline{\boldsymbol{Y}}\right)}^{\mathbf{2}}}\kern0.5em $$
(5)

where \( {Y}_{\exp .},{Y}_{\mathrm{pred}.},\mathrm{and}\ \overline{Y} \)are trial, foretold, and the mean inhibition activity values of the training set compounds [22].

The generated model can then be validated externally to confirm its predictive power and reliability. It is achieved using the validation set compounds. The external predictive power of the model was estimated using the expression shown below [23]:

$$ {\boldsymbol{R}}_{\mathbf{test}}^{\mathbf{2}}=\mathbf{1}-\frac{\sum {\left({\boldsymbol{x}}_{\mathbf{pred}{.}_{\mathbf{test}}}-{\boldsymbol{x}}_{{\mathbf{\exp}.}_{\mathbf{test}}}\right)}^{\mathbf{2}}}{\sum {\left({\boldsymbol{x}}_{\mathbf{pred}{.}_{\mathbf{test}}}-{\overline{\boldsymbol{x}}}_{\mathbf{Training}}\right)}^{\mathbf{2}}}\kern0.5em $$
(6)

where \( {\boldsymbol{x}}_{\mathbf{pred}{.}_{\mathbf{test}}} \) and \( {\boldsymbol{x}}_{{\mathbf{\exp}.}_{\mathbf{test}}} \) are the estimated and actual activities of the validation set, and \( {\overline{\boldsymbol{x}}}_{\mathbf{Training}} \) is the mean of actual activity of the model building set compounds.

Due to some reasons, the values of these parameters are okay and important but not enough to justify the reliability of a model [24]. In view of this, the model has to be subjected to other test such as applicability domain, variation inflation factor, and mean effect.

The multi-collinearity of all the independent variables in the reported model is ascertain by computing the variation inflation factors (VIF) for each. The VIF help in identifying whether these independent variables correlate with one another or not. There is no correlation between the descriptors if their estimated VIF values are equal to 1; there is high possibility of accepting the model if their estimated VIF values are between 1 and 5; and if their estimated VIF values are greater than 10, then the model is therefore rejected not accepted [25]. The VIF value can be calculated using the equation below:

$$ \mathrm{VIF}=\frac{1}{1-{R}^2} $$
(7)

In order to evaluate the individual contribution and participation of each descriptor to the selected model, the mean effect (ME) of each descriptor is therefore calculated. The equation used in calculating the ME is shown below:

$$ \mathrm{MEj}=\frac{B_j\ {\sum}_{j=1}^{i=n}{d}_{ij}}{\sum_j^m{B}_j\ {\sum}_i^n{d}_{ij}} $$
(8)

where ME represents the mean effect of a descriptor j in a model, the coefficient of the descriptor J is represented by βj in the model and the value of the independent variables for each compound in the training set is dij, n is the number of compounds in the training set, and m is the number of descriptor that appear in the model [26].

2.6 Evaluating of applicability domain

The domain of applicability is studied to ensure the reliability of the prediction of the built MLR model. It is also useful in identifying compounds that are distinct to the training set compounds (influential compounds) or response outliers (compounds with standardized residual outside the square area of the model). The method adopted in this research was the leverage approach which is the plot of the standardized residual against the leverages for both the training and test set compounds. The reported model was subjected to AD using the leverage approach [27].

2.7 Docking simulation

For the docking simulation, the virtual screening software used in this research were AutoDock Vina of Pyrex, Discovery studio, and UCSF Chimera on a Dell computer system Latitude E6520 to screen and identify hit compounds by elucidating the binding mode between the binding pose of the target receptor and the NSCLC therapeutic agents.

2.8 EGFR tyrosine kinase enzyme and ligand preparation for the docking simulation

The EGFR tyrosine kinase enzyme with pdb code: 3IKA in complex with WZ4002 was downloaded from the Protein Data Bank (https://www.rcsb.org) and used as the target receptor for the NSCLC therapeutic agents in this research. Discovery Studio Visualizer version 16.1.0.15350 was adopted in preparing the EGFR tyrosine kinase enzyme for the docking simulation. The preparation process of the target receptor started by adding hydrogen, then followed by the elimination of co-ligands, water molecule, and heteroatoms from the structure of the target receptor and saved in protein data bank file format. The prepared structure of the target receptor is shown in Fig. 1. The NSCLC therapeutic agents were prepared by saving the already determined optimum structures in 2.2 above saved in protein data bank file with the help of the Spartan’14 wave software [14]. The prepared structure of one NSCLC therapeutic agent among the dataset is shown in Fig. 2.

Fig. 1
figure 1

Prepared structure of EGFR tyrosine kinase enzyme

Fig. 2
figure 2

Prepared structure of a NSCLC therapeutic agent

2.9 Execution of the docking simulation

The docking simulation of the NSCLC therapeutic agents into the binding site of the target receptor (Met793, Ala743, Met790, Leu844, Leu844, Leu718, and Val726, these binding sites were determined by visualizing the co-crystalline structure of WZ4002 in the binding site of the enzyme) was carried out using AutoDock Vina of Pyrex software [28]. Re-coupling of the complexes for further investigation was achieved with the help of the UCSF Chimera software [29]. For further investigation of the binding mode interactions of the complexes, a discovery studio visualizer software was used to elucidate the 2D structures of all the reported complexes [30, 31].

2.10 Drug-likeness and pharmacokinetic property prediction

The drug-likeness and pharmacokinetic properties of these NSCLC therapeutic agents were predicted utilizing a free online web tool (SwissADME) (http://www.swissadme.ch/index.php) used in predicting drug-likeness and pharmacokinetic properties of drugs [32]. The input file format for SwissADME is simplified molecular input line entry specification (SMILES) which contains a unit compound by line separated by a space with a title or without a title. The computation can be setup when the molecule is ready by clicking on the “Run” button [32].

Lipinski’s rule of five filter is mostly used as the criterion to ascertain whether a molecule is impermeable or badly absorbed. A molecule is considered to be orally bioavailable if it does not violate more than 2 of the RO5 [33].

3 Result

3.1 QSAR study

The results of the QSAR study are given in Tables 2, 3, 4, and 5 and Figs. 3 and 4.

Table 2 General limit required for the QSAR model assessment
Table 3 Descriptions, full name, and categories of descriptors contained in the reported model
Table 4 The pGI50, predicted pGI50, and the residual values for the studied molecules
Table 5 VIF, ME, and correlation statistical analysis of descriptors of the reported model
Fig. 3
figure 3

Scatter plot of predicted pGI50 versus the actual pGI50 for the reported model

Fig. 4
figure 4

Plot of residuals versus the actual pGI50 for the reported model

The selected and reported model is given by the equation below with the following validation terms: R2 of 0.8764, R2adj of 0.8370, Qcv2 of 0.7655, R2test of 0.7024, and LOF of 0.3312

$$ {\mathbf{pGI}}_{\mathbf{50}}=2.797519677\ast \mathbf{ATSC}\mathbf{8}\mathbf{c}-1.977464485\ast \mathbf{MATS}\mathbf{8}\mathbf{s}-1.229853317\ast \mathbf{GATS}\mathbf{5}\mathbf{p}-0.735278765\ast \mathbf{VR}\mathbf{1}\_\mathbf{Dt}+1.186969524\ast \mathbf{minssCH2}+2.607601502\ast \mathbf{RDF}\mathbf{1}\mathbf{20}\mathbf{m}+0.834211273\ast \mathbf{RDF}\mathbf{1}\mathbf{25}\mathbf{m}+4.685695 $$

3.2 Docking simulation

The results of the docking simulation are presented in Table 6 and Fig. 6.

Table 6 The ligand-receptor, binding affinity, hydrogen bond, bond distance, and other interaction of some selected ligands

3.3 Drug-likeness and pharmacokinetic property prediction

The results of the drug-likeness and pharmacokinetic property prediction are shown in Tables 7 and 8 and Figs. 7 and 8 respectively.

Table 7 Drug-likeness properties
Table 8 Pharmacokinetic properties

4 Discussion

4.1 QSAR studies

Using the model building set, five (5) different models were built using MLR-GFA method. Among these five models, the best model was selected and reported since it has passed the minimum requirements for the evaluation of a valid QSAR model as reported by Veerasamy et al. as presented in Table 2 [23].

The descriptions of the descriptors contained in the reported model are shown in Table 3. The negative coefficients of MATS8s, GATS5p, and VR1_Dt descriptors clearly indicated their negative contribution to the inhibitory activities of the NSCLC therapeutic agents. It means that when the amount of these independent descriptors is reduced in the structures of these NSCLC therapeutic agents under investigation, there might be an improvement in the potency of these NSCLC therapeutic agents toward their target receptor (EGFR tyrosine kinase enzyme) and reverse is the case. On the other side, the positive coefficient of ATSC8c, minssCH2, RDF120m, and RDF125m descriptors in the model gave the positive contributions of these independent descriptors to the inhibitory activities of the NSCLC therapeutic agents under investigation. It means when the amount of these descriptors in the compositions/structures of these NSCLC therapeutic agents are increased, there might be an improvement in the potency of these NSCLC therapeutic agents toward their target receptor and vice versa.

4.1.1 Description of the descriptors that appear in the reported model

ATSC8c is an average centered Broto-Moreau autocorrelation; the ATS descriptor is a graph invariant describing how the property considered is distributed along the topological structure and can be seen as a special case in which other types of descriptors can also be derived from [34]. The recognized spatial autocorrelation on a molecular graph G is given as

$$ {ATS}_k=\frac{1}{2}\cdot \sum \limits_{i=1}^A\sum \limits_{j=1}^A{w}_i\cdot {w}_j\cdot \updelta \left({d}_{ij};k\right)=\frac{1}{2}\cdot \left({\mathrm{w}}^{\mathrm{T}}{\cdot}^k\mathrm{B}\cdot \mathrm{w}\right) $$

MATS8s is a Moran autocorrelation which is applied to a molecular graph. Moran coefficient usually takes value in the interval [− 1, + 1]. Positive autocorrelation corresponds to positive values of the coefficient whereas negative autocorrelation produces negative values [34]. It can be defined as

$$ {I}_k=\frac{\frac{1}{\Delta_k}\cdot \sum \limits_{i=1}^A\sum \limits_{j=1}^A\left({w}_i-\overline{w}\right)\cdot \updelta \left({d}_{ij};k\right)}{\frac{1}{A}\cdot \sum \limits_{i=1}^A{\left({w}_i-\overline{w}\right)}^2} $$

Radial distribution function descriptors (RDF descriptors) were suggested based on a radial arrangement function distinct from that generally used to determine molecular changes I (s) (Hemmer, Steinhauer et al., 1999). The radial distribution function chosen here is that one frequently utilized for the description of the diffraction patterns gotten in powder X-ray diffraction experiments.

Ideally, the radial distribution function of a collection of atoms B may be described as the possible occurrence to obtain an atom in a spherical volume of radius R. The common mode of the radial distribution function is expressed by the equation below

$$ g(R)=f\cdot \sum \limits_{i=1}^{A-1}\sum \limits_{j=i+1}^A{w}_i\cdot {w}_j\cdot {e}^{-\upbeta \cdot {\left(R-{r}_{ij}\right)}^2} $$

Table 4 shows the pGI50, predicted pGI50, and the residual values for all the molecules under investigation. The high predicted power of the reported model was confirmed by the low residual values observed between the experimental and predicted pGI50 in the table (meaning that the reported model was reliable with high predicted power). Furthermore, Fig. 3 presents the plot of the predicted pGI50 versus actual pGI50 for the test and training sets compounds, the distribution of the predicted pGI50 and the actual pGI50 of the test and training set compounds throughout the line reaffirmed the reliability of the model. More so, the R2 values of both the internal validation (0.8175) and that of the plot (0.8764) agreed with one another which further confirmed the stability and reliability of the reported model. On the other hand, Fig. 4 presents the scatter plot of the residuals against actual pGI50 in which the unusual occurrence of these residuals of both sets on the upper and lower sides of zero on the plot confirm that the reported model was free from methodological error (systematic deviations).

The correlation statistical analysis on the independent descriptors in the reported model shown in Table 5 indicated that no relationship exists between the descriptors contained in the reported model. This clearly portrayed the high performance of the descriptors used in developing the reported model.

The variation inflation factor (VIF) values were further used to confirm if there is multi-collinearity problem or not in the descriptors of the training set used in building the model. The VIF of all descriptors in the training set were estimated and realized to be within the acceptable range presented in Table 5 (meaning the values are less than 10 for all the descriptors). This confirms the absence of multi-collinearity problem in the descriptors used in building the reported model.

The mean effect (ME) values for all the descriptors were computed to ascertain the participation and individual contribution of a descriptor in opposition to other ones in the selected model and presented in Table 5. The indicator for either increase or decrease in potency of the molecules is the sign of the coefficient of each descriptor in the model. If a descriptor in the model has a positive coefficient it means that an increase in such descriptor may increase the potency of the molecules. But when a descriptor has negative coefficient, it indicates that an increase in such descriptor may decrease the potency of the molecules. Whereas the coefficient of the descriptors indicate the degree of contribution of each descriptor in the model. It is observed that from the model and ME values (Table 5), ATSC8c descriptor gave the highest positive contribution both in the model and ME analysis with + 2.797519677 and + 0.796318. MATS8s gave the lowest negative contribution in both the model and ME analysis with − 1.977464485 and − 0.43128.

The applicability domain (AD) of the reported model was achieved by the plot of the standardized residuals against leverages of both the test and training sets (Williams’ plot) as shown in Fig. 5. The AD is carried out to identify compounds with standardized residuals greater than three standard deviation unit (outliers) and compounds with leverage values greater than the warning leverage h* (influential) in the data used in building the model. Apart from that, it is also used to ascertain the quality of prediction of a model and prevent the misuse of the results obtain by the model. From the plot, four (4) compounds (influential compounds) of the test set with their leverage value greater than the threshold/warning leverage (h*) of 0.83 were identified. No influential or response outlier was identified from the training set which means the model was valid and void. It is very paramount to decipher that these molecules with leverage value greater than the threshold are not put into consideration when designing new NSCLC therapeutic agents. These molecules might be structurally different from those used to generate the reported model and thus may have different mechanism of action

Fig. 5
figure 5

Williams’ plot of the selected model

4.2 Docking simulation

Molecular docking as in silico virtual screening tool was used to elucidate the nature of binding interactions between the NSCLC therapeutic agents and the binding pose of EGFR tyrosine kinase receptor (pdb code: 3IKA which is selected based on published literature) in this research (Supplementary Table 1). Five hit compounds were identified by the virtual screening technique. Compound 29 was identified to have the highest binding affinity of − 8.8 kcal/mol (Table 6). The compound was seen to have interacted with the binding pose of EGFR tyrosine kinase receptor through hydrophobic bond interaction with Phe795, Gly796, Ala743, Leu844, Leu718, Val726, Ala743, and Lys745 amino acid residues of the EGFR tyrosine kinase receptor. More so, it also interacted with EGFR tyrosine kinase receptor via electrostatic bond interaction with lys745 amino acid residue. Next compound in the trend with higher binding affinity (− 8.7 kcal/mol) was compound 12 as shown in Table 6. The interactions of the compound in the binding pose of the EGFR receptor were through hydrogen bond with UNK1 Arg841, Asp855, Glu762, and Glu762 amino acid residues with bond distances of 2.49587 (Å), 3.7987 (Å), 3.25863 (Å), 3.72606 (Å), and 3.72287 (Å). It also interacted with the active site of the EGFR receptor via hydrophobic interactions with Leu718, Leu718, Lys745, Val726, Leu844, Lys728, and Leu792 amino acid residues. Also, among the compounds with good binding affinity was compound 16. The interaction of compound 16 in the binding pose of the EGFR receptor was through hydrogen bond with Met793, Lys728, Glu762, and Asp855 residues with bond distances of 2.86496 (Å), 2.27045 (Å), 2.54982 (Å), 3.59983 (Å), and 3.60362 (Å) respectively. It also interacted with the binding pose of the EGFR receptor via hydrophobic interaction with Leu718, Leu718, Lys745, Val726, Leu844, Lys728, and Leu792. The rest other two complexes among the reported ones interacted in the binding pose of the EGFR receptor through electrostatic interactions, hydrogen bond interactions, and hydrophobic bond interactions as shown in Table 6. Figure 6 showed the 2D structures of compounds 29, 12, and 16 in complex with the receptor (3IKA). Based on the molecular docking results, the most common amino acid residues to all of the hit compounds under investigation were Glu762, Leu718, Lys745, and Val726 (Table 6), and these important amino acid residues might be responsible for the higher binding affinity of the reported compounds.

Fig. 6
figure 6

2D view of a complex 29, b complex 12, and c complex 16 using Discovery studio visualizer

4.3 Drug-likeness and pharmacokinetic property prediction

The drug-likeness and pharmacokinetic properties of all the NSCLC therapeutic agents were predicted and presented in Supplementary Table 2 and 3. Based on the results of the molecular docking, the drug-likeness properties of the hit compounds ware reported and presented in Tables 7 and 8. From the table, no molecule among these reported ones violated more than the permissible limit of the conditions stated by Lipinski’s rule of five filters (MW˂ 500, HBD ≤ 5, HBA ≤ 10, Log p ≤ 5, and PSA ˂ 140 Å2). As such, these molecules are expected to be very active pharmacologically. The bioavailability radar plot of lipophilicity, size, polarity, solubility, saturation, and flexibility further reaffirmed the drug-likeness properties of all the reported molecules (Fig. 7). The painted pink area shows the range for each property (XLOGP3 between − 0.7 and + 5.0, MW between 150 and 500 g/mol, TPSA between 20 and 130 A2, log S not higher than 6, fraction of carbons in the sp3 hybridization not less than 0.25, and no more than 9 rotatable bonds). Based on the condition mentioned, all the molecules might be orally bioavailable even though they were all too flexible and lipophilic.

Fig. 7
figure 7

The plot of lipophilicity, size, polarity, solubility, saturation, and flexibility of a molecule 29, b molecule 12, and c molecule 16

Table 8 presents the pharmacokinetic properties of the reported molecules. From the table, only molecules 16 and 27 have high probability of passive absorption by the gastrointestinal tract while others have low tendency of passive absorption. None of the reported molecules was found to have high probability of brain penetration. Molecules 12, 27, and 29 were predicted to be actively effluxed by P-gp and the other two were predicted as non-substrate of P-gp. Also, only molecules 16 and 27 have higher bioavailability scores (this confirmed the oral bioavailability and permeability of these molecules among the reported molecules and also they have low toxicity level and good absorption properties). The boiled-egg plot (Fig. 8) of TPSA against WLOGP was used to portray the graphical presentation of the brain penetration and gastrointestinal absorption of the reported molecules. From Fig. 8, it can be clearly seen that all the NSCLC therapeutic agents were outside BBB region (yellow) but some were within the GI absorption region (white color) and some were predicted to be actively effluxed by P-gp (blue in color) and then some were predicted as non-substrate of P-gp (red color).

Fig. 8
figure 8

The plot of WLOGP against TPSA for all the molecules

5 Conclusion

In conclusion, QSAR technique was used to build a model with a very high predictive power on some thirty five (35) NSCLC therapeutic agents. The reported model was found to be statistically fit by passing validation techniques employed on it with the validation parameters: R2 of 0.8764, R2adj of 0.8370, Qcv2 of 0.7655, R2test of 0.7024 and LOF of 0.3312 such as internal and external validations and AD. The molecular docking results showed that the most common amino acid residues to all of the reported complexes were Glu762, Leu718, Lys745, and Val726, and these important amino acid residues might be responsible for the higher inhibitory activities/binding affinity of the reported compounds. More so, the drug-likeness and pharmacokinetic properties of all the NSCLC therapeutic agents were predicted using SwissADME and indicated that molecules 16 and 27 among the hit have high probability of passive absorption by the gastrointestinal tract while the other three have low tendency of passive absorption and also none of the reported molecules was found have high probability of brain penetration. Also, only molecules 16 and 27 have higher bioavailability scores. Based on this finding, it is suggested that when designing new NSCLC therapeutic agents these hit compounds with good binding affinity and pharmacokinetic profile should be considered for structural modifications. And also, in vivo and in vitro assay for the ADME properties should be validated experimentally.

Availability of data and materials

Not applicable

Abbreviations

QSAR:

Quantitative structure-activity relationship

MLR:

Multi-linear regression

GFA:

Genetic function algorithm

DFT:

Density function theory

B3LYP:

Becke’s three-parameter read-Yang-Parr hybrid

PDB:

Protein Data Bank

NSCLC:

Non-small cell lung cancer agents

EGFR:

Epidermal growth factor receptor

VIF:

Variation inflation factor

MF:

Mean effect

ADME:

Absorption, distribution, metabolism, and excretion

References

  1. Song J, Jang S, Lee JW, Jung D, Lee S, Min KH (2019) Click chemistry for improvement in selectivity of quinazoline-based kinase inhibitors for mutant epidermal growth factor receptors. Bioorg Med Chem Lett 29:477–480

    Article  CAS  PubMed  Google Scholar 

  2. Hanan EJ et al (2016) 4-Aminoindazolyl-dihydrofuro [3, 4-d] pyrimidines as non-covalent inhibitors of mutant epidermal growth factor receptor tyrosine kinase. Bioorg Med Chem Lett 26:534–539

    Article  CAS  PubMed  Google Scholar 

  3. Kong L-L, Ma R, Yao M-Y, Yan X-E, Zhu S-J, Zhao P, Yun C-H (2017) Structural pharmacological studies on EGFR T790M/C797S. Biochem Biophys Res Commun 488:266–272

    Article  CAS  PubMed  Google Scholar 

  4. Cross DA et al (2014) AZD9291, an irreversible EGFR TKI, overcomes T790M-mediated resistance to EGFR inhibitors in lung cancer. Cancer Disc 4:1046–1061

    Article  CAS  Google Scholar 

  5. Solca F et al (2012) Target binding properties and cellular activity of afatinib (BIBW 2992), an irreversible ErbB family blocker. J Pharmacol Exp Ther 343:342–350

    Article  CAS  PubMed  Google Scholar 

  6. Tsao M-S et al (2005) Erlotinib in lung cancer—molecular and clinical predictors of outcome. N Engl J Med 353:133–144

    Article  CAS  PubMed  Google Scholar 

  7. Ojha Lokendra K, Rachana S, Rani BM (2013) Modern drug design with advancement in QSAR: a review. Int J Res Biosci 2:1–12

    Google Scholar 

  8. Abdulfatai U, Uba S, Umar BA, Ibrahim MT (2019) Molecular design and docking analysis of the inhibitory activities of some α_substituted acetamido-N-benzylacetamide as anticonvulsant agents SN. Appl Sci 1:499

    Google Scholar 

  9. Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3:935

    Article  CAS  PubMed  Google Scholar 

  10. Khan MF, Verma G, Akhtar W, Shaquiquzzaman M, Akhter M, Rizvi MA, Alam MM (2016) Pharmacophore modeling, 3D-QSAR, docking study and ADME prediction of acyl 1, 3, 4-thiadiazole amides and sulfonamides as antitubulin agents. Arab J Chem

  11. Chen Y et al (2017) Discovery of N-(5-((5-chloro-4-((2-(isopropylsulfonyl) phenyl) amino) pyrimidin-2-yl) amino)-4-methoxy-2-(4-methyl-1, 4-diazepan-1-yl) phenyl) acrylamide (CHMFL-ALK/EGFR-050) as a potent ALK/EGFR dual kinase inhibitor capable of overcoming a variety of ALK/EGFR associated drug resistant mutants in NSCLC. Eur J Med Chem 139:674–697

    Article  CAS  PubMed  Google Scholar 

  12. Abdullahia, M., Shallangwaa, G. A., Ibrahima, M. T., Bello, A. U., Arthura, D. E., Uzairua, A., Mamzaa, P. (2018) QSAR studies on some C14-urea tetrandrine compounds as potent anti-cancer agents against leukemia cell line (K562) JKS-S

  13. Mills, N. (2006). ChemDraw Ultra 10.0 CambridgeSoft, 100 CambridgePark Drive, Cambridge, MA 02140. www. cambridgesoft. com. Commercial Price: 1910fordownload, 2150 for CD-ROM; Academic Price: 710fordownload, 800 for CD-ROM: ACS Publications.

  14. Ibrahim MT, Uzairu A, Shallangwa GA, Uba S (2019a) QSAR modelling and docking analysis of some thiazole analogues as⍺-glucosidase inhibitors. J Eng Exact Sci 5:0257–0270

    Article  Google Scholar 

  15. Kohn W, Becke AD, Parr RG (1996) Density functional theory of electronic structure. J Phys Chem 100:12974–12980

    Article  CAS  Google Scholar 

  16. Ibrahim MT, Uzairu A, Shallangwa GA, Uba S (2020a) Computer-aided molecular modeling studies of some 2, 3-dihydro-[1, 4] dioxino [2, 3-f] quinazoline derivatives as EGFR WT inhibitors. Beni-Suef Univ J Basic Appl Sci 9:1–10

    Article  Google Scholar 

  17. Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474

    Article  CAS  PubMed  Google Scholar 

  18. Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137–148

    Article  Google Scholar 

  19. Ibrahim MT, Uzairu A, Uba S, Shallangwa GA (2020b) Computational modeling of novel quinazoline derivatives as potent epidermal growth factor receptor inhibitors. Heliyon 6:e03289

    Article  PubMed  PubMed Central  Google Scholar 

  20. Adedirin O, Uzairu A, Shallangwa GA, Abechi SE (2018a) Computational studies on α-aminoacetamide derivatives with anticonvulsant activities. Beni-Suef Univ J Basic Appl Sci 7:709–718

    Article  Google Scholar 

  21. Grisoni F, Ballabio D, Todeschini R, Consonni V (2018) Molecular descriptors for structure–activity applications: a hands-on approach. Comput Toxicol:3–53 Springer

  22. Mustapha A, Shallangwa G, Ibrahim MT, Bello AU, Ebuka DA, Uzairu A, Mamza P (2018) QSAR studies on some C14-urea tetrandrine compounds as potent anti-cancer against leukemia cell line (K562). J Turkish Chem Soc Section A Chem 5:1387–1398

    Google Scholar 

  23. Veerasamy R, Rajak H, Jain A, Sivadasan S, Varghese CP, Agrawal RK (2011) Validation of QSAR models-strategies and importance. Int J Drug Design Disc 3:511–519

    Google Scholar 

  24. Tropsha A, Bajorath JR (2015) Computational methods for drug discovery and design. ACS Publications

  25. Beheshti A, Pourbasheer E, Nekoei M, Vahdani S (2016) QSAR modeling of antimalarial activity of urea derivatives using genetic algorithm–multiple linear regressions. J Saudi Chem Soc 20:282–290

    Article  CAS  Google Scholar 

  26. Adedirin O, Uzairu A, Shallangwa GA, Abechi SE (2018b) QSAR and molecular docking based design of some n-benzylacetamide as γ-aminobutyrate-aminotransferase inhibitors. J Eng Exact Sci 4:0065–0084

    Article  Google Scholar 

  27. Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Mol Inform 22:69–77

    CAS  Google Scholar 

  28. Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461

    PubMed  PubMed Central  CAS  Google Scholar 

  29. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612

    Article  CAS  PubMed  Google Scholar 

  30. Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA (2009) Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 5:e1000585

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Rizvi SMD, Shakil S, Haneef M (2013) A simple click by click protocol to perform docking: AutoDock 4.2 made easy for non-bioinformaticians. EXCLI J 12:831

    PubMed  PubMed Central  Google Scholar 

  32. Daina A, Michielin O, Zoete V (2017) SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep 7:42717

    Article  PubMed  PubMed Central  Google Scholar 

  33. Ismail SY, Uzairu A, Sagagi B, Sabiu M (2018) In silico molecular docking and pharmacokinetic study of selected phytochemicals with estrogen and progesterone receptors as anticancer agent for breast cancer. 5:1337–1350

  34. Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices, references (Vol. 41). Wiley

Download references

Acknowledgements

The authors acknowledge the technical effort of Ahmadu Bello University, Zaria–Nigeria.

Funding

The authors declare no funding has been received.

Author information

Authors and Affiliations

Authors

Contributions

1. MTI: contributed throughout the research work. 2. AU: gives directives and technical advices. 3. GAS: partakes in technical activities. 4. SU: also partakes in technical activities. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Muhammad Tukur Ibrahim.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Supplementary Table 1.

The Ligand-Receptor, Binding affinity, Hydrogen bond, Bond distance, and other interaction for the remaining compounds. Supplementary Table 2. Drug-likeness properties for the compounds under investigation. Supplementary Table 3. ADME properties for the compounds under investigation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibrahim, M.T., Uzairu, A., Uba, S. et al. Quantitative structure-activity relationship, molecular docking, drug-likeness, and pharmacokinetic studies of some non-small cell lung cancer therapeutic agents. Beni-Suef Univ J Basic Appl Sci 9, 49 (2020). https://doi.org/10.1186/s43088-020-00077-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s43088-020-00077-5

Keywords