Introduction

The name “protein family” was proposed by Dayhoff in the 1960s to describe proteins with similar structure and/or function, which have followed the evolution process from a common ancestor protein (Dayhoff 1969). The GPCR family are physiologically significant membrane proteins that are associated with different signalling pathways and represent one of the most significant target-classes for drug discovery. Two important papers were published about GPCRs in Nature and Nature Review Cancer describing the molecular signatures (Venkatakrishnan et al. 2013) and mutational landscape of GPCRs (O'hayre et al. 2013). Several papers were published about GPCRs in Nature and Cell group journals demonstrating the importance of this family. GPCRs were classified into six families which are- (i) secretin-receptor family, (ii) rhodopsin family, (iii) metabotropic glutamate receptor family, (iv) fungal pheromone A- and M-factor receptors, (v) cyclic-AMP receptors, and (vi) fungal pheromone P- and α-factor receptors from Dictyostelium (Kolakowski Jr 1994). The secretin receptor family, a well-known member of GPCRs, abundantly expression are found in acinar cells as well as ductal epithelial cells, with negligible levels within pancreatic vascular structures as well as in islets (Siu et al. 2006; Ulrich et al. 1998b). These family members are most abundant in the liver (Alpini et al. 1994), stomach (Bawab et al. 1988) (Gespach et al. 1981; Li et al. 1998), brain (Fremeau et al. 1983; Köves et al. 2002) (Nozaki et al. 2002; Yung et al. 2001), reproductive system (Chow et al. 2004), intestine (Andersson et al. 2000), heart (Ishihara et al. 1991), along with lungs (Christophe et al. 1981). GPCRs have extreme therapeutic potential. These receptors are presently used as drug targets of several diseases (Araç and Leon 2020; Purcell and Hall 2018; Vass et al. 2018).

The secretin family, also known as the ‘B family’ (Cardoso et al. 2005; Hamby and Hirst 2008), consists of classical receptors (Cardoso et al. 2005). B family is grouped into three subfamilies: B1, B2, and B3 (Kolakowski Jr 1994). The B1 family consists of 15 protein members (Kolakowski Jr 1994; Cardoso et al. 2014). In our study, we analysed the evolutionary relatedness of 13 members of B1 subfamily of the secretin receptor family, including calcitonin receptor (CALCR), adenylate cyclase-activating polypeptide 1 (pituitary) receptor (ADCYAP1R1), corticotrophin-releasing hormone receptor 2 (CRHR2), gastric inhibitory polypeptide receptor (GIPR), corticotrophin-releasing hormone receptor 1 (CRHR1), glucagon-like peptide receptor 1(GLP1R), growth hormone-releasing hormone receptor (GHRHR), glucagon-like peptide receptor 2 (GLP2R), parathyroid hormone receptor type 1 (PTH1R), secretin receptor (SCTR), parathyroid hormone receptor type 2 (PTH2R), two type of vasoactive intestinal peptide receptor type 1(VIPR1) and type 2 (VIPR2). However, and we have are not considered calcitonin receptor-like receptor and glucagon receptor in this study due to inadequate data availability.

The adenylate cyclase-activating polypeptide type I (pituitary) receptor, also entitled as PAC1, ADCYAP1R1, PAC1R, PACAPR, or PACAPRI, is a protein which is expressed from the ADCYAP1R1 gene in humans. The protein binds with another peptide entitled “pituitary adenylate cyclase-activating peptide(PACAP)” (Ogi et al. 1993; Vaudry et al. 2000). Calcitonin receptor (CALCR) is occupied with the keeping with the regulation of osteoclast-mediated bone resorption and calcium homeostasis. However, the onset of osteoporosis and variations in bone mineral density are associated with the polymorphisms in this gene (Chambers and Magnus 1982; Dacquin et al. 2004; Davey et al. 2008). Corticotrophin-releasing hormone receptor is made up of two types of receptors: CRHR1 and CRHR2 and they are expressed by the CRHR1 and CRHR2 genes, correspondingly. This receptor binds to corticotropin-releasing hormone (CRH) (Hauger et al. 2003). CRHR1 attaches with neuropeptides of the CRH family of proteins. These proteins are the central controller of the hypothalamic–pituitary–adrenal pathway. CRHR1 is an essential protein that can activate of signal transduction pathways, and it controls various processes which include immune response, stress, obesity, and reproduction, while CRHR2 shows extreme attraction for CRH, and can attach with CRH-associated peptides (e.g. urocortin). It has been noted that CRH is produced in the hypothalamus which acts as a vital responsibility to coordinate the autonomic, endocrine, and behavioural responses to pressure and immune challenge (Aguilera et al. 1986) (Grammatopoulos and Chrousos 2002). Gastric inhibitory polypeptide receptor (GIPR) is encoded by the GIPR gene, which produces a GPCR which is GIP (gastric inhibitory polypeptide) and it hinders gastric acid secretion and gastrin release in addition to stimulating insulin release in the presence of elevated glucose (Stoffel et al. 1995). Glucagon-like peptide receptor consists of type 1 receptor (GLP1R) and type 2 receptor (GLP2R). GLP1R is secreted from pancreatic beta cells, and helps in stimulate GLP1R activates the adenylyl cyclase pathway; It affects increased insulin synthesis and liberates of insulin (Drucker et al. 1987). As a result, GLP1R is a possible target for the management of diabetes (Holmes 2003). GLP1R found in the brain, where it is associated with appetite control (Kinzig et al. 2002). The proglucagon peptide synthesises GLP2R. The GCG gene is expressed in this peptide. GLP2/GLP2R helps in augmenting the villus height in the small intestine, stimulates intestinal growth, reduce enterocyte apoptosis, and associated with increased crypt cell propagation. Furthermore, GLP2/GLP2R prevents intestinal hypoplasia, and that is the consequence of total parenteral nutrition. GHRHR gene helps in the expression of the growth hormone-releasing hormone receptor (GHRHR). Similarly, the GHRH gene helps in the expression of growth hormone-releasing hormone (GHRH). The occurrence of receptor association of GHRH causes the production and liberate of growth hormone. Growth hormone deficiency (IGHD), also known as “Dwarfism of Sindh,” is caused by the mutations in the GHRH gene. This disorder is categorised by short stature (Murray et al. 2000). Parathyroid hormone receptor has two sub-types: parathyroid hormone receptor type 1 (PTH1R) and type 2 (PTH2R). The PTH1R gene encodes type 1. PTH1R is a receptor for two peptides, which are parathyroid hormone-like hormone (PTHLH) and parathyroid hormone (PTH). Any deficiency in PTH1R is known to cause different diseases such as Chondrodysplasia Blomstrand type (BOCD), Jansen's metaphyseal chondrodysplasia (JMC), and enchondromatosis (Calvi and Schipani 2000). The PTH2R gene encodes the type 2 receptor. It is more specific in ligand detection. This gene has a tissue-specific delivery. PTH1R makes active only through the PTH but not by PTHLH. This protein is predominantly available in the pancreas and brain (Bhattacharya et al. 2011). Through the binding secretin, SCTR (Secretin receptor) is activated, and it is the powerful regulator of pancreatic electrolyte, pancreatic bicarbonate etc. SCTR, along with Secretin, may be associated with autism and pancreatic cancer (Dong and Miller 2002) (Felley et al. 1992). Two receptors are associated with the category of the vasoactive intestinal peptide receptor, which are vasoactive intestinal peptide receptor type 1 (VIPR1) and type 2 (VIPR2). These receptors are connected with the secretion of exocrine and endocrine, smooth muscle relaxation, ion and water and flux in lung as well as epithelial cell in intestine. Their functions are linked with integral membrane receptors. This receptor acts through the guanine nucleotide-binding. Finally, it triggers adenylate cyclase (Laburthe and Couvineau 2002).

Evolution involves the genetic variations among the existing individuals in a population. These variations arise as a result of the fundamental biological process of replication to form DNA, which is then transcribed to RNA and finally translated to create proteins with the help of ribosomes. Thus, the protein amino acid sequence is analysed by researchers to predict the structural and functional protein units through conserved patterns. These highly conserved regions are directly involved in protein–protein interaction (Branden 1999). Researchers can also gain information on protein folding, function, and transport through protein glycosylation, as these conserved regions are essential for the cell–cell interaction and antigenicity (Hamby and Hirst 2008). A protein can be evaluated for evolutionary conservation by comparing the position of functionally and structurally essential amino acids. Therefore, conservation analysis of amino acid residue positions among the family members can uncover the significance of every location for the structure as well as the function of a particular protein (Ashkenazy et al. 2010; Chakraborty and Agoramoorthy 2014). In evolutionary biology, it is essential to know the delivery of a specific family of protein in each species (Herrada et al. 2011), and several proteins have been analysed in such a context to understand their distribution (Atkinson et al. 2011; Peng et al. 2012).

The goal of this in silico analysis is to gain insight into the evolutionary architecture and conservation blueprint among different members of the B1-subfamily of the secretin receptor proteins. The aim of this study was (i) to analyse the multiple sequence alignment and their score determination, (ii) to investigate the phylogenetic relationship among the member proteins, (iii) to understand the conservation patterns of known highly conserved amino acids, and (iv) to depict the distribution of B1-subfamily members across different species.

Materials and Methods

Data Acquisition

There are 15 protein members in the B1-subfamily of the secretin receptor family, and in our study, 13 members have been analysed. Due to inadequate data, two members, calcitonin receptor-like receptor and glucagon receptor, have not investigated. All of the data for the genes related to protein receptors of the B1-subfamily of the secretin receptor family, including ADCYAP1R1, CALCR, CRHR1, CRHR2, GLP1R, GLP2R, GIPR, GHRHR, SCTR, PTH1R, PTH2R, VIPR1, and VIPR2, were collected from the NCBI protein database (Wheeler et al. 2007). The protein sequence of all the proteins was obtained from NCBI in FASTA format and analysed for suitability for the research study. We analysed the distribution of the secretin family protein members (Family: 7tm_2 (PF00002)) in different species and different kingdoms, using the Pfam server, which contains a database of protein families and domains.

Multiple Sequence Alignment (MSA) and Score Determination

Sequences were analysed by performing multiple sequence alignment using the CLUSTAL W (version 2.1) server (Chenna et al. 2003), and sequence similarities were observed. We performed multiple sequence alignments using CLUSTAL W (version 2.1) with 14 sequences belonging to nine members of the B1-subfamily. The alignment score is calculated and used for further analysis. The MSA was compacted into one by one performing each profile alignment using MUSCLE software (Edgar 2004). MUSCLE employs a novel function, and it can be elucidated as through log-expectation score function as follows:

$$LE^{xy} = \left( {1 - f_{G}^{x} } \right)\left( {1 - f_{G}^{y} } \right)\log \mathop \sum \limits_{i} \mathop \sum \limits_{j} f_{i}^{x} f_{j}^{y} p_{ij} /p_{i} p_{j}$$
(1)

which is a customized edition of the log-average function and is as follows:

$$LA^{xy} = \log \mathop \sum \limits_{i} \mathop \sum \limits_{j} f_{i}^{x} f_{j}^{y} p_{ij} /p_{i} p_{j}$$
(2)

where i and j are amino acid category; pi is the probability of the condition of I; pij is the joint probability or standard probability of i and j notation which are aligned to each other; f xi is the frequency (observed frequency) of i in column x of the initial profile; and f xG the frequency (observed frequency) of spaces/gaps in that column at position x in the family and similarly for position y in the second profile. The approximate probability is αxi where observing amino acid i location and x can be calculated from fx.

Again, we applied the Gblocks server (version 0.91b), to examine the aligned blocks, which defines a position of conserved blocks from numerous alignments (Aguilera et al. 1986; Castresana 2000). All the above servers use maximum likelihood algorithms (Pupko et al. 2002), or they use the empirical Bayesian algorithms (Mayrose et al. 2004) to study the MSA.

Phylogenetic Tree Building as well as Bootstrap Estimation Study

A phylogenetic tree is used to show the evolutionary relatedness based on evolutionary divergence. We used the results generated through sequence alignment to fabricate the phylogenetic tree (Phylogram) using computational bioinformatics server (Phylogeny.fr) (Dereeper et al. 2008), which uses essential algorithms from MUSCLE, PhyML, TreeDyn and Gblocks. All of these servers utilize the ‘bottom-up clustering algorithm’ ‘neighbour-joining algorithm’. This algorithm takes the input as well as distance matrix to indicate the divergence/distance among every couple of taxa, and this matrix has a dimension of N × N, In this case, N is the quantity of points in number or number of nodes. Lastly, we performed bootstrap analysis (Hillis and Bull 1993; Holmes 2003) using the developed phylogenetic tree to make inferences and evaluate robustness.

Analysis of Cladogram Using Computational Biology and Unrooted Tree Formation

From Phylogram, we have constructed a cladogram, binary tree, and un-rooted tree. The binary tree represented the cladogram as the number of nodes in each different level. Again, two types of the unrooted phylogenetic tree were developed, including unrooted-alpha and circular alpha. MAFFT( version 6.0) was applied to construct the unrooted-alpha phylogenetic tree (Katoh and Toh 2008). Interactive Tree Of Life (iTOL) was utilised for circular alpha phylogenetic tree creation (Letunic and Bork 2006).

Conservation Patterns Examination and Computation of Extremely Conserved Amino Acids

The ConSurf server was used to predict the conservation models (Ashkenazy et al. 2010; Glaser et al. 2003). The evolutionary concaveness of amino acids was calculated in each position of amino acid in every ‘B1subfamily’ proteins using a Bayesian algorithm. The score of conservation at each amino acid is generated with the application of the same tool, and finally, the highly conserved amino acids were examined using the same tool.

Sequence Logos of Conserved Domains

The WebLogo software is used to generate a sequence logo. This is utilized for a graphical illustration of amino acid sequences and this tool is displaying the prototype as a bunch of aligned sequences (Crooks et al. 2004) (Schneider and Stephens 1990). The software is used to examine the aligned pattern and profile of bias across amino acid.

Schneider et al. (Schneider and Stephens 1990) described the sequence logo at a particular position which can be characterised as Rseq,which is as follows:

$$R_{seq} = S_{\max } - S_{obs} = \log_{2} N - \left( {\mathop \sum \limits_{n = 1}^{N} p_{n} \log_{2} p_{n} } \right)$$
(3)

where the difference between the utmost probable entropy and the entropy of the observed sample distribution is represented as Rseq; The studied frequency of symbol n at a particular sequence position is denoted as pn and the number of specific symbols for the agreed sequence type is represented as N.

Distribution of B1-Subfamily Proteins Among Different Species

To comprehend the distribution of proteins in ‘B1 subfamily’ throughout diverse species, we have used the Pfam software (Punta et al. 2011), which is a massive collection of families of protein and their domains. This server has an inbuilt characteristic that can search the database of protein families and domains. It produces the distribution of the protein family members among the different species, as well as the various super kingdoms. Finally, it depicts the result as (i) “sunburst” tree and (ii) “species tree,” as well as (iii) “the phylogenetic tree for family seed alignment.” For the third one analysis, the server again used FastTree server, and the neighbour join tree algorithm with a local bootstrap based method on 100 samples to calculate data. It reflected next to the tree nodes. FastTree determine roughly maximum-likelihood for the development of phylogenetic trees from this for family alignment. As B1-subfamily proteins family is not available, we used the secretin family (Family: 7tm_2 (PF00002)) for further analysis. This server also uses multiple sequence alignments and Hidden Markov models (HMMs) algorithm. We developed above three types of trees.

Results and Discussion

The ‘B1 subfamily’ members of the secretin receptor family were retrieved from NCBI, and the information consisting of accession number, location, amino acid sequence length, etc. depicted in Table S1. The proteins of the particular family member and their genes were documented (Table S2). This subfamily contains only 15 protein members, and therefore, this was our sample size; we analysed 13 proteins in our study (n = 13) due to the sequence availability. However, several medically necessary small protein families were available, which contained 5 to 15 protein members that were sufficient to analyse for proper scientific understanding. Several studies have been performed in this direction, such as with the insulin receptor substrate family (Chakraborty et al. 2011) and glucose transporter family (Wood and Trayhurn 2003).

Multiple Sequence Alignment (MSA) and Score Determination

The result of the MSA is documented in Figure S1. Most of the aligned sequences are found between positions 26 to 40; 251 to 299; 350 to 415, 425 to 445, 495 to 515 and 531 to 550. The graphical representation of alignment scores between two proteins is shown in Fig. 1a, b, c. The analysis of sequence alignment shows a maximum score of 67 between the sequence of CRHR1 and CRHR2. The minimum alignment score of 21 was experiential between the sequence of calcitonin hormone receptor 1 and glucagon-like peptide receptor 2. Then, we applied Gblocks to create a graphical view of the extremely associated blocks among the B1-subfamily members (Fig. S2). The highly aligned blocks were visualised between sequence number 254 to 302, 344 to372, 426 to 455, 494 to 554 and 539 to 564, respectively. Harmar performed multiple sequence alignment of family-B GPCRs, and seven aligned blocks were observed. In the same study, putative hormone-binding domain (PHBD) was also evaluated which is located in the extracellular area of family-B GPCRs (Harmar 2001). However, in our study with B1 subfamily members, we found six multiple sequence aligned blocks and five highly aligned blocks (Gblocks).

Fig. 1
figure 1

Alignment scores of protein sequences related to B1- subfamily members. a Alignment score between sequences (notation Seq (x:y) meaning alignment score between sequence x, and sequence y); b Scatter distribution of scores; c scores connected by smoothed line without marker

Phylogenetic Tree Building as well as Bootstrap Estimation Study

Constructed phylogenetic tree has been shown the useful relatedness among the different sequences of the B1-subfamily members. Then, we developed a phylogram, and the trees that were generated are shown in Fig. 2. The phylogram represents branch length proportional to evolutionary time. In this study, we found that CRHR1 and CRHR2 have a common ancestral relationship and were siblings in 100% bootstrap values; GIPR and GLP1R were siblings in 87% bootstrap values, and VIPR2 and ADCYAP1R1 were siblings in 72% bootstrap values. It must be noted that two receptors, SCTR and GHRHR, exhibited a common point of origin with 29% bootstrap replications. However, SCTR and GHRHR exhibited the lowest bootstrap replications, and CRHR1 and CRHR2 showed the highest bootstrap replications among receptors. The evolutionary relationship between GPCRs was developed by Graul and Sadee (2001) using a clustered database approach, where they used BLAST to yield a database of approximately 1700 GPCRs. However, no report was generated for the phylogenetic relationship between B1-subfamily members. Here, this phylogenetic analysis of 13 members of the B1-subfamily shows significant evolutionary relationships between members.

Fig. 2
figure 2

Phylogenetic tree for B1 subfamily members (construction using Phylogeny.fr software). The Phylogram tree shows bootstrap values at the inner nodes

Cladogram Analysis Using Computational Biology And Unrooted Tree Formation

A cladogram uses a cladistics method to correlate parental relationships. Hence, cladograms are more informative than phylograms. Figure 3a shows the cladogram of protein sequences of the B1-subfamily members. Figure 3b shows a cladogram with the equal representation of nodes, and binary tree equivalents to the cladogram are also shown. In the matching illustration of nodes, a total of 24 nodes was observed in the cladogram figure (Fig. 3c). In broad-spectrum, the cladogram on ‘n’ number of species (B1-subfamily members) has ‘2n-1′ edges, and the number of search Q(n) for every protein in a particular cladogram tree (phylogenetic tree) is in the variety of log n ≤ Q(n) ≤ n ( where number of nodes in a binary tree is represented as n) (Deo 2017; Mittal et al. 1994). In the view of the algorithmic point, at the end of level 0, one node is likely. At the point of level 1, two nodes are possible and it will going on like this way. Therefore, the total number of nodes for the binary tree should be 20 + 21 + 22 + ……… + 2p ≥ n (at p level). The distance of the path between two leaf nodes established the relationship in a binary tree. This binary tree is a six-level binary tree, and VIPR2 and ADCYAP1R1 are situated in the leaf node. Graphical representation was provided between the number of nodes in every level (Fig. 3d), where it must be noted that level-3 and level-5 obtained a maximum number of six nodes.

Fig. 3
figure 3

Cladogram (B1 subfamily proteins) and its analysis using computational biology. a Cladogram for tree algorithm analysis b Matching representation of nodes c Representation binary tree equivalent to Cladogram d Number of nodes in each level

Two types of unrooted trees were illustrated to understand the possible relationships between B1-subfamily members, the unrooted-alpha tree and the circular alpha tree. Both trees are shown in Fig. 4. Nine internal nodes and thirteen external nodes were demonstrated in the unrooted-alpha tree. Conversely, eleven internal nodes and thirteen external nodes were illustrated in the circular alpha tree. John et al. (2003) stated that unrooted binary trees are also crucial for the conclusion of evolutionary trees based on data condition for all leaf node species and the unrooted binary tree describing the evolution of those species. However, in this case, our tree depicts branch-decomposition of these proteins, and these unrooted trees can solve the natural graph enumeration problem (Balding et al. 2008).

Fig. 4
figure 4

The unrooted phylogenetic tree of B1 subfamily proteins. a Unrooted-alpha phylogenetic tree b Circular alpha phylogenetic tree

Conservation Patterns Examination and Computation of Extremely Conserved Amino Acids

The conservation prototype and protein backbone structures with extremely conserved parts and the number of extremely conserved protein residues are shown in Fig. 5. CRHR2 has the maximum amount of extremely conserved amino acid residues followed by the PACP1 receptor. VIPR2 contained the lowest number of highly conserved residues, whereas the calcitonin receptor, secretin receptor and VIPR1 lacked highly conserved amino acid residues. Some typical features were observed in B1-subfamily members. The most common semi-conserved structural characteristics of the B1-family GPCR are a comparatively small C-terminus and an elongated N-terminal domain with a residue of 100–200 aa (amino acids) (George et al. 2002).

Fig. 5
figure 5

Conservation patterns, backbone structures and number of highly conserved residues of the proteins. a Shows the general conservation patterns with highly conserved amino acids in 3D structure of the B1 subfamily members. Amino acid conservation scores were classified into 9 levels. The color scale for residue conservation is indicated in the figure. b Backbone structures with highly conserved amino acids of B1 subfamily members proteins. c Number of highly conserved residues of the proteins

Moreover, in the extracellular loop (ECL) ECL 1 and ECL2, a conserved disulphide bridge was noted between cysteine residues (Authier and Desbuquois 2008; Kenakin and Miller 2010; Ulrich et al. 1998b). Particularly, conserved regions contribute in biochemical performance such as binding regions of surface structure on the diverse exterior part of proteins (Ghosh et al. 2012); (Branden and Tooze 1999); (Ulrich et al. 1998a). The highly conserved regions of membrane proteins or viral peptides can be used for vaccine development, and we have found several conserved residues within B1-subfamily members in this study.

Sequence Logos of Conserved Residues

Sequence logos of conserved residues are depicted in Fig. 6. The position of every single amino acid is represented in the form of its unique symbol; the height of the symbol pointed out the virtual frequency of that particular amino acid at an exact position in the sequence. We found sequence logos in several continuous positions, including 6 to 21, 27 to 43, 70 to 78, 85 to 94, 142 to 153, 155 to 160, 162 to 181, 205 to 220, 227 to 238, 240 to 248, 252 to 312, 316 to 324, 349 to 423, 425 to 430, 432 to 472, 404 to 430, 434 to 574, 583 to 585, 603 to 605, 609 630, 646 to 648, 652 to 654, and 674 to 678. The longest continuous number sequence logo (74) was found between sequence positions 349 to 423. In this study, the maximum height recorded logo is 3.6 bits, and the minimum height recorded is 0.18 bits. This enabled us to visually inspect and examine the significant amino acids that constitute a functionally significant area of the protein. Perrin et al. (2007) found that human B1 family of GPCR show conservation of structurally essential residues comprise of the salt bridge, six cysteines residues, two tryptophan residues, a proline and a glycine. Cysteine residues are required for conserved disulphide bridge formation, and this was confirmed from in our study. However, structure–function relationships proposed by Perrin et al. (2007) recognized a structurally significant salt bridge and aspartic acid residue which is extremely conserved. It is located in the initial segment extracellular domain (ECD1) of receptors in B1 GPCR family members, which are important for the functionality and structure formation of the ‘family B1′ of this group of receptors. This receptor class is significant for drug development. Therefore, the protein domains that contain these conserved residues are highly important and can be used for drug development, drug targeting, or drug interaction studies (Giaccia et al. 2003. We can understand protein–protein interactions from these highly conserved segments (Li et al. 1998).

Fig. 6
figure 6

Generated WebLogo for B1 subfamily proteins in different sequence position a 1 to 192 sequence b 193 to 384 sequence c 385 to 576 sequences d 577 to 704 sequences

Distribution of B1-Subfamily Proteins Among Different Species

A graphical representation of the distribution of secretin family member proteins (which contains B1-subfamily receptor proteins) across species is shown in Fig. 7. The tree is built by considering the taxonomic lineage of each sequence that has a match to these family members. Three types of trees, a “sunburst” tree (Fig. 7a), a “species tree” (Fig. 7b), and a “phylogenetic tree for family seed alignment” (Fig. 7c), are depicted. Two types of “sunburst” trees were developed: tree segments weight by the number of sequences and tree segments weight by the number of species. These “sunburst” trees show each node in the tree as a separate arc, arranged radically with the super kingdoms at the centre and the species arrayed around the outermost ring. This tree information indicates that secretin family member proteins are not only distributed in Homo sapiens but also distributed in several species, including Insecta, Diptera, Cyprinidae, and Mammalia. The “species tree” shows the occurrence of this subfamily across different species, including 247 species of Eukaryota, 190 Metazoa and two Choanoflagellida. Harmar et al. concluded that the subfamily was encoded by 15 genes in humans, as well as at least five genes in Drosophila and three in C. elegans (Harmar 2001).

Fig. 7
figure 7

Graphical representation of the distribution of secretin receptor family proteins across species. a "sunburst" visualization of the species tree for this family. It shows each node in the tree as a separate arc, arranged radically with the super kingdoms at the centre and the species arrayed around the outermost ring. b The “species tree” which shows the occurrence of this subfamily across different species. c The “phylogenetic tree for secretin family seed alignment” which uses neighbor join tree algorithm with a local bootstrap based on 100 resamples (which is reflected next to the tree nodes)

Conclusion

One of the key challenges of biochemical science is to understand the prototype and pattern of evolutionary relatedness, conservation blueprint, and diversification among members of a protein family. Here, we have demonstrated evolutionary relatedness, conservation pattern, and the distribution across species among the different members of the B1-subfamily of the secretin receptor proteins through in silico analysis. The evolutionary architectures of the protein family members suggest that there is a similar evolutionary force that drives diversification across the universal process of evolution, starting from macro-evolutionary processes and proceeding to the micro-evolutionary methods. Ultimately, this process is shaping the diversity of life on Earth. Our findings on evolutionary relatedness and the diversification of B1-subfamily members support the universal branching rule (Herrada et al. 2011, 2008), and we assume that this rule applies everywhere from gene families to protein families, and finally, within the continuous process of speciation. Our analysis impressively suggests that evolutionary relatedness occurs during the branching process. The evolutionary relatedness and branching process are universal for all evolutionary processes; however, limitations of sample size prevent our ability to depict evolutionary relatedness and branching processes universally. Conversely, the evolutionary processes indeed impel biological diversification and evolutionary relatedness across protein family members, as well as the entire history of life.