Introduction

Due to the increasing demand of consumers for healthy products, attention to animal health has become more important (Palii et al. 2020). Mastitis is one of the most common diseases that can affect dairy cattle during dry periods and lactation. It is considered as the most expensive dairy cattle disease in the world, which causes a lot of damage to the dairy industry every year (Halasa et al. 2007; Hamed et al. 2020). In addition to the cost of treatment and veterinary medicine and death from the disease, the influence of mastitis in the dairy cattle economy can include reduced milk production and quality, the indirect impact of the disease on lower reproductive rates, and increased risk of culling in the herd. The use of antibiotics in the treatment of mastitis can also cause side effects of its residues on human health (Barlow 2011; Schukken et al. 2009).

In general, the causative agent of mastitis is divided into two categories: infectious and environmental factors, which are epidemiologically different. The main bacteria that cause infectious mastitis are Staphylococcus aureus, Streptococcus agalactiae, Streptococcus dysagalactiae, and Mycoplasma bovis. In addition, the bacteria that cause peripheral mastitis include Escherichia coli and Streptococcus dysagalactiae. Due to the control of the causative agents of infectious mastitis, the incidence of environmental mastitis in the herd has increased. E. coli is the most important environmental pathogen in dairy cow mastitis, which causes great economic damage to livestock (Phuektes et al. 2001; Zhao and Lacasse 2008).

E. coli infection has two different types of responses. The first response, which is localized, involves inflammation and the cellular immune response and occurs in a quarter infected with a bacterial infection, but the other response, which is systemic, may occur in a quarter around infected tissue and distant organs such as the liver. The systemic response involves the processing and antigen presentation, cytokines, protein degradation, and apoptosis. The interaction effects of these responses indicate disease and infection in the animal (Mitterhuemer et al. 2010).

The goal of the breeding program is to increase resistance to mastitis in dairy herds. To achieve this goal, genes that affect mastitis should be identified. Based on analog biological roles and comparison of transcripts of infected and healthy animals and determinants of gene and network hubs, this project supports proteins and significant modules effective in disease that helps to understand the host response to the pathogen (Gorji et al. 2019).

It will be possible to study the expression of thousands of genes normally in a short time using microarray technology (Blohm and Guiseppi-Elie 2001). In this regard, bioinformatics and computational biology have overcome the challenges associated with the processing of biological information, and by recognizing the difference in the expression of genes and their interactions, a better understanding of cellular mechanisms and disease-causing processes, including mastitis, is obtained. This study aimed to identify key genes and potential pathways associated with mastitis induced by E. coli in dairy cattle by bioinformatics analysis.

Materials and Methods

Data Collection

In this study, the gene expression profile (GSE15022 (which was obtained with the help of platform GPL2112 Affymetrix Bovine Genome Array was used. The data contain ten samples of mammary quarter tissue, including five tissue samples around quarter infected with E. coli, compared to five quarter tissue samples of healthy dairy cows. RNA probes for hybridization of sample arrays were developed by Ambion Kit in 2010 by Mitterhuemer et al. The gene expression data were downloaded from the public database the Gene Expression Omnibus (GEO) database of NCBI (http://www.ncbi.nlm.nih.gov/geo/) (Barrett et al. 2013).

Identification of DEGs

The online statistical tool, GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r) (Davis and Meltzer 2007) is an R-based web program that helps users analyze GEO data and identify and visualize significant differences in gene expression. This tool uses a t test to compare two groups (control and treatment) or ANOVA to compare more groups. In this study, GEO2R analysis was used to compare tissue samples around quarter infected with E. coli and quarter tissue of healthy dairy cows. The adjusted P value was applied to correct for the occurrence of false-positive results. The GEO2R inbuilt method, Benjamini and Hochberg (false discovery rate), were used. The cutoff criteria were \(\left|Log FC\right|\)≥ 1 and adjusted p value < 0.05. Finally, DEGs including upregulated and down-regulated genes were obtained.

PPI Network and Module Analysis

To further investigate the association of DEGs at the protein level, the associated PPI networks were plotted using the Search Tool for the Retrieval of Interacting Genes (STRING, http://string-db.org/) (Szklarczyk et al. 2019). A maximum number of interactors = 0 and a confidence score ≥ 0.4 were chosen as the cutoff criteria. The STRING database is an online database available for evaluating protein networks.

The Molecular Complex Detection (MCODE) v1.6.1 plugin of Cytoscape v3.8.0 software was applied to clarify the biological significance of modules of the PPI network with max. depth = 100, k-core = 2, node score cutoff = 0.2 and degree cutoff = 2. Then, the DAVID tool was used for functional enrichment analysis of each module.

Gene ontology and KEGG Pathway Analysis

For functional annotation of GO and analysis of KEGG pathway enrichment, David (https://david.ncifcrf.gov/) was used (Sherman and Lempicki 2009). This web tool is an online bioinformatics resource for any functional evaluation of high-throughput gene expression profiles. The DEGs were exposed to ClueGO v2.5.7 to obtain complete Gene Ontological terms (GO) (Bindea et al. 2009). The p-values < 0.05 were considered to be significant.

Finding Hub Genes

In the PPI network, the genes that are most closely related to other genes are called gene hubs. Hub genes are conserved in the evolutionary pathway and are known as molecular biomarkers in various diseases. CytoHubba is a simple Cytoscape plugin that ranks the importance of nodes in the PPI network with different algorithms for identifying key biological elements (Chin et al. 2014). In this study, using the degree algorithm, ten genes with a rank higher than 8 were selected as hub genes.

Results

Identification of DEGs

The gene expression profile of GSE15022 which contains 10 samples including five adjacent tissue of quarter infected with E.coli and five tissue of healthy quarter of dairy cattle was assessed using GEO2R. A total of 156 differentially expressed genes were detected which 95 genes were up-regulated and 61 genes were downregulated in adjacent tissue of quarter infected compared with healthy tissue (Table 1). Volcano plot display differentially expressed genes between adjacent tissue of quarter infected and healthy tissue of quarters of dairy cattle (Fig. 1).

Table 1 One hundred and fifty six differentially expressed genes including 95 up-regulated genes and 61 down-regulated genes which were identified between adjacent tissue of quarter infected of infected tissue with E. coli and healthy quarter of dairy cattle
Fig. 1
figure 1

Volcano plot displaying differentially expressed genes between adjacent tissue of quarter infected with E. coli and healthy quarters of dairy cattle. Significant up-regulated genes were showed in red dot (Up) and significant down-regulated genes were showed in blue dot (Down) (adj.pvalue < 0.05)

PPI Network and Sub-modules Analysis

Based on the data in the STRING database, the PPI network consisting of 153 nodes and 168 edges was plotted using Cytoscape software (Fig. 2).

Fig. 2
figure 2

Protein–protein interaction network constructed with the differentially expressed genes

The Cytoscape plugin MCODE displayed the closely interlinked areas from the PPI network in the shape of clusters. The top three clusters that were significant with MCODE scores of 6.5, 3.3, and 3.1 were selected (Table 2).

Table 2 The most interlinked regions are clustered from the DEGs of GSE1502 dataset using MCODE

DAVID Enrichment Analysis

To obtain a better understanding of important differentially expressed genes, the DAVID v6.8 online server was used for GO analysis and KEGG enriched pathways. In gene ontology, three parts of functional groups including biological process (BP), cellular performance (CC), and molecular function (MF) were analyzed. The modified Fisher exact p-value (EASE score) ≤ 0.05 was considered strongly enriched.

It was found that the DEGs are enriched in 20 terms of the biological process which cellular oxidant detoxification (GO: 0098869), oxidation–reduction process (GO: 0055114), and positive regulation of intrinsic apoptotic signaling pathway (GO: 2001244) are the most significant biological process (Table 3). About the cellular component analysis, the DEGs were mostly enriched in extracellular space (GO: 0005615), mitochondrial matrix (GO: 0005759), and extracellular exosome (GO: 0070062) (Table 4). The gene ontology molecular function analysis revealed the involvement of DEGs in oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen (GO: 0016705), monooxygenase activity (GO: 0004497), and iron ion binding (GO: 0005506) (Table 5).

Table 3 Significantly enriched GO terms in biological process of differentially expressed genes (p < 0.05)
Table 4 Significantly enriched GO terms in cellular component of differentially expressed genes (p < 0.05)
Table 5 Significantly enriched GO terms in molecular function of differentially expressed genes (p < 0.05)

The KEGG pathway enrichment analysis revealed the association of the DEGs in ten KEGG pathways. The most significant pathways were Biosynthesis of amino acids (bta01230), p53 signaling pathway (bta04115), Cysteine and methionine metabolism (bta00270), and Metabolic pathways (bta01100) (Table 6).

Table 6 KEGG pathway analysis of differentially expressed genes (p < 0.05)

ClueGO Enrichment Analysis

The Cytoscape plugin ClueGO was applied to investigate the functional enrichment of the DEGs of the dataset. ClueGo assisted to visualize the GO terms of the detected PPI complex network. The BP and MF terms of the GO functional enrichment analysis of the complex PPI network are described in Fig. 3. The statistical options for ClueGO enrichment analysis were set according to a hypergeometric test which is two-sided with p ≤ 0.05, Benjamini–Hochberg correction, and kappa score ≥ 0.4 as a primary criterion. The BP terms for DEGs were mostly enriched in response to a toxic substance (GO:0009636), L-serine metabolic process (GO:0006563), chaperone-mediated protein folding (GO:0061077), cellular response to xenobiotic stimulus (GO:0071466), xenobiotic metabolic process (GO:0006805), and establishment of the skin barrier (GO:0061436), and the MF term for DEGs were mostly enriched in RAGE receptor binding (GO:0050786), sulfur compound biosynthetic process (GO:0044272), oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen (GO:0016702) and extracellular matrix organization (GO:0030198) (Fig. 4).

Fig. 3
figure 3

Gene Ontology enrichment terms was visualized by the ClueGO plugin of Cytoscape. Important BP and MF of the DEGs are presented with the particular gene interactions. Functional nodes and edges are described the connectivity of the GO terms network with a kappa score of 0.4 and p-value ≤ 0.05. This p-value indicates the size of the node. The color code of each node represents the special functional categorizes that they are involved in. The most important functional GO terms are showed by bold fonts and the names of the DEGs involved in each class are presented in red font (Color figure online)

Fig. 4
figure 4

Pathway term enrichment that is visualized by the ClueGo plugin of Cytoscape. The plugin delivers a comprehensive enrichment analysis for DEGs, including KEGG and REACTOME pathways. Functional nodes and edges describe the connectivity of the pathway terms network with a kappa score of 0.4 and p-value ≤ 0.05. This p-value indicates the size of the node. The color code of each node represents the special functional categorizes that they are involved in. The most important functional pathway terms are showed by bold fonts and the names of the DEGs involved in each class are presented in red font (Color figure online)

The KEGG pathway analysis from ClueGO showed that many DEGs were significantly enriched in Glycine, serine and threonine metabolism (KEGG: 00260), Cysteine and methionine metabolism (KEGG: 00270), Glutathione metabolism (KEGG: 00480), Butanoate metabolism (KEGG: 00650), p53 signaling pathway (KEGG: 04115), Ferroptosis (KEGG: 04216) and Arachidonic acid metabolism (KEGG: 00590) (Fig. 4).

Finding Hub Genes

PPI network using cytoHubba plugin showed that 10 genes have the highest interaction compared to other genes identified as hub genes (Table 7). Among them, SOD2, ALDH18A1, and MTHFD2 genes did not belong to any of the three modules.CBS and HGDH were present in Module 3 and the remaining five genes belonged to Module 1.

Table 7 Top 10 in network string-cytoscape input.txt ranked by Degree method

Discussion

Mastitis in dairy cows is of great economic importance because of the damage it does to the dairy industry, its quality, and composition, as well as dairy products. The most significant gene ontologies of DEGs in biological process enrichment were cellular oxidant detoxification and oxidation–reduction process. In addition, seven genes were enriched in immune response, which indicates the development of an immune response against E. coli. This was similar to the results reported by Buitenhuis et al. (2011) that indicated mastitis in E. coli infection significantly increased the expression of upregulated genes in the immune response biological process. Mitterhuemer et al. (2010) reported that the DEGs between udder quarters infected with E. coli, adjacent to infected tissue quarters, and control animals were mainly involved in immune response.

The common biological process terms using ClueGO and DAVID were response to toxic substance, l-serine metabolic process and chaperone-mediated protein folding. Cellular oxidant detoxification was subgroup of response to toxic substance GO.

Abdelmegid et al. (2017) investigated the host defense-related proteins of cows with S. aureus subclinical mastitis. One of the significant GO in BP term was cellular oxidant detoxification. So, this GO might be important in the response to any bacterial infection leading to mastitis (Abdelmegid et al. 2017). Ahlawat et al. (2021) compared gene expression profiling of milk somatic cells of cattle and buffaloes. Cellular oxidant detoxification was a BP term that was significant in both species. As somatic cells are important in mastitis, this GO can be identified as a significant GO in mastitis. Therefore, the DEGs involved in this GO are important too (Ahlawat et al. 2021).

Dai et al. (2018) used RNA- seq transcriptomics analysis of the mammary gland of bovine during the dry period and lactation. One of the significant GO was the oxidation–reduction process which was showed the importance of this GO in lactation (Dai et al. 2018). The oxidation–reduction process is a way to degrade organic contaminants which can be applied for the inactivation of pathogenic contaminants (Ali et al. 2020).

The KEGG pathway analysis of DEGs showed that the most significant pathway was enriched in the biosynthesis of amino acids and the p53 signaling pathway. Most of the genes (24 genes) were involved in the metabolic pathways.

The gut microbiome related to milk protein in buffaloes and Holstein cattle were investigated in a study. The results showed that the biosynthesis of amino acids is a significant pathway that also positively correlated to the genera of Parasutterella, Sutterella, Dorea, and Parabacteroides (Zhang et al. 2017). Another study surveyed the key proteins in Asprin inhibiting effect on Staphylococcus xylosus. S. xylosusis a pathogen that causes infection in cow mastitis. The biosynthesis of amino acids was one of the most significant pathways in the study. Therefore, it can be concluded that the pathway of the biosynthesis of the amino acids is an important pathway in response to pathogen and bacterial infection (Xu et al. 2017).

FU et al. (2019) reported that p53 suppresses the progression of mammary epithelial tumors and their metastasis through the ZEB1/p53 signaling pathway. Other significant pathways are Arachidonic acid metabolism, Cysteine and methionine metabolism, Glycine, serine, and threonine metabolism, Glutathione metabolism, Biosynthesis of antibiotics, Carbon metabolism, and Butanoate metabolism (Fu et al. 2019).

In addition, the results of KEGG pathway analysis using ClueGO showed that many DEGs are involved in the mentioned pathways such as Glycine, serine and threonine metabolism, Cysteine and methionine metabolism, Glutathione metabolism, Butanoate metabolism, p53 signaling pathway, Ferroptosis, Arachidonic acid metabolism.

Among DEGs, 10 hub genes had the largest degree. Superoxide dismutase 2 (SOD2) is a manganese-dependent superoxide dismutase. Becuwe et al. (2014) reported that SOD2 could act as a biomarker in the metastatic progression of breast tumors. In a study, Jensen et al. (2013) investigated mastitis caused by E. coli and S. aureus adjacent to non-infected quarters. The results showed that SOD2 was up-regulated during S. aureus infection and acted as a gene involved in mammary gland response to the infection (Jensen et al. 2013). A bioinformatics analysis investigated the microRNA and putative target genes in bovine mammary tissue infected with Streptococcus uberis. The results showed that SOD2 was among differentially expressed genes due to infection (Naeem et al. 2012). Another study investigated the distinct local and systemic transcriptome responses which were induced by E. coli infection in the mammary gland of cows. Enhanced expression of SOD2 was observed as an indicator of oxidative stress and an active defense reaction in infected and neighboring healthy quarters (Mitterhuemer et al. 2010). A study surveyed differentiating S. aureus from E. coli mastitis after udder infection. SOD2 was one of the DEGs included in the E. coli infected samples (Günther et al. 2017).

Cystathionine beta-synthase (CBS) was involved in the metabolic pathway of glycine, serine, and threonine metabolism, biosynthesis of amino acids, biosynthesis of antibiotics, and cysteine and methionine metabolism and was one of the genes involved in Module 3. The CBS gene encodes the enzyme cystathionine beta-synthase. This enzyme is responsible for using vitamin B6 as a cofactor to make cystatin from serine and homocysteine. Cystathionine is used to make taurine. Taurine plays a variety of roles in the cell, including regulating electrolyte balance, strengthening the immune system, and antioxidant function. Therefore, increased expression of this gene can be one of the signs of the cell's immune response against E. coli infection (Zuhra et al. 2020).

Aldehyde dehydrogenase 18 family, member A1 (ALDH18A1) is a member of the aldehyde dehydrogenase family and encodes a bifunctional NADPH- and ATP-dependent mitochondrial enzyme. ALDH18A1 plays an important role in the biosynthesis of the amino acids of proline, ornithine, and arginine (Holmes 2017). ALDH18A1 gene encodes the enzyme synthetase of P5C, which catalyzes glutamate to P5C and then P5C catalyzed by the enzyme P5C reductase (PYCR) to proline (D'Aniello et al. 2020). Also, Proline is one of the main amino acids of collagen and is needed to stabilize the triple helix of collagen (D'Aniello et al. 2020). The results of a study on the identification of candidate genes affecting mastitis in cows showed that ALDH18A1 was among DEGs in mammary tissue after intramammary infection with E. coli or Streptococcus uberis (Chen et al. 2015). ALDH18A1 is one of the genes in the collagen synthesis pathway (Kocher et al. 2021). Therefore, ALDH18A1 can be considered an important gene in mastitis induced by E. coli.

Collagen, as the most abundant protein in the body, is essential for maintaining the natural structure and strength of connective tissue(Chen et al. 2020). Glycine, proline, and hydroxyproline are the main amino acids that make up the collagen structure. Threonine, serine, and choline are involved, in the synthesis of glycine. Also synthesis proline from arginine and hydroxyproline from the rest of proline in collagen (Li and Wu 2018). Therefore, the amino acid biosynthesis pathway enriched in Kegg is important for collagen synthesis. Collagen, type III, alpha 1 (COL3A1), and Collagen, type I, alpha 2 (COL1A2) genes were identified as other hub genes that encode type 3 and type 1 collagen, respectively. In a study on sheep, Transcriptional Profiles of the mammary gland were investigated during the lactation period. COL3A1 was found to be DEG in this study. This evidence suggests that COL3A1 may play an important role in the sheep mammary epithelial cells cycle and can be considered as a candidate hub gene for future research in cows (Chen et al. 2020). A study evaluating various miRNAs expressed in dairy heifers and dairy cows showed that COL1A2 is one of the genes associated with the cytoskeleton and extracellular matrix and is associated with mammary gland functional activity (Wicik et al. 2016). In a study by Tsutsui et al. (2020) On the pattern of expression of different types of fibril collagen in different periods of pregnancy, lactation, and after breastfeeding, it was stated that type I collagen produces a suitable fibrillar structure during pregnancy, while this structure after pregnancy broken. during calving temporary dense fibrillar network is formed by type III fibrillar collagen. it has been generally reported that these collagens are involved in the structural reconstruction of the mammary glands and their function (Tsutsui et al. 2020). In a study, miRNA and mRNA Expression Profiles in Mammary Glands of Holstein Cows infected by S. aureus were investigated and COL1A2 was one of the most significant DEGs involved in many pathways (Wang et al. 2021).

Younis et al. (2016) reported that Osteoglycin (OGN) which induces bone formation, Lumican (LUM) which is important in the development of tissue-engineered cartilage, and periostin osteoblast-specific factor (POSTN) which is important in extracellular matrix mineralization along with COL1A2 was involved in E. coli-induced defense response in the breast during mastitis induced by E. coli (Younis et al. 2016). Gopinath et al (2022) reported that OGN not only induces the formation of bovine endochondral bone but also plays an important role in collagen fibrillogenesis. POSTN has also been reported to form a network structure in interaction with several proteins, including collagen V (Gopinath et al. 2022). Lumican participates in the structural organization of tissues (Giatagana et al. 2021). Barreto et al. (2020) reported that LUM, through its pathogen-related molecular pattern (PAMP), as a mediator of TLR4 activation, causes inflammation, cartilage destruction, and polarization of macrophages in osteoarthritis. Choi et al. (2022) reported that serine is involved in the proliferation of breast cancer cells and that the PHGDH gene, which encodes the enzyme phosphoglycerate dehydrogenase, catalyzes the synthesis of this amino acid.

As a result, this bioinformatics analysis identified ten hub genes that might play an important role in mastitis induced by E. coli. SOD2, COL1A2, COL3A1, POSTN, ALDH18A1, and CBS might be the core genes related to mastitis. They are can be considered as candidate genes in a breeding program of mastitis prevention and early detection. Further experimental studies are required to confirm the findings of the present analysis.