Abstract

The role of an extracellular matrix- (ECM-) receptor interaction signature has not been fully clarified in gastric cancer. This study performed comprehensive analyses on the differentially expressed ECM-related genes, clinicopathologic features, and prognostic application in gastric cancer. The differentially expressed genes between tumorous and matched normal tissues in The Cancer Genome Atlas (TCGA) and validation cohorts were identified by a paired -test. Consensus clusters were built to find the correlation between clinicopathologic features and subclusters. Then, the least absolute shrinkage and selection operator (lasso) method was used to construct a risk score model. Correlation analyses were made to reveal the relation between risk score-stratified subgroups and clinicopathologic features or significant signatures. In TCGA (26 pairs) and validation cohort (134 pairs), 25 ECM-related genes were significantly highly expressed and 11 genes were downexpressed in gastric cancer. ECM-based subclusters were slightly related to clinicopathologic features. We constructed a risk score . The risk score model could well predict the outcome of patients with gastric cancer in both training (, HR: 1.807, 95% CI: 1.292-2.528, ) and validation (, HR: 1.866, 95% CI: 1.347-2.584, ) cohorts. Besides, risk score-based subgroups were associated with angiogenesis, cell adhesion molecules, complement and coagulation cascades, TGF-beta signaling, and mismatch repair-relevant signatures (). By univariate (1.845, 95% CI: 1.382-2.462, ) and multivariate (1.756, 95% CI: 1.284-2.402, ) analyses, we regarded the risk score as an independent risk factor in gastric cancer. Our findings revealed that ECM compositions became accomplices in the tumorigenesis, progression, and poor survival of gastric cancer.

1. Introduction

As a common tumor of the digestive system, gastric cancer is the fifth common malignant tumor and the third leading cause of cancer death in the world [1, 2]. Due to the occult course of gastric cancer, it is of great significance to clarify the pathogenesis and find effective markers for gastric cancer.

In recent years, studies have shown that the extracellular matrix (ECM) remodeling, namely, the synthesis, distribution, and degradation of ECM, is closely connected to the differentiation, proliferation, invasion, and metastasis of malignant tumors [3]. ECM constitutes the main part of the extracellular microenvironment [4]. It is a complex organic unity constructed by a variety of insoluble extracellular macromolecules in a certain proportion and structure. It is the site of cell survival and activity, with physical functions such as connection, support, water retention, pressure resistance, and protection. In addition, by integrin or other cell surface receptors, it can directly interact with cells to regulate growth, metabolism, function, migration, proliferation, and differentiation of cells, thus to adjust functions of the whole tissue and organs [4]. Recent studies on solid tumors such as breast cancer and ovarian cancer have suggested that ECM underwent a remodeling process similar to embryonic development in tumor progression. The reconstructed ECM then forms a loose microenvironment for cancer cells, giving rise to high proliferation, low differentiation, and invasion and metastasis of tumor cells [5]. Therefore, the identification of prominent ECM-relevant tumor markers that derive the biological perspective into the development and progression of gastric cancer would be of clinical value. In this study, the differentially expressed ECM-relevant markers were identified between gastric cancer and normal tissues. Based on the selection operator (lasso) regression model, it revealed that the ECM-relevant markers exhibited a great value to predict the prognosis of gastric cancer.

2. Methods

2.1. Datasets

We downloaded The Cancer Genome Atlas-Stomach Adenocarcinoma (TCGA-STAD) data from the UCSC Xena browser (https://xena.ucsc.edu/) [6]. The RNA-sequencing data were unified into (fragments per kilobase million (FPKM)). The validation data GSE29272 [7] and GSE62254 [8] were downloaded from Gene Expression Omnibus datasets (https://www.ncbi.nlm.nih.gov/geo/).

2.2. Genes of Researched Signatures

We investigated all ECM-receptor interaction-related genes (KEGG hsa04512) from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.kegg.jp/) [9]. Besides, genes of cell adhesion molecules (CAMs) (KEGG hsa04514), complement and coagulation cascades (KEGG hsa04610), TGF-beta signaling pathway (KEGG hsa04530), base excision repair (KEGG hsa03410), DNA replication (KEGG hsa03030), nucleotide excision repair (KEGG hsa03420), and mismatch repair (KEGG hsa03430) were also identified from KEGG (Table 1).

2.3. Building a lasso Regression Model

We conducted the univariate analysis of each ECM-receptor interaction-related genes. Then, the genes with were selected in the establishment of a lasso regression model. The lasso regression model was built by the package “glmnet” of R [10]. According to the lasso model, each patient is assigned a risk score. We defined patients with a in the high-risk group (); otherwise, in the low-risk group ().

2.4. Statistical Analyses

We identified differentially expressed genes between tumorous and matched normal tissues in TCGA and validation cohorts by a paired -test. Consensus clusters were built by the package “ConsensusClusterPlus” of R [11]. We identified a consensus matrix of TCGA for from 2 to 9. Gene set enrichment analysis (GSEA) was used to analyze the most enriched gene sets of the high- and low-risk groups [12, 13]. Packages “clusterProfiler” [14], “org.Hs.eg.db,” “enrichplot,” and “GO plot” [15] of R were applied to perform GO analyses and visualize the results. The package “GSVA” was applied to get single-sample gene set enrichment analysis (ssGSEA) of relevant signatures [16]. The package “survminer” was used to visualize the survival time of high- and low-risk groups. A value > 0.05 was considered to indicate a statistically significant difference. All analyses were conducted with R (https://www.r-project.org/). The hazard ratios were shown with 95% confidence interval (95% CI).

3. Results

3.1. Differentially Expressed Genes of an ECM-Receptor Interaction Signature

In TCGA cohort, there were 26 pairs of tumorous and matched normal tissues enrolled in the study. As shown in Figure 1(a) and Supplementary Figure 1, the expressions of 46 ECM-receptor interaction-related genes were significantly different in contrast to adjacent tissues. In an independent cohort with 134 pairs of tumorous and matched normal tissues, 36 genes had the obvious uniformity with the expression changes. The expressions of AGRN, CD47, COL11A1, COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL5A2, COL5A3, COL6A3, COMP, DAG1, HMMR, ITGA2, ITGA4, ITGAV, ITGB8, LAMB1, LAMB3, LAMC2, SPP1, THBS2, VWF, and SDC1 were significantly highly expressed in gastric cancer, while 11 genes CD36, CHAD, COL4A6, ITGA8, ITGA9, LAMA2, RELN, SV2C, TNXB, LAMB4, and LAMC3 were downexpressed in tumorous tissues (Figure 1(b) and Supplementary Figure 2).

3.2. Building Consensus Clusters and Correlation between Clinicopathologic Features and Clusters

We identified consensus matrixes of TCGA for from 2 to 9 (Figure 2(a) and Supplementary Figure 3). In consideration of discrimination and simplicity, we chose to build consensus clusters. Principal component analysis (PCA) showed that two consensus clusters had a certain degree of differentiation (Figure 2(b)). Patients in cluster 2 () had worse outcomes than patients in cluster 1 () () (Figure 2(c)). Besides, stratified clusters were slightly related to the histologic grade, cancer type, tumor stage, and TNM stage, while presenting no correlation with PIK3CA, KMT2D, PCLO, FAT4, ARID1A, LRP1B, and TP53 mutations (Figures 2(d)2(f) and Supplementary Table 1).

3.3. Establishment of the lasso Regression Model

To better predict the outcome of gastric cancer patients, we calculated the hazard ratio with 95% confidence interval of all ECM-receptor interaction-related genes and 25 of them with , which were enrolled in the establishment of the lasso regression model (Table 2). Figures 3(a) and 3(b) show the solution paths and partial likelihood deviances of the building process of the lasso regression model. The risk score (Figure 3(c)). GSEA showed the top enriched gene sets: protein complex binding, GTPase activity, organ morphogenesis, integrin binding, and regulation of biological quality (Figures 3(d)3(h)). GO analyses revealed the top enriched biological process (BP), molecular function (MF), and cellular component (CC) (Figure 3(i)). The circular plot showed that 17 genes were highly related to the GO term (Figure 3(j)).

3.4. Predictive Ability of the Risk Model

In TCGA cohort (), the risk model could predict the outcome of patients with gastric cancer (HR: 1.807, 95% CI: 1.292-2.528, ), whose reliability and credibility were stronger than those of the consensus clusters () (Figure 4(a)). Besides, in another independent cohort (GSE62254) (), the risk model could still provide excellent prediction accuracy (HR: 1.866, 95% CI: 1.347-2.584, ) (Figure 4(c)). The distribution of survival time, risk score, and gene expressions showed that patients in the high-risk group had shorter disease survival time in both TCGA (Figure 4(b)) and validation cohorts (Figure 4(d)).

3.5. Correlation between Risk Groups and Clinicopathologic Features

To explore the underlying mechanisms of the risk group, we compared relevant signatures in the high- and low-risk groups. As shown in Figure 5(a), we found the signatures angiogenesis, cell adhesion molecules, complement and coagulation cascades, and TGF-beta signaling enriched in the high-risk group, while mismatch repair-relevant signatures base excision repair, DNA replication, nucleotide excision repair, and mismatch repair were not in the group. Furthermore, the risk stratification was highly correlated with the histologic grade (), cancer type (), tumor stage (), and living status () (Figure 5(b)).

3.6. Univariate and Multivariate Analyses of the Risk Score and Clinicopathologic Features

In the univariate analysis, age (1.641, 95% CI: 1.140-2.362, ), the lymph node stage (1.318, 95% CI: 1.124-1.545, ), the TNM stage (1.535, 95% CI: 1.233-1.910, ), the tumor stage (1.277, 95% CI: 1.020-1.601, ), and the risk score (1.845, 95% CI: 1.382-2.462, ) were risk factors for gastric cancer (Figure 6(a)). In the multivariate analysis, age (1.951, 95% CI: 1.337-2.849, ) and the risk score (1.756, 95% CI: 1.284-2.402, ) were the main risk factors for gastric cancer (Figure 6(b)).

4. Discussion

Gastric cancer is characterized by insidious onset, easy metastasis, early misdiagnosis, and high recurrence rate [17]. Due to the lack of a simple domestic screening system, most patients with gastric cancer are in the late stage when first diagnosed, greatly influencing their clinical therapeutic effect and survival quality [18]. Within this context, tumor markers, in the field of biochemistry, have received increasing attention for their characters such as noninvasive, safe, simple, inexpensive, and easy to monitor dynamically [19]. For gastric cancer, many tumor markers have been detected from the perspective of genetic traits or genetic modification. In this study, we revealed that in gastric cancer, many ECM-relevant molecules also were effective tumor markers, possessing an important value in clinical application.

In previous research, ECM-relevant molecules have been identified as progression and prognostic biomarkers in some other solid tumors that were used for impacting clinical decisions and overall outcomes. For example, in colon adenocarcinoma (CAC), COL1A2, THBS2, and COL1A1 were related to prognosis [20]. In addition, it was found that the level of ITGA5 in CAC was significantly linked to overall survival (OS), which might serve as an independent prognostic indicator [21]. In neuroblastoma, it has been revealed that there existed an association between SDC3 expression and improved prognosis [22]. Additionally, the high expression level of SDC3 was also associated with poor prognosis in patients with renal cell carcinoma [23]. For lung cancer, COL5A1 was highly expressed in patients with recurrence and short survival [24]. SSP1 was upregulated in tumor tissues, and low expression of SSP1 had a significant relationship with the better outcome [25]. Moreover, according to the reported references, FN1 likely represented a signature biomarker for lung cancer in the prediction of responses to treatments [26]. In contrast to these cancer types that we have discussed, ECM-receptor interaction-relevant genes have been poorly studied as progressive and prognostic biomarkers in gastric cancer. Through the KEGG database, we systematically examined 84 ECM-receptor interaction-relevant genes in this study and found that most of them were differentially expressed in gastric cancer tissues. On the basis of these genes, we divided patients into two subclusters. As we had expected, the subclusters exhibited good prognostic performance (). For better prediction of survival with ECM-receptor interaction-relevant genes, lasso regression analysis was then conducted. Thereinto, we found that eight significant genes (VTN, SV2B, CD36, VWF, ITGB5, SDC2, COL5A2, and THBS1) were related to ECM-receptor interaction and an eight-gene risk score model was constructed based on them. The risk score model had its favorable performance in predicting prognosis of gastric cancer. The eight genes may be potential prognostic markers for gastric cancer.

In a variety of tumors, such as cervix neoplasia [27], ovarian cancer [28], and prostate cancer [29], VTN was considered a promising biomarker, which encoded vitronectin, an adhesive glycoprotein that connected cells with ECM. Recently, a report also revealed that VTN was a poor prognostic factor in gastric cancer [30]. Likewise, VWF, encoding von Willebrand factor that is a platelet adhesion glycoprotein, has been widely used as a biomarker in cancer, and it also has been identified as a new therapeutic target in gastric cancer [31]. As for THBS1, encoding thrombospondin 1, it took part in angiogenesis and tumor progression, whose increased expression was significantly correlated with tumor differentiation [32]. COL5A1, encoding an alpha chain of type V collagen, was a promising prognostic marker considered to have a good potential for the treatment of patients with gastric cancer as well [33]. The expression of CD36 was reported in relation to gastric cancer metastasis via O-GlcNAcylation [34]. However, the current literature mostly explores the role of one gene in gastric cancer and rarely links them to explore the combined effect on the gastric cancer treatment. Besides, ITGB5, encoding integrin-β5, was thought to be involved in the regulation of tumor initiation and progression by mediating links between cells and ECM. The literature reported in glioblastoma [35], hepatocellular carcinoma [36], and cervical cancer [37] that ITGB5 could serve as a predictive biomarker. In ITGB5, the gene expression analysis identified that its expression was elevated in gastric tumor tissue [38]. Nevertheless, the function of ITGB5 in gastric cancer is not yet fully elucidated. As for SV2B and SDC2, encoding a member of the synaptic vesicle protein 2 and syndecan 2, respectively, both of them have not been fully studied in gastric cancer. SV2B was identified as a key prognosis-associated marker in glioblastoma multiforme and prostate cancer [39, 40]. In spite of this, the study of SV2B in tumors is still limited. Relatively speaking, SDC2 has been well studied in various tumors, especially in colorectal cancer, lung cancer, prostate cancer, and esophageal squamous cell carcinoma [4145]. According to the discussion above, we considered that SV2B and SDC2 deserved to be further studied in gastric cancer. The disruption in ECM organization lost its regularity, which will compromise gastric cancer foci. ECM compositions became accomplices in the tumorigenesis, progression, and poor survival of gastric cancer. The aberrant ECM signature should be simultaneously inhibited in the treatment of gastric cancer [46].

We further investigated the possible mechanisms underlying the differences between low- and high-risk groups. It was found that there existed a significant difference in angiogenesis between the two groups. As you know, it has been suggested that angiogenesis provided nutrients for tumor growth and pathways for cell metastasis [47]. Consistent with our research, the angiogenesis signature was upregulated in the high-risk group. Besides, the angiogenesis depends on migration and proliferation of vascular endothelial cells [48]. In this process, endothelial cells must attach to each other and to the extracellular matrix to form and expand new microvessels. ECM is one of the critical influencers in the survival of vascular endothelial cells [49]. Thus, we speculated that these differentially expressed genes could promote the formation of tumor blood vessels and further affect the development and prognosis of tumors. Moreover, cell adhesion molecules presented as one of the main media between cells and ECM. The changes of cell adhesion molecules could affect multiple signaling pathways, thereby affecting the pathophysiology of cancer tissues [50]. In addition to possible changes in angiogenesis and cell adhesion molecules, complement and coagulation cascades were also affected in gastric cancer, which might participate in tumor progression and prognosis. Increasing evidence has indicated that complement and coagulation cascades were significantly involved in the signaling pathway in gallbladder cancer [51], clear cell renal cell carcinoma [52], small-cell lung cancer [53], epithelial ovarian cancer [54], bladder cancer [55], and head and neck cancer [56]. In gastric cancer, Gu et al. once pointed that complement and coagulation cascades were significantly enriched pathways [57]. However, the research about it in gastric cancer is insufficient, and there is no direct evidence to clarify that the upregulation of this pathway connects with the prognosis of gastric cancer. From the results in this study, we also found that TGF-β signaling pathway is upregulated in gastric cancer, which was in line with the results of existing research. The dysregulated pathway could promote the generation of ECM [58], leading to tissue fibrosis. An overactivated TGF-β signaling pathway could induce tumor growth and metastasis by promoting epithelial-mesenchymal transformation and angiogenesis [59]. Of course, the results indicated that we could further research the relationship between the eight significant genes and TGF-β.

Furthermore, the downregulation of base excision repair and nucleotide excision repair signatures in the high-risk group was consistent with the current research in gastric cancer. Particularly, DNA mismatch repair is one of the most prevalent pathways involved in a damaged base excision repair system. Absence of base excision repair could result in the accumulation of DNA damage, leading to cancer malignant transformations and poor prognosis. This imbalance was also associated with DNA polymorphism regulation, and such uncorrected false DNA variant likely had relation to cancer risk [60]. The defects in nucleotide excision repair would lead to the increased instability of the genome. Besides, unrepaired DNA damage possibly increased genetic susceptibility to cancers and risk of carcinogenesis [61]. Thus, according to the mentioned results above, the association between the excision repair and eight significant genes deserved to be further explored.

Recent research suggested that the impact of age as an independent risk factor on gastric cancer may differ depending on the cancer stage [62]. Although the finding of age as an independent risk factor in this study had a certain particular value, large-scale clinical data is urgently needed to verify and thus to direct the establishment of a clinical treating scheme. We identified a risk score model to predict prognosis of patients with gastric cancer and validate it in an independent cohort. For the simple and convenient assessment, we could choose it to provide some references. However, we need to acknowledge that the risk score is a relative value, which varies in different institutes and different detection methods. After unifying the testing methods, we need to collect as many samples as possible to identify the cut-off value to guide the oncologists.

5. Conclusions

In conclusion, we produced comprehensive analyses to investigate the vital role of an ECM-receptor interaction signature in gastric cancer. ECM compositions became accomplices in the tumorigenesis, progression, and poor survival of gastric cancer.

Abbreviations

ECM:Extracellular matrix
TCGA:The Cancer Genome Atlas
lasso:Least absolute shrinkage and selection operator method
STAD:Stomach adenocarcinoma
FPKM:Fragments per kilobase million
KEGG:Kyoto Encyclopedia of Genes and Genomes
CAMs:Cell adhesion molecules
GSEA:Gene set enrichment analysis
ssGSEA:Single-sample gene set enrichment analysis
CAC:Colon adenocarcinoma
OS:Overall survival
BP:Biological process
MF:Molecular function
CC:Cellular component
HR:Hazard ratios
95% CI:95% confidence interval
logFC:log2(fold change)
PCA:Principal component analysis.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors have no conflicts of interest to declare.

Authors’ Contributions

X. Yang and M. He searched the literature and designed the study. L. Chen collected the data. X. Yang and Y. Mao performed the data analyses. X. Yang and L. Chen were in charge of the validation cohort analyses. Z. Hu and M. He processed the figures. All authors participated in manuscript writing. All authors read and approved the final manuscript. Xiangchou Yang and Liping Chen contributed equally to this work.

Acknowledgments

We are very grateful to TCGA database. We appreciate Chen Yang’s help in bioinformatics. This work was supported by the Wenzhou Science and Technology Project (grant number 2019Y0216).

Supplementary Materials

Supplementary Figure 1: paired -test of ECM-receptor interaction-related genes with statistical difference of gastric cancer in the training cohort (TCGA). Supplementary Figure 2: paired -test of ECM-receptor interaction-related genes in the validation cohort (GSE29272). Supplementary Figure 3: consensus matrixes of the training cohort for from 3 to 9. Supplementary Table 1: patients’ clinical characteristics and their cluster classification. (Supplementary materials)