Abstract

Background. Although accumulating evidence suggested that a molecular signature panel may be more effective for the prognosis prediction than routine clinical characteristics, current studies mainly focused on colorectal or colon cancers. No reports specifically focused on the signature panel for rectal cancers (RC). Our present study was aimed at developing a novel prognostic signature panel for RC. Methods. Sequencing (or microarray) data and clinicopathological details of patients with RC were retrieved from The Cancer Genome Atlas (TCGA-READ) or the Gene Expression Omnibus (GSE123390, GSE56699) database. A weighted gene coexpression network was used to identify RC-related modules. The least absolute shrinkage and selection operator analysis was performed to screen the prognostic signature panel. The prognostic performance of the risk score was evaluated by survival curve analyses. Functions of prognostic genes were predicted based on the interaction proteins and the correlation with tumor-infiltrating immune cells. The Human Protein Atlas (HPA) tool was utilized to validate the protein expression levels. Results. A total of 247 differentially expressed genes (DEGs) were commonly identified using TCGA and GSE123390 datasets. Brown and yellow modules (including 77 DEGs) were identified to be preserved for RC. Five DEGs (ASB2, GPR15, PRPH, RNASE7, and TCL1A) in these two modules constituted the optimal prognosis signature panel. Kaplan-Meier curve analysis showed that patients in the high-risk group had a poorer prognosis than those in the low-risk group. Receiver operating characteristic (ROC) curve analysis demonstrated that this risk score had high predictive accuracy for unfavorable prognosis, with the area under the ROC curve of 0.915 and 0.827 for TCGA and GSE56699 datasets, respectively. This five-mRNA classifier was an independent prognostic factor. Its predictive accuracy was also higher than all clinical factor models. A prognostic nomogram was developed by integrating the risk score and clinical factors, which showed the highest prognostic power. ASB2, PRPH, and GPR15/TCL1A were predicted to function by interacting with CASQ2/PDK4/EPHA67, PTN, and CXCL12, respectively. TCL1A and GPR15 influenced the infiltration levels of B cells and dendritic cells, while the expression of PRPH was positively associated with the abundance of macrophages. HPA analysis supported the downregulation of PRPH, RNASE7, CASQ2, EPHA6, and PDK4 in RC compared with normal controls. Conclusion. Our immune-related signature panel may be a promising prognostic indicator for RC.

1. Introduction

Previously, colon cancer (CC) and rectal cancer (RC) are considered a single tumor entity (called colorectal cancer (CRC)) [1]. However, recent studies indicate that there are significant differences in epidemiology, pathology, molecular mechanisms, and the response to treatments [1]. The risk of developing RC is estimated to be four times higher than that of CC, and RC patients have a lower 5-year survival rate than CC patients (60% versus 72%) due to a poor response to current treatment options [13]. Thus, it is of particular importance to explore approaches to early separate RC patients with a high death risk and then provide improved specialized care to further reduce the overall mortality rate.

With the developments in sequencing technology and bioinformatics, recent scholars suggest that a molecular signature panel may be more effective for the prognosis prediction than routine clinical characteristics (such as the tumor-node-metastasis (TNM) stage) [4, 5]. Li et al. identified a four-mRNA signature panel as an independent prognostic factor for CRC. This four-mRNA signature panel can effectively predict the prognosis of CRC patients, with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.730. The stratified analysis indicated that the patients belonging to the same T stage (T3+T4), N stage (N1+N2), or TNM stage (III+IV) can also be stratified into the high-risk and low-risk groups using this 4-gene signature panel [6]. The study of Zuo et al. revealed that a six-mRNA signature panel had a significant prognostic value to discriminate the high-risk patients from the low-risk patients, with an AUC of 0.711 and 0.683 for 3-year and 5-year survival, respectively. This 6-gene signature panel was an independent factor of OS after adjustment for clinical factors and can predict different survival outcomes in patients with the early (or advanced) TNM stage [7]. Sun et al. developed a 12-gene expression signature panel to precisely predict the prognosis for CC patients, which could distinguish poor from good prognosis patients within stage II/III [8]. Chen et al. found that the signature panel consisting of 16 gene pairs formed by 24 genes had a better prognostic ability than the TNM stage (AUC: 0.724 vs. 0.703; concordance index (C-index): 0.869 vs. less than 0.8) at 5 years [9]. Although there were also studies to identify mRNAs associated with the prognosis of RC patients [1013], no reports specifically focused on the signature panel and compared its prognostic values with clinical factors.

In the present study, we aimed to (1) screen RC-related genes by weighted gene coexpression network analysis (WGCNA) [14], (2) develop a reliable mRNA signature panel for the prediction of overall survival (OS) in patients with RC using the least absolute shrinkage and selection operator (LASSO) method [15, 16], (3) validate its superior prognostic performance to various clinical features by stratified analysis and comparison of the AUC and C-index, and (4) establish a clinicopathologic-mRNA nomogram to improve the prediction accuracy clinically.

2. Materials and Methods

2.1. Data Access

The RNA-seq expression data (fragments mapped per kilobase of exon per million reads mapped, level 3) were retrieved from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov) database using “TCGA-rectal adenocarcinoma (READ)” as the keyword. There were 177 READ cases and 10 controls in this dataset. However, only 162 READ cases provided clinical information. Thus, TCGA dataset (including 162 cases and 10 controls) served as the training dataset for our following analyses. Furthermore, GSE123390 (platform: Affymetrix Human Transcriptome Array 2.0) and GSE56699 (platform: Illumina HumanHT-12 WG-DASL V4.0 R2 expression beadchip) microarray datasets were also obtained from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) database using “rectal cancer” as the keyword. GSE123390 was used for the WGCNA model validation because it investigated the mRNA expression profile in human RC tissues () and controls (). GSE56699 was used for the survival model validation because it provided the prognosis information for 61 of 72 RC cases. The study flowchart is illustrated in Figure 1.

2.2. Identification of Differentially Expressed Genes (DEGs)

In TCGA-READ and GSE123390 datasets, the DEGs were screened between RC and controls using the limma package of R (version 3.34.7; https://bioconductor.org/packages/release/bioc/html/limma.html) [17]. and were defined as the statistical threshold. All DEGs in these two datasets were subjected to the hierarchical clustering analysis using the pheatmap package of R (version 1.0.8; https://cran.r-project.org/web/packages/pheatmap), and the heat map generated was used to assess the heterogeneity of gene expression patterns between RC and controls. The Draw Venn Diagram online tool (http://bioinformatics.psb.ugent.be/webtools/Venn) was utilized to identify the shared DEGs between TCGA-READ and GSE123390 datasets. The functions of common DEGs were analyzed using the gProfiler tool (http://biit.cs.ut.ee/gprofiler/gost). Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Reactome pathways with an were considered to be statistically significant.

2.3. WGCNA

To screen genes associated with the development of RC, WGCNA was performed using the WGCNA package in R (version 1.61; https://cran.r-project.org/web/packages/WGCNA/index.html) [14], by which highly correlated mRNAs could be clustered into the same coexpression modules. WGCNA included six steps: (1) calculation of the expression and connectivity correlations of mRNAs between TCGA-READ and GSE123390 datasets; (2) selection of the soft threshold power () according to the scale-free topology criterion; (3) calculation of the topological overlap matrix dissimilarity between genes in TCGA-READ to build the dendrogram and identification of modules ( and ) by the Dynamic Tree Cut method [18]; (4) assessment of the preservation ( and ) of modules in two datasets using the modulePreservation statistics [19]; (5) enrichment of DEGs to modules using the hypergeometric algorithm [20]; and (6) association of coexpression modules with clinical information.

2.4. Protein-Protein Interaction (PPI) Network

The PPI pairs among DEGs in crucial modules were identified using the Search Tool for the Retrieval of Interacting Genes (STRING; version 11.0; https://string-db.org) database [21]. Only interactions with a combined were selected to construct the PPI network using Cytoscape software (version 3.4; http://www.cytoscape.org/) [22]. The functions of genes in the PPI network were analyzed using the KEGG, GO, and Reactome implements in the STRING. Significant GO terms, KEGG pathways, and Reactome pathways were chosen using an as the statistical threshold.

2.5. Development of the Prognostic Signature Panel

The univariate Cox regression analysis was used to screen OS-associated genes from the DEGs in the preserved modules. The DEGs with a log-rank in the univariate analysis were entered into the multivariate Cox regression model to identify independent predictors. An L1-penalized (LASSO) Cox proportional hazard model in the penalized package (version 0.9-5; http://bioconductor.org/packages/penalized/) [15, 16] was further applied on these independent prognostic DEGs to select the optimal subset of signature panels. The risk score model was established based on the expression levels of prognostic DEGs (ExpDEGs) and their LASSO coefficients ():

The patients were divided into the high-risk group and the low-risk group by using the median risk score as the cutoff. The OS differences between the high-risk group and the low-risk group were compared according to the Kaplan-Meier survival curve analysis and log-rank test. The predictive accuracy of the risk score was estimated through the AUC calculated from the ROC curve. These analyses were first carried out for TCGA-READ dataset and then validated in the GSE56699 dataset.

Moreover, univariate and multivariate Cox analyses were applied using TCGA-READ cohort to evaluate whether the risk score was independent of other clinical variables for the prognosis prediction. Kaplan-Meier survival curve analysis was used to identify whether the risk score was also an effective tool for stratification of patients with the same clinical characteristics. A nomogram that incorporated the risk score and clinical prognostic factors was developed for the prediction of 3-year and 5-year OS rates. The predictive power of the nomogram was assessed in terms of the AUC and the C-index (which was calculated using the survcomp package, http://www.bioconductor.org/packages/release/bioc/html/survcomp.html).

2.6. Analysis of Immune Cell Infiltration

The associations between the expression of selected prognostic genes and the abundance of six tumor-infiltrating immune cells in TCGA-PRAD (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells) were estimated from the Tumor Immune Estimation Resource (TIMER; https://cistrome.shinyapps.io/timer/) based on Spearman’s correlation test. adjusted by the tumor purity was considered significant.

2.7. Verification of Protein Expressions of Prognostic Genes

The immunohistochemical results for the prognostic-related DEGs and their interaction genes in rectal and normal tissues were downloaded from the Human Protein Atlas (HPA) to confirm their protein expression levels.

3. Results

3.1. DEGs between RC and Normal Controls

Based on the threshold of and , 1053 DEGs (including 830 downregulated and 223 upregulated) were screened between 162 RC tissues and 10 normal control tissues of TCGA dataset, while 1711 DEGs (including 728 downregulated and 983 upregulated) were identified between 28 RC tissues and 5 normal control tissues of the GSE123390 dataset. The volcano plot and heat map of these two datasets are shown in Figures 2(a) and 2(b) and Figures 2(c) and 2(d), respectively. In addition, the Draw Venn Diagram online tool was used to investigate the intersection of DEGs in these two datasets. As a result, 268 overlapped DEGs were found, but the expression trend was consistent only in 247 DEGs (Figure 2(e); Table S1).

These 247 genes were subjected to the gProfiler online toolset for the function enrichment analysis. The results showed that 9 GO molecular function terms (such as glycosaminoglycan binding: ribonuclease A family member 7 (RNASE7)), 57 GO biological process terms (such as response to endogenous stimuli: calsequestrin 2 (CASQ2), pyruvate dehydrogenase kinase 4 (PDK4), and C-X-C motif chemokine ligand 12 (CXCL12); ion transport: CASQ2 and CXCL12; response to organic substance: CASQ2, TCL1 family AKT coactivator A (TCL1A), and CXCL12; response to chemical: PDK4 and EPH receptor A6 (EPHA6); chemotaxis: CXCL12 and EPHA6; and regulation of heart contraction: CASQ2), 7 KEGG pathways (such as calcium signaling pathway: CASQ2), and 5 Reactome pathways (such as muscle contraction: CASQ2; GPCR ligand binding: CXCL12) were enriched (Table S2).

3.2. Identification of RC-Related Modules by WGCNA

The correlation analysis showed that there were positive correlations in the expression level (, ) and the connectivity (, ) of RNAs between the training dataset TCGA and the validation dataset GSE123390. Using TCGA dataset, the soft threshold power 10 was chosen to create networks with a scale-free topology ( reached 0.9 for the first time; the mean connectivity was equal to 1). The dendrogram displayed that the genes were clustered into 9 modules using TCGA dataset (Figure 3(a)). These modules were also validated in the analysis of the GSE123390 dataset (Figure 3(b)). Among them, blue, brown, grey, red, turquoise, and yellow modules were considered to be preserved because of their and value < 0.05 (Table 1). Brown and yellow modules were significantly enriched by DEGs (enrichment and value < 0.05), suggesting they may be particularly crucial for the development of RC (Table 1). The genes in the brown and yellow modules were also found to be significantly associated with the pathologic M, pathologic N, pathologic T, pathologic stage, survival time, and death of RC patients (Figure 3(c)).

3.3. Construction of a PPI Network

The 77 DEGs in the brown and yellow modules were uploaded to the STRING database to obtain their interaction relationships. As a result, 281 interaction pairs between 69 DEGs were identified (such as peripherin- (PRPH-) PTN, ankyrin repeat and SOCS box containing 2- (ASB2-) CASQ2/PDK4/EPHA6, and G protein-coupled receptor 15 (GPR15)/TCL1A-CXCL12), which were used to construct the PPI network (Figure S1). Function enrichment analysis obtained 30 GO biological process terms (such as regulation of heart contraction: CASQ2; regulation of ion transport: CASQ2 and CXCL12; regulation of tissue remodeling: PDK4; negative regulation of leukocyte tethering or rolling: CXCL12; and negative regulation of dendritic cell apoptotic process: CXCL12), 4 KEGG pathways (such as metabolism of xenobiotics by cytochrome P450 and chemical carcinogenesis), and 4 Reactome pathways (such as muscle contraction: CASQ2) (Table S3).

3.4. Development of a Prognostic Risk Score

The univariate Cox regression analysis identified that 35 DEGs were significantly associated with OS, including EPHA6 (hazard ratio ), CASQ2 (), and PDK4 () (Table S4). Eight of them were screened as independent prognostic predictors after the multivariate Cox regression analysis (Table S5). The use of a LASSO-based Cox PH model further identified that 5 DEGs (ASB2, GPR15, PRPH, RNASE7, and TCL1A) might constitute the optimal signature panel for the prognosis prediction (Table 2).

The prognostic risk score was estimated for each patient in the training TCGA dataset and the validation GSE56699 dataset by the following formula: . Based on the median value of the risk score, the patients were divided into the low-risk group and the high-risk group. Kaplan-Meier curve analysis showed that patients in the high-risk group had a significantly poorer prognostic outcome than those in the low-risk group (TCGA: , 95% confidence interval -14.88, and , Figure 4(a); GSE56699: , 95% -25.50, and , Figure 4(b)). ROC curve analysis demonstrated that this risk score had high predictive accuracy for unfavorable prognosis, with the AUC of 0.915 (Figure 4(c)) and 0.827 (Figure 4(d)) for TCGA and GSE56699 datasets, respectively.

Univariate and multivariate Cox regression analyses were also performed using the risk score and other clinical factors to confirm the independence of our five-mRNA signature panel. The results displayed that after multivariable adjustments for clinicopathological factors, the risk score remained significantly associated with patients’ OS (Table 3), suggesting this five-mRNA-based classifier was an independent prognostic factor. In addition, the risk score was shown to have the ability to further classify the population with age above 65 years (), pathologic M0 (), pathologic N0 (), and pathologic stage II () and predict their different prognosis results (Figure 5), indicating its prognostic superiority to routine clinical factors. Moreover, the time-dependent ROC curve analysis also demonstrated that the predictive accuracy of the risk score (; C-) was higher than that of age (; C-), pathologic M (; C-), pathologic N (; C-), and pathologic stage (; C-) and the model with all clinical factors (; C-) (Figure 6(a)). Therefore, the risk score should be integrated with the clinical factors to better predict the prognosis clinically, based on which a prognostic nomogram was developed (Figure 6(b)). As expected, the AUC (0.976) and the C-index (0.913) of the nomogram were higher than those of any clinical factor model and the risk score model (Figure 6(a)).

3.5. Correlations between mRNA Levels of Prognostic Genes and Tumor-Infiltrating Immune Cells

TIMER analysis revealed that there was a significant correlation between ASB2/TCL1A expression and B cells, CD4+ T cells, and dendritic cells; the expression of GPR15 was positively associated with the abundance of B cells and dendritic cells (DCs); the expression of PRPH positively correlated with the infiltration levels of CD4+ T cells and macrophages (Figure 7). No significant association was observed between the expression of RNASE7 and infiltration levels of all six immune cells (Figure 7).

3.6. Validation of Protein Expressions of Prognostic Genes

The expressions of 5 signature genes and their interaction genes were validated using the immunostaining results from the HPA database. The results supported the downregulation of PRPH, RNASE7, CASQ2, EPHA6, and PDK4 in RC compared with normal controls (Figure 8). GPR15, TCL1A, and PTN were not detected in both RC tissues and normal rectal tissues. There was no evidence of immunostaining for ASB2 and CXCL12. No rectal samples were collected to investigate the expression of PDK4 in RC.

4. Discussion

In the present study, we established a risk score model based on five prognostic DEGs (ASB2, GPR15, PRPH, RNASE7, and TCL1A). ROC curve analysis indicated that this 5-mRNA signature panel can accurately predict the prognosis for patients with RC, with the AUC of 0.915 and 0.827 for the training and validation datasets, respectively. The prognostic performance of our risk score seemed to be better than that of previously reported signature panels developed for CRC (such as 4 genes: for the external dataset and 0.607 for the internal dataset [4]; 6 genes: [7]; and 9 genes: [23]) or CC (16 gene pairs: [9]; 9 genes: [22]). In addition, in the study of Zuo et al. [7], they performed a subgroup analysis to confirm whether the 6-gene signature panel was effective for colon adenocarcinoma (COAD) and READ. As a result, the AUC was, respectively, 0.653 and 0.74 for COAD and READ, which were both lower than that of our signature panel. Moreover, in line with other signature panels identified for CRC or CC patients [69, 24], our risk score was demonstrated to be an independent factor for the prognosis prediction and stratify the survival of patients with the same TNM stage. Also, the AUC and the C-index of the risk score were higher than those of age (; C-), pathologic M (; C-), pathologic N (; C-), and pathologic stage (; C-) and the model with all clinical factors (; C-). These findings reveal that our risk score may serve as an effective molecular biomarker to predict the poor prognosis of patients with RC. To better guide prognostication in clinical practice, some authors suggest that molecular prognostic models and clinicopathological models should be combined [2427], which showed the highest predictive power compared with anyone. In agreement with these studies, our results also showed that the nomogram that integrated the five-mRNA classifier and four clinical risk factors (age, pathologic M, pathologic N, and pathologic stage) had the highest AUC (0.976) and C-index (0.913).

Although all of the 5 signature genes were not included in the previous signature panels for CRC [69, 24], some of them were found to be associated with the progression and prognosis of CRC (including ASB2, GPR15, TCL1A, and PRPH) [2831]. High ASB2 expression was shown to predict a short relapse-free survival for patients with CRC [28]. Deletion of ASB2 in hematopoietic cells inhibited the shortening of the colon and the tumor load in mice. Function analysis indicated that ASB2 may exert tumor-promoting roles by decreasing Th1, Th17, and cytotoxic CD8+ T cell response which are beneficial for protection against tumor progression [28]. Consistent with the study of Spinner et al. [28], our results also demonstrated that patients with a high level of ASB2 may have a 12.505-fold higher risk of possessing a worse OS rate than those with low expression. Also, ASB2 was predicted to interact with the chemotaxis-associated EPHA6 gene which could be hypermethylated to result in downregulated EPHA6 expression by anti-inflammatory interleukin-6 [32, 33], a Th17 cell biomarker [34]. The knockdown of EPHA6 decreased prostate cancer cell invasion in vitro and reduced lung and lymph node metastasis in vivo [35]. High EPHA6 expression was associated with a lower OS rate in patients with breast cancer [36] and our RC (). In addition to EPHA6, we also predicted that ASB2 may be involved in RC by influencing the expressions of CASQ2 and PDK4. Highly expressed CASQ2 [37] and PDK4 [38] were reported to be significantly correlated with poor OS and disease-free survival in cancer patients. Overexpression of PDK4 promoted cell proliferation, invasion, and tumor growth in vivo [38]. These prognosis conclusions of CASQ2 () and PDK4 () were also confirmed in our study on RC. TCL1A is crucial for cancer development by expressing in a subpopulation of immune B cells (CD3-/CD19+/CD10+/CD34-). A high TCL1A/CD20 (B cell) ratio or TCL1A expression was shown to correlate with improved survival [39, 40]. In agreement with other cancers [39, 40], our study also showed that TCL1A was a protective risk for the survival of RC patients () and positively associated with the abundance of B cells. Furthermore, we predicted TCL1A may interact with CXCL12, a chemokine gene that was speculated to be involved in the regulation of the dendritic cell apoptotic process in our function enrichment analysis. High CXCL12 expression was reported to confer a survival advantage for breast cancer patients [41] and stage III CC [42]. Silencing of CXCL12 by transforming growth factor-β in mesenchymal stromal cells of the primary tumor site promoted the tumor metastasis by increasing the expression of CXCR7, a CXCL12 receptor [43]. A meta-analysis showed that immunotherapy with DCs significantly improved OS at 6 months, 1 year, 3 years, and 5 years of patients with hepatocellular carcinoma [44]. Coculture of DCs significantly inhibited liver cancer stem cell growth in vitro and in vivo [45]. Consistent with these findings, we also found that the expression of TCL1A was significantly positively correlated with the infiltration of DCs. Using TCGA and the genotype-tissue expression data, Wang and Wang found that GPR15 was significantly lowly expressed in COAD and READ compared with normal tissues [30], which was validated in our study. GPR15 expression was significantly positively correlated with the prognosis of patients with COAD (that is, the high expression had a longer OS) [30], which was also observed in our study of READ. Wang and Wang believed that GPR15 may be a tumor suppressor by regulating a serial of genes enriched in immune systems and increasing the infiltration of B cells (in neck squamous carcinoma, lung adenocarcinoma, and stomach adenocarcinoma), CD4+ T cells, and DCs (in neck squamous carcinoma and stomach adenocarcinoma) [30]. Similarly, we found that the expression of GPR15 was positively associated with the levels of tumor-infiltrating immune B cells and DCs and predicted that GPR15 could interact with DC-related CXCL12 to participate in RC progression. CD133+ human umbilical hematopoietic progenitor cells were revealed to promote the proliferation and invasion of CRC cells in vitro and enhance tumor growth and metastasis in vivo by upregulating PRPH [31]. However, the mechanisms of PRPH in CRC remained unclear. In this study, we speculated that PRPH may function by interacting with downstream PTN. Tumor-associated macrophages increased the proportion of cancer stem cells in lymphoma by secreting PTN [46]. Upregulated PTN promoted tumor cell proliferation and inhibited apoptosis and chemosensitivity by activating the NF-κB pathway [46, 47]. Meta-analysis showed that high expression of PTN was significantly associated with an advanced TNM stage and a poor OS in tumor patients [48]. Similar to these studies, we also reported that PRPH was positively associated with tumor-associated macrophages.

Although RNASE7, encoding an antimicrobial peptide, was not demonstrated to be associated with CRC, the roles of itself and its family members in other cancers may indirectly verify our conclusions. Scola et al. reported that the expression of RNASE7 was gradually reduced during the malignant transformation process, showing the highest expression in healthy skin and the lowest expression in oral squamous cell carcinoma [49]. The low expression of RNase family members contributed to the loss of immune defense against bacterial infections [5053], which is an important cause for the initiation of cancer. The knockdown of RNase L increased prostate cancer cell migration [54] and enhanced tumor growth and metastasis following implantation in the mouse prostate [55], the mechanism of which was related with an increased cell surface expression of integrin β1 and activation of the focal adhesion kinase-sarcoma pathway and the Ras-related C3 botulinum toxin substrate 1-guanosine triphosphatase activity [54]. Colorectal tumors with lower levels of RNase H2 exhibited a significantly shorter survival time [56]. In line with these studies, we also demonstrated that RNASE7 was downregulated in RC and patients with a higher level of RNASE7 had a longer OS compared with controls.

Some limitations should be acknowledged in this study. First, the prognostic signature panel was developed and validated based on the survival information retrospectively collected from the public datasets (TCGA and GSE56699). Prospective trials needed to be performed in our hospital to further verify the prognostic value of this signature panel. Second, the expressions of these signature DEGs were also identified using public TCGA and GSE123390 datasets. In these datasets, the number of samples in the normal group was quite smaller than that in the cancer group. This imbalance may cause a statistical problem. Consistent sample size in the RC and control groups should be designed to further confirm their expressions. Third, clinical (PCR, immunohistochemistry, and Pearson’s correlation), in vitro (coimmunoprecipitation, knockdown, overexpression, or coincubation of immune cells), and in vivo (tumor transplantation, mimics, siRNA transfection, and immunotherapy) experiments should be conducted to explore the PPI relationships between our signature genes (PRPH-PTN, ASB2-CASQ2/PDK4/EPHA67, and GPR15/TCL1A-CXCL12) and assess the functions of our signature genes in the progression of RC (especially RNASE7, which was not reported in CRC previously). Fourth, other grouped variable selection methods (such as Elastic net and CoxBoost) [57] for identification of prognostic signature panels should be used individually or jointly with LASSO to identify more effective prognostic indicators for RC. Fifth, the expression, prognostic power, and functions of signature genes should be compared between the CC and RC samples.

5. Conclusion

Our study developed a five-mRNA signature panel (ASB2, GPR15, PRPH, RNASE7, and TCL1A) as an immune-related prognostic biomarker for RC. This signature panel exhibited excellent accuracy to stratify the patients with a higher death risk. The nomogram that combined the risk score and clinical features (age, pathologic M, pathologic N, and pathologic stage) may be more effective in guiding the clinical decision-making of personalized treatment.

Data Availability

The datasets generated for this study can be found in GEO (http://www.ncbi.nlm.nih.gov/geo/; GSE123390, GSE56699) and TCGA (https://gdc-portal.nci.nih.gov/).

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

HH, XSL, TXS, and HKP conceived the study. HH and XSL collected the data and performed the statistical analysis. CAD, LF, and WJZ participated in the investigation of the literature and interpretation of the results. HH and XSL drafted the article. TXS and HKP revised the manuscript. All authors have read and approved the final version of the article. He Huang and Shilei Xu contributed equally to this work.

Supplementary Materials

Figure S1: protein-protein interaction network constructed by crucial module genes. Red, upregulated expression; green, downregulated expression. Table S1: common DEGs identified in GSE123390 and TCGA datasets. Table S2: function enrichment for the common differentially expressed genes in two datasets. Table S3: function enrichment for genes in the PPI network. Table S4: univariate Cox regression analysis for DEGs associated with OS. Table S5: multivariate Cox regression analysis for DEGs associated with OS. (Supplementary Materials)