Abstract

Purpose. DNA methylation alterations play important roles in initiation and progression of clear cell renal cell carcinoma (ccRCC). In this study, we attempted to identify differentially methylated mRNA signatures with prognostic value for ccRCC. Methods. The mRNA methylation and expression profiling data of 306 ccRCC tumors were downloaded from The Cancer Genome Atlas (TCGA) to screen differentially methylated lncRNAs and mRNAs (DMLs and DMMs) between bad and good prognosis patients. Uni- and multivariable Cox regression analyses and LASSO Cox-PH regression analysis were used to select prognostic lncRNAs and mRNAs. Corresponding risk scores were calculated and compared for predictive performance in the training set using Kaplan-Meier OS and ROC curve analyses. The optimal risk score was then identified and validated in the validation set. Function enrichment analysis was conducted. Results. This study screened 461 DMMs and 63 DMLs between good prognosis and bad prognosis patients, and furthermore, nine mRNAs and six lncRNAs were identified as potential prognostic molecules. Compared to nine-mRNA status risk score model, six-lncRNA methylation risk score model, and six-lncRNA status risk score model, the nine-mRNA methylation risk score model showed superiority for prognosis stratification of ccRCC patients in the training set. The prognostic ability of the nine-mRNA methylation risk score model was validated in the validation set. The nine prognostic mRNAs were functionally associated with neuroactive ligand receptor interaction and inflammation-related pathways. Conclusion. The nine-mRNA methylation signature (DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL1) may be a useful prognostic biomarker and tool for ccRCC patients. The present results would be helpful to elucidate the possible pathogenesis of ccRCC.

1. Introduction

Clear cell renal cell carcinoma (ccRCC) is the most common histological subtype of renal cell carcinoma which accounts for more than 90% of cancers in the kidney [1]. ccRCC is characterized by high rates of progression and mortality, and patients have a dire prognosis [2]. Traditional clinical predictors of outcome such as clinical stages, grade, and necrosis could not achieve satisfactory results for ccRCC patients with similar clinical features but different outcomes [3]. Therefore, novel sensitive prognostic biomarkers for ccRCC are demanded.

Aberrant DNA methylation is a critical factor for initiation and progression of cancer [4]. Associations of DNA methylation with prognosis of ccRCC patients are gaining attention. For instance, based on genome-wide CpG methylation profiling, Wei et al. develop a five-CpG-based classifier as a reliable tool for survival prediction of ccRCC patients [5]. Using The Cancer Genome Atlas (TCGA) data, a recent study by Chen et al. finds that a four-gene DNA methylation signature is tightly related to prognosis of patients with kidney renal clear cell carcinoma and is able to serve as a prognostic predictor [6]. Moreover, a promoter methylation classifier panel of 172 differentially methylated CpGs is reported to be capable of classifying prognosis of nonmetastatic ccRCC patients [7]. Long noncoding RNAs (lncRNAs) modulate gene expression at various levels and facilitate cancer phenotypes via interaction with other cellular macromolecules, such as DNA, RNA, and protein [8, 9]. Important implications of lncRNAs have been demonstrated for tumorigenesis of ccRCC [10]. A large number of lncRNAs gain DNA methylation in ccRCC, and several lncRNAs with promoter methylation are related to outcome of patients [11]. However, aberrantly methylated lncRNAs which can function as prognostic biomarkers for ccRCC patients remain elusive.

In this paper, we tried to identify potential mRNA methylation signature and lncRNA methylation signature for survival prediction of ccRCC patients through a comprehensive analysis of mRNA and lncRNA methylation data downloaded from TCGA dataset, and the predictive performances of methylation or status risk scores were compared based on these prognostic lncRNAs or mRNAs to select the best prognostic model for ccRCC.

2. Materials and Methods

2.1. Datasets and Preprocessing

Methylation and gene expression data of 309 ccRCC tumor tissue samples and 24 normal control tissue samples were downloaded from TCGA repository using Illumina Infinium Human Methylation 450 BeadChip platform and Illumina HiSeq 2000 RNA Sequencing platform. Patients without complete survival information were excluded from the study, and the resulting 306 patients with ccRCC tumor tissue samples were used as the training set in the current study. E-MTAB-3274 [12] dataset consisting of methylome data of 107 ccRCC samples was obtained from EBI ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) based on Illumina Infinium Human Methylation 450 BeadChip platform. Of these samples, 102 samples had corresponding survival information and were used as the validation set.

According to probe sites and ID information, we identified lncRNAs and mRNAs from TCGA set and E-MTAB-3274 dataset based on HUGO Gene Nomenclature Committee (HGNC) [13] database (http://www.genenames.org/) that enrolls 4112 lncRNAs and 19201 protein coding genes. As a result, 13919 mRNAs and 1028 lncRNAs were identified.

2.2. Differential Methylation Analysis between Good and Bad Prognosis Samples

In the training set, differentially methylated lncRNAs and mRNAs (DMLs and DMMs) were identified between good prognosis patients (being alive with OS time lasting for more than 60 months) and bad prognosis patients (being dead with OS time lasting for less than 36 months). Significant DMLs and DMMs were defined as those with and . Next, two-way hierarchical clustering analysis of these significant DMLs and DMMs was carried out based on centered Pearson correlation algorithm.

2.3. Correlation Analysis between DNA Methylation and Expression of Significant DMLs and DMMs

We analyzed relationships of overall methylation and expression levels of preselected significant DMLs and DMMs through calculating Pearson correlation coefficient (PCC) and Spearman correlation coefficient (SCC) [14]. Subsequently, PCC of methylation and expression levels of each individual DML or DMM was computed. These DMLs and DMMs with significant negative correlations ( value <0.05) were selected to be applied in further analysis.

2.4. Prognostic Risk Scoring Model Building

In order to construct risk scoring models for predicting survival in ccRCC patients, we employed a step-by-step strategy to identify the prognostic DMLs and DMMs in the training set. Firstly, we performed univariable Cox regression analysis to assess the association between methylation levels of the aforementioned DMLs and DMMs with patients’ OS time. The resulting lncRNAs and mRNAs were considered significant if their values <0.05. Secondly, the significant lncRNAs and mRNAs were chosen to undergo a multivariable Cox regression analysis. The significant RNAs ( value <0.05) were determined to be independent predictors of prognosis and were then used to fit a L1-penalized (LASSO) Cox-PH regression [15] model with estimation of optimal lambda value through performing 1,000 cross-validations. The optimal feature combination was selected using LASSO regularized regression algorithm in penalized package. CV and lambda values were obtained by feature filtering. The small lambda value usually resulted in small penalty term value as well as over fitting phenomenon; therefore, we chose the largest lambda value which is corresponding to the lambda parameter value. Eventually, the resulting optimal set of predictive lncRNAs and mRNAs was applied to build risk scoring models based on their methylation levels or status together with the estimated multivariable Cox regression coefficients. Methylation status of an individual lncRNA or mRNA was defined according to optimal cutoff point of methylation value (Monte-Carlo value <0.05) which was determined based on patients’ survival through performing X-Tile Bio-Informatics Tool [16] (high expression, ; low expression, ). We utilized the following equations to quantify methylation or status risk scores for survival prediction:

,

Methylation risk score .

β RNAn suggests regression coefficient of RNAn; status RNAn and methylation RNAn suggest methylation status and values of RNAn, respectively.

Risk score was estimated for each patient using above formula. With median risk score as cutoff point, the training set or the validation set was dichotomized into a high-risk group and a low-risk group.

2.5. Statistical Analysis

Kaplan-Meier survival plots as well as log-rank test were used to analyze difference in OS time between two risk groups. Receiver operating characteristic (ROC) curve was plotted to compare sensitivity and specificity of different risk scores in predicting OS time of patients. Based on the training set, associations of clinical features and risk score status with patients’ OS were calculated using uni- and multivariable Cox regression analyses. Nomogram was built combining the optimal risk score status with prognostic clinical features. Calibration plots were used to evaluate predictive capability of the nomogram. Different packages of the R software (version 3.4.1) were used to implement all these bioinformatics and statistical analyses in the study: limma [17] package for differential methylation analysis and differential expression analysis; pheatmap [18] package (https://cran.r-project.org/ web/packages/pheatmap/index.html) for hierarchical clustering analysis; survival package (version 2.41-1, http://bioconductor.org/packages/survivalr/) for uni- and multivariable Cox regression analyses and Kaplan-Meier OS analysis; cor.test (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor. test. html) function for calculation of PCC and SCC values; pROC package for ROC curve analysis; penalized package (version 0.9.50) for LASSO Cox-PH regression model; rms package [19] (version 5.1-2, https://cran.r-project.org/web/packages/rms/index.html) for nomogram analysis; gene set enrichment analysis (GSEA) [20] software (http://software.broadinstitute.org/gsea/index.jsp) for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Significance was defined as value <0.05.

2.6. Functional Enrichment Analysis

According to the optimal risk score, the training set was divided into a high-risk group and a low-risk group. Between the two risk groups, differentially expressed genes (DEGs) were identified based on the threshold of and and then underwent KEGG pathway enrichment analysis. Significant signaling pathways were selected in the present study.

3. Results

3.1. Detecting DMLs and DMMs between Good Prognosis and Bad Prognosis Patients

The training set was consisted of 68 bad prognosis patients and 52 good prognosis patients. As shown in Figures 1(a) and 1(b), we found a total of 524 DMRs ( and ) between the good prognosis patients and the bad prognosis patients, including 461 DMMs and 63 DMLs. Furthermore, the 461 DMMs were composed of 119 hypomethylated mRNAs and 342 hypermethylated mRNAs, and the 63 DMLs were comprised of 34 hypomethylated lncRNAs and 29 hypermethylated lncRNAs. Results of two-way hierarchical clustering analysis for these patients were shown in a heatmap (Figure 1(b)). Based on methylation value, ccRCC patients were categorized into two groups.

3.2. Correlation between DNA Expression and Methylation

For the total of 524 identified DMRs, the overall methylation level was negatively correlated with the overall expression level (, value = 1.252-04; , value = 3.589-04). A significant negative correlation between methylation and expression of each individual DMR was observed for 222 DMRs, consisting of 27 lncRNAs and 195 mRNAs. These 222 DMRs were selected for further analysis.

3.3. Identifying a Nine-mRNA Methylation Signature and a Six-lncRNA Methylation Signature for Survival Prediction

Using data in the training set, we performed univariable Cox regression analysis of the 222 DMRs. Totally 124 resulting DMRs were significantly correlated with patients’ OS time ( value <0.05), including 113 mRNAs and 11 lncRNAs. Furthermore, 31 DMRs were determined to be independent prognostic factors in multivariable Cox regression analysis ( value <0.05), consisting of 24 mRNAs and 7 lncRNAs. Consequently, the 31 DMRs were used to fit a LASSO Cox-PH regression model. Using the optimal lamda value, as shown in Table 1 and supplement Figure 1 (Figure S1), we retrieved an optimal panel of 9 predictive mRNAs (, ; DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL) and an optimal panel of 6 predictive lncRNAs (, ; FAM138B, HCG11, KIAA0087, MIAT, PSORS1C3, and SNHG11).

Based on methylation levels or status of the 9 prognostic mRNAs or the 6 prognostic lncRNAs and their multivariable Cox regression coefficients, we designed formulas for calculating methylation or status risk scores as follows:

,

,

,

.

Nine-mRNA methylation risk score, nine-mRNA status risk score, six-lncRNA methylation risk score, and six-lncRNA status risk score were calculated for each patient in the training set, respectively. With median risk score as threshold, the training set was divided into a high-risk group and a low-risk group. For each risk score, difference in OS time was significant between the two risk groups according to results of Kaplan-Meier survival analysis and log-rank test (Figures 2 and 3). The nine-mRNA methylation risk score displayed markedly smaller value than the other three risk scores (nine-mRNA methylation risk score, and value = 3.109-10; six-lncRNA methylation risk score, and value = 1.984-05; nine-mRNA status risk score, and value = 9.245-10; six-lncRNA status risk score, and value = 7.318-05, Table 2). Validation was performed in the validation set to assess predictive performance of the four risk scores. Only the nine-mRNA methylation risk score could dichotomize the validation set into two risk groups with significantly different OS time (nine-mRNA methylation risk score, and value = 1.915-02; six-lncRNA methylation risk score, and value = 8.507-02; nine-mRNA status risk score, and value = 6.722-02; six-lncRNA status risk score, and value = 2.528-01, Table 2).

Results of ROC curve analysis were depicted in Figures 2 and 3 and Table 3. AUC values of nine-mRNA methylation risk score for the training set and the validation set were 0.987 and 0.884, respectively, which are higher compared to the other risk scores. All above observations suggest that the nine-mRNA methylation risk score performs better than other risk scores for survival prediction of ccRCC patients.

3.4. Characterization of Four Independent Prognostic Features and Construction of a Composite Nomogram

The univariable Cox regression analysis in the training set indicated that age, histologic grade, pathologic N, pathologic T, pathologic stage, hemoglobin, platelet qualitative, and nine-mRNA methylation risk score were significantly related to patients’ OS time ( value <0.05, Table 4). Furthermore, histologic grade (, , and value = 1.56-02), pathologic stage (, , and value = 2.26-02), platelet qualitative (, , and value = 2.28-02), and nine-mRNA methylation risk score model (, , and value = 2.17-04) were found to be independent predictive factors of prognosis in multivariable Cox regression analysis (Table 4, Figure 4). It revealed that prognostic value of the nine-mRNA methylation risk score model was independent of clinical factors. To integrate the nine-mRNA methylation risk score with the three prognostic clinical variables, a composite nomogram was built (Figure 5(a)). Calibration curves for 5-year survival probability showed sound agreement between predicted and actual probabilities of 5-year survival (Figure 5(b)).

3.5. Functional Implications of the Nine Prognostic mRNAs

To investigate functional characteristics of the nine prognostic mRNAs in ccRCC biology, we performed functional enrichment analysis on the identified DEGs between the high-risk group and the low-risk group of the training set classified by the nine-mRNA methylation risk score. We found 715 DEGs between the two risk groups, consisting of 190 downregulated genes and 525 upregulated genes (Figures 6(a) and 6(b)). These genes were significantly involved in cytokine-cytokine receptor interaction, neuroactive ligand receptor interaction, toll-like receptor signaling pathway, and cell receptor signaling pathway (Table 5).

4. Discussion

Considering the prognostic potential of DNA methylation for ccRCC, the present work conducted a genome-wide analysis of mRNAs and lncRNAs methylation profiling of ccRCC samples from TCGA systematically, with a view to discover novel and reliable prognostic biomarkers. There were 461 DMMs and 63 DMLs between good prognosis and bad prognosis patients. Among which, 24 DMMs and 7 DMLs were identified to be independent predictive factors by uni- and multivariable Cox regression analyses. Eventually, a nine-mRNA methylation signature and a six-lncRNA methylation signature were determined by LASSO Cox-PH regression model. Four prognostic prediction systems were established, and four risk score models were generated, including nine-mRNA methylation risk score model, nine-mRNA status risk score model, six-lncRNA methylation risk score model, and six-lncRNA status risk score model. The nine-mRNA methylation risk score model performed better than other DML-based risk scores in predicting survival outcome of ccRCC patients. This study suggests that the nine-mRNA methylation risk score is reliable and powerful for classifying ccRCC patients according to prognosis.

Our study identified a prognostic nine-mRNA methylation signature, comprising of DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL1. DMRTA2, a member of the DMRT family, is involved in central nervous system development [21]. Methylation of DMRTA2 is associated with survival of patients with triple-negative breast cancer [22]. DRGX encodes dorsal root ganglia homeobox protein that is a transcription factor implicated in embryonic development of the nervous system. There is evidence that DGRX may be related to serum concentration of monocyte chemoattractant protein-1, an inflammation biomarker, in subjects treated with fenofibrate, an anti-inflammatory and triglyceride-lowering drug [23]. FAM167A, also namely C8orf13, belongs to the FAM167 (SEC) family. The polymorphisms of FAM167A-BLK region are linked to susceptibility of autoimmune diseases [24, 25]. Low expression of FAM167A is related to longer survival time of ccRCC patients [26]. FGGY encodes protein FGGY carbohydrate kinase domain containing proteins that phosphorylate carbohydrates [27]. FGGY is significantly upregulated during neurogenic skeletal muscle atrophy [28]. FOX family members play roles in regulating cell growth and differentiation [29], and FOXI2 expression is associated with survival of ccRCC patients [30]. KRTAP proteins are major components of hair shaft, playing crucial roles in human hair shaft keratinization [31]. KRTAP2-1 shows an association with chemosensitivity of colorectal cells to oxaliplatin [32]. TCTEX1D family members contain a conserved domain similar to C-terminus of TCTEX1, with little information regarding their biological function [33]. TCTEX1D1 is found to be hypermethylated and downregulated in endometrial cancer [34]. Tau Tubulin Kinase 1 encoded by TTBK1 is implicated in regulating microtube assembly and stabilization. TTBK1 phosphorylation activity is linked to several neurodegenerative diseases like Alzheimer’s disease [35]. UBE2QL1 protein is related to protein ubiquitination. There is evidence that UBE2QL1 is a candidate renal tumor suppressor gene [36]. Aberrant methylation of UBE2QL1 has been found in HBV-related hepatocellular carcinoma [37]. As far as we know, for the first time, the nine differentially methylated mRNAs are reported to be of prognostic value for ccRCC patients.

The training set of this study was dichotomized by the nine-mRNA methylation risk score model into a high-risk group and a low-risk group. Function enrichment analysis showed that the DEGs between the two risk groups were significantly enriched in cytokine-cytokine receptor interaction, neuroactive ligand receptor interaction, toll-like receptor signaling pathway, and cell receptor signaling pathways. Mounting evidences demonstrate that neuroactive ligand receptor interaction pathway is an important pathway for ccRCC [38, 39]. Toll-like receptors are critical participants in inflammation responses. Cancer-related inflammation has been accepted as an important hallmark of cancer [40]. These findings reveal that the nine prognostic mRNAs may be involved in neuroactive ligand receptor interaction and inflammation-related pathways. Further functional annotation of these mRNAs is needed to deepen our understanding of their biological implications in ccRCC prognosis.

It should be noted that this study is an extensive bioinformatics study based on published data. The results of these studies should also be further validated in vitro or in vivo models. We hope that these useful clues will help other researchers to carry out relevant research.

5. Conclusion

We report and validate predictive value of a nine-mRNA methylation risk score model for risk stratification of ccRCC patients. The nine-mRNA methylation signature, including DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL1, may be a useful prognostic biomarker for ccRCC patients. Further experimental investigations are required to confirm this prognostic signature.

Data Availability

All data used and/or analyzed in this study are available from TCGA database (https://portal.gdc.cancer.gov) or the EBI Array database (https://www.ebi.ac.uk/arrayexpress/).

Conflicts of Interest

The authors declare that they have no competing interests.

Supplementary Materials

Figure S1: the lambda parameter curve (left) and coefficient distribution chart (right) of (a) lncRNA and (b) mRNA. The lambda parameter curve was gained basing on cross-validation likelihood screening. The horizontal axis and vertical axis represent the different values of lambda and CVL, respectively, and the intersection of red dotted lines indicates the value of lambda parameters when CVL reaches the maximum value. The coefficient distribution charts for optimal prognosis were screened basing on Cox-PH model using L1-penalized regularized regression algorithm. (Supplementary Materials)