Introduction

Male breast cancer (BCa) represents ~1% of all newly diagnosed cancers in men1 and ~1% of all breast cancers2. Research into this rare disease has been limited, with treatment largely extrapolated from knowledge about female BCa3. Surgical management is usually modified radical mastectomy, with a minority of patients being offered breast conserving treatment4. Local and systemic treatment is largely informed by treatment indications and regimens used in female breast cancer. However, for adjuvant endocrine treatment, the use of aromatase inhibitors (AIs) alone is not recommended, with tamoxifen for at least 5 years indicated for ER/PgR positive tumors3,5. Where AIs are indicated, for example in metastatic male BCa, pituitary blockade with an LHRH agonist or orchiectomy is recommended6.

Genetic counselling is recommended for all men with BCa, regardless of family history, due to strong links between male BCa and BRCA2 mutations, seen in 10% of men with BCa3,7,8,9. Transcriptomic multiparametric assays are now integrated into clinical management guidelines for early female BCa10 both as prognostic tools and to identify patients for adjuvant chemotherapy11,12. Most guidelines refer exclusively to female BCas with respect to the use of these multiparametric assays. Data relating to these tests in male BC are from retrospective series, most with small numbers of cases limited to evaluation of prognosis for single tests13,14,15,16; however, we are not aware of any analyses which provide comparative data on multiple signatures with respect to patient outcome. We developed a method to compare signatures using a combined quantitative mRNA array covering key molecular signatures17, which have been trained against the results of the same signatures measured by the original methodology18. We describe here an analysis of male BCa samples from the EORTC cohort19 using these “trained” signatures to compare the result of each test and to determine the association between test result and prognosis in the context of a multi-institutional male BCa cohort.

Results

mRNA profiling

Out of the 1483 patients included in the parental study, 699 (47.1%) patients met the eligibility criteria of the research project; the main reasons for exclusion were missing tissue, event status, or event dates (see Supplementary Fig. 1). As previously reported19, no evidence for a selection bias due to missing data has been identified. From these, 389 samples had sufficient material for extraction and 381 samples yielded sufficient RNA.

All 381 samples assayed were successfully analyzed using the custom NanoString gene expression panel and passed the quality control (Supplementary Fig. 1, Supplementary Table 1).

Distribution of gene signatures and concordance

Table 1 details the distribution of risk classification across tests, which markedly differs from one gene signature to another. The proportion of high risk patients ranged from 15.7 to 63.5% and for low risk patients the range was 9.4–53.5% (Table 1). For tests with 3 risk groups, the proportion of intermediate risk cases ranged from 38.3 to 55.6%. Low risk cases ranged from 9.4 to 53.5% of all cases and high risk cases from 15.7 to 63.5%. Cross-tabulations for all test combinations are shown in Supplementary Tables 215. When cross-tabulating the pairs of gene signatures, the C-index values ranged from 40.3 to 78.2% and kappa values ranged from 0.17 to 0.58, indicating slight to moderate agreement between the gene signatures (Table 2). Using the Prosigna-trained sub-type classification, 0.8% of cases were Basal-like, 3.4% HER2-enriched, 20.2% Luminal A and 75.6% Luminal B.

Table 1 Distribution of risk scores by test.
Table 2 Cross-tabulation of risk classification: low or low + intermediate vs high.

Survival analysis

Seventy-four patients experienced a locoregional recurrence or distant progression qualifying as events for TTR endpoint, of whom, 55 (74.3%) reported a distant progression as first event. Sixty-one patients experienced a distant recurrence qualifying as events for the TTDR endpoint and 38 patients died after a distant recurrence (BCSS endpoint). Seventy-four patients died in the absence of a distant progression and these deaths were considered as competing risks.

Also critical to this analysis, with respect to outcome, when competing risks of deaths not preceded by distant recurrence were accounted for, the cumulative incidence of specific BCa-related events were consistently lower in patients classified as low risk. The 5-year cumulative incidence of locoregional or distant recurrence in low risk patients ranged from 3.7% (95%CI: 0.3–16.3) (Prosigna-trained) to 10.2% (95%CI: 5.6–16.2) (Risk 95-Gene; Figs. 12, Supplementary Figs. 25) and from 3.7% (95%CI: 0.3–16.3) (Prosigna-trained) to 8.2% (95%CI: 4.1–14.0) (Risk 95-gene) for distant recurrence. For high risk patients, 5-year cumulative incidence of locoregional or distant recurrences ranged from 24.6% (95%CI: 13.8–37.1) (IHC4-RNA) to 20.0% (95%CI: 14.2–26.5) (95-gene) and from 22.1% (95%CI: 15.4–29.7) (Prosigna-trained) to 17.8% (95%CI: 12.2–24.2) (95-gene) for distant recurrence. Regarding BCSS, the 5-year cumulative incidence was below 5% for low risk patients for all tests, whilst for high risk patients the rates ranged between 9.8% (95%CI: 5.8–14.9) and 13.7% (95%CI: 7.1–22.4).

Fig. 1: Oncotype DX-trained cumulative incidence of clinical outcomes by risk category.
figure 1

a Oncotype DX-trained risk classification for Time to Relapse; b Oncotype DX-trained risk classification for Time to Distant Relapse; c Oncotype DX-trained risk classification for Breast Cancer Specific Survival. Cumulative incidence rates for low (blue line), intermediate (green line), and high (red line) Oncotype DX-trained results with corresponding 95%CI (shaded areas) estimated by cumulative incidence function accounting for deaths not preceded by a distant relapse as competing risks. Total events/risk group (Events/Total) represent all events observed during follow up (up to 12 years). CIF = 5-year cumulative event frequency (percent) at 5 years with estimated 95% confidence intervals (95% CI).

Fig. 2: ROR-PT-trained cumulative incidence of clinical outcomes by risk category.
figure 2

a ROR-PT-trained risk classification for Time to Relapse; b ROR-PT-trained risk classification for Time to Distant Relapse; c ROR-PT-trained risk classification for Breast Cancer Specific Survival. Cumulative incidence rates for low (blue line), intermediate (green line), and high (red line) ROR-PT-trained results with corresponding 95%CI (shaded areas) estimated by cumulative incidence function accounting for deaths not preceded by a distant relapse as competing risks. Total events/risk group (Events/Total) represent all events observed during follow up (up to 12 years). CIF = 5-year cumulative event frequency (percent) at 5 years with estimated 95% confidence intervals (95% CI).

For all signatures evaluated, there was evidence of prognostic information associated with statistically significant risk stratification in low, intermediate (where applicable), and high risk groups (Figs. 12, Supplementary Figs. 25, Table 3). Each signature provided statistically significant separation of patients into low, intermediate (where appropriate) and high risk groups with respect to TTR, TTDR, and BCSS in univariate analyses (Gray test < 0.01, except for the IHC RNA4 signature) (Figs. 12, Supplementary Figs. 25). On the contrary, in multivariate-adjusted analyses for TTR and TTDR, the effect of gene-signatures was no longer significant (Table 3).

Table 3 5-year estimate of cumulative incidence and hazard ratio by outcome, risk category and gene signature.

The 5-year AUC from time-dependent ROC analysis ranged from 0.63 (95%CI: 0.54–0.72) (IHC4-RNA) to 0.72 (95%CI: 0.64–0.80) (Prosigna-trained) for TTR and from 0.65 (95%CI: 0.55–0.74) (IHC4-RNA) to 0.73 (95%CI: 0.64–0.82) (Prosigna-trained) for TTDR. Regarding BCSS, AUC is about 0.75 for each gene signature and is maximal for Risk-95 gene signature (AUC = 0.82, 95%CI: 0.71–0.93) (Supplementary Figs. 68).

Discussion

The use of molecular prognostic assays in female BCa is now well established, but evidence relating to their performance in male BCa patients is sparse. In this study we show, using computational methods to recapitulate multiple BCa prognostic signatures, evidence for the prognostic impact of multiple gene signatures. However, we also show evidence of discordance between signatures, applied to the same case, similar to that seen in female BCa20. This study highlights the potential utility of molecular prognostic signatures in male breast cancer but suggests that more research is needed if we are to fully understand the potential value of different approaches to assessing prognosis and directing treatment, using molecular tools, in men with breast cancer.

Critical to our study is the close correlation between the computationally derived “signature trained” scores and true results as shown in our recent paper18. For ROR-PT results the correlation coefficient between “trained” and true assay results was 0.93, comparing true to “trained” results showed 90% of cases within the same risk category (low, intermediate, high—see ref. 18). Similarly for “Oncotype Dx-trained” results the correlation coefficient between true and “trained” results was 0.87 with 75% of results giving the same risk category (see ref. 18) and only 1% of cases disagreeing by more than 1 risk category. For MammaPrint -trained results, which were calculated only as categorical high versus low risk groups, over 90% of cases were classified in the same risk group by “trained” and true results18. Full details of these results are reported elsewhere18. For Genomic Grade Index and IHC4, we did not have access to actual assay results to enable us to train signatures as we did for other signatures, and for Endopredict we did not have enough genes covered to allow recapitulation of this signature.

Using a common analysis platform and computational methods to recapitulate prognostic scores we found that all molecular signatures tested: Oncotype DX-trained21,22, Prosigna-ROR-PT23,24, MammaPrint25,26,27, Genomic Grade Index28, IHC4-mRNA based IHC429, and our novel 95-gene signature17 demonstrated the ability to segregate male BCas into high and low prognostic risk groupings. All signatures were associated with significant differences in 5-year survival for time to recurrence, time to distant relapse and breast cancer-specific survival, between low and high risk groups in univariate analyses (Figs. 12, Supplementary Figs. 25). However, due to the relatively small number of breast cancer-specific events, we were unable to demonstrate the statistically significant prognostic impact of the majority of signatures in multivariate analysis when adjusting for the following key clinico-pathological covariates:age, grade, nodal status and tumor size and treatment variables (adjuvant chemotherapy, radiotherapy, endocrine treatment). There have been few reports on the utility of prognostic signatures, developed using female BCas, when applied to male BCas and these have largely focused on the utility of Oncotype DX13,15. Previous studies showed an 81% 5-year BCa-specific survival for men with recurrence scores >31, slightly lower than the 86.3% 5-year BCSS observed for men with recurrence scores >25 shown in the current study, but commensurate with the different thresholds used13,15. In the study by Massarweh et al.15, 27.8% of men exhibited RS > 25 compared with 12.4% with scores >31. Given the modest number of events in both studies, we believe our results are broadly comparable to those reported by Massarweh et al.15. Results from a similar study by Wang et al.14 show a higher all-cause mortality rate in all risk groups, but is limited by failure to exclude competing causes of death, which accounted for almost 50% of events in our current study. This high percentage may be due to the fact that male BC patients are older and have more co-morbidities than their female counterparts. We are unaware of studies reporting patient outcome in male BCa when stratified by tests other than Oncotype DX, making it more challenging to draw comparisons between studies using these molecular assays. With respect to the 50-gene signature driving molecular subtypes (Prosigna/PAM50), a study by Sanchez-Munoz et al.16, profiled 67 invasive male BCas using the NanoString panel identifying 60% of cases as Luminal B, 30% Luminal A and 10% HER2-enriched; which is consistent with our findings18, however, we are not aware of any studies of male BCas profiled reporting Prosigna risk scores.

As with prior comparisons in female BCa20, we demonstrate poor agreement between risk signatures in male BCa with kappa values ranging from 0.17 to 0.58 (Table 2). This modest agreement reinforces observations from larger cohorts of female BCas that different molecular risk scores based on limited mRNA panels may not capture all features related to risk in this population. This conclusion is supported by multiple analyses showing the added value of combining multiple risk signatures in female BCa30 and our own recent data highlighting the modest AUCs associated with different molecular signatures17 with respect to predicting outcome. Despite different methodologies used, AUCs of time-dependent ROC curves at 5 years for male breast cancer cases fall within the same range of the AUCs reported in female patients17. This provides no indication that different cut points for risk would apply to male rather than female breast cancer, however, given the small sample size of the present study it is premature to exclude this possibility entirely. All signatures assessed would appear appropriate for use in male breast cancer patients.

There are several key limitations to our current research project. Firstly, we have used computational methods to calculate the relevant risk signatures rather than the original assays as used in the clinical setting. This limitation is offset in part by the use of a training and validation approach to benchmark results for Oncotype DX, Prosigna, and MammaPrint results against true assay results18, but remains a limitation for other tests. Secondly, the analyses were conducted in a retrospective dataset in which not all data were systematically collected in all patients including, the cause of death is not reported for a substantial number of patients leading to a substantial proportion of competing risks. As a result, we have not presented overall survival (all causes) data since this would be confounded by the lack of data as to cause of death in many patients. Despite this study representing one of the largest cohorts of male breast cancers analyzed to date, the sample size and in particular the number of breast cancer related events, limit the statistical power of this analysis. In particular, we were not able to compare the impact of multiple tests performed in sequence due to a lack of statistical power nor were we able to assess the potential impact of tests on chemoprediction. Notwithstanding these limitations we are able to show the ability of a number of existing multiparametric tests (including MammaPrint, Oncotype Dx, Prosigna ROR-PT, Genomic Grade Index and a novel 95-gene signature) to provide useful prognostic information in male breast cancer. These data provide evidence to support the utility of multiple prognostic assays in the context of male breast cancer. Further research to identify the optimal prognostic approach to male breast cancer, perhaps including genomic features such as mutations and copy number alterations, is warranted in addition to investigating the role of intratumoural heterogeneity.

Methods

Patients and samples

The retrospective cohort study of the EORTC/TBCRC/BIG/NABCG International Male Breast Cancer Program enrolled male patients with histologically proven BCa, diagnosed between 1990 and 2010, across multiple participating institutions19. Ethics approval was provided by the University of Toronto (#30035), a waiver of consent was approved since patient contact was not feasible due to death or loss to follow-up and the research involved no risk to patients whose identify was coded and confidentially protected. Patients with all disease stages (early, locally advanced, and metastatic) were included, irrespective of the treatment received. Availability of a tissue sample (Formalin-Fixed-Paraffin-Embedded—FFPE) of good quality was mandatory for enrollment. Biological material was handled and analyzed centrally according to published guidelines for adoption across BCa clinical trials, conducted by BIG and NABCG, in 200831. Patients in this research project were selected from the retrospective cohort study based on the following exclusion criteria: patients ineligible for the analysis of the parental retrospective cohort, with metastatic (M1/MX) disease, ER-ve per central pathology or local pathology (if central pathology not available), HER2+ve or unknown based on central pathology, insufficient information for assessment of recurrence free survival. In addition, samples with insufficient RNA or which failed the quality control criteria were excluded. All institutions participating in the retrospective cohort study obtained ethical approval from their institutions including consent waivers.

RNA extraction and expression profiling using NanoString

Profiling of all samples was performed using mRNA extracted and analyzed using the NanoString codeset as described previously17 at the Ontario Institute for Cancer Research (OICR).

Derivation of signature-trained risk stratification scores from candidate assays

Based on our study comparing two different approaches to the generation of simulated risk scores18 we selected a training and validation approach based on results obtained from the OPTIMA prelim study20 to best fit risk stratification scores generated for this study to those derived from the relevant commercial assay. For all tests we used the suffix “-trained” to discriminate the computationally derived assays scores from the commercially derived scores, e.g., Oncotype DX-trained vs. Oncotype-DX®. For each of the commercial test, cases were grouped into pre-defined risk categories according to the cut-points: Oncotype DX—low risk <18, intermediate risk 18–25, high risk ≥25; Prosigna—low risk <40, intermediate risk 40–60, high risk ≥61; MammaPrint—low risk and high risk as described in ref. 18. We modified the original cut point for “high risk” for the Oncotype DX test in line with reported results from the TAILORx trial11,32 and our previous reported results from OPTIMA prelim20. For “Prosigna”, results refer throughout to the ROR-PT risk score in clinical use, which includes tumor pathological size. For the Genomic Grade Index (GGI), the suffix “-like” refers to recapitulation of the risk score as previously described though not trained against a benchmark dataset. The IHC4-mRNA signature is similarly modelled to estimate risk by the transcriptomic expression of ER, PgR, Ki67, and HER2 originally based on the immunohistochemical signature described by Cuzick et al.33 The 95-gene signature has been previously described by our group18.

Statistical analyses

Results from the expression profiling using NanoString were provided to EORTC to perform the statistical analysis of clinical data, long term outcomes, and local and central pathology data. Descriptive statistical analysis was performed for patient characteristics, disease characteristics, and treatment(s) administered.

Cross-tabulation of risk classification (low, intermediate—where applicable, high) as defined by the different gene signatures were tabulated to assess concordance and agreement of classification across the different gene signatures. Concordance index and kappa agreement coefficients and their corresponding 95% confidence interval (CI) were estimated. When cross-tabulating gene signatures with different number of categories (i.e., three categories such as low, intermediate, high versus two categories such as high, low), the intermediate category was combined with the low category and Cohen’s simple kappa was estimated while for ternary versus ternary comparisons, the weighted kappa was used.

The prognostic value of the gene signatures was assessed for the following endpoints: time to distant relapse (TTDR) defined as the time until the first distant progression, time to relapse (TTR) defined as the time until the first loco-regional recurrence or distant progression, breast cancer-specific survival (BCSS) defined as the time until breast cancer related death, considering death preceded by a distant relapse. For these endpoints, deaths in the absence of distant relapse are considered as competing risk. The endpoints were calculated from the time of first diagnosis of BCa. Patients without an event for the above endpoints were censored at the last date known alive.

The event rates at 5 years and corresponding 95% confidence intervals were estimated by the cumulative incidence method. Cumulative incidence functions between the risk groups were compared based on the Gray test at a significance level of 0.05. Fine and Gray models were used to estimate the univariate and adjusted hazard ratio (HR) and their corresponding 95%CI. The multivariate models were adjusted for known prognostic clinico-pathological variables (age, grade, nodal status and tumor size) and treatment variables (adjuvant chemotherapy, radiotherapy, endocrine treatment) and the multivariate p-value was estimated with the use of a Wald test. Due to the low number of events for BCSS, only univariate analyses were conducted for this endpoint. The proportional hazard assumption was checked graphically using a plot of the log cumulative hazard. The analyses were not adjusted for multiple testing.

The ability of the gene signatures to predict clinical outcome at 5 years was assessed by time-dependent receiver operating characteristic curves (ROC) and the corresponding area under the curve (AUC).The underlying method of ROC curves has been extended to the setting of censored observations and presence of competing risks34. Time-dependent ROC curves at 5 years were plotted and the corresponding AUCs estimated for each endpoint (Time to relapse, Time to distant relapse, Breast Cancer-specific survival) and for each gene signature to the exception of MammaPrint. As described previously18 when training the algorithm for MammaPrint, only dichotomized risk categories were available preventing any AUC analysis with this signature. Cases were patients that experienced the event of interest in the first five years of follow-up, while controls were defined as patients that were either event-free at 5 years, or experienced a competing event in the first 5 years of follow-up.

Analyses were performed with SAS software, version 9.4 (SAS Institute) and the time-dependent ROC curves were plotted in R, version 4.0.0, with the timeROC package.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.