Introduction

Two main treatment strategies are currently recognized to prevent clinical relapses, disability accumulation, and inflammatory activity on magnetic resonance imaging (MRI) in patients with relapsing-remitting multiple sclerosis (RRMS): escalation and early intensive treatment [1].

Escalation consists of starting with less potent and safest immunomodulatory drugs and gradually switch to high-efficacy treatments in case of breakthrough disease.

The rationale behind escalation strategy is that patients at the earlier disease stage may respond optimally to a safer, low-efficacy immunomodulatory injectable or oral treatments [2], such as interferon beta (IFNB), glatiramer acetate (GA), teriflunomide (TFN), and dimethyl fumarate (DMF). If disease activity persists despite therapy, escalation to high-efficacy treatments with non-selective intravenous (iv) immunosuppressant agents (mitoxantrone (MTX), cyclophosphamide (CYC)), or with selective immunosuppression by either oral drugs (fingolimod (FNG), cladribine (CLB)) or monoclonal antibodies (natalizumab (NTZ), alemtuzumab (ALZ), ocrelizumab (OCR), rituximab (RTX)), is warranted to avoid further relapses and future disability. However, this approach requires careful evaluation of treatment response, an issue lacking of consensus on definition and monitoring actions [3]. In addition, the escalation approach may potentially expose patients to miss the so-called therapeutic window, i.e., postponing high-efficacy treatments until when neurodegeneration prevails over inflammation [4, 5].

Early (i.e. shortly after the diagnosis) intensive treatment includes durable therapy with high-efficacy drugs (NTZ, FNG, OCR, RTX) and induction obtained with potent immunosuppressant drugs, followed by a maintenance immunomodulating therapy. This latter approach requires short-time administration of immunosuppressant agents whose biological effect are long-lasting and not quickly reversible following treatment discontinuation [6], such as CYC, MTX, ALZ, and CLB. Therefore, though NTZ, OCR, RTX, and, to some extent, even FNG are considered high-efficacy treatments, they are not suitable as induction agents [7, 8].

The rationale behind induction strategy relies on “resetting” the immune system in order to achieve an early disease control [7, 8]. Exposure to induction immunosuppression should ideally last as short as possible, to minimize the risk of malignant neoplasms [9, 10] and opportunistic infections [11].

Currently in clinical practice, escalation is intended to be suitable for most patients [1], while induction is mainly restricted to patients with aggressive RRMS [12]. However, the escalation approach has recently been criticized since considered not adequate or not sufficient in conferring the greatest possible long-lasting therapeutic effect [13]. Because randomized clinical trial addressing escalation versus induction in MS are lacking, further contributing to uncertainty on how start treatment [5, 14], observational data with quasi-experimental design might help to fill this gap. Therefore, here we evaluated the long-term (10 years) effectiveness of initial treatment with escalation versus induction approach in RRMS patients, using a multicenter, retrospective local MS registry data.

Methods

Study Design

This was an independent, multicenter, post-marketing study.

We retrospectively analyzed data of patients affected by RRMS who regularly attended 5 tertiary MS outpatient clinics in Italy: S. Camillo-Forlanini Hospital, Rome; S. Andrea Hospital, Rome; Policlinico “A. Gemelli,” Rome; ASST Spedali Civili di Brescia, Montichiari (BS); Rehabilitation Unit ‘Mons. L. Novarese’ Hospital, Moncrivello (VC).

Clinical data were prospectively collected by each MS center following the local medication monitoring plan and hospital guidelines and then stored in an ad hoc electronic database developed for this study. In no way this study did interfere with the usual care and monitoring received by patients.

Participants

We collected data of previously untreated patients starting a disease-modifying therapy with either an immunomodulatory agent (IFNB or GA), possibly switched to high-effective treatments (MTX, CYC, NTZ, ALZ, FNG, or CLB) in case of treatment failure (escalation group), or with iv immunosuppression (MTX or CYC), followed or not by maintenance treatments (induction group).

The inclusion criteria at treatment start (henceforth defined as “baseline”) were as follows:

  1. (1)

    Age < 55 years

  2. (2)

    RR disease course [15]

  3. (3)

    Treatment-naïve status

  4. (4)

    < 5 years since the first demyelinating event, the time frame in which disease activity is most correlated with long-term disability [16, 17], and there is the greater risk reduction of transition to secondary progressive MS by treatment [18];

  5. (5)

    An Expanded Disability Status Scale (EDSS) score ≤ 4.0 [19], indicating a threshold where the disability accumulation is mainly driven by inflammation rather by neurodegeneration [20]

  6. (6)

    Available brain MRI scan performed within 1 month before initial treatment with escalation or induction strategy (baseline MRI scan)

  7. (7)

    “Active” disease, defined as either ≥ 2 relapses in the pre-treatment year or 1 relapse with residual disability and ≥ 1 gadolinium-enhancing lesion at baseline MRI scan (this definition of 'active' disease was adapted from a previous observational study exploring the effectiveness and safety of MTX as induction strategy in “aggressive” RRMS [21])

  8. (8)

    At least two clinical evaluations per year including disability scoring with EDSS performed by certified neurologists (www.neurostatus.net)

The exclusion criteria at baseline were as follows:

  1. (1)

    Primary or secondary progressive MS [15]

  2. (2)

    Patients lost to follow-up before 10 years of observation for reasons other than death

Outcome Definition

Main Outcome

Time to reach the disability milestones of EDSS score ≥ 6.0, corresponding to the ability to walk only with unilateral support and < 100 m without resting, confirmed in at least two consecutive visits and sustained (stable or higher) over the entire follow-up [19]. We adopted such outcome instead of the classical 0.5 or 1.0-point EDSS worsening [22] to set a robust endpoint based on a clinically significant milestone for patients with MS.

Secondary Outcome

EDSS score assessed 10 years after the treatment start.

Additional Outcome

Serious adverse events (SAEs) defined as any untoward medical occurrence that at any dose resulted in death, was life-threatening, required inpatient hospitalization or causes prolongation of existing hospitalization, resulted in persistent or significant disability/incapacity, might have caused a congenital anomaly/birth defect, or required intervention to prevent permanent impairment or damage (http://ichgcp.net/12-adverse-event-ae).

Statistical Analysis

Categorical data were presented as count (proportion); continuous data were presented as mean ± standard deviation (SD) or median (interquartile range), unless indicated otherwise.

We collected the following baseline variables: sex, age, disease duration (i.e., the time elapsed since symptom onset), EDSS score, number of relapses in prior year, absence or presence of gadolinium-enhancing lesions on brain MRI scan.

Differences in baseline characteristics between escalation and induction groups were tested with the Fisher exact test or the Mann-Whitney U test for categorical or continuous variables, as appropriate.

As patients were not randomized to treatment group (induction or escalation), we performed a 1:1 ratio matching procedure combining an exact matching on sex with a propensity score (PS)–based nearest-neighbor matching within a caliper of 0.05 (without replacement). Individual PS values were estimated by use of logistic regression with the aforementioned baseline characteristics as covariates (sex, age, disease duration, EDSS score, number of relapses in prior year, absence or presence of gadolinium-enhancing lesions), and treatment group as the dependent variable. The validity of PS-based matching was tested by analysis of standardized differences (|d|), with |d| > 0.20 considered as imbalance [23].

We compared escalation and induction on the risk of reaching EDSS score ≥ 6.0 (primary outcome) using a Cox regression model stratified by matched pairs [24]. The time elapsed from baseline to the last visit over the 10-year follow-up or outcome reach (whichever came first) was entered as main time variable in the models. This procedure allowed us to exclusively select patients with similar baseline characteristics and to obtain a comparable follow-up length for each pair [25]. Graphic inspection of log-minus-log survival plots confirmed the proportional hazard assumption in post-matching Cox analysis.

To assess robustness of the results, we also conducted several post-estimation sensitivity analyses as follows:

  1. (1)

    After inserting in the Cox model the time since first symptom to EDSS score ≥ 6.0 (rather than the time since baseline) as main time variable

  2. (2)

    After entering in the Cox model not only the treatment strategy, but also all the baseline variables

  3. (3)

    After entering the time (years) on high-effective treatments as time-varying covariate

  4. (4)

    After re-running the Cox model with the “best” n:1 matching procedure among ratios of 2:1, 3:1, and 4:1 to provide more precise estimations (through larger sample size) without compromising the balance across covariates [26]

Between-group comparison of the last EDSS score at 10 years (secondary outcome) was carried out in matched pairs by the Wilcoxon signed-rank test.

Lastly, in the whole sample, we compared the proportions of patients with SAEs in the two groups with a logistic regression analysis adjusted for baseline variables.

Two-tailed p values < 0.05 were considered significant.

Results

Participants

We examined records from 3851 patients in the escalation group and 132 in the induction group who started treatment from 1998 to 2009 (see Fig. 1 for the study flowchart).

Fig. 1
figure 1

Study flowchart of patients’ disposition

Within the escalation group, 738 out of 3851 were included in the analysis. Excluded patients had longer disease duration and fewer pre-treatment relapses and were less likely to have gadolinium enhancement at the baseline MRI scan (p values < 0.05).

Within the induction group, 75 out of 132 patients were included in the analysis. Excluded patients had longer disease duration and higher EDSS score at baseline (p values < 0.05).

Findings in the Whole Cohort

All patients in the escalation group (n = 738) started with high-dose, high-frequency IFNB-1a or IFNB-1b; of them, 394 (53.4%) required one or more high-efficacy treatments after a median time of 3.5 (2 to 5.5) years, namely NTZ (n = 234), FNG (n = 74), MTX (n = 62), ALZ (n = 8), CYC (n = 7), CLB (n = 5), and rituximab (n = 4). The remaining 344 (46.6%) continued to receive the same initial treatment (n = 190) or switched to different low-efficacy treatments after a median time of 6 (3 to 6.5) years such as DMF (n = 76), GA (n = 37), TFN (n = 26), or azathioprine (n = 15) over the 10-year follow-up (Fig. 2).

Fig. 2
figure 2

Treatment sequencing after escalation and induction at 10-year follow-up (note that one patient in either group was submitted to autologous hematopoietic stem cell transplantation after high-efficacy treatments)

Patients in the induction group (n = 75) started with monthly or bi-monthly infusions for 6–24 months of either MTX (n = 55) at the dosage of 8 mg/mq (maximum cumulative dose, 140 mg/mq) or CYC (n = 20) at the dosage of 500 to 1000 mg/mq. They received a median (range) number of 8 (6,7,8,9,10,11,12) MTX infusions and 10 (6,7,8,9,10,11,12) CYC infusions, respectively. When necessary, dosage adjustment was done on the basis of blood cell count to avoid myelotoxicity and to ensure the expected level of immunosuppression. Immediate post-induction MRI data were available only for 45 patients. All available MRI scans showed suppression of inflammatory activity (absence of gadolinium-enhancing lesions).

Immediately after the last immunosuppression infusion, 42 (56.0%) patients received a maintenance treatment with IFNB (n = 20), GA (n = 14), or AZA (n = 8). Twenty-four (32.0%) patients started IFNB (n = 22) or AZA (n = 2) because of the occurrence of relapses (n = 16) or isolated MRI activity (n = 8) after a median time of 1.25 (1 to 2) years from the last immunosuppression infusion. The remaining 9 (12.0%) patients did not receive any treatment following induction immunosuppression since they were free from disease activity and did not accumulate disability over the 10 follow-up period (see also Fig. 2).

Despite induction, 26 (34.7%) patients required more efficacious treatments, such as NTZ (n = 22) or FNG (n = 4), over the follow-up period. One patient in either group was submitted to autologous stem cell transplant after failing multiple high-efficacy treatments.

Considering the whole unmatched sample, 111/738 (15.0%) and 21/75 (28.0%) patients reached the outcome at follow-up in the escalation and induction groups, respectively (p = 0.008). The median final EDSS scores were 2.5 (1.5 to 4.5) and 4.5 (3.5 to 6.0) in the escalation and induction groups, respectively (p < 0.001).

PS-Based Matching Procedure

Table 1 shows the baseline variables of patients eligible for analysis in the unmatched cohort and after matching in a 1:1 ratio. We observed a significant imbalance in pre-matching baseline characteristics across treatment groups due to the older age (p = 0.002) and higher EDSS score (p < 0.001) in the induction group. This between-group imbalance did not persist after the matching procedures that retained 150 patients (75 per group). No covariate exhibited large imbalance (|d| < 0.20) after the re-sampling procedures, and the standardized mean difference of PS values decreased by approximately 99%, from 0.999 to 0.006, indicating a significant improvement in the overall match.

Table 1 Baseline characteristics of the included patients before and after the matching procedure

Follow-up Data in the Re-matched Samples

After matching in a 1:1 ratio, we found that the proportion of patients reaching the primary outcome was lower in the induction (n = 21, 28.0%) than in the escalation group (n = 29, 38.7%) (hazard ratio [HR] = 0.48, p = 0.024) (Fig. 3A). Findings from the sensitivity analyses, providing results consistent with the core analysis, are shown in Table 2. When the disease duration was set as main time variable instead of time from baseline, we observed that patients in the escalation group were more prone to approach the disability accrual of natural MS history than those in the induction group (Fig. 3B).

Fig. 3
figure 3

Time since treatment start (A) and since symptom onset (B) to EDSS score ≥ 6.0 by initial treatment strategy. The gray area overlaid in part B indicates the estimated median time since symptom onset to EDSS score ≥ 6.0 in natural history studies ranging from 14 to 20 years (adapted from Confavreux and Vukusic [27])

Table 2 Time to EDSS score ≥ 6.0 by initial treatment strategy (hazard ratios < 1.0 indicate more favorable outcome for induction)

At 10-year follow-up, the median final EDSS scores in the 1:1 ratio re-sampled cohort were 5.0 (3.5–6.5) and 4.5 (3.5–6.0) after escalation and induction, respectively (Z = 1.75, p = 0.08). We found consistent results even after matching in a 1:2 ratio (Z = 2.04, p = 0.042).

Baseline Variables Associated with Worse Outcome

We explored if there were associations between the risk of reaching the main outcome and baseline variables other than treatment strategy (Table 3). As expected, the baseline EDSS score was associated with an increased risk of reaching the outcome after both escalation (HR = 3.15, p < 0.001) and induction (HR = 5.10, p < 0.001). Within the escalation group, a longer disease duration at treatment start was also associated with an increased risk of reaching the outcome (HR = 1.38 for each year of delay, p = 0.014). Within the induction group, the only other factor associated with increased risk was an older age at treatment start (HR = 1.08 for each year, p = 0.007). We found no association between the 10-year primary outcome and the type of induction agent (CYC versus MTX), the number of iv infusions of CYC or MTX, or the maintenance therapy following the induction immunosuppression (p values > 0.2).

Table 3 Association between the risk of reaching the outcome of EDSS score ≥ 6.0 at 10 years and baseline variables

Safety Data

Considering the whole umatched population, SAEs occurred more frequently after induction (n = 8, 10.7%) than escalation (n = 18, 2.4%): crude odds ratio (OR) = 4.78 (95% CIs 2.00 to 11.39; p = 0.001), adjusted OR = 3.36 (95% CIs 1.26 to 8.95, p = 0.015). We found no difference between MTX and CYC in terms of SAE occurrence (p = 0.98). After causality assessment, the treatment associated with the SAE occurrence was permanently discontinued in 11 patients, including 3 cases of NTZ-related progressive multifocal leukoencephalopathy; in 5 cases, the SAE occurred off-treatment after MTX (leukemia, n = 3; breast cancer, n = 1) or ALZ (Grave’s disease, n = 1) administration; 4 cases of solid malignancies occurred while on IFNB (n = 3) or NTZ (n = 1), but they were probably related with previous exposure to MTX or CYC (see Table 4 for details).

Table 4 Serious adverse events (SAEs) reported over 10 years of follow-up

Discussion

In this independent, multicenter, retrospective study, we conducted a PS-based matched analysis to explore the long-term effects of an initial treatment approach with escalation or induction in patients with RRMS. As per inclusion criteria and also through the PS-based matching procedure, we selected a sample of patients with evidence of “active” disease (the median pre-treatment annualized relapse rate was 2 and approximately 60% of them had an active MRI scan at baseline) and poor prognostic factors (the median EDSS score at baseline was 2.5) in spite of a relatively short disease duration (2 years on average).

In our cohort, induction was associated with an approximately 50% reduced risk of reaching EDSS ≥ 6.0 as compared to escalation, albeit with a worse safety profile. Notably, the probability of reaching EDSS ≥ 6.0 in the escalation group overlaps, especially after 10 years from disease onset, with the estimated median disease duration since symptom onset to EDSS score ≥ 6.0 reported in natural history studies, ranging from 14 to 20 years (see also Fig. 3B) [27]. The analysis of baseline variables associated with worse outcome also confirms that (1) the longer the time to treatment start, the worse the long-term outcomes when starting with escalation [18], and (2) younger age is associated with better outcome after induction [21].

The growing availability of therapeutic agents for RRMS has prompted renewed interest in the issue of treatment algorithm, but at the same time it is a matter of concern because of the unknown long-term immunologic and safety risks of sequencing multiple therapies [28]. While escalation is the more widely adopted treatment strategy, induction is restricted to patients at risk of rapid disability accrual, mainly because the poor definition of the target population and the increased risk of immunosuppression-related toxicity [2]. Although an induction approach is well established in hematologic malignancies [29] and in other autoimmune diseases (e.g., rheumatoid arthritis) [30], data supporting initial treatment with induction agents in RRMS are limited to few small clinical trials on MTX [31, 32], one observational study on CYC [33], and larger clinical trials on ALZ [34, 35] and CLB [36]. These studies have provided considerable results on relapse rate, time to sustained disability worsening, and MRI measures in the short-term period [31,32,33,33,34,35,36]. Our results are also in line with two real-world retrospective studies showing early intensive treatments (including NTZ and ALZ) as associated with reduced risk of transition to secondary progressive MS [18] and disability accumulation [13] than low-efficacy drugs.

Our study has the merit of a longer follow-up period (10 years) and an exclusive inclusion of patients initially treated with immunosuppressive agents suitable for induction. However, in our sample, a certain proportion of patients required high-efficacy treatments with monoclonal antibodies or FNG even after induction. Although this proportion was lower after induction than after escalation (34.7% versus 53.4%), one can argue that not all patients may benefit from early immunosuppression to reset immune system over the long-term period.

Albeit encouraging, findings on early immunosuppression have raised relevant safety concerns; thus, their use has been restricted only to patients with aggressive MS [12]. In our study, induction was associated with an approximately fourfold increased risk of SAEs, especially malignant neoplasms, as compared to escalation. Treatment-related leukemia or lymphoma and infertility have been reported with both MTX and CYC [9, 10]. Specific side effects of dose-related cardiotoxicity and bladder malignancies have been associated with MTX and CYC, respectively [37, 38]. Consequently, administration of MTX and CYC has now been replaced by newer agents such as ALZ and CLB, in spite of some attempts for profiling patients who will have a more favorable risk:benefit profile [39]. Two pragmatic, randomized clinical trials are currently ongoing to elucidate if an early “aggressive” therapy approach, including induction with ALZ, will be associated with better outcome as compared with starting treatment with platform injective or oral drugs: the TRaditional versus Early Aggressive Therapy for MS (TREAT-MS, ClinicalTrials.gov no. NCT03500328) and the Determining the Effectiveness of Early Intensive Versus Escalation Approaches for the Treatment of RRMS (DELIVER-MS, ClinicalTrials.gov no. NCT03535298).

Our study is not without limitations, mainly due to its retrospective design, small sample size (especially for induction group), and comparison of patients in different treatment eras (MTX and CYC are not longer prescribed given the increased availability of newer drugs). Our data refer to specific treatments and are affected by the era in which they were collected; therefore, they cannot be generalized and extended to all current induction treatments. Although we are confident that our statistical approach, based on PS matching and pairwise comparisons [24, 40], allowed us to compare data of patients with similar baseline characteristics over a long-term follow-up, we cannot overcome the possibility of indication, selection, and hidden biases [41]. Indication bias refers to lack of randomization to treatment exposure [41]. Selection bias is mainly attributable to the exclusion from the analysis of a greater proportion of patients in the escalation than in the induction group as consequence of both the eligibility criteria and PS-matching procedure that is prone to the so-called bias due to incomplete matching [42]. This implies that the hypothesis regarding the superiority of induction versus escalation must be restricted only to older patients with higher EDSS scores (see also Table 1). Regarding hidden bias, we cannot exclude that induction was adopted in some patients due to unmeasured prognostic factors (not encompassed among the baseline variables), including (but not limited to) symptom onset, cognitive deficit, MRI characteristics including lesion burden, black holes, discernable brain atrophy, and infratentorial and/or spinal cord involvement [12].

In conclusion, our study provides real-world evidence that initial treatment with an induction agent is associated with more favorable long-term benefits as compared to an escalation approach in patients with RRMS; this, at the price of an increased risk of SAEs and limitations in the future treatment sequencing. Moreover, our data suggest that the best candidate for induction is a younger patient in the early disease stage. Being based on observational data collected in the real-world setting and retrospectively analyzed, our findings should be considered only hypothesis-generating. We are also aware that immunosuppressant agents such as MTX and CYC are no longer in wide use for RRMS, but hopefully this study may represent an avenue for future investigation aimed to clarify if the newer induction agents, namely ALZ and CLB, could provide a more advantageous long-lasting risk:benefit profile.