FormalPara Key Points

The current review examines the usefulness of pharmacologic treatments on improving cognitive function in persons with multiple sclerosis (MS).

In conclusion, there is insufficient evidence to support the use of pharmacologic intervention to improve cognitive function in persons with MS.

Higher-quality randomized controlled trials are needed to establish the cognitive efficacy of pharmacologic treatments for MS-related cognitive dysfunction, with cognition as the primary endpoint. Researchers are urged to use standardized criteria (such as the American Academy of Neurology criteria) to guide their research designs.

Clinicians should consider effect sizes of studies before deciding whether to prescribe certain medications to ameliorate cognitive symptoms.

1 Introduction

Multiple sclerosis (MS) is a progressive, autoimmune, inflammatory disease that affects myelination and axonal integrity in the central nervous system. Cognitive impairment occurs in approximately two-thirds of persons with MS [1, 2], most prominently in the domains of speed of information processing and learning and memory [2, 3]. Cognitive impairment can be extremely disruptive in symptom management, medication adherence, instrumental activities of daily living (e.g. managing finances, driving), employment, and independence among persons with MS [3,4,5]. Various medications have shown efficacy in reducing annualized relapse rates [6], brain lesions as detected by magnetic resonance imaging [7], and disability progression as determined by the Expanded Disability Status Scale [8] among persons with MS; however, these metrics do not account for the hidden disabling symptoms of MS, such as fatigue and cognitive impairment. Moreover, cognitive endpoints traditionally have not been incorporated into phase III MS pharmaceutical trials, and, if they are, they are typically not primary endpoints. This results in studies that are underpowered and/or poorly designed for the purpose of examining cognitive outcomes. Therefore, our knowledge base regarding the efficacy of pharmacologic treatments on improving cognitive function in persons with MS is limited.

There is currently no standard intervention for cognitive impairment in MS, although research has been conducted investigating various pharmacologic, behavioral, and brain stimulation treatments. Specifically regarding medications, a 2013 Cochrane Review concluded that there is “no convincing evidence to support the efficacy of pharmacological symptomatic treatment for MS-associated memory disorder” due to the poor quality of extant literature [9]. A more recent review published in 2016 similarly outlined methodological problems in pharmaceutical trials but argued that newer disease-modifying therapies (DMTs) and certain symptomatic therapies show promise in benefitting cognitive function [10]. Both articles included only randomized controlled trials (RCTs), which are the gold standard in evaluating therapeutic effects. Neither article recommended any medication for standard clinical use of improving cognitive function due to the limitation of available evidence.

The current systematic review aimed to evaluate the efficacy of pharmacologic treatments in improving cognitive function among persons with MS, using the American Academy of Neurology (AAN) classification of evidence criteria and standardized effect size measures (when possible) for comparison across studies. Given the lack of established standard pharmacologic treatment for cognitive impairment in MS, and limited research base in general, we included all medications that were the subject of cognitive efficacy investigation in persons with MS, in order to present a comprehensive overview of the literature. Although the current review focused primarily on RCTs in establishing conclusions regarding various medications, a brief discussion of all relevant studies (including non-RCTs) was also included to address limitations of the literature as a whole.

2 Methods

A literature search was conducted of the PubMed and PsycINFO databases, using the following keywords: cognition, cognitive, neuropsychological, multiple sclerosis, disease modifying therapy, drug, medication, processing speed, attention, working memory, executive functioning, learning, and memory. Additionally, to identify abstracts that may not explicitly refer to cognition, particularly in studies where cognition is not the primary endpoint, ‘PASAT’ and ‘SDMT’ were used as keywords as they are the most commonly used tests in these studies [11]. Only original, English-language research articles published in peer-reviewed journals between 1990 and January 2020 with human adult subjects were included in the current review. To be included, studies had to utilize at least one objective measure of cognition; studies using only subjective reports of cognition were excluded. Case studies, editorials, book chapters, and review articles were excluded, although citations in book chapters and review articles were cross-referenced and relevant articles were extracted. The initial search yielded 141 articles; 95 articles were screened based on the aforementioned inclusion and exclusion criteria, and a final sample of 87 articles were selected for final review (see Fig. 1).

Fig. 1
figure 1

Study selection process

Classification of evidence was determined using the 2017 AAN criteria for therapeutic trials [12] (see electronic supplementary Table 1 for criteria). Four study authors reviewed the final sample of 87 articles using a structured review table and criteria. Each article was independently assessed by two reviewers who rated the article’s classification of evidence. For each article, if there was disagreement, the two reviewers discussed their rationales and reached a consensus. If no consensus was reached, a third reviewer was asked to weigh in. Cohen’s d was calculated as the measure of standardized effect size using a web calculator (https://www.psychometrica.de/effect_size.html) for RCTs and controlled studies. Effect sizes were only calculated for positive studies (i.e., studies indicating significant treatment effects). For studies that provided Cohen’s d, the provided values were included in this review. For studies with unbalanced groups, we used modules #2 and #3 in the web calculator (analogous to Hedges’ g), which corrected for the unequal sample sizes among groups. For small sample sizes (total sample size in both groups [n] < 50), we multiplied the effect sizes by the bias correction factor ([n − 3]/[n − 2.25] × \(\sqrt {\left[ {n - 2} \right]/n}\)). Data were treated parametrically, even in studies that utilized non-parametric tests (e.g. Wilcoxon signed-rank test) due to the lack of test statistics presented in most of these studies required to calculate non-parametric effect sizes. Repeated measures (e.g. pre- and post-treatment) were treated as independent measures (using module #3 in the web calculator) because most studies did not provide correlation between pre- and post-treatment values, as was needed to account for repeated measure effects; calculation of Cohen’s d based on the methods established by Morris and DeShon [13, 14] was used in these cases. Some studies did not provide sufficient information to calculate effect sizes, including studies that did not provide means and standard deviations (SDs) and studies that only provided medians and ranges/confidence intervals. Therefore, no effect sizes were provided for these studies.

Table 1 Summary of RCTs for disease-modifying therapies

3 Results

Eighty-seven articles published between 1990 and January 2020 were included in this review. For classification of evidence using the AAN criteria, reviewers disagreed on 31 articles, of which they were able to reach a consensus after discussion on 30 articles, and a third reviewer weighed in for one article. The medications were divided into three categories for the purpose of this review: DMTs, symptomatic therapies, and other therapies. The efficacy of these medications on cognitive function was reviewed. For brevity, this review analyzed only the primary cognitive endpoints if the studies specified them. A study was considered negative if the treatment effect was found on a secondary endpoint but not a primary endpoint. If primary endpoints were not specified, all cognitive endpoints were analyzed; however, the lack of a primary endpoint specification would be noted as a weakness, which would downgrade the AAN classification of evidence. Secondary analyses of RCTs that only included a subset of the original treatment allocation groups were designated as class III due to increased participant selection bias (i.e. not all participants had equal opportunity to be allocated to each group). As a general trend, the proportion of positive studies (i.e. findings of significant drug efficacy) tended to increase as the quality of evidence (AAN classifications) decreased. Indeed, class IV observational studies consisted of the highest number of positive studies (number of positive studies: class IV = 33; class III = 24; class II = 24; class I = 6) (see Fig. 2). The following sections discuss each medication in detail individually, with an emphasis on RCTs. Tables 1, 2, 3, 4, 5 and 6 summarize the main findings, effect sizes, and evidence classifications of the studies reviewed. Data are divided into RCTs (Tables 1, 2, 3), non-randomized, controlled/quasi-controlled studies (Table 4), and observational studies (Tables 5, 6).

Fig. 2
figure 2

Number of positive studies, stratified by A study type and B AAN class of evidence. The proportion of positive studies increased as the quality of evidence decreased. AAN American Academy of Neurology

Table 2 Summary of RCTs for symptomatic treatments
Table 3 Summary of RCTs for other treatments
Table 4 Summary of non-randomized, controlled studies (class III)
Table 5 Summary of all other observational studies (class IV) for disease-modifying therapies
Table 6 Summary of all other observational studies (class IV) for symptomatic and other treatments

3.1 Disease-Modifying Therapies (DMTs)

Disease-modifying therapies are injectable or oral medications designed to prevent relapses and slow down the progression of MS. Various DMTs have different mechanisms of action, but most are immunosuppressive or immunomodulatory.

3.1.1 Interferon β-1a and Interferon β-1b

Seventeen studies examined the effects of interferon (IFN) β-1a and IFN β-1b, of which four were RCTs. Two studies compared IFNs β-1a and -1b, four studies examined IFN β-1a only, and 11 studies examined IFN β-1b only. Of 13 non-RCTs, five were class III controlled studies and eight were class IV observational studies. Only eight of all 17 studies investigated cognition as the primary endpoints, of which all but one were class III and IV.

Among studies comparing IFNs β-1a and -1b, one small class II RCT compared two IFN β-1a groups (Avonex and Rebif) with one IFN β-1b group (Betaferon) among 63 relapsing-remitting (RR) MS patients (21 per group) [15]. The IFN β-1a groups significantly improved on more neuropsychological measures than the IFN β-1b group [five and six of eight versus one of eight measures on the Brief Repeatable Neuropsychological Battery (BRNB)] after 1 year of treatment [15]. However, because the study included more than two primary measures (in fact, there were eight measures), the chances of type I error (i.e. false positive rate) were considerably increased. When we directly calculated and compared the effect sizes of IFN β-1a with IFN β-1b across the five and six statistically significant measures, they were small to negligible (Cohen’s d = 0.04–0.25 for Avonex, and Cohen’s d = 0.02–0.36 for Rebif, compared with Betaferon). A class III non-randomized, controlled study found no significant differences between the IFN β-1a and IFN β-1b groups [16].

In studies evaluating the cognitive effect of IFN β-1a relative to placebo, a class II RCT was conducted among 436 secondary-progressive MS patients, of which researchers of the IMPACT trial found a statistical trend of improvement (small effect: Cohen’s d = 0.20) on the Paced Auditory Serial Addition Test (PASAT) over 2 years in the treatment group relative to the placebo group [17]. Of note, although Cohen et al. met all criteria for class I evidence, its classification was designated as class II because cognition was not the sole primary endpoint [17]. Therefore, these findings would need to be replicated in studies specifically targeting cognition. A class III secondary analysis of a larger RCT found a significant treatment effect [18], and two class IV observational studies favored high-dose (44 µg) over low-dose (22 µg) IFN β-1a treatment [19, 20].

In studies examining the cognitive efficacy of IFN β-1b relative to placebo, a class II RCT conducted in 73 primary/transitional progressive MS patients found no significant treatment effect over placebo on the BRNB over 2 years [21]; however, it was unclear whether allocation was concealed in this study. A class II secondary report of the BENEFIT trial found a significant but small treatment effect (Cohen’s d = 0.23) for IFN β-1b relative to placebo on the PASAT 2 years after the trial among 439 patients with clinically isolated syndrome [22]. Since the original BENEFIT trial was not designed to examine cognitive endpoints, these findings would need to be replicated in cognition-focused studies. Apart from observational follow-ups of the two class II RCTs described above, there were three class III studies (one was a secondary analysis of a larger RCT and two were non-randomized, controlled studies), two of which found significant treatment effects [23, 24]. Of the remaining three class IV observational studies without adequate comparator groups, only one was positive [25].

In sum, there was no class I evidence for the cognitive efficacy of IFN β-1a or IFN β-1b. Class II RCTs demonstrated small to negligible or no treatment effects. Studies of lower quality yielded mixed findings.

3.1.2 Glatiramer Acetate

Seven studies examined the cognitive efficacy of glatiramer acetate. Cognition was not the primary endpoint in any of these studies. Only one study was an RCT (class II), two were class III non-randomized, controlled studies, and four were observational studies without comparator groups. The class II RCT did not detect a significant improvement on the BRNB after 2 years of glatiramer acetate treatment over placebo among 248 RRMS patients [26]. Since this study was a secondary analysis of an RCT that did not a priori specify cognitive endpoints, it only met criteria for class II. The two class III studies yielded mixed results [27, 28], and three of four class IV studies were positive [29,30,31].

3.1.3 Natalizumab

There were no prospective RCTs that examined the cognitive efficacy of natalizumab. Five of 13 studies were class III (one was a secondary analysis of a subset of larger RCTs, and four were non-randomized, controlled studies), and the remainder were class IV observational studies without comparator groups. Cognition was the primary endpoint in 8 of 13 studies. The class III secondary analysis of the AFFIRM and SENTINEL RCTs among 942 RRMS patients (subsets of original trials) found a 43% reduced risk of declining 0.5 SDs on the PASAT score 2 years following natalizumab treatment, compared with placebo in the AFFIRM trial [32]. However, it is important to note that the threshold of 0.5 SDs may not represent a clinically significant decline; studies more typically use at least one SD as the threshold to document change [33]. Three of four remaining class III studies were positive (of which two studies only found significant effects on one of 10 measures) [34,35,36]. The eight class IV observational studies without comparator groups were all positive [37,38,39,40,41,42,43,44].

3.1.4 Fingolimod

Two studies examined the cognitive efficacy of fingolimod: one class II and one class III. Cognition was not the primary endpoint in either study. A class II pooled analysis of the FREEDOMS and FREEDOMS II RCTs showed significant improvement on the PASAT after the first 6 months of treatment over placebo in 1556 RRMS patients, with a negligible effect size (Cohen’s d = 0.13) [45]. The class III open-label (rater-blinded) GOLDEN RCT conducted in 157 cognitively impaired RRMS patients found no significant treatment effect of fingolimod over IFN β-1b after 18 months [46]. Of note, the authors stated that imbalance in baseline characteristics (i.e. disease severity and baseline cognitive test scores) and dropout pattern may have favored the IFN comparator group.

3.1.5 Other DMTs

Additionally, there were one class II, one class III, and two class IV studies for all other DMTs, only one of which was an RCT. Cognition was the primary endpoint in only one of four studies and not in the RCT. The class II RCT, the DECIDE trial conducted in 1841 RRMS patients, found a statistically significant but negligible treatment effect using daclizumab β for 96 weeks compared with IFN β-1a, on the Symbol Digit Modalities Test (SDMT), yielding a negligible effect size (Cohen’s d = 0.11) [47]. The study was classified as class II because there was no prespecification of cognitive endpoints in the original trial, as required by AAN criteria. Of note, daclizumab β has been discontinued by pharmaceutical companies due to reports of encephalitis in Europe.

3.1.6 Summary

In sum, there is a paucity of good-quality evidence in support of the cognitive efficacy of DMTs in persons with MS. There was no class I evidence for this drug type, and the majority of studies were class III and IV. Class II investigations either showed small/negligible or no significant treatment effects. Although many class III and IV observational studies yielded positive results (particularly for natalizumab), these studies suffered from a myriad of methodological limitations (e.g. absence of randomization or equivalent comparator groups, more than two primary cognitive endpoints), which restricts generalizable validity. Therefore, at this time, there is insufficient evidence to support the use of DMTs to improve cognitive function in persons with MS.

3.2 Symptomatic Therapies

Symptomatic therapies may be prescribed for MS patients for specific symptoms, such as mobility or fatigue, as a supplement to DMTs. Given that symptomatic therapies do not target the immunopathology of MS, most studies utilized MS samples of mixed phenotypes.

3.2.1 Dalfampridine

Dalfampridine (also known as fampridine or 4-aminopyridine) treats walking difficulties in persons with MS. Twelve studies investigated the cognitive efficacy of dalfampridine, five of which were RCTs: one class I, two class II, and two class III. The remaining seven studies were class IV observational studies without comparator groups. Only 5 of 12 studies investigated cognition as primary endpoints. The class I RCT was conducted with 120 cognitively impaired MS patients [48]. Significant improvement on the SDMT was observed after 12 weeks of treatment relative to placebo, with a medium effect size (d = 0.60). The treatment effect disappeared during a 4-week washout period after the treatment phase. A class II preliminary report of an RCT in 21 RRMS patients found significant improvement after 20 weeks of treatment compared with placebo on only 12 of 35 measures used in this study, with small to medium effect sizes on the significant measures (Cohen’s d between 0.18 and 0.46) [49]. The study was limited due to the large number of outcome measures administered without specification of one or two primary outcomes, as required by the AAN criteria. In contrast, another class II RCT did not find a significant treatment effect on the SDMT after 12 weeks among 57 cognitively impaired MS patients [50]. This study was classified as class II because it did not provide sufficient information regarding its inclusion criteria and randomization. Both class III RCTs used within-subject, crossover designs. Both were class III due to a lack of equivalent treatment order groups at baseline and insufficient examination or accounting for carryover effects. One of the class III studies, the FAMPKIN extension trial, consisted of a 2-year observational period (when the whole sample received treatment) followed by a crossover RCT phase [51]. Only 20 of 32 patients from the extension trial completed the RCT phase, during which cognitive performance was superior during the dalfampridine condition compared with the placebo condition, on only one of eight measures. The other class III crossover RCT in 20 MS patients found no significant cognitive improvement during the 2-week treatment condition relative to the placebo condition [52]. Of the remaining seven class IV observational studies, six were positive [53,54,55,56,57,58].

In sum, there was one class I study in support for the cognitive efficacy of dalfampridine, with a medium effect size. For the remaining studies, higher-quality works (class II and III RCTs) yielded mixed results, while lesser quality class IV observational investigations were more likely to be positive. Thus, more work needs to be done to confirm the results of the one class I study for dalfampridine.

3.2.2 Cognition-Enhancing Medications

Cognition-enhancing medications include dementia medications (e.g. for Alzheimer’s disease) and supplements purporting cognitive benefit (e.g. Gingko biloba). Although some stimulants are used to improve cognition [e.g. for attention-deficit/hyperactivity disorder (ADHD)], they are not included in this section because they are subsumed under the category of ‘Stimulants’ in Sect. 3.2.3, along with other stimulants that are typically prescribed for wakefulness (e.g. for narcolepsy and fatigue). Nine studies on dementia medications were identified: seven studies for cholinesterase inhibitors (three for donepezil and four for rivastigmine) and two studies for memantine, an N-methyl-d-aspartate (NMDA) receptor antagonist. Three studies on Gingko biloba were also included. All but three studies (two were neuroimaging studies involving rivastigmine and one examined Gingko biloba) studied cognition as primary endpoints.

Two class I RCTs and one class IV observational study examined donepezil. Both class I RCTs were conducted by the same research group, who started with a smaller single-center trial followed by a larger multicenter trial to confirm previous results. In the smaller trial among 68 mildly cognitively impaired MS patients, significant improvement was observed on the Selective Reminding Test (SRT) after 24 weeks of donepezil treatment relative to placebo with a medium effect size (Cohen’s d = 0.49) [59]. However, the larger RCT of 120 mildly cognitively impaired MS patients did not show a significant donepezil treatment effect compared with placebo [60], and therefore did not confirm previous results.

Four RCTs examined rivastigmine: one class I, two class II, and one class III. The class I RCT on 60 mildly cognitively impaired MS patients found no significant treatment effect, relative to placebo, on the Wechsler Memory Scale (WMS) general memory score after 12 weeks [61]. One of the class II RCTs was a small functional magnetic resonance imaging (fMRI) study with 15 MS patients [62]. Those researchers found a significant but small treatment effect on the modified PASAT after a single dose of rivastigmine compared with placebo (Cohen’s d = 0.25). The study was classified as class II due to the lack of specification of primary endpoints. The other class II RCT had a larger sample size with 81 cognitively impaired MS patients, but did not find a significant treatment effect for rivastigmine over placebo on the SRT after 16 weeks [63]. The last class III RCT was also an fMRI study and did not detect a significant treatment effect on the BRNB in a single (investigator)-blind, crossover design among 15 MS patients [64]. In sum, three of four RCTs (including a class I study) did not find significant cognitive treatment effects for rivastigmine among persons with MS.

One class II and one class III RCT examined memantine. Both studies used samples of mildly cognitively impaired MS patients (114 and 62, respectively) and found no significant treatment effect for memantine over placebo on the PASAT after 16 and 52 weeks [65, 66].

Two class II and one class III RCT examined Gingko biloba. All three studies specified more than two primary endpoints, which violated AAN class I criteria. The class II studies conducted in 38 and 120 cognitively impaired MS patients found no significant treatment effects relative to placebo after 12 weeks [67, 68]. A class III pilot RCT found medium to large treatment effects compared with placebo among 21 MS patients, on only two measures (Cohen’s d = 0.55 for the Visual Threshold Serial Addition Test, and d = 0.78 for the California Verbal Learning Test (CVLT) intrusions) out of a large number of measures (total number of measures unspecified) [69], with an inflated type I error rate.

In sum, there was consistently high-quality evidence (class I and II) that cognition-enhancing medications used for dementia (e.g. donepezil, rivastigmine, memantine) and Gingko biloba did not have a positive treatment effect in individuals with MS.

3.2.3 Stimulants

Nine studies investigated the cognitive efficacy of central nervous system stimulants: three on l-amphetamine sulfate, one on methylphenidate, one on lisdexamfetamine dimesylate, one on mixed amphetamine salts, one on armodafinil, and two on modafinil. Amphetamine-based stimulants and methylphenidate are typically prescribed to treat inattention for individuals with ADHD, and modafinil/armodafinil are used to treat fatigue in persons with MS (as well as narcolepsy). All but two studies (both for modafinil) examined cognition as the primary endpoint.

l-Amphetamine sulfate was investigated in one class I, one class II, and one class III study. In the class I RCT among 136 cognitively impaired MS patients, no significant treatment effect relative to placebo was found after 29 days, on the primary endpoint SDMT, an information processing speed measure [70]. A class II post hoc reanalysis of this trial divided patients based on baseline memory impairment (median split) [71]. Significant and large treatment effects for l-amphetamine compared with placebo were found on the CVLT-II and the Brief Visual Memory Test-Revised (BVMT-R) delayed recall scores (Cohen’s d = 0.94 for CVLT-II and d = 1.00 for BVMT-R), but only among individuals with memory impairment at baseline. In contrast, a preliminary class III crossover RCT among 19 cognitively impaired MS patients found significant medium-size treatment effects for a single dose of l-amphetamine 45 mg relative to placebo on information processing speed measures [PASAT, SDMT, and part A of the Trail-Making Test (Cohen’s d = 0.36–0.45)]; there was no significant change on memory measures [72]. Lower doses of l-amphetamine (15 or 30 mg) were not found to be more efficacious in improving cognitive function than placebo in this study. Still, the investigation by Benedict et al. was limited in its lack of specification of primary cognitive measures, as well as failure to examine crossover effects or baseline equivalency of treatment order groups [72]. Efficacy of a single dose of mixed amphetamine salts was investigated in a class I RCT among 49 cognitively impaired MS patients [73]. Significant improvement on the SDMT was evidenced in the treatment group relative to placebo, with a medium effect size (Cohen’s d = 0.47). No significant change was observed on the PASAT, the other primary endpoint. A class II RCT examined lisdexamfetamine dimesylate within a sample of 63 cognitively impaired MS patients [74]. Significant improvement on the SDMT was observed after 4 weeks of treatment up to the highest tolerable dose compared with placebo, with a medium effect size (Cohen’s d = 0.62). The improvement on SDMT was maintained for another 4 weeks of treatment. Again, no significant improvement was observed on the other primary endpoint, the PASAT. Of note, 76% of enrolled participants prematurely dropped out of this trial, which increased bias in the final results. A smaller class II RCT examined methylphenidate in 26 low average to cognitively impaired RRMS patients [75]. They found significant improvement on the PASAT after a single dose of the medication compared with placebo. Effect sizes were medium and large (Cohen’s d = 0.52 for the three-second trial and 0.71 for the two-second trial). Limited details regarding randomization and blinding were presented in this paper.

In sum, there were contradictory class I findings with regard to the cognitive efficacy of amphetamine-based formulations; two RCTs (class II and III) reported significant treatment effects. Methylphenidate and lisdexamfetamine dimesylate each had a single RCT demonstrating their efficacies (none of which were class I).

Three class II RCTs and one class III secondary analysis of a larger RCT investigated modafinil and armodafinil. The larger class II RCT conducted with 121 fatigued MS patients showed contradictory findings after 8 weeks, favoring the modafinil group on the SDMT and favoring the placebo group on the PASAT [76]. The study was limited in its lack of detailed account of blinding. Both of the smaller class II RCTs were within-subject, crossover designs and utilized samples of cognitively impaired MS patients: 30 patients taking armodafinil and 16 patients taking modafinil [77, 78]. Both studies found a significant treatment effect on only one of multiple measures (8 and 11, respectively); one of the significant findings was on a predefined secondary endpoint and not a primary endpoint; therefore, probabilities for type I error were high. A class III secondary analysis of a larger RCT observed a significant treatment effect [79].

3.2.4 Other Symptomatic Therapies

Amantadine is also used to treat fatigue in persons with MS. Two class III studies (one was a prospective crossover RCT and one was secondary analysis of a larger RCT) examined amantadine in 24 [80] and 45 fatigued MS patients [81]. Cognition was not the primary endpoint in either study. Neither study found significant treatment effects compared with placebo after 10 days and 6 weeks, respectively.

3.2.5 Summary

In sum, contrary to studies on DMTs, most studies on symptomatic therapies were RCTs and examined cognition as primary endpoints (see Figs. 3 and 4). They also tended to use samples of cognitively impaired patients, increasing sensitivity. Furthermore, symptomatic therapies generally yielded stronger treatment effect sizes (medium range) than DMTs (small to negligible) in RCTs that yielded positive findings (see Fig. 5). However, conclusions were difficult to draw because there were many contradictory findings in these RCTs (negative findings were not illustrated in Fig. 5). The best evidence was in dalfampridine, with one class I RCT in support of its cognitive efficacy and a medium treatment effect size. However, multiple lower-quality RCTs yielded mixed findings for dalfampridine. In contrast, there was quality evidence (class I and II) that demonstrated no significant treatment effects for cognition using dementia medications (e.g. donepezil, rivastigmine, memantine) or Gingko biloba. Lastly, other symptomatic therapies yielded largely mixed results in terms of their cognitive efficacy. Thus, overall, there is insufficient evidence at this time to support the use of symptomatic therapies to improve cognitive function in persons with MS.

Fig. 3
figure 3

Summary of studies based on A study type and B AAN class of evidence. For disease-modifying therapies, the number of studies increased as the quality of evidence decreased. For symptomatic therapies, the majority of studies were randomized controlled trials, and AAN classification levels were evenly distributed. AAN American Academy of Neurology

Fig. 4
figure 4

Proportion of cognition-focused studies, stratified by medication and study type. For disease-modifying therapies, more than half of the studies did not specify cognition as the primary endpoint. For symptomatic therapies, most studies examined cognition as the primary endpoint and these studies tended to be higher in quality compared with other medication types

Fig. 5
figure 5

Summary of effect sizes in RCTs based on medication type. Effect sizes are expressed as Cohen’s d. Weighted effect size = ratio of positive measures to total measures × effect size. Only positive RCTs are included (n = 20). Additionally, there were three positive RCTs with insufficient data to calculate Cohen’s d, and 18 negative RCTs. On average, effect sizes for disease-modifying therapies were negligible, and effect sizes for symptomatic therapies were in the medium range. RCTs randomized controlled trials

3.3 Other Therapies

This section includes medications that are not DMTs and do not target a specific MS symptom (e.g. ambulatory disability, cognition). Cognition was not the sole primary endpoint in any of these studies. Two studies examined recombinant human erythropoietin (EPO): one class II and one class IV. The class II RCT among 50 progressive MS patients found no significant treatment effect after 24 weeks of high-dose EPO treatment on part B of the Trail-Making Test [82]. Schreiber et al. [82] did not specify cognition as the sole primary endpoint, and it was therefore classified as class II. Two class II RCTs examined statins for 24 months and provided limited or no evidence for their cognitive efficacy. One study found significant improvement using simvastatin over placebo on one screening measure of 15 total measures among 140 secondary-progressive MS patients; primary endpoints were not specified in this study [83]. The other study did not find a significant treatment effect using atorvastatin among 154 RRMS patients; there was a high dropout rate and only 63% of enrolled participants were part of the final analyses [84]. A class II RCT compared high- and low-dose estrogen as supplemental therapy for IFN β-1a in 142 female RRMS patients [85]. After 24 months, the high-dose estrogen group (estrogen + IFN) included significantly fewer patients with cognitive impairment than the no-estrogen group (IFN only) and lower risk of developing cognitive impairment (high-dose vs. no-estrogen, odds ratio 0.27). Low-dose estrogen did not have a significant treatment effect. Classification as class II was due to the lack of a primary endpoint specification. Three studies examined methylprednisolone: one class III non-randomized, quasi-controlled study [86] and two class IV observational studies all found significant treatment effects [87, 88], but there were no class I or II studies to establish generalizable validity.

In sum, there was limited quality evidence to support the cognitive efficacy of medications reviewed in this section, and therefore insufficient evidence to support their use to improve cognitive function in persons with MS at this time.

4 Discussion

The current review aimed to systematically evaluate the cognitive efficacy of pharmacologic treatments for MS (including disease-modifying, symptomatic, and other therapies) based on papers published in peer-reviewed journals between 1990 and January 2020. Overall, none of the medications reviewed yielded consistently positive, high-quality evidence in support for their cognitive efficacy, which is in line with the most recent Cochrane review [9]. The best evidence was in dalfampridine, with one class I RCT that found a medium-sized treatment effect. However, other RCTs with dalfampridine of lower classifications yielded largely mixed findings. In contrast, there was consistently high-quality evidence (class I and II) that cognition-enhancing medications used for dementia (e.g. donepezil, rivastigmine, memantine), and Gingko biloba, did not have a positive treatment effect in individuals with MS. One notable finding in this review was that overall, lower-quality observational studies (i.e. class III quasi-controlled studies or class IV uncontrolled studies) tended to yield more positive findings than higher-quality studies (i.e. RCTs), as illustrated in Fig. 2. Focus on lower-quality studies may give clinicians a biased impression of the overall literature base. Furthermore, due to the scarcity of research, optimal dosage and treatment duration for the medications reviewed are unknown as existing studies vary in dosage and treatment intervals. Therefore, much work (e.g. higher-quality RCTs, comparisons of different dosages and treatment duration) is needed before formal recommendations can be made about any of the medications reviewed. We recommend future clinical trial researchers guide their research designs using standardized criteria, such as the AAN criteria, to ensure minimization of bias.

Interestingly, there was no quality evidence demonstrating cognitive efficacy for DMTs, the gold-standard treatment for MS. In fact, from our review, there were no class I studies, and class II and III studies either demonstrated no treatment effect or yielded small to negligible effect sizes. Furthermore, few good-quality studies on DMTs focused on cognition as a primary endpoint, compared with symptomatic therapies, which limited the validity and generalizability of positive findings. Additional high-quality, cognition-focused investigations of DMTs should be conducted, given the pervasiveness and insidious impact of cognitive impairment among persons with MS [1, 2, 89]. Cognition-focused investigations would allow researchers to consider cognition-related factors in study design and analysis, and calculate power based on the cognitive outcomes. For example, to increase sensitivity for a particular cognitive treatment, it may be advantageous to recruit only cognitively impaired individuals. This was done in studies of most of the symptomatic medications (but not DMTs), as the primary goal in these studies was to evaluate cognitive benefit. Therefore, unsurprisingly, most symptomatic therapy investigations were generally higher quality (e.g. RCTs with multiple class I studies) than DMT investigations. That being said, there were many contradictory findings for symptomatic treatments. Additional inquiries into moderating treatment response variables may help explain the mixed results.

One reason for the mixed results pervasive in this review may be the heterogeneity of cognitive assessment procedures. Some studies used single primary endpoints, some used consensus batteries for MS (e.g. BRNB), and others used study-specific batteries of neuropsychological tests. Studies utilizing a battery of tests (without specification of one or two primary endpoints) may find statistically significant effects in a subset of the tests administered, and the number of significant measures may vary among studies, resulting in difficulty in interpretation (i.e. how many significant tests in a battery signify an overall positive study?). However, strictly adhering to AAN criterion of including only one or two primary endpoints (meant to reduce type I error) presents its own challenges for cognition-focused investigations. There is not a single ‘gold standard’ neuropsychological test. In fact, performance variability within a battery of neuropsychological tests occurs in even cognitively healthy individuals, with a difference of as much as six SDs between the highest and lowest scores [90]. In the MS literature, there is increasing consensus to use the SDMT as the standard primary endpoint in clinical trials [91]. Although the SDMT measures the most prevalent cognitive deficit in MS, i.e. information processing speed, it does not account for impairment in other cognitive domains, such as learning and memory or executive functions (in which impairments are also common among persons with MS). Therefore, guidelines regarding best practice in selecting primary cognitive endpoints for clinical trials, considering the aforementioned issues, is warranted. Nevertheless, future studies should prespecify primary endpoints in order to minimize type I error, in accordance with the AAN criteria.

A systemic problem identified in the current review (and research studies in general) is the overreliance on statistical significance using the p value. Indeed, the present review revealed that in studies (some high in quality) that found statistically significant effects (with p values below the conventional 0.05 threshold; most studies do not adjust for multiple comparisons), some studies yielded small or even negligible effect sizes (Cohen’s d < 0.20). Overreliance on the p value has been cited as one of the reasons for the reproducibility crisis. Furthermore, a statement from the American Statistical Association stressed that the p value does not measure the size of an effect or the significance of a finding [92]. Instead, statisticians recommend using effect sizes, and confidence intervals around the effect sizes, to denote the significance of a result [93, 94]. Therefore, when evaluating whether to prescribe a certain medication to improve cognitive function in MS patients based on published findings, clinicians are urged to consider the effect sizes and confidence intervals rather than reliance on the p values.

The current review offers unique strengths compared with past reviews on the same topic. The inclusion of a standardized effect size measure provides an objective metric that can be compared across studies. The use of evidence classification helps to account for potential sources of bias in the studies reviewed. However, a few limitations have to be addressed. We only calculated effect sizes for positive findings in controlled studies. It is possible that some negative studies have strong effect sizes but did not reach statistical significance due to small sample sizes. We decided to not calculate effect sizes for negative findings because one could argue that small sample sizes may not be representative samples of the population being investigated. Nevertheless, we note this as a limitation, as it may underestimate the overall effects observed in the literature. Furthermore, we were unable to calculate effect sizes for some studies due to insufficient data, such as missing means and SDs. Due to these missing data, along with variability in the study endpoints, we were unable to conduct a quantitative meta-analysis. Another limitation of our effect size calculations is the treatment of all available data as a parametric. Although some studies utilized non-parametric tests for their main analyses, the majority did not provide the relevant statistics for which effect sizes could be calculated. Thus, we used the provided means and SDs to calculate Cohen’s d. Lastly, we did not account for correlations between repeated measurements in our effect size calculations because most studies did not provide this information.

5 Conclusions

According to the current systematic review, there is insufficient evidence to support the use of pharmacologic treatments to improve cognitive function in persons with MS. There were many contradictory findings observed in this review, which may be due to possible unidentified moderating treatment response variables and/or lack of standardization in assessment procedures. Higher-quality RCTs with cognition as the primary outcome are needed to establish the cognitive efficacy of pharmacologic treatments for MS-related cognitive dysfunction. Future clinical trial researchers are urged to utilize standardized criteria (such as the AAN criteria) to guide their study designs. There was also an overreliance on statistical significance in determining the overall cognitive efficacy of a medication, which may not be clinically meaningful. Clinicians should consider effect sizes of medications before deciding whether to prescribe them to ameliorate cognitive symptoms.