Introduction

Life expectancy is increasing rapidly for a number of reasons, such as better health care and hygiene, healthier lifestyles, improved food security, and lower child mortality rates (World Health Organization, 2020). We now live longer and healthier lives than our ancestors just a few generations ago. Nevertheless, this dramatic increase in life expectancy has not been accompanied by a proportionate increase in quality of life, particularly for the elderly, who suffer from numerous age-related conditions. Rather, the increase in longevity has increased the risk of disease, disability, dementia, and advanced aging prior to death (Kassebaum et al., 2016).

The term “dementia” is generally understood as a behavioral or cognitive decline sufficiently serious to affect the capacity of a person to undertake everyday tasks but not associated with psychiatric disorders (G. M. McKhann et al., 2011). Dementia due to Alzheimer’s disease (ADD) accounts for an estimated 60 to 80 percent of dementia cases (Association, 2019) and has overtaken cancer as the most feared disease according to a recent survey (Alzheimer’s Disease International 2018, September). ADD is marked by a gradual cognitive decline occurring continuously over a long period, and it is understood to start two decades or more before symptoms emerge (Association 2019; Monsell et al., 2014; Resnick et al., 2010; Savonenko et al., 2015; Wilson et al., 2010). ADD is well known to impact various cognitive processes, with substantial episodic amnesia from the initial stages of the disease as well as deterioration in semantic memory, language, inhibitory control, attention, visuospatial function, and executive dysfunction (Bondi et al., 2017; Chau et al., 2015; Crawford et al., 2015b; Crawford & Higham, 2016; Hellmuth et al., 2012; T. J. Shakespeare et al., 2015; Whitehead et al., 2018).

Present research has identified three stages of Alzheimer’s disease (AD): preclinical AD, mild cognitive impairment (MCI) due to AD, and ADD (Association 2019; Jack et al., 2018). Preclinical AD spans from the first neuropathologic brain lesions to the onset of the first clinical symptoms of AD (Bruno Dubois et al., 2016). MCI is marked by cognitive deterioration greater than anticipated for the individual's age and level of education, although this does not significantly disrupt everyday life activities (Gauthier et al., 2006; G. M. McKhann et al., 2011). MCI can be categorized based on clinical presentation as amnestic MCI (aMCI) and nonamnestic MCI (naMCI), or the number of cognitive domains affected as single cognitive domain or multiple cognitive domains (Roberts & Knopman, 2013). The number of affected domains has important implications for understanding the extent of the underlying brain disease or pathology, disease severity, and likelihood of progression to dementia. MCI with primarily memory deficits is called as amnestic MCI. naMCI includes MCI with problems in thinking skills, inability to make sound decisions and judgments, and inability to take the sequential steps needed to perform relatively complex tasks (Khan, 2016). Typically, patients with MCI convert to ADD at an average of nearly 15% annually, although this prevalence rate varies considerably due to the various MCI diagnostic methods (Libon et al., 2014; Mitchell & Shiri-Feshki, 2009). In general, individuals with aMCI eventually develop into ADD and those with naMCI develop into non-AD dementias (Gauthier et al., 2006; Khan, 2016). Overall, MCI may be temporary, persistent, or progress to other types of neurodegenerative diseases such as AD dementia (Mitchell & Shiri-Feshki, 2009).

With the rise of the aging population, we expect to see a rise in the number of individuals afflicted by ADD. Current solutions for treatment are ineffective, as a number of AD drugs have been tested, but no currently approved drugs can cure the disease, all drugs on the market provide only symptomatic relief. Additionally, the diagnosis of ADD relies largely on documenting cognitive decline, by which time the disease has already caused severe brain damage (G. M. McKhann et al., 2011). For this reason, there is a need for early diagnosis in order to delay or prevent the onset of symptoms (Cummings et al., 2019).

Various approaches, such as genetic testing, biological markers, and structural and functional neuroimaging, have been proposed to improve screening and timely identification of cognitive decline. Among them, biological markers may offer the most promising path to the discovery of an easy and accurate way to detect MCI and ADD before symptoms begin (Jack et al., 2018). Several potential AD biomarkers are being studied to assess their ability to detect prodromal AD and offer objective, dependable measures of disease progress (Goldman & Van Deerlin, 2018). A well-known biomarker used to evaluate the risk or presence of AD is amyloid beta, which is detectable in cerebrospinal fluid (CSF) and blood plasma (Jack et al., 2018; Nakamura et al., 2018). Other indicators of early AD include cortical and subcortical alterations, destruction in the limbic area, cerebral cortex, hippocampus and subcortical nuclei, and eye function changes (Braak & Braak, 1995; Daffner et al., 1992; Katz & Rimmer, 1989). The current biomarkers used in AD studies are either expensive or invasive, hence, we believe that for widespread use, the development of affordable or noninvasive biomarkers for screening or monitoring neuropathological changes is required.

Eye tracking (ET) technology is becoming popular due to the development of accurate, affordable, moveable and easy-to-use eye trackers. ET can be employed in various environments, enabling research of various population groups (Bueno et al., 2019). The eye shares many neural and vascular similarities to the brain and numerous cortical and subcortical regions, which are affected by AD and participate in the triggering and regulation of eye movements (EMs). Consequently, ET can provide an indirect link to neuronal and cognitive functioning (Broerse et al., 2001; Holmqvist et al., 2011; Jamadar et al., 2013; McDowell et al., 2008). Thus, ET may offer a method for monitoring of preclinical, MCI, and ADD stages in a way that is potentially sensitive to the cognitive disease process.

ET metrics might be applied to different aspects of oculomotor behavior such as fixations, smooth pursuit, vergence, vestibular-ocular movements, optokinetic movements, saccades, and pupil responses (Borys & Plechawska-Wójcik, 2017; Duchowski, 2007). Fixations maintain the eye steady during purposeful gaze when the head is stationary. Smooth pursuit movements hold the image of a mobile target on the fovea centralis. Vergence movements shift the eyes in a reverse course to facilitate image positioning on both foveae. Vestibular-ocular reflexes maintain images on the retina during quick motions of the head. Saccades swiftly shift the fovea to a new focus (Mack et al., 2013). Pupil responses (dilation and constriction) are a physiological response that varies the size of the pupil. To date, fixation, smooth pursuit, and saccades are the most common components in EMs assessed in ET tasks for AD (Daffner et al., 1992; Garbutt et al., 2008; Pavisic et al., 2017).

EMs and pupillary responses offer accurate information regarding executive function that can be assessed by oculometrics such as saccade amplitude, saccade latency, saccade peak velocity, fixation duration, latency to pupil constriction, peak pupil constriction, baseline pupil diameter and other measures that are presumed to reflect neural mechanisms of goal-directed behavior, decision making, learning, memory, and attention (Borys & Plechawska-Wójcik, 2017; Eckstein et al., 2017; Holmqvist et al., 2011; Hutton 2008; Luna et al., 2008; Marandi & Gazerani, 2019). These unique characteristics make the eye a relatively inexpensive biomarker for cognitive evaluation and the evolution of AD, which carries the potential for wide implementation (Anderson & MacAskill, 2013; Molitor et al., 2015).

ET dependent evaluation of EMs, in particular examination of saccade properties, is especially helpful in assessing the stage of disease in patients with mild motor function disorders and cognitive impairments, such as ADD (Anderson & MacAskill, 2013). In addition, laboratory-based ET, especially testing of saccade properties, can provide relevant information regarding progression or reversion in neurodegenerative diseases (Anderson & MacAskill, 2013). Two main categories of saccadic EMs can be differentiated: visually guided saccades (also known as reflexive, refixation, or prosaccades) and voluntary (or volitional) saccades. A visually guided saccade can be described as an involuntary positioning reaction to a new event in the field of vision, whereas voluntary saccades result from purposeful activity in a variety of paradigms such as antisaccades, memory-guided saccades or predictive saccades. In antisaccades, the gaze is oriented to the opposite location of the peripheral target onset (Hallett, 1978). In memory-guided saccades, subjects fixate on a central stimulus, and a peripheral focus is shown momentarily, signaling the position for a corresponding saccade, then they conduct saccadic EM toward the target stimulus. In predictive saccades, participants typically direct their gaze in expectation of the emergence of a target in a specific spot with a fixed temporal frequency (Broerse et al., 2001) (Fig. 1).

Fig. 1
figure 1

Saccadic paradigms. (A) Visually guided saccade: a visual stimulus is shown randomly to the right or left side of a central point of fixation and participants are directed to react with quick and accurate EMs. (B) Antisaccade: the EMs are oriented toward a spatial position in the visual field contrasting the stimulus. (C) Memory-guided saccade: participants are directed to inhibit natural reflexive EMs when a new stimulus appears as well as to suppress the saccade until the central fixation point is offset. At the time of the saccadic initiation, there is no visual information on the location of the previously displayed target. (D) Predictive saccade: a visible target steps in spatial variants in a foreseeable chronological sequence

Within these paradigms, many conditions are possible. Among the most popular conditions used in saccade tasks relates the timing between the central fixation stimulus offset and the appearance of the peripheral stimulus target. In standardized ‘‘step” trials, the central fixation offset matches up with the peripheral target appearance. In “gap” trials, the central fixation offset leads the peripheral target appearance, whereas in “overlap” trials, the central fixation stimulus is noticeable after peripheral target appearance (Hutton, 2008: see Fig. 2).

Fig. 2
figure 2

Elementary trial technique for saccade paradigms, showing (A) gap, (B) step and (C) overlap conditions

The gap effect refers to the shorter saccade latency in gap trials than in other conditions. The effect is due to a variety of possible mechanisms that are structured to facilitate in the maintenance of fixation (Pratt et al., 2006). One explanation is that the absence of the fixation point in the gap trials allows attention to be detached until the target emerges, leading to a faster saccade latency, which are referred to as express saccades (Fischer & Ramsperger, 1984), while visual attention is engaged during overlap trials and saccades are suppressed, leading to slower latencies (Fischer et al., 1993; Fischer & Breitmeyer, 1987; Fischer & Ramsperger, 1984). Some researchers have construed the offset of the fixation stimulus in the gap task to serve as an alerting signal, leading to a reduction in saccade latencies (Reuter-Lorenz et al., 1995). Overall, the gap effect appears to reflect both attentional disengagement ‘‘fixation release” and warning components. This fixation release aspect has been suggested to be regulated by low-level neural connections in the superior colliculus (Reuter-Lorenz et al., 1991). According to this description, the removal of the central stimulus results to diminished activity of the fixation neuron in the superior colliculus, thereby disinhibiting movement cells and aiding the beginning of a successive saccade (Hutton, 2008).

The anti-effect refers to a decrease in the latency of visually guided saccade trials relative to antisaccade trials (Hallett, 1978; Hallett & Adams, 1980; Douglas P. Munoz & Everling, 2004), which may be attributed to the additional cognitive processes in the antisaccade trials. The areas of the brain controlling saccadic EMs have been established from preclinical and clinical lesion and neuroimaging studies (McDowell et al., 2008; Pierrot-Deseilligny et al., 2004). The generation of basic visually guided saccades and more sophisticated voluntary saccades involves similar core neural connections, with additional brain areas supporting the relevant cognitive functions (McDowell et al., 2008). Both types of saccades have recognizable neural pathways directly linked to their respective cognitive processes (Broerse et al., 2001; McDowell et al., 2008). Sensory-motor programming in a visually guided paradigm may be guided by different cortical and subcortical networks contingent on the nature of the saccadic paradigm. The network involved in visually guided saccade generation includes striatum, thalamus, superior colliculus, and cerebellar vermis subcortical regions as well as frontal, occipital, and parietal cortical regions. This involves the incorporation of spatial attention, visual processing and a specifically focused motor system but limits requirements on higher-order executive functions. A wide variety of higher-order processes for example attention and knowledge acquisition have been found to influence performance on visually guided saccades (Hutton, 2008).

In volitional saccades, there is a greater demand on higher-level executive control leading to an increasingly complex patterns of brain stimulation. In antisaccade trials, at least 2 different steps are necessary compared to visually guided trials: the inhibition of the reflexive response to make a visually guided saccade to the target and the reversal of the stimulus location into a voluntary motor command to look the other way from the stimulus. Antisaccade execution incorporates a fronto-parieto-subcortical network, comprising dorsolateral prefrontal cortex (DLPFC), supplementary eye field (SEF), frontal eye fields (FEFs), anterior cingulate cortex (ACC), posterior parietal cortex, thalamus, and striatum (Hutton & Ettinger, 2006). Broadly, antisaccade trials activate the oculomotor network more than visually guided trials and may also recruit extra brain areas such as DLPFC and ACC, which are unnecessary in visually guided trials. Activity in these areas is additionally noted during voluntary saccades (such as memory-guided saccades, antisaccades, and predictive saccades); each of them need sophisticated executive processes. These extra demands are facilitated by changes in saccade circuitry activity and by recruitment of extra brain areas. The antisaccade task encompasses a wide range of cognitive processes, such as decision making, working memory, goal-oriented behavior, knowledge acquisition, and attention (Jamadar et al., 2013). Visual cortical activity is regulated as a function of the task requirements and can predict the kind of saccade to be generated, likely through a top-down control process (Broerse et al., 2001; McDowell et al., 2008)

New research utilizing saccadic paradigms has provided evidence of precise abnormalities strongly associated with cognitive measures using conventional neuropsychological tests (Crawford & Higham, 2016; Crawford et al., 2013; Lagun et al., 2011). Several studies have found that EMs between patients with MCI and those with ADD are different from those of healthy age-matched controls (Boxer et al., 2006; Chehrehnegar et al., 2019; Garbutt et al., 2008; Heuer et al., 2013; Holden et al., 2018; Peltsch et al., 2014; Yang et al., 2011, 2013). However, there is still considerable ambiguity in choosing parameters that are relevant in distinguishing between controls and patients with AD. The disparity in saccade paradigm formats may account for the substantial part of the variance seen across studies; hence, the assessment of methodological approaches is of particular importance. In addition, the magnitude and significance of longer reaction times on antisaccade trials than on visually guided trials (the anti-effect), the gap effect, and antisaccade task measures such as antisaccade latency, latency of incorrect prosaccades, numerous spatial accuracy measures, such as the amplitude of correct and incorrect saccades and the final eye position of correct responses, and errors (prosaccades toward the target that are not corrected), which have been found to vary in healthy humans, vary considerably across studies and laboratories, with some studies reporting rates as low as 5% and others as high as 25% (Hutton & Ettinger, 2006). Furthermore, the time to correct errors (the time between an incorrect prosaccade and subsequent corrective antisaccade) in patients with MCI and patients with ADD has not been dealt with in depth.

A recent meta-analytic review of the literature on visually guided and volitional saccade paradigms found that patients with ADD but not patients with MCI had longer visually guided latencies than controls. Additionally, for the volitional antisaccade task, antisaccade latencies did not differentiate between patient groups from healthy controls, but the frequency of antisaccade errors was significantly increased among patient groups compared with controls (Kahana Levy et al., 2018). One of the main limitations of the review was that they used saccade latency only in the gap condition for calculating effect sizes, and saccadic latency and error rates in other formats, such as step and overlap, were not explored. Consequently, this raises questions about the significance and relevance of other conditions not explored, such as overlap or step conditions, for distinguishing between patients with AD and controls.

As a step forward in improving the clinical usability of the EM technique, we review existing original articles and conduct meta-analyses to differentiate performances in saccadic EM of patients with MCI and patients with ADD from their normal controls based on various saccadic paradigms (e.g., visually guided vs. antisaccade paradigm) and on diverse conditions (e.g., gap, overlap, or step conditions).

Methods for Systematic Review and Meta-Analysis

Protocol and Registration

This systematic review has been registered with the International Prospective Register of Systematic Reviews (University of York Centre for Reviews and Dissemination, 2020); registration no. CRD42019138926; available from https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42019138926) and is guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Liberati et al., 2009; Moher et al., 2009).

Eligibility Criteria

We considered the following study designs: nonrandomized and randomized controlled trials (RCTs) and observational study designs such as cohort studies, cross-sectional studies and case–control studies, which investigated saccadic EMs in patients with MCI and patients with ADD in comparison with a healthy age-matched control group. The diagnosis of MCI (caused by any etiology) was based on the specific criteria as follows: Petersen criteria (Petersen et al., 1999), revised Petersen criteria (Petersen, 2004; Petersen et al., 2001), Winblad criteria (Winblad et al., 2004), Matthews criteria (Matthews et al., 2008), revised Matthews criteria for MCI (Artero et al., 2006), Clinical Dementia Rating (CDR) = 0.5 (Morris, 1993), the National Institute on Aging-Alzheimer's Association (NIA-AA) core clinical criteria (Albert et al., 2011), or a combination. For ADD, we used the following criteria: National Institute of Neurological and Communicative Disorders and Stroke–Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria (G. McKhann et al., 1984; G. M. McKhann et al., 2011), DSM III (American Psychiatric, 1986) and DSM-IV (Diagnostic and statistical manual of mental disorders : DSM-IV 1994), DSM-IV-TR (Diagnostic criteria from DSM-IV-TR 2000), International Statistical Classification of Diseases and Related Health Problems ICD-10 (International statistical classification of diseases and related health problems, 2004), and Dubois criteria (B. Dubois et al., 2007, 2010) or a combination. To be included in this review, articles had to be published in a peer-reviewed journal published between January 1980 and July 2020 and written in English. When several articles were published from the same parent study or dataset, only one article was included in the analysis based on the completeness of information that could be obtained from each article. All other articles published from shared datasets were excluded for reasons of non-independence, as they could potentially bias results (Liberati et al., 2009; von Elm et al., 2004). Finally, studies were excluded if they did not have an appropriate control group (e.g., children <18 years), participants were individuals with MCI or ADD not diagnosed according to specific criteria, or insufficient data were provided to calculate or estimate effect sizes and attempts to contact corresponding study authors were unsuccessful.

Information Sources

We searched for published articles indexed in MEDLINE, EMBASE, and CENTRAL databases. A manual search of references and forward citations of relevant systematic reviews and relevant original research articles was also carried out to ensure that all potential studies were captured. The searches were concluded by July 30, 2020.

Search Strategy

The search strategy was developed through a review of published literature and in consultation with a reviewer experienced in systematic reviews and adapted to other databases. The MEDLINE, EMBASE, and CENTRAL database search strategies are presented in Tables S1-S3 in the Online Resource.

Study Selection

All the identified articles were initially imported into Endnote (Ver. X9, Thomson Reuters, USA), and duplicate records were removed. These articles were then uploaded to Covidence systematic review software (The Cochrane Collaboration, 2020, July 22) where OJ and DDN screened the titles and abstracts. The reviewers independently screened the identified papers for inclusion using the registered protocol and made decisions about inclusion according to the eligibility criteria. Corresponding authors were contacted when the information in the published article was insufficient to decide eligibility. Disagreements were resolved by consensus or a third reviewer (KJU). Only those records that were included by both reviewers passed on to the final review stage. Reference lists of these eligible studies were manually checked to ensure that no potentially relevant articles were missed. The full texts of all papers not excluded based on title or abstract were screened. The number of articles included and excluded at the distinct phases was recorded as recommended and presented in a PRISMA flowchart (Moher et al., 2009: see Fig. 3).

Fig. 3
figure 3

Flow diagram according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Data Extraction

Data were extracted independently by two reviewers (OJ and DDN) using a Microsoft Excel (2016) spreadsheet form tailored to the requirements of this systematic review. Disagreements were resolved through discussion with a third reviewer (KJU). If numerical data were missing from the results section, the reviewers extracted data with WebPlot Digitizer Version 4.2. Five study authors were contacted about missing data that were necessary to calculate effect sizes, and follow-up emails were initiated within one month when no response to the first emails was received. Two of these authors responded and provided the necessary information.

Data Items

The extracted data included the title of article, first author, country, study design, demographic information of the sample (i.e., age in years,% male, education in years), cognitive status diagnostic criteria, scores on standard assessments of cognitive status (i.e., Mini-Mental State Examination, Montreal Cognitive Assessment), study population (e.g., MCI, ADD) and sample size per group, ET device and technique, oculomotor paradigm, saccade task condition (e.g., gap, step, overlap), saccade parameters (e.g., mean latency, amplitude, gain, errors, omissions, and anticipations), main findings and conclusions (Table 1). We used additional calculations such as, calculating the standard deviation (SD) from the standard error (SE) and sample size, standard errors from confidence intervals (CIs) and p values; absolute (difference) measures and standard errors from confidence intervals and p values; and ratio measures to obtain summary statistics where necessary.

Risk of Bias in Individual Studies

The Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I) assessment tool was used since the study aimed to evaluate the efficacy of using various conditions during saccade-based EM as a screening, diagnostic, or monitoring method for patients with MCI and patients with ADD (Sterne et al., 2016). This tool includes seven specific bias domains, preintervention and postintervention. The domains are (1) confounding, (2) selection of participants, (3) classification of intervention, (4) deviation from interventions, (5) missing outcome data, (6) measurement of outcomes, and (7) selection of reported result overall. Risk of bias was rated as 0 - no information, 1 - low risk, 2 - moderate risk, 3 - serious risk, and 4 - critical risk. Two authors (OJ and DDN) independently assessed the risk of bias of the included articles. Disagreements were managed by consensus.

Summary Measures

Effect sizes were shown in terms of standardized mean differences using Hedges’ g (unbiased), which includes a correction for small sample bias given the demonstrated tendency for studies with relatively small sample sizes to overestimate the true population effect (Hedges, 2016; Hedges & Olkin, 2014). For comparison, we also reported the difference in means (referred to as mean difference: MD) which is given by \(\mathrm{MD}=\mathrm{M}1-\mathrm{M}2.\) There are several popular formulations of the standardized mean difference (SMD). The one implemented in RevMan is Hedges’ adjusted g, which is very similar to Cohen's d, but includes an adjustment for small sample bias. The formula for Hedges’ g = \(\frac{\mathrm{M}1-\mathrm{M}2}{{\mathrm{SD}}_{\mathrm{pooled}}}\left(1-\frac{3}{4\mathrm{N}-9}\right)\), where M1 is the mean response for the patient group, M2 is the mean response for the control group, and N is the overall sample size including both patient and control groups (Hedges & Olkin, 2014). The pooled SD is calculated as SDpooled = \(\sqrt{\frac{\left(\left(\mathrm{N}1-1\right){\mathrm{SD}1}^{2}\right)+\left(\left(\mathrm{N}2-1\right){\mathrm{SD}2}^{2}\right)}{(\mathrm{N}1+\mathrm{N}2)-2}}\),

where N1 is the patient group sample size, N2 is the control group sample size, SD1 is the SD of the mean for the patient group, and SD2 is the SD of the mean for the control group. All effects were calculated such that a positive effect size corresponds to longer latency or higher frequency of errors during visually guided and antisaccades tasks in the patient groups (MCI and ADD) than in the control group.

Synthesis of Results

A random-effects model was assumed given that heterogeneity in effect sizes was expected to exceed that which could be explained by sampling error alone (Deeks JJ, 2019; Rothstein et al., 2013). To address the primary aim of this review, the results from different saccade paradigms were pooled according to condition (gap, step, and overlap) to determine an overall mean effect size (Hedges & Olkin, 2014). Macros available in Review Manager Version 5.3 software (Cochrane, London, UK) and JASP computer software, version 0.13.1 were used to aggregate a mean effect size and 95% CI.

Heterogeneity of effect sizes was identified using Chi22, or chi-square, Q) and quantified using the I2 statistic (Cochrane handbook for systematic reviews of interventions 2020). Chi2 is calculated as the weighted sum of squared deviations of each study’s effect size from the overall mean effect size and provides significance test for heterogeneity (Borenstein et al., 2011). I2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance) (Cochrane handbook for systematic reviews of interventions, 2020). The formula for I2 \(=\left(\frac{Q-\mathrm{df}}{Q}\right)\mathrm{x }100\mathrm{\%}\),

where Q is the Chi2 statistic and df is its degrees of freedom (Higgins & Thompson, 2002; Higgins et al., 2003). In the meta-analysis, I2 values of 25%, 50%, and 75% represented low, moderate, and high degrees of heterogeneity, respectively (Higgins et al., 2003). However, it is important to note that I2 is a measure of relative heterogeneity, and a high I2 may be observed in the context of smaller absolute heterogeneity. Thus, Tau2 (Tau-squared, τ2) was also calculated to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in the different studies. Τau2 is defined as the variance of the true effect sizes and presents an estimate of the between-study variance in a random-effects model (Cochrane handbook for systematic reviews of interventions, 2020). Ultimately, we used strategies developed to address heterogeneity, such as rechecking the data and conducting subgroup analyses (Deeks JJ, 2019).

Additionally, when several autonomous study groups were compared with a single control group, (Chehrehnegar et al., 2019; Crawford et al., 2019; Heuer et al., 2013; Holden et al., 2018; Peltsch et al., 2014; Wilcockson et al., 2019; Yang et al., 20112013) or when the effects were calculated over various time periods in the same sample (Crawford et al., 2015), the calculation of the average effect size that decreases over the observations would result in the omission of essential moderator information and would therefore not be appropriate. Accordingly, effect sizes for each of these nonindependent comparisons were included. To avoid underestimating the error variance associated with each effect size, the sample sizes used to calculate the standard errors for each group were divided by the number of their inclusions (Cochrane handbook for systematic reviews of interventions, 2020).

Risk of Bias Across Studies

Publication bias was estimated by visual inspection of a funnel plot and Egger’s linear regression test (significant at P < 0.1) (Egger et al., 1997). Statistical analyses were conducted using Review Manager Version 5.3 software (Cochrane, London, UK: The Cochrane Collaboration, 2014) and JASP Team (2020) JASP (Version 0.13.1) [Computer software].

Additional Analyses

Subgroup analyses were conducted to determine whether paradigm, clinical diagnosis (MCI and ADD), and outcomes (latency and error rate) in saccade paradigms contributed to the observed effect sizes. Chi2, I2, and Tau2 values were calculated to detect and quantify the heterogeneity across studies. All statistical analyses were performed using Review Manager software, version 5.3 (RevMan 5.3) and JASP computer software, version 0.13.1. Whenever a meta-analysis was not feasible because of a limited number of studies, a narrative summary was produced.

Results

Study Selection

The database search generated 5887 references of which 738 were duplicates, resulting in a total of 5149 unique articles. A total of 4966 were excluded because these studies did not meet the selection criteria. Subsequently, 183 full texts were assessed for eligibility, and 148 studies were excluded after full-text review based on our inclusion criteria. Subsequently, 36 studies met the eligibility criteria; however, two studies from the same research group had identical numerical outcomes (Crawford et al., 20052013); therefore, only the later study (Crawford et al., 2013) was included in the final 35 studies included in the synthesis. Of these, eight studies (Bourgin et al., 2018; Bylsma et al., 1995; Currie et al., 1991; Mosimann et al., 2004; Pavisic et al., 2017; L. F. Scinto et al., 1994; T. Shakespeare et al., 2015; Verheij et al., 2012) did not meet the data availability inclusion criteria as the reported saccade paradigm temporal format could not be distinguished or gap and the overlap results were combined; thus, these studies were excluded from the meta-analysis. Thus, the remaining 27 studies (Abel et al., 2002; Alichniewicz et al., 2013; Boucart et al., 2014a, b; Boxer et al., 2006, 2012; Chehrehnegar et al., 2019; Crawford et al., 20132015, 2019; de Boer et al., 2016; Garbutt et al., 2008; Hershey et al., 1983, 2013; Holden et al., 2018; Kaufman et al., 2012; Laurens et al., 2019; Lenoble et al., 2015, 2018; Mosimann et al., 2005; Noiret et al., 2018; Peltsch et al., 2014, 2020; Shafiq-Antonacci et al., 2003; Wilcockson et al., 2019; Yang et al., 20112013) were included in the quantitative analysis (meta-analysis: see Fig. 1). Of the 34 studies included in the qualitative synthesis, 31 had defined saccade conditions, with twenty-four (77%) conducting ET in the gap condition.

Study Characteristics

Of the 35 studies included in this review, 8 (23%) (Boucart et al., 2014a, b; Bourgin et al., 2018; Holden et al., 2018; Laurens et al., 2019; Lenoble et al., 20152018; Noiret et al., 2018) were conducted in France, 7 (20%) (Boxer et al., 2006, 2012; Bylsma, 1995; Garbutt et al., 2008; Hershey et al., 1983; Heuer et al., 2013; L. F. M. Scinto et al., 1994) in the United States, 8 (23%) in the United Kingdom (Crawford et al., 20132015, 2019; Mosimann et al., 2005; Pavisic et al., 2017; Polden et al., 2020; T. Shakespeare et al., 2015; Wilcockson et al., 2019), and the rest (34%) in Australia (Abel et al., 2002; Currie et al., 1991; Shafiq-Antonacci et al., 2003), Germany (Alichniewicz et al., 2013), Canada (Kaufman et al., 2012; Peltsch et al., 2014), China (Yang et al., 20112013), the Netherlands (de Boer et al., 2016; Verheij et al., 2012) Switzerland (Mosimann et al., 2004) and Iran (Chehrehnegar et al., 2019). Two studies (Bylsma, 1995; Crawford et al., 2015) were longitudinal prospective cohort studies, whereas the rest were matched case–control studies.

The total sample size of the 35 included studies comprised 2435 subjects, 1252 controls and 1183 patients (386 MCI and 797 ADD patients). All the studies that reported on gender had both male and female participants. The characteristics of the included studies are described in Table 1.

Risk of Bias Within Studies

Of the 35 studies assessed using the ROBINS-I risk of bias assessment tool (Table S4 in the Online Resource), 25 studies were rated as a moderate risk of bias (Abel et al., 2002; Alichniewicz et al., 2013; Boucart et al., 2014b; Bourgin et al., 2018; Boxer et al., 2006; Bylsma, 1995; Chehrehnegar et al., 2019; Crawford et al., 2015; Crawford et al., 2019; Currie et al., 1991; de Boer et al., 2016; Hershey et al., 1983; Holden et al., 2018; Kaufman et al., 2012; Laurens et al., 2019; Lenoble et al., 2018; Mosimann et al., 2004; Mosimann et al., 2005; Peltsch et al., 2014; L. F. M. Scinto et al., 1994; Shafiq-Antonacci et al., 2003; T. J. Shakespeare et al., 2015; Verheij et al., 2012; Wilcockson et al., 2019; Yang et al., 2011). Ten studies were rated as having a low risk of bias (Boucart et al., 2014a; Boxer et al., 2012; Crawford et al., 2013; Garbutt et al., 2008; Heuer et al., 2013; Lenoble et al., 2015; Noiret et al., 2018; Pavisic et al., 2017; Polden et al., 2020; Yang et al., 2013).

Synthesis of Results

  1. 1.

    Qualitative Synthesis

We performed a qualitative analysis using the variables that were reported in most of the included studies. The common parameters for analysis were latencies and gain or amplitude in prosaccade and antisaccade and error rate in the antisaccade paradigm. The variables were analyzed according to differences observed between patients and controls. The analysis focused on the parameters excluded from the meta-analysis and the most widely reported parameters, in order to prevent repetition of the synthesis. A summary of analyzed articles is listed in Table 1.

  1. 1.1

    Latency

Most studies placed the target stimuli in the horizontal plane. Of these, 14 studies also reported placing the target stimuli in the vertical plane separately or in combination with the horizontal plane target stimuli.

In twenty-four studies the saccade latency of patient groups (MCI and ADD) was compared with that of controls using gap conditions (Abel et al., 2002; Boucart et al., 2014a; Boucart et al., 2014b; Boxer et al., 2006; Boxer et al., 2012; Chehrehnegar et al., 2019; Crawford et al., 2015; Crawford et al., 2013; Crawford et al., 2019; de Boer et al., 2016; Garbutt et al., 2008; Heuer et al., 2013; Holden et al., 2018; Lenoble et al., 2015; Lenoble et al., 2018; Mosimann et al., 2004; Mosimann et al., 2005; Pavisic et al., 2017; Peltsch et al., 2014; Polden et al., 2020; T. J. Shakespeare et al., 2015; Wilcockson et al., 2019; Yang et al., 2011; Yang et al., 2013). Fourteen studies used overlap conditions (Boxer et al., 2006; Boxer et al., 2012; Chehrehnegar et al., 2019; Crawford et al., 2015; Crawford et al., 2013; Garbutt et al., 2008; Laurens et al., 2019; Mosimann et al., 2004; Mosimann et al., 2005; Peltsch et al., 2014; Polden et al., 2020; T. Shakespeare et al., 2015; Yang et al., 2011; Yang et al., 2013). Eight studies used step conditions (Abel et al., 2002; Alichniewicz et al., 2013; Currie et al., 1991; Hershey et al., 1983; Holden et al., 2018; Kaufman et al., 2012; Noiret et al., 2018; Shafiq-Antonacci et al., 2003). In 4 studies the variation could not be determined (Bourgin et al., 2018; Bylsma et al., 1995; L. F. Scinto et al., 1994; Verheij et al., 2012).

  1. 1.1.1

    Prosaccade Latency

Thirty studies reported the prosaccade latency of controls compared to patients (Abel et al., 2002; Alichniewicz et al., 2013; Boucart et al., 2014a; Boucart et al., 2014b; Bourgin et al., 2018; Boxer et al., 2006; Boxer et al., 2012; Bylsma, 1995; Chehrehnegar et al., 2019; Crawford et al., 2015; Crawford et al., 2013; de Boer et al., 2016; Garbutt et al., 2008; Hershey et al., 1983; Heuer et al., 2013; Holden et al., 2018; Laurens et al., 2019; Lenoble et al., 2015; Lenoble et al., 2018; Mosimann et al., 2004; Mosimann et al., 2005; Noiret et al., 2018; Peltsch et al., 2014; Polden et al., 2020; L. F. M. Scinto et al., 1994; Shafiq-Antonacci et al., 2003; T. J. Shakespeare et al., 2015; Verheij et al., 2012; Yang et al., 2011; Yang et al., 2013) Nine of these studies had an MCI group alone (Alichniewicz et al., 2013) or with an ADD group (Chehrehnegar et al., 2019; Heuer et al., 2013; Holden et al., 2018; Laurens et al., 2019; Peltsch et al., 2014; Polden et al., 2020; Yang et al., 2011; Yang et al., 2013). Overall, 87% of the studies found that patients had a longer latency than controls, with no study reporting significantly longer latency in the control group.

  1. 1.1.2

    Antisaccade Latency

There were 15 studies (Alichniewicz et al., 2013; Bourgin et al., 2018; Boxer et al., 2006; Boxer et al., 2012; Chehrehnegar et al., 2019; Crawford et al., 2013; Crawford et al., 2019; Currie et al., 1991; Garbutt et al., 2008; Heuer et al., 2013; Holden et al., 2018; Mosimann et al., 2005; Noiret et al., 2018; Peltsch et al., 2014; Shafiq-Antonacci et al., 2003; Wilcockson et al., 2019) that reported antisaccade latency patients compared with the controls. Eight of these studies had an MCI group alone (Alichniewicz et al., 2013) or with an ADD group (Chehrehnegar et al., 2019; Crawford et al., 2019; Heuer et al., 2013; Holden et al., 2018; Laurens et al., 2019; Peltsch et al., 2014; Wilcockson et al., 2019). Overall, 80% of the studies found that patients had a longer latency than controls, with no study reporting significantly longer latency in the control group.

  1. 1.1.3

    Antisaccade Error Latency

Of the 7 studies (Bourgin et al., 2018; Boxer et al., 2006; Crawford et al., 2013; Crawford et al., 2019; Garbutt et al., 2008; Heuer et al., 2013; Noiret et al., 2018) that reported on the latency of error responses (prosaccades during antisaccade tasks), only one (Heuer et al., 2013) had an MCI group. Overall, 100% of studies found that patients had a longer latency than controls.

  1. 1.2

    Antisaccade Error Rate

Of the 14 studies (Alichniewicz et al., 2013; Bourgin et al., 2018; Boxer et al., 2006; Boxer et al., 2012; Crawford et al., 2013; Garbutt et al., 2008; Heuer et al., 2013; Holden et al., 2018; Kaufman et al., 2012; Mosimann et al., 2005; Noiret et al., 2018; Peltsch et al., 2014; Shafiq-Antonacci et al., 2003; Wilcockson et al., 2019) that reported on the antisaccade error rate or correct antisaccades, only 5 studies (Alichniewicz et al., 2013; Holden et al., 2018; Peltsch et al., 2014; Wilcockson et al., 2019) had an MCI comparison group. Overall, 100% of the studies found that patients had a higher frequency of antisaccade errors than controls.

  1. 1.3

    Gain or Amplitude

We examined gain or amplitude in both PS and AS. Overall, 10 (91%) studies (Boxer et al., 2012; Bylsma, 1995; Chehrehnegar et al., 2019; Crawford et al., 2015; Crawford et al., 2013; Garbutt et al., 2008; Mosimann et al., 2004; Mosimann et al., 2005; L. F. M. Scinto et al., 1994; Shafiq-Antonacci et al., 2003) studies reporting on gain or amplitude found hypometric saccades in patients. In 9 of these studies, comparisons were made only between age-matched controls and patients with ADD. Only one study (Chehrehnegar et al., 2019) with both MCI and ADD patient groups compared their reported findings to similar findings in other studies (hypometric saccades in patients). Overall, 90% of the studies found that compared to controls, patients had hypometric saccades, with no study reporting significantly smaller amplitudes in the control group.

  1. 2.

    Meta-analysis (Quantitative Analysis)

We conducted a meta-analysis derived from the visually guided and antisaccade paradigms of each saccade condition, comparing saccades in patient groups (combining MCI and ADD) and healthy age-matched controls. In order to compare studies with ADD patient groups to studies with MCI groups, outcomes (latency and error rate), and paradigms (prosaccade and antisaccade), we conducted subgroup-analyses. The effect sizes were calculated (from the study mean and standard deviation) as standardized mean differences and expressed as Hedges’ g (unbiased) using a random-effects model.

  1. 2.1

    Gap

The first stage of the meta-analysis included 54 effect sizes of the gap condition that were derived from latency measures in the visually guided paradigms and latency and frequency of errors in the antisaccade paradigm for controls and patients (MCI and ADD) groups together. The overall weighted mean effect size in the gap condition was moderate (SMD: 0.52, CI: [0.37, 0. 68], Chi2 = 210.12, df = 53, p < 0.001, Tau2 = 0.24, I2 = 75 %) (Fig. A1 in the Online Resource). The I2 values indicated substantial heterogeneity; therefore, the presence of potential moderators.

Accordingly, in the second stage of analysis, we used the paradigm type (prosaccade and antisaccade) as a moderator variable. The subgroup analysis revealed the following (prosaccade, k = 27, Chi2 = 51.10, df = 26, p = 0.002, Tau2 = 0.09, I2 = 49%; antisaccade, k = 27, Chi2 = 141.70, df = 26, p < 0.001, Tau2 = 0.31, I2 = 82%).

For the prosaccade group, the I2 value indicated low heterogeneity; therefore, the mean effect size was considered the best estimation for the data. In prosaccade studies, the overall weighted mean effect size in all studies was moderate (SMD: 0.30, CI: [0.13, 0.46] and MD: 15.88, CI: [7.42, 24.34]), suggesting a significant difference in prosaccade latency between the patient and control groups (Fig4A).

Fig. 4
figure 4

Forest plot of effect sizes and their confidence intervals comparing patients and controls in the gap condition for (A) prosaccade latency (msec), (B) antisaccade latency (msec), and (C) antisaccade error rate (%)

Subgroup analysis of prosaccade paradigm using the clinical diagnosis (MCI and ADD) as a moderator revealed the following: ADD group, k = 19, Chi2 = 47.6, df = 16, p < 0.001, Tau2 = 0.19, I2 = 66%; MCI group, k = 8, Chi2 = 1.14, df = 5, p = 0.95, Tau2 = 0.000, I2 = 0%. The I2 value indicated moderate heterogeneity in the ADD group and homogeneity for the MCI group; therefore, the mean effect size was considered the best estimation for the data. The overall weighted mean effect size in ADD studies was moderate (SMD: 0.39, CI: [0.17, 0.62] and MD: 21.37, CI: [9.80, 32.93]), and in MCI studies was small (SMD: 0.09, CI: [0.10, 0.28] and MD: 3.98, CI: [-4.58, 12.55]). This suggests that patients with ADD had significantly longer saccadic latencies when compared to controls whereas there were no significant differences between patients with MCI and controls (Fig. A2a, b in the Online Resource). Comparing prosaccade latency directly between patients with MCI and patients with ADD, revealed the following: k = 8, Chi2 = 19.66, df = 7, p = 0.006, Tau2 = 0.18, I2 = 64%. The I2 value indicated moderate heterogeneity; therefore, the mean effect size was considered the best estimation for the data. The overall weighted mean effect size between ADD and MCI was moderate (SMD: 0.45, CI: [0.08, 0.81] and MD: 24.03, CI: [4.78, 43.27]), suggesting a significant difference in saccadic reaction times between patients with ADD and patients with MCI in the prosaccade paradigm (Fig. A2c) in the Online Resource).

For the antisaccade group, the Chi2 and I2 values indicated the presence of substantial heterogeneity; therefore, the presence of potential moderators. In the antisaccade studies, the mean overall effect size was moderate (SMD: 0.73, CI: [0.50, 0.97]).

Subgroup analysis of the antisaccade paradigm using the outcomes (latency and error rate) as a moderator revealed the following (for latency, k = 15, Chi2 = 41.86, df = 14, p < 0.001, Tau2 = 0.13, I2 = 67%; for error rate, k = 12, Chi2 = 80.64, df = 11, p < 0.001, Tau2 = 0.50, I2 = 86%). For the latency outcome, the I2 value indicated moderate heterogeneity, therefore, the mean effect size was thus regarded as the best estimate for the data. In the error studies, the I2 values indicated substantial heterogeneity; therefore, the presence of potential moderators. In the antisaccade studies, the mean overall effect size in latency studies was moderate (SMD: 0.44, CI: [0.21, 0.66] and MD: 34.37, CI: [16.94, 51.80]), (Fig. 4B) whereas the mean overall effect size in error rate studies was large (SMD: 1.16, CI: [0.72, 1.60] and MD: 26.10, CI: [15.35, 36.84]), (Fig. 4C). This suggests a significant difference in outcome measures of saccade latency and frequency of errors between patients and controls.

Subgroup analysis of the antisaccade latency outcome using clinical diagnosis as the moderator revealed the following: ADD group: k = 8, Chi2 = 42.23, df = 7, p < 0.001, Tau2 = 0.28, I2 = 83%; MCI group: k = 7, Chi2 = 15.16, df = 6, p = 0.02, Tau2 = 0.07, I2 = 60%. In the ADD studies, the I2 values indicated substantial heterogeneity; therefore, the presence of potential moderators. The I2 value indicated moderate heterogeneity for the MCI group; therefore, the mean effect size was considered the best estimation for the MCI latency data (Fig. A3a, b in the Online Resource). In the ADD studies, the mean overall effect size in latency studies was moderate (SMD:0.55, CI: [0.15,0.95] and MD:40.47, CI: [10.19,70.75]), and the mean overall effect size in MCI studies was moderate (SMD:0.35, CI: [0.10, 0.60 and MD:28.55, CI: [6.14, 50.96]), suggesting that both patients with ADD patients with MCI had significantly different antisaccade saccade latency from healthy controls. In the additional analysis of the antisaccade paradigm comparing between patient groups (MCI vs. ADD), antisaccade latency revealed the following: k = 7, Chi2 = 24.37, df = 6, p < 0.001, Tau2 = 0.19, I2 = 75%. The I2 value indicated high heterogeneity; therefore, the presence of potential moderators. Between MCI and ADD, the overall weighted mean effect size was moderate (SMD: 0. 30, CI: [-0.07, 0.67] and MD: 20.70, CI: [-6.44, 47.85]), suggesting no significant differences in antisaccade latency between patients with ADD and patients with MCI. (Fig. A3c in the Online Resource).

Subgroup analysis of the error rate outcome using clinical diagnosis as the moderator revealed the following, the following was found: ADD group: k = 7, Chi2 = 33.97, df = 6, p < 0.001, Tau2 = 0.36, I2 = 82%; MCI group: k = 5, Chi2 = 15.57, df = 4, p = 0.004, Tau2 = 0.17, I2 = 74%. In the ADD group, the I2 value indicated high heterogeneity; therefore, the presence of potential moderators. In the MCI group, the I2 indicated moderate heterogeneity and consequently was considered the best estimate for data (Fig. A4a, b in the Online Resource). In the ADD studies, the mean overall effect size in error studies was large (SMD: 1.59, CI: [1.09, 2.09] and MD: 36.46, CI: [22.05, 50.86]), and the mean overall effect size in MCI studies was moderate (SMD: 0.55, CI: [0.14, 0.97] and MD: 10.98, CI: [2.58, 19.38]), suggesting that both patients with ADD patients with MCI had significantly higher frequency of errors compared to healthy controls. In the analysis of the error rate outcome between MCI vs. ADD, antisaccade error rate revealed the following: k = 5, Chi2 = 32.15, df = 4, p < 0.001, Tau2 = 0.46, I2 = 88%. The I2 value indicated high effect size heterogeneity and the presence of additional moderator(s); the overall weighted mean effect size was moderate (SMD: 0.53, CI: [-0.11, 1.17] and MD: 13.02, CI: [-3.36, 29.40]), (Fig. A4c in the Online Resource).

  1. 2.2

    Step

The first stage of the meta-analysis included 12 effect sizes of the step condition that were derived from the visually guided and antisaccade paradigms for MCI and ADD groups together (Chi2 = 14.54, df = 11, p = 0.20, Tau2 = 0.05, I2 = 24%). The Chi2 and I2 values indicated homogeneity; therefore, the mean effect size was considered the best estimation for the data. The overall weighted mean effect size was large (SMD: 0.84, CI: [0.59, 1.08]), suggesting significant differences in outcomes between patients and healthy age matched controls (Fig. B1 in the Online Resource).

Accordingly, in the second stage of analysis, we used the paradigm type as a subgroup moderating variable (prosaccade, k = 5, Chi2 = 5.09, df = 4, p = 0.28, Tau2 = 0.03, I2 = 21% (Fig. 5A); antisaccade, k = 7, Chi2 = 6.68, df = 6, p = 0.35, Tau2 = 0.02, I2 = 10%). The Chi2 value indicated homogeneity, and the I2 value indicated homogeneity; therefore, the mean effect size was considered the best estimation for the data. In prosaccade studies, the overall weighted mean effect size in MCI and ADD studies was moderate (SMD: 0.67, CI: [0.33, 1.01] and MD: 46.98, CI: [17.30, 76.66]), (Fig. 5A), suggesting a significant difference in saccadic latency between patients and controls. In the overall antisaccade studies, the mean overall effect size was large (SMD: 1.00, CI: [0.70, 1.30]), implying significant differences in outcome measures of latency and error rate between patients and controls. Due to the small number of studies, we did not perform subgroup analyses to compare healthy controls and patient groups separately.

Fig. 5
figure 5

Forest plot of effect sizes and their confidence intervals comparing patients and controls in the step condition for (A) prosaccade latency (msec), (B) antisaccade error rate (%), and (C) antisaccade latency (msec)

In the subgroup analysis of the antisaccade, we used outcomes (latency and error rate) as a moderator (for error rate, k = 4, Chi2 = 0.45, df = 3, p = 0.93, Tau2 = 0.00, I2 = 0%, for latency, k = 3, Chi2 = 3.90, df = 2, p < 0.14, Tau2 = 0.16, I2 = 49%). The Chi2 value indicated homogeneity, and the I2 value indicated homogeneity and moderate homogeneity; thus, the mean effect size was considered the best approximation for the data. In studies with error rate as an outcome, the mean overall effect size was large (SMD: 1.18 CI: [0.82, 1.54] and MD: 25.52, CI: [18.13, 32.92]), suggesting a significant difference in the frequency of errors between patients and controls. In studies with latency as an outcome, the mean overall effect size was moderate (SMD: 0.74, CI: [0.10, 1.39] and MD: 93.55, CI: [12.75, 174.35), suggesting a significant difference in the saccadic reaction times between patients and controls (Fig. 5B).

  1. 2.3

    Overlap

The first stage of the meta-analysis included 30 effect sizes of the overlap condition that were derived from the visually guided and antisaccade paradigms for MCI and ADD groups together (Chi2 = 83.67, df = 29, p < 0.001, Tau2 = 0.18, I2 = 65%). The I2 values indicated moderate heterogeneity; therefore, the mean effect size was considered the best estimation for the data. The overall weighted mean effect size was medium (SMD: 0.50, CI: [0.30, 0.69]), suggesting a significant difference between patients and controls (Fig. C1 in the Online Resource).

Accordingly, in the second stage of analysis, we used the paradigm type as a subgroup moderator variable (prosaccade, k = 20, Chi2 = 39.79, df = 19, p = 0.003, Tau2 = 0.11, I2 = 52%; antisaccade, k = 10, Chi2 = 34.62, df = 9, p < 0.001, Tau2 = 0.28, I2 = 74%). For both groups, the I2 value indicated moderate heterogeneity; therefore, the mean effect size was considered the best estimation for the data. In the prosaccade overlap studies, the overall weighted mean effect size in MCI and ADD studies was moderate (SMD: 0.34, CI: [0. 14, 0.55]) and MD: 26.87, CI: [11.72, 42.01]), indicating that there was a significant difference in saccadic latency between the patient and control groups (Fig. 6A). In the antisaccade studies, the mean overall effect size was moderate (SMD: 0.79, CI: [0.40, 1.18]), suggesting a significant difference in outcomes (latency and error) between patients and controls.

Subgroup analysis of the prosaccade paradigm using clinical diagnosis (MCI and ADD) as a moderator revealed the following: ADD group, k = 13, Chi2 = 30.00, df = 12, p = 0.003, Tau2 = 0.16, I2 = 60%; MCI group, k = 7, Chi2 = 2.85, df = 6 p = 0.83, Tau2 = 0.00, I2 = 0%. The I2 value indicated moderate heterogeneity in the ADD group and homogeneity in the MCI group. In the ADD studies, the mean overall effect size was moderate (SMD: 0.50, CI: [0.22, 0.79] and MD: 36.78, CI: [16.53,57.03), whereas in MCI studies, it was small (0.08, CI: [-0.14,0.29] and MD: 6.88, CI: [10.69,24.45]), suggesting a significant difference in prosaccade latency between patients with ADD and controls, but no significant difference between patients with MCI and controls (Fig. C2a, b in the Online Resource). Additional analysis of the prosaccade comparing patient groups (MCI vs. ADD) using the same moderator revealed the following: k = 6, Chi2 = 22.64, df = 5, p < 0.001, Tau2 = 0.32, I2 = 78%. The I2 value indicated high effect size heterogeneity and the presence of additional moderator(s). Between the patient groups, the mean effect size was small (SMD: 0.26, CI: [-0.27, 0.79] and MD: 34.70, CI: [-23.25, 92.65]), suggesting no significant difference in the saccadic latency between patients with ADD and patients with MCI (Fig. C2c in the Online Resource).

For the antisaccade group, the Chi2 and I2 values indicated the presence of heterogeneity and high effect size heterogeneity and therefore the presence of additional moderator(s). In the analysis of the antisaccade paradigm, we used outcome (latency vs. error rate) as a subgroup moderating variable (for latency, k = 6, Chi2 = 24.32, df = 5, p < 0.001, Tau2 = 0.33, I2 = 79%, for error, k = 4, Chi2 = 17.34, df = 3, p < 0.001, Tau2 = 0.47, I2 = 83%)) (Fig. 6B). In the antisaccade studies, the mean overall effect size in latency studies was moderate (SMD: 0.73, CI: [0.19, 1.27] and MD: 66.05, CI: [12.65, 119.45]) whereas in error studies, the mean overall effect size was large (SMD: 0.91, CI: [0. 28, 1.54] and MD: 22.42, CI: [5.00,39.84]), suggesting a significant difference in frequency of errors between patients and healthy age matched controls (Fig. 6B).We did not perform further subgroup analyses due to the small number of studies.

Fig. 6
figure 6

Forest plot of effect sizes and their confidence intervals comparing patients and controls in the overlap condition for (A) prosaccade latency (msec), (B) antisaccade latency (msec), and (C) antisaccade error (%)

Gap Effect

Gap Effect for Controls

The meta-analysis included 12 effect sizes that were derived from the visually guided and antisaccade paradigms for control groups together (Chi2 = 58.79, df = 11, p < 0.001, Tau2 = 0.27, I2 = 81%). The I2 values indicated substantial heterogeneity; therefore, the presence of additional moderators. In control studies, the overall weighted mean effect size was large (SMD: 1.25, CI: [0.91, 1.59] and MD: 85.80, CI: [51.24, 91.44]; Fig. 7A).

Fig. 7
figure 7

Forest plots of effect sizes and their confidence intervals, comparing prosaccade latency (msec) between overlap and gap conditions for (A) controls and (B) patients

Gap Effect for Patients

The first stage of the meta-analysis included 16 effect sizes that were derived from the visually guided and antisaccade paradigms for patient groups together (Chi2 = 83.90, df = 15, p < 0.001, Tau2 = 0.51, I2 = 82%). The I2 values indicated substantial heterogeneity; therefore, the presence of additional moderators. In patient studies, the overall weighted mean effect size was large (SMD: 1.23, CI: [0.83, 1.63] and MD: 82.02, CI: [59.54, 105.50]; Fig. 7: B).

Subgroup analysis using clinical diagnosis as the moderator variable revealed the following: ADD, k = 11, Chi2 = 48.34, df = 10, p < 0.001, Tau2 = 0.49, I2 = 79%; MCI, k = 5, Chi2 =34.70, df = 4, p < 0.001, Tau2 = 0.71, I2 = 88%. For both patient groups, the I2 value indicated high heterogeneity; therefore, the presence of additional moderators. In both ADD and MCI patient studies, the overall weighted mean effect size was large: ADD (SMD: 1.29, CI: [0. 81, 1.76] and MD: 84.12, CI: [56.59, 111.64]), MCI (SMD: 1.12, CI: [0. 33, 1.92]) and MD: 77.9, CI: [31.61, 124.21]; Fig. D1a, b in the Online Resource).

Anti-effect

Anti-effect for Controls

The meta-analysis included 10 effect sizes that were derived from the visually guided and antisaccade paradigms for control groups together (Chi2 = 136.72, df = 9, p < 0.001, Tau2 = 0.77, I2 = 93%). The Chi2 and I2 values indicated high heterogeneity and therefore the presence of potential moderator(s). In control studies, the overall weighted mean effect size was large (SMD: 1. 16, CI: [0. 59, 1.73] and MD: 75.63, CI: [51.71, 99.55]) (Fig. 8A).

Anti-effect for Patients

The first stage of the meta-analysis included 15 effect sizes that were derived from the visually guided and antisaccade paradigms for patient groups together (Chi2 = 46.38, df = 14, p < 0.001, Tau2 = 0.20, I2 = 70%). The I2 values indicated moderate heterogeneity, therefore the mean effect size was regarded as the best approximation for the results. In patient studies, the overall weighted mean effect size was large (SMD: 0.99, CI: [0.71, 1.26] and MD: 89.86, CI: [63.66, 116.06]) (Fig. 8A), suggesting a significant difference in latency between antisaccade and prosaccade paradigms.

Subgroup analysis using clinical diagnosis as a moderator variable indicated the following: ADD, k = 9, Chi2 = 22.57, df = 8, p = 0.004, Tau2 = 0.19, I2 = 65%; MCI, k = 6, Chi2 = 23.05, df = 5, p < 0.001, Tau2 = 0.125, I2 = 78%. For the ADD group, the I2 value moderate heterogeneity; therefore, the mean effect size was regarded as the best approximation for the data. In both ADD patient studies and MCI patient studies, the overall weighted mean effect size was large, ADD: (SMD: 0.90, CI: [0.55, 1.25] and MD: 89.60, CI: [54.08, 125.13]), MCI :( SMD: 1.11, CI: [0.65, 1.57] and MD: 90.63, CI: [48.83, 132.43]) (Fig. E1a, b in the Online Resource). This suggests that there is a significant difference in the antisaccade and prosaccade latency when patient groups are compared independently.

Fig. 8
figure 8

Forest plot of effect sizes and their confidence intervals, comparing latencies (msec) between antisaccade and prosaccade in gap and overlap conditions for (A) controls and (B) patients

Summary

Overall, the results suggest that visually guided and antisaccade paradigms using gap, step and overlap conditions may be used to distinguish patients (MCI and ADD) from controls and MCI from ADD within patient groups when using prosaccade and antisaccade latency and error rate variables (Table 2). In addition, the magnitude of the effect size for both the gap effect and anti-effect is large in patients (MCI and ADD), similar to findings reported in healthy controls.

Table 1 Summary of studies that compared MCI or ADD patients to age-matched controls in saccade paradigms
Table 2 Summary of meta-analysis results for saccadic eye movements

Risk of Bias Across Studies

The funnel plot and Egger’s tests were conducted to evaluate the publication bias of this meta-analysis. The results indicated that publication bias for gap (Z = 2.603, p = 0.009) and overlap (Z = 3.368, p = 0.002) whereas no publication bias was identified, and the pooled results were stable (Z = 0.967, p = 0.334) for step (Fig. 9).

Fig. 9
figure 9

Funnel plot depicting the effect size (x axis) by their standard error (y axis) for (A) gap, (B) step, and (C) overlap

Discussion

In this review, we assessed variations in saccadic EMs between patients (MCI and ADD) and healthy age-matched controls. We conducted a qualitative synthesis and a meta-analysis comparing saccadic performances based on (1) conditions (gap, step, and overlap) between interparticipant groups on the same paradigm, (2) gap effect (gap vs. step/overlap), in controls and patients (MCI and ADD) and (3) anti-effect (latencies in antisaccade vs. prosaccade), in controls and patient groups.

First, we examined saccades in controls and patients (MCI and ADD) together using (1) the gap condition, in which the fixation point is offset approximately 200 msec before the target comes on followed by (2) the step and (3) the overlap condition, in which the fixation point stays on after the target appears. The meta-analysis results showed that regardless of the condition (gap, step, or overlap), saccadic EMs may be used to distinguish control and patient groups (MCI and ADD). This may suggest that cognitive function is related to saccadic EM deficits, as both patient groups performed worse than controls.

Second, we used paradigms (prosaccade and antisaccade paradigms), outcomes (latency and error rate), and diagnosis (MCI and ADD) in the three conditions as moderators to ascertain if they contributed to the observed differences. Overall, in both saccadic paradigms, when we compared controls and patients, visually guided saccade paradigms revealed a moderate effect size, whereas the antisaccade paradigms indicated a large effect size. The larger effect size in antisaccade paradigms may reflect impaired processing and defective higher-order cognitive control processes (such as working memory, decision making and inhibition) in patients during the antisaccades compared to visually guided trials which do not require these additional higher-order processes. The processing impairment may imply that both mechanisms involved in the antisaccade paradigm—inhibition of reflexive misdirected saccades and triggering of intentional correct antisaccades—may be impaired in patients.

Next, we compared controls and patients in the gap, step and overlap conditions based on the outcomes of saccade latencies and error rates. We found longer latencies in patients than in controls in all conditions. Saccadic latency reflects visual processing, target selection, and motor programming and is dependent on stimulus properties, such as luminance and the nature of the cognitive task (Leigh & Kennard, 2004). Therefore, longer latencies in patients may indicate defects in the usage and interpretation of visual information, poor selection of single object from a field of multiple objects as the goal of a movement, and defective transformation of abstract codes into spatially and temporally coordinated patterns of muscle contractions that produce EMs. In addition, a longer saccade may also reflect poor disengagement, shift, and re-engagement of visual attention.

When we compared the latencies in the gap condition to those in step and overlap conditions, we found a gap effect, manifested by a significant reduction in prosaccade latency in the gap condition compared with the overlap and step conditions. The gap paradigm elicited shorter latency saccades than the step and overlap conditions in both patients and controls. The gap latencies are generally shorter than in other conditions because the gap stimuli primarily release the eye fixation mechanism for a change in gaze direction and provide a warning cue when the fixation stimulus is offset. There is a drop in fixation neuronal discharge approximately 100 msec into the gap period and a slow buildup of low-frequency activity among a subset of saccade neurons in both the SC and FEF (Dias & Bruce, 1994; Dorris & Munoz, 1995; Dorris et al., 1997; Everling & Munoz, 2000; D. P. Munoz & Wurtz, 1995). The rostral pole of the midbrain superior colliculus, which projects to omnipause neurons, plays an important role in the release of fixation and warning components. When we compared the gap effect size in patients and controls, we found a seemingly large effect in both groups, although the mean magnitude of the effect was larger in controls (1.25 in controls vs. 1.23 in patients). However, further quantitative analyses to investigate the gap effect significance between controls and patients were not feasible since the variance (SD) in the difference for latency between gap and overlap conditions from individual studies could not be obtained. These findings may suggest differences in the neuronal activity of the fixation neurons and saccade neurons during the gap period, with patients having a slower decline in fixation neuronal activity and/or a slower buildup of saccade neuronal activity. This substantiates previous findings in the literature that compared younger and older adults and found that the absolute size of the gap effect varied between age groups, but the relative decrease in latency remained constant (Pratt et al., 1997). Crawford et al. (2013) found that the size of the gap effect did not differ significantly when older controls were compared to patients with ADD, but it was significantly different in younger controls (Crawford et al., 2013).

When we compared the latencies in the visually guided saccades to the antisaccade tasks (i.e., anti-effect), the meta-analysis results showed a large effect size manifested by significantly longer latencies in antisaccade than in prosaccade tasks in both controls and patients. The longer reaction time in antisaccade reflects additional processing and higher-order cognitive control processes during the antisaccades. However, further quantitative analyses to further investigate the anti-effect significance between controls and patients were not feasible since the variance (SD) of the difference for saccade conditions latency between visually guided and antisaccade paradigms from individual studies could not be obtained.

Furthermore, we found more antisaccade errors in patients than in controls, suggesting that patients are unable to inhibit reflex saccades, possibly due to DLPFC and ACC lesions and insufficient top-down inhibition of saccade neurons in the FEF and SC before the target appearance (Douglas P. Munoz & Everling, 2004).

Finally, in the prosaccade paradigm, gap and overlap conditions may be able to distinguish MCI from ADD using latency as an outcome as we found a medium effect size in the ADD group and a small effect size in the MCI group, with no overall difference between patients with MCI and healthy controls. Similarly, in the antisaccade paradigm with gap condition, the frequency of errors revealed a difference between patients with MCI and patients with ADD when both groups were compared with controls, and we found a large effect size in the ADD group and medium effects in the MCI group, with no overall difference between patients with MCI and controls. Patients with MCI are presumed to have better performance than patients with ADD in tasks related to increased cognitive load, visual attention, disengagement and attention shift as there were not many significant differences when they were compared with controls. When we compared directly the ADD groups with MCI groups, in the gap condition, we found an overall medium effect size in prosaccade latency with statistical significance, and small to medium effect sizes in the antisaccade latency and error rate (with marginal CIs). Similarly, medium effect size (with marginal CI) was observed in the prosaccade latency in the overlap condition. In these direct comparisons between ADD and MCI, we have limited number of studies (4 to 5) and more studies are required to confirm the results.

Limitations and Future Directions

This review has several limitations. First, our results were derived from observational study designs that are prone to several limitations, mainly due to unmeasured confounding factors and other risks of bias. We used the risk of bias assessment as a measure of quantification to limit bias in the final inclusion. Additionally, our primary analysis was based on the differentiation of participants in terms of the saccade task condition. This limited the number of comparisons when the studies selected had a small number of manipulations, such as step conditions.

Given that the analysis focused on horizontal saccades, there is some likelihood that dissimilar evaluations would have arisen if the focus was on vertical saccades. This is because horizontal and vertical saccades are generated by distinct groups of premotor neurons (Leigh & Kennard, 2004; Takahashi & Shinoda, 2018). Additionally, several studies had either controls or patients with MCI and ADD but not both patient groups; therefore, we were unable to carry out subgroup analyses.

In addition, there was a lack of adequate information or discrepancies in the categorization of saccades (such as anticipatory and predictive) by different studies, which may have impacted the saccade parameter results reported. Some studies had a specific criterion of saccades that clearly differentiated the different saccade behaviors, such as anticipatory and express saccades. The range of saccade behaviors, such as memory-guided, predictive, and reflexive saccades, could not be explored in depth (Leigh & Kennard, 2004). Additionally, antisaccade metrics such as error rates, correct antisaccade latency and error latency were not defined in all studies. Therefore, because it was likely that studies described the measures differently due to lack of agreement in definitions of saccadic parameters, it was impractical to determine precise differences between controls and patients.

When extracting data, we relied on data extraction software such as WebPlot digitizer, whose accuracy is dependent on the quality of images (provided) in the manuscript and may therefore be prone to variation from the actual results. In addition, it was not possible to investigate the significant relationships of controls and patients further because the variance (standard deviation) of the difference for the anti-effect, gap effect, and saccade conditions could not be obtained from the studies.

We conducted a broad search of several databases but placed restrictions on the language of the study. Only studies published in English were considered in this review, which is one of the main limitations. It is likely that there are other studies published in other languages that we have missed in this review.

Another potential limitation of this review is the possibility of publication bias. Overall, many studies retrieved and included in the review reported statistical comparisons between controls and patients in the gap and overlap condition that did not reach significance. Generally, the best way to minimize the impact of publication bias in a systematic review is the inclusion of trial registries and unpublished studies or grey literature (Lau et al., 2006; Sterne et al., 2011). Since we included only published articles, there is a high chance that several other completed studies may not have been published due to inconclusive results. Other than publication bias, reasons that may explain the funnel plot asymmetry include poor methodological quality leading to exaggerated effects in smaller studies, true variation across studies, artefactual causes and chance (Sterne et al., 2011).

Since the focus of the study was on MCI caused by any etiology, there is a possibility that dissimilar evaluations would have arisen if the focus had been on MCI due to AD.

Finally, we mostly examined studies that used gap and overlap stimulus paradigms to test saccades. We mainly used latency in the gap and overlap conditions for calculating the mean differences because it was the most common measure in the studies. Other saccade parameters, such as amplitude, gain, and velocity, need to be investigated to determine whether there are significant differences between controls and patients. Future studies should explore step, different ranges of saccade behaviors (such as anticipations, reflexive, express), smooth pursuit, mixed tasks, saccade parameters, such as peak velocity, amplitude, and fixation, and other neurological or psychiatric pathologies that affect saccades. Additionally, visually guided eye movements were shown to be prone to disease, ageing and ethnicity (Polden et al., 2020). Therefore, future research should explore saccade performance based on these variables.

Conclusion

The main goal of the current study was to determine whether different saccade paradigms and conditions could distinguish patients with MCI and patients with ADD from controls and validate the gap effect and anti-effect in patients with MCI and ADD compared to controls. We found that, in general, patients can be distinguished from controls by prosaccade and antisaccade latencies and frequency of antisaccade errors, regardless of the saccade condition. Both prosaccade and antisaccade paradigms differentiated patients from controls. More specifically, antisaccade paradigms were more effective than prosaccade paradigms in distinguishing patients from controls, as shown by a large effect size in antisaccade paradigms and moderate effect in prosaccade paradigms. During prosaccades in the gap and overlap conditions, when patients were compared with controls, patients with ADD had significantly longer latencies than patients with MCI, and these latencies, corresponding to a moderate effect size in ADD and a small effect size in MCI, could be used to differentiate the two groups. Similarly, during antisaccades in the gap condition, when patients were compared with controls, patients with ADD had significantly more errors than patients with MCI, and these errors, corresponding to a large effect size in ADD and a moderate effect size in MCI, could be used to differentiate the two groups. The absolute size of the gap effect varied between participant groups, but the relative decrease in latency remained constant, with both groups showing a large effect size. The anti-effect magnitude was similar in both patients and controls; however, patients with MCI had longer antisaccade latencies than patients with ADD, corresponding to a moderate effect size in ADD and a large effect size in MCI. In conclusion, the results offer compelling evidence supporting the use of gap effect, anti-effect and specific saccade paradigms and conditions to distinguish between MCI and ADD patients as well as between patients and controls.