Introduction

Parkinson’s disease (PD) is a common neurodegenerative disorder, for which only symptomatic therapies are available. Efforts to develop neuroprotective or preventive treatments will benefit from a reliable biomarker. Ideally, such a biomarker can identify PD in its early stages, differentiate between PD and other neurodegenerative parkinsonian disorders, track disease progression, and quantify treatment effects.

In PD, abnormal accumulation of α-synuclein in neurons impairs synaptic signaling, causing disintegration of specific neural networks [1]. Neuro-imaging with [18F]-fluorodeoxyglucose positron emission tomography (18F-FDG PET) can capture synaptic dysfunction in vivo. The radiotracer 18F-FDG provides an index for the cerebral metabolic rate of glucose, which is strongly associated with neuronal activity and synaptic integrity [2].

Brain regions with altered 18F-FDG uptake in PD have been identified with univariate group comparisons using Statistical Parametric Mapping (SPM) [3,4,5,6,7]. However, because metabolic activity is correlated in functionally interconnected brain regions, analysis of covariance is more suitable to assess whole-brain networks. Multivariate disease-related patterns can be identified with the Scaled Subprofile Model and Principal Component Analysis (SSM PCA). Subsequently, a disease-related pattern can be used to quantify the 18F-FDG PET scans of new subjects [8,9,10]. In this procedure, an individual’s scan is projected onto the pattern, resulting in a subject score. This is a single numeric value which reflects the degree of pattern expression in that individual’s scan.

The PD-related pattern (PDRP) was initially identified by Eidelberg et al. with SSM PCA in 33 healthy controls and 33 PD patients from the USA [11]. This PDRP (PDRPUSA) has served as a reference in many consecutive studies [12]. The PDRPUSA is characterized by relatively increased metabolism of the thalamus, globus pallidus/putamen, cerebellum and pons, and by relative hypometabolism of the occipital, temporal, parietal, and frontal cortices. PDRPUSA subject scores were significantly correlated with motor symptoms and presynaptic dopaminergic deficits in the posterior striatum [13], increased with disease progression [14], and were shown to decrease after effective treatment [15, 16]. PDRPUSA was significantly expressed in patients with idiopathic REM sleep behavior disorder (iRBD), a well-known prodromal disease stage of PD [17], and could discriminate between healthy controls, PD, and patients with multiple system atrophy (MSA) [18, 19].

Because of these properties, PDRPUSA is considered a neuro-imaging biomarker for PD [12]. It is essential that the PDRP is thoroughly validated. In collaboration with Eidelberg et al., PDRPs were identified in independent American, Indian, Chinese, and Slovenian populations [11, 15, 20,21,, 21]. Independently from these authors, the PDRP was recently derived in an Israeli population [22]. These PDRPs were highly similar to the PDRPUSA, although minor deviations in PDRP regional topography were observed in several of these studies, which may be caused by differences in demographics or clinical characteristics of the cohorts.

We previously identified a PDRP in a retrospective cohort of PD patients scanned on dopaminergic medication [23], and subsequently in an independent cohort of prospectively included PD patients who were in the off-state (PDRPNL) [24]. We used code written in-house, and obtained similar results compared with other PDRP studies. Recently, we demonstrated significant expression of the PDRPNL in idiopathic REM sleep behavior disorder (a prodromal stage of PD), PD, and dementia with Lewy bodies [25]. However, the PDRPNL has not been validated in a larger cohort, and correlations with PDRPUSA were not explored.

The aim of the current study was to validate the PDRPNL in several independent cohorts. We were able to test the PDRPNL in 19 controls and 20 PD patients from our own clinic in the Netherlands, in 44 healthy controls and 38 “de-novo” PD patients from Italy, and 19 healthy controls and 49 late-stage PD patients from Spain. In addition, we newly identified a PDRP in Italian and Spanish datasets and performed a cross-validation between these populations. We compared the three PDRPs to the reference pattern (PDRPUSA).

Methods

18F-FDG PET data from the Netherlands

The PDRPNL was previously identified in 18F-FDG PET scans from 17 healthy controls and 19 PD patients (NL1; Table 1) [24]. In these subjects, antiparkinsonian medication was withheld for at least 12 h before PET scanning.

Table 1 Dutch (NL) data

In a previous study, we demonstrated that the PDRPNL was significantly expressed in an independent dataset of 20 PD patients compared with 19 controls (NL2; Table 1) [25]. For the current study, we added scans of 8 patients with the parkinsonian variant of MSA (MSA-P). Patients were diagnosed with probable PD or MSA-P by a movement disorder specialist [26]. 18F-FDG-PET was performed in our clinic as part of routine diagnostic workup. These patients were scanned with the same camera as NL1. However, since the PDRPNL derivation study [24], reconstruction algorithms were updated (Table 1). Antiparkinsonian medication was not routinely withheld in NL2 PD patients.

Table 2 Italian (IT) data

18F-FDG PET data from Italy

The IT dataset consisted of 18F-FDG PET scans from 44 healthy controls and 38 consecutive outpatients with “de-novo,” drug-naïve PD [27] (Table 2). 123I-FP-CIT Single Photon Emission Computed Tomography (DAT SPECT) was abnormal in all Italian PD patients. Disease-related patterns are typically determined on approximately 20 patients and 20 controls. Therefore, 20 controls and 20 patients were randomly selected from the IT dataset for PDRPIT derivation. The remaining 24 controls and 18 patients were used for validation.

18F-FDG PET data from Spain

18F-FDG PET scans from 49 PD patients and 19 controls from Spain (SP) were included from a previous study (Table 3) [28]. Patients in this cohort had long disease durations and were studied in the on state (i.e., antiparkinsonian medication was continued). From this dataset, 19 PD patients were randomly selected for PDRPSP identification. The remaining 30 patients were used for validation.

Table 3 Spanish (SP) data

Identification of PDRPNL, PDRPIT, and PDRPSP

All images were spatially normalized onto an 18F-FDG PET template in Montreal Neurological Institute brain space [29] using SPM12 software (Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK).

Identification of the PDRPNL was described previously [24]. For identification of the PDRPIT and PDRPSP, we applied an automated algorithm written in-house, based on the SSM PCA method of Spetsieris and Eidelberg [10], implemented in MATLAB (version 2017b; MathWorks, Natick, MA). Images were masked to remove out-of-brain voxels, log-transformed, and subject and group means were removed. Principal component analysis (PCA) was applied to the residual profiles in voxel space, and the components explaining the top 50% of the total variance were selected for further analysis. For each subject, a score was calculated on each selected principal component (PC). These scores were entered into a forward stepwise logistic regression analysis. The components that could best discriminate between controls and patients [30] were linearly combined to form the PDRP. In this linear combination, each component was weighted by the coefficient resulting from the logistic regression model. All voxel weights in the PDRP were overlaid on a T1 MRI template in Montreal Neurological Institute (MNI) space for visualization. All voxels in the PDRP are used for subject score calculation.

To investigate which regions in each PDRP were stable, a bootstrap resampling was performed within each derivation set (1000 repetitions) [31]. Voxels that survived a one-sided confidence interval (CI) threshold of 90% (percentile method) after bootstrapping were overlaid on a T1 MRI template. The stable regions in the three PDRPs were visually compared.

Validation of PDRPNL, PDRPIT, and PDRPSP

For validation, subject scores for PDRPNL, PDRPIT, and PDRPSP were calculated in patients and controls from the same population. First, images were log-transformed and the subject mean and group mean (originating from the PDRP identification cohort) were removed, resulting in a residual profile for each subject. The subject score is calculated by projecting the subject residual profile on the pattern. To account for differences in data-acquisition, subject scores were always z-transformed to the subject scores of healthy controls that were scanned on the same camera, with identical reconstruction algorithms. If subject scores in validation PD subjects were significantly higher compared with subject scores in controls, the pattern was considered valid.

In this manner, PDRPNL subject scores were calculated in the derivation cohort (NL1) and in the validation cohort (NL2). However, data acquisition was not identical for NL1 and NL2 data. This resulted in a significant difference in PDRPNL subject scores between the NL1 and NL2 healthy control groups (supplementary Fig 1). To correct for these differences, subject scores in NL1 were z-transformed to NL1 healthy controls, such that NL1 control mean was 0 with a standard deviation of 1. Similarly, subject scores in NL2 were z-transformed to NL2 controls.

Subject scores for PDRPIT were calculated in the IT derivation cohort (controls n = 20; PD n = 20) and the IT validation cohort (controls n = 24; PD n = 18). Because all IT scans were acquired with identical protocols, subject scores could be z-transformed to the IT healthy controls from the derivation sample (n = 20).

Subject scores for the PDRPSP were calculated in the SP derivation cohort (controls n = 19; PD n = 19) and the SP validation cohort (PD n = 30). PDRPSP subject scores were z-transformed to the SP controls from the derivation sample (n = 19). As a second SP healthy control cohort was not available, PDRPSP subject scores in PD patients were compared with the PDRPSP subject scores in the derivation healthy controls.

Cross-validation of PDRPNL, PDRPIT, and PDRPSP

Subsequently, PDRPNL subject scores were determined in the IT and SP datasets, PDRPIT subject scores were determined in the NL and SP datasets, and PDRPSP subject scores were determined in the NL and IT datasets. In addition, subject scores for the PDRPUSA were calculated in each dataset in the same manner. Each subject score was then transformed into a z-score with respect to controls from the same camera, such that control mean was 0 with a standard deviation of 1. To determine the performance of each pattern in discriminating between controls and patients, a receiver operating curve was plotted (for each pattern in each dataset) and the area under the curve (AUC) was obtained.

The similarity of the three PDRPs to each other and to the PDRPUSA was tested in two ways. First, in each dataset, the z-scores for each PDRP were correlated with Pearson’s r correlation coefficient. Second, the voxelwise topographies of the different patterns were compared by using volume-of-interest (VOI) correlations over the whole brain. A set of 30 standardized VOIs were selected from a previous study [21, 32], reflecting key nodes of the reference PDRP. Within each VOI, region weights were calculated for each pattern. Subsequently, region weights between any two of the patterns were correlated using Pearson’s r correlation coefficient.

PDRP expression in MSA-p subjects

Subject scores for each PDRP were calculated in 8 MSA-p patients. Subject scores for each PDRP were z-transformed to corresponding subject scores in NL2 controls.

Principal component 1

PDRPUSA [11], as well as the PDRP determined in Chinese [20] and Slovenian [21] populations, consisted of PC1 in isolation. Combinations of components were not considered. There are several methods to decide which components are disease-related and should be included in the final disease-related pattern [10]. In the current study, this decision was based on a forward stepwise logistic regression model, using the Akaike information criterion (AIC) as model selection criterion [30], in order to combine the least possible number of components to obtain the optimum discrimination between controls and patients. It is possible that the optimal model selects one component. If the PDRPs identified in the current study were not based on PC1 in isolation, we repeated all analyses using PC1 alone for each cohort. In that case, we additionally identified PDRPNL-PC1, PDPIT-PC1, and PDRPSP-PC1, and repeated the cross-validation.

Statistical procedures

Between-group differences in PDRP z-scores were assessed using a Student’s t test. Correlations between PDRP and age, disease duration, and UPDRS were examined with Pearson’s r correlation coefficient. Analyses were performed using SPSS software version 20 (SPSS Inc., Chicago, IL) and considered significant for P < 0.05 (uncorrected).

Results

PDRPNL

The first six principal components explained 50% of the total variance. The PDRPNL was formed by a weighted linear combination of principal components 1 and 2 (variance explained 17% and 10%, respectively; Figs. 1a and 2a). PDRPNLz-scores were significantly different between healthy controls and PD patients in both derivation (NL1) and validation (NL2) cohorts (P < 0.0001; Fig. 3a).

Fig. 1
figure 1

Display of PDRPNL (a), PDRPIT (b), and PDRPSP (c). All voxel values of each PDRP are overlaid on a T1 MRI template. Red indicates positive voxel weights (relative hypermetabolism) and blue indicates negative voxel weights (relative hypometabolism).L=left. Coordinates in the axial (Z) and sagittal (X) planes are in Montreal Neurological Institute (MNI) standard space.

Fig. 2
figure 2

Display of stable voxels of each PDRP, determined after bootstrap resampling (90% confidence interval not straddling zero). Overlay on a T1 MRI template. Positive voxel weights are color-coded red (relative hypermetabolism), and negative voxel weights are color-coded blue (relative hypometabolism). L, left. Coordinates in the axial (Z) and sagittal (X) planes are in Montreal Neurological Institute (MNI) standard space.

Fig. 3
figure 3

Subject scores for each PDRP in their respective derivation and validation cohorts. a PDRPNL was identified in 17 HC and 19 PD (NL1) and validated in 19 HC and 20 PD (NL2). Because reconstruction parameters were different for cohort NL1 and NL2, PDRP subject scores were z-transformed to corresponding healthy controls. b PDRPIT was identified in 20 HC and 20 PD, and validated in 24 HC and 18 PD. All subject scores were z-transformed to the 20 HC from the derivation sample. c PDRPSP was identified in 19 HC and 19 PD, and validated in 30 PD. Additional HC for validation were not available. All subject scores were z-transformed to the 19 HC from the derivation sample. Subject z-scores are compared between groups with a Student’s t test. Bars indicate mean and standard deviation

PDRPIT

The first six principal components explained 51% of the total variance. A weighted linear combination of principal components 1 and 2 (variance explained 19% and 8% respectively) could best discriminate between controls and patients in the logistic regression model, and was termed the PDRPIT (Figs. 1b and 2b). PDRPIT subject scores were significantly different between healthy controls (n = 24) and patients (n = 18) in the validation cohort (P < 0.0001; Fig. 3b).

PDRPSP

The first five principal components explained 51% of the total variance. The PDRPSP was formed by a weighted linear combination of principal components 1, 2, and 3 (variance explained 17%, 14%, and 5%, respectively; Figs. 1c and 2c). PDRPSP was significantly expressed in PD patients from the validation set (P < 0.0001, Fig. 3c).

Cross-validation

Each of the PDRPs (including the PDRPUSA) was significantly expressed in PD patients compared with controls, in each of the datasets (Figs. 4a–c and 5). Corresponding ROC-AUCs are reported in Table 4.

Fig. 4
figure 4

Subject scores for each PDRP in the other cohorts (cross-validation). a PDRPNL subject scores are plotted for the Italian (IT) and Spanish (SP) data. b PDRPIT subject scores are plotted for the two Dutch samples (NL1 and NL2) and in SP data. c PDRPSP subject scores are plotted for NL1, NL2, and IT data. Subject scores are z-transformed to healthy control values from the same camera, and compared between groups with a Student’s t test. Bars indicate mean and standard deviation

Fig. 5
figure 5

Subject z-scores for the reference pattern PDRPUSA [11] in each of the datasets. Subject scores are z-transformed to healthy control values from the same camera, and compared between groups with a Student’s t test. Bars indicate mean and standard deviation

Table 4 Cross-validation of patterns

Correlations to UPDRS and disease duration were inconsistent (Table 5). Within each dataset, z-scores of any two PDRPs were significantly correlated. Subject scores on all three patterns were also significantly correlated to subject scores on PDRPUSA (Table 5). Especially, the PDRPNL showed consistent high correlations to PDRPUSA. In addition, a comparison between spatial topographies of the original PDRPUSA versus the PDRPIT, PDRPNL, and PDRPSP showed significant correlations in region weights (Table 6).

Table 5 Correlations between PDRP subject scores and clinical data
Table 6 Region-weight correlations

PDRP subject scores in MSA-p patients

Subject scores for each PDRP were calculated in MSA patients. Subject z-scores on each PDRP were not significantly different between controls and MSA patients (Fig. 6).

Fig. 6
figure 6

Subject scores for each PDRP in eight cases of MSA-p. Subject scores are z-transformed to NL2 controls and compared between groups with a Student’s t test. Bars indicate mean and standard deviation

Principal component 1

As stated above, PDRPNL and PDRPIT were identified as linear combinations of multiple PCs. All analyses were repeated for PDRPNL-PC1, PDPIT-PC1, and PDRPSP-PC1. The PDRPs that were based on combinations of PCs yielded higher diagnostic accuracy (Table 4) compared with patterns based on PC1 alone (Table 7). However, subject scores on PDPIT-PC1, PDRPNL-PC1, and PDRPSP-PC1 did show much higher correlations to subject scores on PDRPUSA (all r > 0.98, P < 0.0001).

Table 7 Receiver operating curve—AUCs using PC1

Discussion

In this study, we cross-validated the previously published PDRPNL [24], and additionally identified a PDRP in an Italian (PDRPIT) and Spanish (PDRPSP) sample. The three patterns were akin to PDRPUSA, and also to the PDRP described in other populations [20, 21]. Topographical similarity to PDRPUSA was confirmed for each of the three PDRPs by a significant correlation of region weights, and a significant correlation in subject scores. Furthermore, PDRPNL, PDRPIT, and PDRPSP were significantly expressed in PD patients compared with controls in both identification and validation cohorts, but were not significantly expressed in MSA-p patients.

The typical PDRP topography is characterized by relative hypermetabolism in the thalamus, putamen/pallidum, pons, cerebellum, and motor cortex. These changes co-vary with relatively decreased metabolism in the prefrontal, parietal, temporal, and occipital cortices [11, 15, 20, 21, 23, 24]. This topography is thought to reflect changes in cortico-striatopallido-thalamocortical (CSPTC) loops and related pathways in PD [33, 34]. In these circuits, a dopaminergic deficit leads to abnormal basal ganglia output, mediated by hyperactivity of the subthalamic nucleus (STN) and its efferent projections. Several studies support a direct relationship between altered STN output and the PDRP topography [16, 35,37,38,38].

The high degree of similarity in PDRP topography across samples is striking considering differences in demographics, clinical characteristics, scanning methods, and reconstruction algorithms. Especially the PDRPNL was highly similar to the reference pattern (PDRPUSA). These two patterns showed the highest subject score correlation and region weight correlation. Furthermore, the PDRPNL achieved the highest AUC in the other cohorts. Like PDRPUSA, PDRPNL was derived in a cohort of off-state patients with a wide range of disease durations (duration 4.4 ± 3.2 years; range 1.5–11.5 years) and severities.

Deviations from the typical PDRP topography are worth exploring further in relation to clinical characteristics. The PDRPIT is unique in that it is, to our knowledge, the first time the PDRP has been derived in “de-novo,” treatment-naïve PD patients. It is likely that these very early-stage patients have a less severe nigrostriatal dopaminergic deficit compared with the more advanced PD patients in PDRPUSA, PDRPNL, and PDRPSP derivation cohorts. This may be reflected by less severe involvement of the frontal cortex in PDRPIT, as nigrostriatal denervation is known to be positively correlated with hypometabolism in the frontal cortex [13, 39].

The PDRPSP was derived in PD patients who were scanned while being on dopaminergic medication. Levodopa is known to decrease metabolism in the cerebellar vermis, putamen/pallidum, and sensorimotor cortex. Levodopa therapy can reduce PDRP expression, but does not completely correct the underlying network abnormalities [16]. It can be assumed that the effect of dopaminergic therapy on PDRP expression is modest in comparison with the effect of disease progression [40]. Indeed, the typical PDRP topography could still be identified in these on-state patients. However, the PDRPSP did not correlate as well to the other patterns, both in terms of subject scores and region weights. It is not clear whether this is related to the advanced disease stage or the effect of treatment. The PDRPSP was characterized by more stable involvement of the occipital cortex, possibly related to the presence of mild cognitive impairment and visual hallucinations, which often occur in advanced PD [41].

Following from the above, it can be concluded that the typical PDRP topography is highly reproducible. Similar topographies have also been identified in studies comparing 18F-FDG-PET scans of healthy controls and PD patients with SPM [3,4,5,6,7]. Such analyses can be supportive in the visual assessment of an 18F-FDG-PET scan in clinical practice. Several studies have evaluated the diagnostic value of observer-dependent visual reads supported by SPM-based comparisons to healthy controls [3, 4, 42,44,44]. A recent meta-analysis (PD versus “atypical” parkinsonism) estimated a pooled sensitivity of 91.4% and specificity of 94.7% for this semi-quantitative approach [45].

The merit of SSM PCA over mass-univariate approaches lies in its ability to objectively quantify 18F-FDG PET scans of patients using the pre-defined patterns. Pattern expression scores were shown useful in differential diagnosis, tracking disease progression, and estimating treatment effects [46]. Although in the current study PDRP z-scores were significantly higher in PD patients compared with healthy controls, there was a considerable overlap in PDRP z-scores between patients and controls in almost every cohort. This overlap is not unique to the current data, and is also apparent in other studies [12].

Some healthy controls appear to express the PDRP. Since we found significant correlations between PDRP z-scores and age in healthy controls, it could be suggested that ageing and PD share certain metabolic features. Metabolic decreases have been reported in the parietal cortex in normal aging [47, 48]. This may cause some overlap with the PDRP. However, the correlation with age in our study was not consistent across all datasets and patterns (Table 5). Furthermore, expression of an age-related spatial covariance pattern was shown to be independent from PDRP expression [49, 50]. Alternatively, a high PDRP z-score in a healthy subject could signal a prodromal stage of neurodegeneration. For instance, subjects with idiopathic REM sleep behavior disorder (a prodromal stage of PD) were shown to express the PDRP years before onset of clinical parkinsonism [17, 25].

Low PDRP z-scores in PD patients could indicate inaccurate clinical diagnosis. A recent meta-analysis of clinicopathologic studies suggests that the clinical diagnosis of PD by an expert, after an adequate follow-up, has a sensitivity of 81.3% and a specificity of 83.5% [51]. Thus, even under ideal circumstances, the diagnosis is inaccurate in a number of patients.

Overlap in pattern expression scores is not only apparent between controls and PD patients, but also between patients with different parkinsonian disorders. For instance, the PDRP may also be expressed in patients with progressive supranuclear palsy (PSP) [52]. This means that the expression score for a single disease-related pattern is inadequate to differentiate between multiple disorders. However, this does not hamper application in differential diagnosis. Previous studies have shown that an algorithm combining multiple disease-related patterns (including the PDRP) with logistic regression could accurately distinguish between parkinsonian disorders. With this method, Tang et al. achieved accurate categorization of patients (n = 167) with an uncertain diagnosis 3–4 years before a final clinical diagnosis was made by an expert clinician masked to the imaging findings [18]. Highly similar results were obtained in an independent sample (n = 129) [19].

In this study, we compared data from different centers. It is well-known that variations in PET scanners and image reconstruction algorithms influence disease-related pattern scores [53,55,55] (supplementary Fig 1). In support of this, we recently identified clear center-specific features in the current data using machine-learning algorithms [56]. Therefore, PDRP subject scores cannot be compared readily between subjects from different centers. In all PDRP studies, this is solved with a z-transformation using the mean and standard deviation of a small healthy control group. This potentially introduces a bias, depending on which controls are selected. However, this issue is not relevant for within subject studies. Therefore, PDRP subject scores may be especially useful in tracking disease progression [14], or treatment effects [16, 35,37,38,38].

This study is methodologically different from previous PDRP studies. The PDRPs identified in this study were formed by a combination of principal components (PCs). These combinations were determined based on a forward stepwise logistic regression model [30]. There are different methods to decide which components are included in the disease-related pattern [10]. Previous studies have always identified the PDRP as PC1 in isolation [11, 20, 21]. The process of component selection is not always described in detail. Automatically choosing PC1 as the disease-related pattern, and disregarding consecutive, smaller PCs, increases the risk information loss. On the other hand, a pattern that combines multiple PCs may give a better fit of the initial sample, but may be limited in its relevance or generality across new datasets. Therefore, we re-evaluated the data and included only PC1 for PDRPIT, PDRPNL, and PDRPSP. Indeed, the PC1 patterns correlated better to the reference pattern (PDRPUSA). However, the patterns that included multiple PCs yielded higher diagnostic accuracy . Apart from component selection, several other decisions and cutoffs may influence pattern identification [10]. More advanced machine-learning algorithms may be of use in determining optimal patterns without the use of arbitrary thresholds and associated loss of potentially useful information [55,57,58,58].

There is increasing interest to apply the PDRP in clinical practice and in therapeutic trials [12]. However, rigorous validation by independent research groups is necessary before widespread application. The current study has contributed to the finding that the PDRP is a universal feature of PD, and it is striking that such similar patterns could be identified in a limited number of 18F-FDG PET scans from three populations, despite overt clinical and methodological heterogeneity. However, our results also show considerable overlap in PDRP subject scores between control and PD groups. Further study is needed to overcome this issue, perhaps by addressing potential center-specific effects in the data or by employing more advanced machine-learning algorithms.