Introduction

Lung cancer is the leading cause of cancer-related death worldwide, with approximately 40% of patients having distant metastases at the time of initial diagnosis [1, 2]. Because appropriate staging is crucial for decisive treatment [3], an accurate and cost-effective method for lung cancer staging must be established.

Hybrid imaging using positron emission tomography and computed tomography with fluorine-18-fluorodeoxyglucose (18F-FDG PET/CT) tracer is a powerful tool for initial staging and restaging of lung cancer, as it combines metabolic and anatomic data [3]. Furthermore, 18F-FDG PET/CT can provide surgical and radiotherapy guidance and help predict tumor response to treatment [3]. Previous investigations show that hybrid 18F-FDG PET/CT is more effective than computed tomography (CT) or positron emission tomography alone for tumor, node, metastasis staging [4]. However, inherent limitations include high cost, lack of necessary infrastructure in many centers, and high rates of false positives in areas with endemic granulomatous diseases [3].

Whole-body magnetic resonance imaging (WB-MRI) is a non-invasive and radiation-free imaging tool for cancer staging and metastasis detection [5]. Inclusion of diffusion-weighted imaging sequences with WB-MRI (WB-DWI) can further improve diagnostic accuracy [6, 7]. A recent meta-analysis compared the diagnostic performance of 18F-FDG PET/CT and DWI in differentiating malignant and benign pulmonary nodules and masses [8]. Overall, DWI appeared to be equivalent or superior to 18F-FDG PET/CT in classifying malignant lung nodules [8]. Taylor et al recently reported that WB-MRI staging has similar accuracy to current standard methods, reducing staging costs and time [9]. However, there have been no meta-analyses comparing these imaging modalities with respect to diagnostic performance in lung cancer staging and detection of distant metastases. The aim of this systematic review and meta-analysis was to compare the diagnostic performance of 18F-FDG PET/CT, WB-MRI, and WB-DWI in the detection of extrathoracic lung cancer metastases.

Materials and methods

Literature search

This study was performed using the Enhancing the Quality and Transparency of Health Research (EQUATOR) Reporting Guidelines with the Preferred Reporting Items for Systematic Reviews (PRISMA). We gathered all accessible literature available through PubMed (U.S. National Library of Medicine), Embase (Elsevier), and the Cochrane Library (John Wiley & Sons) electronic databases up to June 2019. The search algorithm was based on a combination of the equivalent terms listed in Supplementary File 1.

Inclusion and exclusion criteria

To be included, studies had to meet several criteria: (i) performance evaluation of 18F-FDG PET/CT or WB-MRI in M staging of lung cancer; (ii) use of histopathologic analysis or imaging follow-up as the reference standard; and (iii) inclusion of clearly stated values for true positive (TN), false positive (FP), false negative (FN), and true negative (TN). Studies were excluded if they (i) focused on prognosis or therapeutic response instead of M staging; (ii) had a sample of fewer than 10 patients; (iii) were published as a conference abstract, letter, review, animal experiment, comment, or case report; (iv) were not published in English; (v) used 18F-FDG PET that was not hybridized with CT; (vi) used radiotracers other than 18F-FDG; or (vii) used non–whole body MRI. Three researchers reviewed the titles and abstracts of retrieved articles and applied inclusion and exclusion criteria. The full texts of qualifying articles were retrieved and reviewed to confirm study eligibility. The PRISMA flowchart for the selection process is presented in Fig. 1.

Fig. 1
figure 1

Study flow diagram of literature search of eligible studies

Assessment of methodologic quality

Studies that met eligibility criteria were examined by 2 reviewers following the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [10]. This quality control instrument consists of 4 parts: patient selection, index testing, reference standard, and flow and timing. The final criterion is based on the risk of bias with respect to concerns about applicability. Rating risks of bias was determined as high, low, or unclear. Only studies with a low risk of bias were included in the present study. Disagreements between reviewers were resolved by consensus.

Data extraction

Literature accepted for analysis was reviewed by 2 analysts using the PRISMA guidelines [11]. Information collected from studies included first author, year of publication, study design, country of patient recruitment, patient enrollment, technical specifications, reference standard, and blinding. Details regarding staging, and numbers of TP, TN, FP, and FN were also gathered from each article.

Statistical analysis

Studies were only included if they included both WB-MRI with or without DWI and 18F-FDG PET/CT in the same sample group so as to minimize methodologic and clinical inter-study heterogeneity. Pooled sensitivities and specificities with 95% confidence intervals (95% CIs) were calculated using the bivariate random effect analysis model of Reitsma et al [12]. Pooled estimates of positive and negative likelihood ratios (PLRs and NLRs) and diagnostic odds ratios (DORs) were also obtained. Direct comparison of the DORs of MRI and PET/CT were performed using a Z test, and a two-tailed p value of less than 0.05 was considered significant [13,14,15]. Given that the DOR does not follow a Gaussian distribution, we transformed the natural logarithm of DOR for the purpose of this analysis to assume an approximately normal distribution [15]. Summary receiver operating characteristic curves were constructed, and areas under the curve were obtained. Heterogeneity among studies was assessed using the chi-square statistic for the pooled estimates (p < 0.05 indicated significant heterogeneity). We further calculated the heterogeneity of the pooled estimate of DORs to test if heterogeneity was due to the threshold effect. The variation across studies caused by heterogeneity rather than by chance was estimated by calculating the I2 values. Deeks’ funnel plot was intended to assess for publication bias, as indicated by an asymmetric appearance [16]. Analyses were conducted using Stata version 12.0 (StataCorp LP).

Results

Literature search

The initial literature search resulted in 2700 articles, of which 61 were reviewed and 4 were considered eligible. Ohno et al [17] and Chen et al [18] compared 18F-FDG PET/CT with WB-DWI. Ohno and colleagues [14] also performed comparisons with WB-MRI, as did Ohno et al and Yi et al [19, 20]. Because of our focus on M staging and WB-MRI with or without DWI, the other methods analyzed by these authors will not be referenced, and diagnostic capability will include brain imaging.

Summary findings of the eligible studies are shown in Table 1. The four studies included a total of 553 patients, of those at least 87 had M-stage lung cancer. Chen et al did not report the exact number of patients with metastases, and they were the only authors that used the total number of metastases as the reference standard to calculate diagnostic accuracy (instead of the number of patients with M-stage NSCLC as the others) [18]. Technical characteristics of the eligible studies (equipment, sequences, diagnostic parameters) are described in Supplementary Table 1. All WB-MRI studies were performed with the use of contrast media [17, 19, 20]. Both WB-DWI studies used b values of 0 and 1000 s/mm2 [17, 18]. Ohno et al’s work [17] was the only study that used a 5-point visual scale (positive if visual scale ≥ 4) as a threshold for probability of malignancy rather than consensus of two radiologists. All studies enrolled Asian patients. A summary of lesion descriptions is shown in Supplementary Table 2.

Table 1 Characteristics of the studies included in the meta-analysis

Methodological quality

Participant selection was considered at low risk of bias in all studies. Regarding the reference standard, most studies were judged as low risk of bias because they used histopathologic analysis and a follow-up of more than 12 months [17] or more than 6 months [18,19,20]. Studies could not be evaluated with respect to risk of bias for flow and timing, as the time intervals between the index tests and reference standard tests were not reported. The results of the QUADAS-2 assessment are presented in Supplementary Figs. 1 and 2.

Heterogeneity between studies and publication bias

All four studies included in this meta-analysis exhibited significant heterogeneity (p < 0.01) with respect to sensitivity and specificity for 18F-FDG PET/CT, WB-MRI, and WB-DWI. For WB-DWI, the specificity p value was less than 0.02. Specificity heterogeneity showed strong variability for 18F-FDG PET/CT and WB-MRI (I2 of 93.9% and 90.9%, respectively) and moderate variability for WB-DWI (70%). When measured for the pooled estimate of DORs, heterogeneity was not statistically significant for PET/CT (I2 = 37.9%, p = 0.185), WB-MRI (I2 = 57.0%, p = 0.098), or WB-DWI (I2 = 0%, p = 0.317).

Diagnostic accuracy of 18F-FDG PET/CT, WB-MRI, and WB-DWI

Pooled results are shown in Fig. 2. 18F-FDG PET/CT had a pooled sensitivity of 83% (95% CI, 0.54–0.95) and specificity of 93% (95% CI, 0.87–0.96). WB-MRI had a pooled sensitivity of 92% (95% CI, 0.18–1.00) and specificity of 92% (95% CI, 0.85–0.95), whereas WB-DWI had a pooled sensitivity of 78% (95% CI, 0.46–0.93) and specificity of 0.91 (95% CI, 0.79–0.96). The likelihood ratio syntheses resulted in an overall PLR of 8.7 (95% CI, 2.9–25.6) and NLR of 0.24 (95% CI, 0.08–0.77) for WB-DWI and a PLR of 10.8 (95% CI, 6.4–18.4) and NLR of 0.09 (95% CI, 0.00–3.25) for WB-MRI. For 18F-FDG PET/CT, the overall PLR was 11.7 (95% CI, 6.6–20.9), and the NLR was 0.19 (95% CI, 0.06–0.58). The DOR was 62 (95% CI, 18–212) for WB-DWI, 117 (3–4480) for WB-MRI, and 62 (95% CI, 18–212) for 18F-FDG PET/CT (Table 2). Direct comparison of the DORs of MRI to PET/CT revealed no statistical significance between imaging modalities (p = 0.186 for WB-DWI; p = 0.638 for WB-MRI). Using a fitted summary receiver operating characteristic curve, the overall areas under the curve for WB-DWI, WB-MRI, and 18F-FDG PET/CT were 0.93 (95% CI, 0.90–0.95), 0.93 (95% CI, 0.91–0.95), and 0.95 (95% CI, 0.93–0.96), respectively (Fig. 3).

Fig. 2
figure 2

Forest plot of the 18F-FDG PET/CT (a), WB-MRI (b), and WB-DWI (c) studies. The Q statistic and I2 are measurements of heterogeneity. Heterogeneity was strong and significant for all pooled analyses, except for specificity for WB-DWI

Table 2 Summary of test performance characteristics based on diagnostic capability on assessment of metastases of integrated 18F-FDG PET/CT, WB-DWI, and WB-MRI without DWI included brain analysis
Fig. 3
figure 3

Summary ROC (SROC) curve of the 18F-FDG PET/CT (a), WB-MRI (b), and WB-DWI (c) studies

Discussion

MRI has a high potential to be a single-test imaging modality for evaluation of NSCLC patients because of its comparable accuracy to PET/CT, reduced examination time, and lack of ionizing radiation [9, 21, 22]. Also, MRI is more cost-effective for reduction of health care costs as it usually costs half the price of having a PET/CT study [23]. Previous meta-analyses have demonstrated that both modalities (MRI and 18F-FDG PET/CT) have a good diagnostic performance in evaluating pulmonary lesions, lymph nodes in non–small cell lung cancer (NSCLC), and detection of primary and metastatic malignancies [7,8,9, 24, 25]. Our results summarize those few studies that have focused on global M staging in NSCLC using both imaging techniques. Only four studies fulfilled our study criteria [17,18,19,20], and two of those used DWI [17, 18].

The heterogeneity found in the initial analysis (sensitivity, specificity) was most likely due to the threshold effect, as it did not remain significant in the analysis of the pooled estimates of DORs. Other possible explanations for heterogeneity included different sample sizes, the impact of per patient analysis instead of per lesion analysis, varying composition of organ and tumor histopathology, and endemic zones of granulomatous disease in China and Korea [26,27,28]. Our analysis showed that the studied techniques had similar probabilities of ruling out malignancy (NLRs) or positive results among those with disease (PLRs). For a test to be highly useful, it should have an NLR less than 0.1 and a PLR greater than 10. Thus, WB-MRI would be more highly indicated (NLR, 10.8; PLR, 0.09), and WB-DWI should not be used alone (NLR, 8.7; PLR, 0.24) [7]. The DOR, which measures discriminative power, did not differ between diagnostic tests.

Because there was no differences in overall diagnostic performance (i.e., the DORs) between WB-MRI and 18F-FDG PET/CT, WB-MRI appears to be a suitable, accurate alternative to 18F-FDG PET/CT. The use of DWI may provide supplemental information for decision-making in M staging of NSCLC [17]. Ohno et al used 4 modalities (WB-DWI only, WB-MRI without DWI, WB-MRI plus DWI, and 18F-FDG PET/CT) to evaluate lesions. They found that, when brain metastasis assessment was included, specificity and accuracy were lower with WB-DWI alone (87.7% and 81.8%) than with WB-MRI plus DWI (92% and 87.7%), WB-MRI without DWI (92% and 85.7%), and 18F-FDG PET/CT (94.5% and 88.2%) [17]. However, Chen et al [18] demonstrated no difference in the diagnostic performance of WB-DWI and 18F-FDG PET/CT for the detection of metastases. Yi et al [20] found no difference in detection ability for brain and hepatic metastases between WB-MRI and 18F-FDG PET/CT. In comparing the diagnostic accuracy of extrathoracic metastases, Ohno et al [19] found WB-MRI to be superior to 18F-FDG PET/CT (98.6% vs. 90.7%, p < 0.05) but not to FDG-PET/MRI.

The incidence of brain metastases in patients with NSCLC ranges from 21 to 54% and increases as overall survival increases [29]. Because 18F-FDG PET/CT provides limited information (inferior soft tissue contrast and high physiologic background activity), brain MRI is the preferred and recommended imaging modality for patients with suspected NSCLC [30]. Ohno et al [17] assessed the actual utility of WB-MRI compared with 18F-FDG PET/CT. This study demonstrated that the diagnostic capability for M staging, excluding brain metastasis evaluation, was inferior for WB-MRI without DWI compared with 18F-FDG PET/CT. Lee et al [31] showed that 18F-FDG PET/CT plus contrast-enhanced brain MRI without DWI had a higher sensitivity than 18F-FDG PET/CT alone (88% vs. 24%; p < 0.001) to detect brain metastases in patients with lung adenocarcinoma.

Bone is the site of 30 to 40% of lung cancer metastases [32], and bone metastasis prevalence was within this range in all 4 of the studies included in this meta-analysis. Ohno et al [17] and Chen et al [18] found that 18F-FDG PET/CT and WB-MRI had a similar performance in detecting bone metastases. Yi et al [20] did not perform DWI but did obtain additional T1-weighted turbo spin-echo images that showed bones to be most frequent site of metastasis (8%). Takenaka et al [33] showed that WB-MRI with or without DWI is more specific and accurate in detecting bone metastases compared with WB-DWI alone, 18F-FDG PET/CT, or bone scintigraphy on a per-site basis in patients with NSCLC. They also concluded that adoption of DWI as an adjunct for WB-MRI could improve the diagnostic accuracy.

Our study has some limitations. The number of articles and patients examined was smaller than anticipated. In diagnostic imaging studies, small sample sizes and heterogeneous methods of primary studies can limit the quality of the meta-analysis [34]. We were not able to test for publication bias given the small number of studies included in the meta-analysis. Other limitations include those inherent to this study design, such as selection and publication bias, limited information from reports, and potential for ecological fallacy. Description of characteristics of metastatic lesions (e.g., size, ADC, and SUV value) was not available in most studies. Last, exclusion of non-English studies may have increased the probability of publication bias.

Conclusions

This meta-analysis of four separate studies found WB-MRI and WB-DWI to show a similar diagnostic performance to 18F-FDG PET/CT in M-staging of NSCLC. These MRI techniques are of lower cost and less time-consuming than PET/CT and are ionizing-radiation-free. Further high-quality studies comparing the diagnostic performance of these imaging modalities and various optimized MRI protocols are needed to determine if MRI should supplant the current standard approach to M staging.