Introduction

Low back pain (LBP) is the worldwide leading cause of disability and a common presentation to medical services, with an age-standardised point prevalence of 9.4% [1]. LBP is usually benign and self-limiting, but can be the presenting feature of serious spinal pathology such as malignancy, occurring in 1.4–5% of presentations [2, 3]. Choosing Wisely, an initiative established by the American Board of Internal Medicine to avoid unnecessary medical interventions, recommends that spinal imaging should be avoided in patients with no clear indicators of serious pathology and a duration of less than 6 weeks [4]. In addition to the economic cost, inappropriate imaging may lead to patients ascribing their pain to incidental imaging findings, increasing the likelihood of seeking unnecessary interventions [5, 6]. Failing to image when indicated may delay timely management of an underlying serious condition.

Multiple studies have investigated rates of inappropriate LBP imaging; estimates vary markedly ranging from 3.8 to 88.5% [7]. Appropriateness is often judged by red flags: clinical features thought to raise suspicion of serious pathology. However, the nature and number of red flags vary widely between guidelines [8]. Which red flags to include when assessing LBP imaging appropriateness is important, given the wide variation in their predictive value [9, 10]. In addition, studies vary in how they calculate the numerator and denominator to determine imaging appropriateness. Both these issues likely contribute to uncertainty on how much lumbar imaging is inappropriate.

Here we review the criteria studies use to assess appropriateness of LBP imaging in primary care, assess compliance with clinical guidelines, and how proportions of inappropriate imaging are calculated.

Methods

This scoping review built upon the work of a systematic review and meta-analysis by Jenkins et al. that identified studies assessing appropriateness of imaging for LBP [7]. This review included studies identified by Jenkins et al. published in the last 5 years (since 2014), plus studies from a repeat of the original search to identify subsequently published papers.

MEDLINE, EMBASE, and CINAHL were searched from 1st of January 2018 to 20th of February 2019, using the same search terms as Jenkins et al. [7]. Citation lists of included papers were also reviewed. Inclusion criteria were as follows: studies assessing LBP imaging appropriateness; studies in a primary care setting; and studies of adult patients. The following article types were excluded: case reports; case series; reviews; conference abstracts.

One author performed the initial title and abstract screen, identifying studies appropriate for a full-text review. These were combined with studies identified by Jenkins et al. in the last 5 years to give a complete list of eligible studies. Data extraction was performed by two authors using a data collection proforma. Detail was collected on the clinical setting a study was conducted in, the criteria to assess imaging appropriateness and its consistency with relevant national guidelines, how studies attained the information (i.e. chart review or insurance claims data), and the method by which studies calculated the proportion of appropriate imaging.

Results

The electronic search identified 708 papers. A total of 674 were excluded after title and abstract review, leaving 34 for full manuscript review. A further 27 were excluded after full review. In total, seven studies eligible for inclusion were identified from the electronic search. These were combined with the 15 studies identified from the Jenkins et al. review published since 2014, and one further study identified from citation lists review, to give a total of 23 eligible studies. See Fig. 1 for more detail.

Fig. 1
figure 1

Flow diagram of included studies

Table 1 describes the included studies, with detail on the criteria used to assess imaging appropriateness, guidelines followed or adapted by the study, and the source from which studies collected data.

Table 1 Included studies with details on indications to image

Fourteen of the 23 (61%) eligible studies considered prolonged symptom duration as an indication for imaging. Two of the nine studies that did not consider symptom duration included a trial of conservative therapy as an indication. Most studies (19/23, 83%) assessed imaging as inappropriate or appropriate in a binary manner, whereas four used a grading system to give an appropriateness score.

A broad range of red flags were utilised and can be stratified into three groups: (1) clinical features (23 in total); (2) suspicion of pathology (5 in total); and (3) past medical history (14 in total). The total number of red flags considered by each study ranged from 1 to 18. The most frequent clinical feature red flag was neurological impairment, present in 16/23 (70%) of studies. Age was used by seven studies but with inconsistent cut-offs. One study stratified red flags by age, with the combination of red flags required for imaging dependent on age (e.g. age over 60 years + history of trauma + female gender, corticosteroid use, or increased thoracic kyphosis). This was also the only study that utilised clusters of red flags, rather than relying on individual features [11]. The number of studies that endorsed each of the clinical feature red flags is detailed in Fig. 2.

Fig. 2
figure 2

Bar chart displaying relative frequencies of clinical features used as red flags for LBP imaging

Far fewer studies (n = 4) considered clinical suspicion of a serious pathology as a red flag. Past medical history was used widely, but with marked variation. History of malignancy was employed in 19/23 (83%) of studies. Four studies limited to history of malignancy within the last year, with one study also excluding primary skin and prostate cancers. The frequency of clinical suspicion and past medical history red flags are in Table 2.

Table 2 Relative frequencies of suspicion of pathology and past medical history red flags

A total of 10 guidelines were referenced, with 16/23 (70%) assessing imaging appropriateness in line with a guideline. Seven studies combined, amended, or did not reference any guideline [19,20,21,22, 28, 30, 33].

The method of calculating the proportion of imaging that was inappropriate varied between studies. The two most common approaches, used in seven studies each, were:

$$\frac{{{\text{Number}}\,{\text{ of}}\,{\text{ inappropriate}}\,{\text{ imaging}}\,{\text{ requests}}\,{\text{ for}}\,{\text{ LBP}}}}{{{\text{Number}}\,{\text{ of}}\,{\text{ LBP }}\,{\text{imaging}}\,{\text{ requests}}}}$$

and

$$\frac{{{\text{Number }}\,{\text{of}}\,{\text{ inappropriate}}\,{\text{ imaging}}\,{\text{ requests }}\,{\text{for}}\,{\text{ LBP}}}}{{{\text{Number }}\,{\text{of}}\,{\text{ LBP }}\,{\text{patients }}\,{\text{not }}\,{\text{requiring}}\,{\text{ imaging}}}}$$

Two studies calculated the proportion of appropriate imaging decisions, allowing an estimation of when imaging had been inappropriately not performed, as well as performed [29, 30]. Four of the studies calculated the proportion of all patients presenting with LBP who had imaging, in order to compare interventions. Two studies calculated the number of inappropriate LBP imaging requests, as a proportion of all patients presenting with LBP, and one assessed the total number of LBP imaging requests without calculating a proportion.

Discussion

This review highlights that widely varying criteria are employed to assess appropriateness of imaging for LBP. Most studies used red flag features to define imaging as appropriate, but the list of red flags varied substantially between studies. A Cochrane review assessed the predictive value of red flag features for spinal malignancy in patients presenting with LBP [34]. Frequently used red flag features such as age, neurological symptoms, and duration of symptoms had high false-positive rates, with only a previous history of malignancy having moderate predictive value. A further study assessed the performance of red flag features in predicting vertebral fracture, malignancy, infection, or cauda equina syndrome. Combinations of red flags performed well in predicting serious pathology; for example, a history of trauma in an individual older than 70 had a positive predictive value of 20.4. Night pain, pain that awakens a patient from sleep, did not predict any serious pathology [9]. Most studies identified in this review included at least one red flag with limited predictive value for serious pathology, and only one study measured against clusters of red flags.

Only a handful of the studies used clinician suspicion of serious pathology, rather than relying on the presence of individual red flag features. This approach requires clinical acumen and discretion but would provide clarity for clinicians.

Guidelines were inconsistently followed in the included studies, with nearly a third combining amending, or not following guidelines, and studies from similar geographical regions opting for different guidelines. While inter-regional variation in guideline choice is to be expected, variation in guideline choice between studies within a country is less expected, as is authors’ decisions to amend, combine, or not use guidelines at all.

This variation in approach renders comparability problematic: it is likely that a clinical case deemed appropriate for imaging in one study may well have been considered inappropriate in another study. It also undermines the substantial effort and resources put into creating guidelines in the first place.

The use of  varying methods to calculate the proportion of inappropriate imaging impacts on comparability. Seven of the studies assessed the proportion of appropriate imaging by dividing the number of inappropriate requests by the total number of imaging requests. This method when used alone is flawed, as it will not capture instances where imaging has been inappropriately not performed and will overestimate the proportion of inappropriate imaging. The following example explains this further:

In a study, 1000 people presented with LBP. A total of 100 underwent LBP imaging, 10 of which were deemed inappropriate.

If the total number of LBP imaging requests is used as the denominator, this would be construed as 10/100 (10%) of patients presenting with LBP having inappropriate imaging.

If the number of patients presenting with LBP is used as the denominator, one can see that the actual proportion of LBP patients undergoing inappropriate imaging was 10/1000 (1%).

This crucial limitation impacts on comparability between studies and prevents the identification of cases where patients with clinically suspicious LBP are not imaged. The two studies that included inappropriate non-imaging reported that nearly two-thirds of patients with clinically suspicious LBP were not imaged when they should have been [29, 30], suggesting this is a poorly recognised issue.

This scoping review identified studies from a broad electronic search of three databases, building upon the work of a previously published systematic review and meta-analysis [7], giving confidence that all eligible studies have been captured. The granularity of information extracted has enabled an in-depth comparison of how appropriateness of imaging for LBP is assessed, the degree to which studies follow clinical guidelines, and how studies calculate the proportion of inappropriate imaging for the first time.

The included studies had varying clarity when describing how appropriateness of imaging was assessed. If a guideline was cited with no further details, the reference was reviewed with appropriateness criteria extracted. As much detail as possible has been included, with review by a second author to reduce the likelihood that any information was omitted.

This review focuses on LBP imaging in primary care. The findings should not be generalised to secondary or specialist services, as it is likely that the practice in these settings will be substantially different, often with a higher index of suspicion of serious pathology, and greater clinical expertise.

Conclusions

Reducing inappropriate lumbar imaging is a very common Choosing Wisely recommendation but if we cannot agree on how to define and measure appropriateness, we do not know how big a problem there is or if progress is being made in solving the problem. Notably, the Choosing Wisely imaging recommendation does not consider the problem of failing to image when it is indicated.

Given its societal and economic impact, efficient assessment and management of LBP is crucial. To this end, care providers are increasingly embedding clinical decision support in online test ordering systems, but until the evidence base is clear as to which features should indicate imaging, their full benefit will not be realised. Further work and collaboration is urgently needed to identify and employ an internationally recognised methodology for defining and measuring imaging appropriateness for LBP.