Introduction

Chronic obstructive pulmonary disease (COPD) is one of the most common long-term respiratory conditions with rising burden and mortality worldwide.1,2,3 It is characterised by increasing breathlessness and decline in lung function, punctuated by episodes of acute exacerbations that often lead to hospital admission and result in poor prognosis and gradual deterioration of quality of life.4 Annual healthcare and societal costs of COPD in Europe are estimated to be €48.4 billion.5 Despite the high burden of disease, the large majority of patients with COPD remain undiagnosed6 while experiencing significant morbidity,7 resulting in calls to improve early diagnosis.8,9 Early diagnosis could focus smoking cessation support and allow prescription of treatments that have been shown to reduce risk of exacerbation in those with COPD, thus has the potential to slow disease progression.

Screening programmes are not yet recommended, partly because of lack of evidence of the long-term benefits,10,11 a view which is upheld in the most recent UK National Screening Committee report.12 However, there are also uncertainties around the performance of available screening tests, including symptom or risk assessment questionnaires and lung function-based measures, alone or in combination.12,13 A recent study compared different screening strategies among current smokers, against post-bronchodilator spirometry. This concluded that microspirometry or peak flow meters had the best performance, but interpretation was limited by a small sample size and low-quality spirometry data.14 Microspirometers are small relatively inexpensive handheld devices that measure forced expiratory volume in 1 s (FEV1) and in 6 s (FEV6). While this is not a substitute for confirmatory spirometry, which is more time consuming and measures FEV1 and forced vital capacity (FVC), usually after bronchodilation, the FEV1/FEV6 ratio could be used as a pragmatic initial screening test to identify patients requiring confirmatory spirometry. Microspirometry can be undertaken in office settings and requires less time and patient effort.15,16,17,18

Over the past decade, several studies have explored the accuracy of microspirometers in detecting airflow obstruction.14,19,20,21,22,23,24,25,26,27 However, none of the studies considered the use of microspirometers as the second stage of a screening pathway. Microspirometry as a screening tool is usually performed without bronchodilation, as this contributes to time savings and avoids the need for Salbutamol. However, it remains uncertain how microspirometry performance differs when conducted pre- and post-bronchodilator. Finally, there is little consensus regarding the optimal FEV1/FEV6 cut-point for referral to confirmatory spirometry, with recent studies suggesting ratios of <0.73,22 <0.7521 and <0.78.19

To address the current evidence gaps, we conducted a study in primary care patients with existing respiratory symptoms, including those pre-screened in our linked trial. We aimed to assess the test performance of a microspirometer (Vitalograph Lung monitor) against confirmatory post-bronchodilator spirometry (ndd Easy on-PC) and explore the effect of using pre- or post-bronchodilator microspirometer data, the impact of using different airflow obstruction criteria and optimal cut-points.

Results

Follow-up assessments were booked for 1633 participants. Out of the 1500 participants who attended the assessment, 551 took part in the case–control study. Lung monitor and spirometry test data were available for a total of 544 participants (Fig. 1).

Fig. 1
figure 1

Flow of the participants.

Of those, 349 (64.2%) were male, the mean age was 69.6 (9.1) years, 517 (96.3%) were White British, 382 (74.3%) were overweight/obese and 472 (88.1%) had a positive smoking history. Similar proportions of participants reported Medical Research Council (MRC) Dyspnoea scores of 1–2 and 3–5, one fifth (19.7%) had CAT scores representing high impact on daily life and over half the sample (57.3%) were retired (Table 1).

Table 1 Description of analysis sample, stratified by cases and controls.

A total of 337 (62.0%) participants had airflow obstruction according to the reference test, with over three quarters (n = 264, 78.3%) representing GOLD stage I and II i.e. mild/moderate COPD (Table 1). In comparison with controls, the cases were slightly older (70.4 vs 68.2 years), more likely to be male (68.3% vs 57.5%), a higher proportion had a positive smoking history (90.4% vs 84.3%) and MRC Dyspnoea scores of 3–5 (54.5% vs 40.1%) (Table 1).

Nearly half of the participants (45.5%) reported exacerbations in the past 12 months and 37 (6.8%) reported a respiratory hospitalisation in the past 2 years (Table 1). Cases reported approximately twice as many exacerbations as controls (54.2% vs 31.4%) in the past 12 months.

Screening accuracy of the pre-bronchodilator lung monitor

Using lower limit of normal (LLN; i.e. the lower 5th percentile) as the cut-off for a positive result, the pre-bronchodilator lung monitor had sensitivity of 50.5% (95% confidence interval (CI) 45.0, 55.9) and specificity of 99.0% (95% CI 96.6, 99.9) (Table 2). The positive predictive value of the lung monitor was estimated to be 76.9% (95% CI 55.7, 89.8) for a population prevalence of 6%, dropping to 61.8% (95% CI 37.8, 81.1) for a population prevalence of 3% and rising to 85.3% (95% CI 68.6, 93.9) for a population prevalence of 10%.

Table 2 Pre-BD lung monitor (FEV1/FEV6 < LLN) against post-BD confirmatory spirometry (FEV1/FVC < LLN).

FEV1 measurements from both devices were highly correlated (r = 0.97; p < 0.001), with the Bland–Altman plot demonstrating good agreement (Supplementary Fig. 1). Comparison of FEV6 from both devices again revealed high correlation (r = 0.95; p < 0.001), though agreement was lower, indicating that the lung monitor systematically underestimated FEV6 values by −0.37 litres (Supplementary Fig. 2).

The lung monitor had high discriminatory accuracy (C = 0.90; 95% CI 0.88, 0.93) between cases and controls according to confirmatory spirometry FEV1/FVC < LLN (Supplementary Fig. 3).

Screening accuracy of the post-bronchodilator lung monitor

Using post-bronchodilator data for the lung monitor, the sensitivity was 46.6% (95% CI 41.2%, 52.1%) and specificity was 97.1% (95% CI 93.8%, 98.9%) for detecting airflow obstruction (Table 3). The positive predictive value of the lung monitor was estimated to be 50.6% (95% CI 42.6, 58.6) for a population prevalence of 6%.

Table 3 Post-BD lung monitor (FEV1/FEV6 < LLN) against post-BD confirmatory spirometry (FEV1/FVC < LLN).

Comparison of pre- and post-bronchodilator lung monitor test accuracy revealed a borderline significant difference favouring pre-bronchodilator of 3.9% in sensitivity (7.8%, −0.05%); p = 0.04, but no statistically significant evidence of a difference in specificity (1.9% (−0.9%, 4.7%)); p = 0.10.

Lung monitor and confirmatory spirometry tests were also highly correlated for post-bronchodilator FEV1 (r = 0.97; p < 0.001), with the Bland–Altman plot again demonstrating good agreement (Supplementary Fig. 4). Comparison of post-bronchodilator FEV6 again revealed high correlation (r = 0.97; p < 0.001), though agreement was lower, indicating that the lung monitor systematically underestimated FEV6 values by −0.28 litres (Supplementary Fig. 5).

Discriminatory accuracy of the post-bronchodilator lung monitor was identical to that based on pre-bronchodilator data (C = 0.90; 95% CI 0.87, 0.92; Supplementary Fig. 6).

Sensitivity analyses: optimal cut-points for lung monitor FEV1/FEV6 ratio, relative to confirmatory spirometry FEV1/FVC < LLN

In light of comparable test accuracy of the lung monitor based on pre- and post-bronchodilator data, we explored optimal FEV1/FEV6 cut-points using pre-bronchodilator tests (Table 4).

Table 4 Screening accuracy of pre-bronchodilator lung monitor FEV1/FEV6 cut-points, against post-BD confirmatory spirometry (FEV1/FVC < LLN).

Using an FEV1/FEV6 cut-point of <0.7 to define a positive test for the lung monitor, sensitivity increased to 58.5% (95% CI 53.0, 63.8), specificity was 98.6% (95% CI 95.8, 99.7) and the positive predictive value increased to 72.0% (95% CI 57.4, 83.1) for a population prevalence of 6% (Table 4). However, using a fixed ratio had little effect on the discriminatory accuracy of the lung monitor (C = 0.91; 95% CI 0.89, 0.94; Supplementary Fig. 7).

In our sample, an FEV1/FEV6 cut-point of <0.78 had the best overall test performance with sensitivity of 82.8% (95% CI 78.3%, 86.7) and specificity of 85.0% (95% CI 79.4%, 89.6%). Using this cut-point would result in the lung monitor only missing 17.2% of true positives and correctly identifying the majority of patients without the disease. Furthermore, this cut-point would result in 57% of those screened requiring confirmatory spirometry. The positive predictive value for a population COPD prevalence of 6% was estimated to be 26.1% (95% CI 25.0, 27.2) meaning that around one in four patients referred for confirmatory spirometry would result in a diagnosis.

The above pattern was broadly similar when analyses were repeated using the fixed ratio to define obstruction for confirmatory spirometry (FEV1/FVC < 0.7), though sensitivity was slightly lower at each cut-point and specificity remained at 100% until FEV1/FEV6 > 0.7 (Supplementary Table 1). These analyses may reflect the test performance when using the simpler criterion for the lung monitor in countries defining airflow obstruction as FEV1/FVC < 0.7, such as the UK.28

Discussion

We found that the lung monitor has high discriminatory accuracy among patients with existing chronic respiratory symptoms. This supports its suitability, either alone or perhaps in combination with a symptom questionnaire, as a screening test prior to confirmatory spirometry. We further demonstrated that using a bronchodilator with the lung monitor as part of screening offers no performance advantage.

Importantly, the lung monitor demonstrated good test performance despite being delivered with minimal coaching and only requiring a maximum of three blows, rather than the possible six blows to achieve repeatability with confirmatory spirometry.

Using pre-bronchodilator FEV1/FEV6 < LLN, the lung monitor missed half of COPD cases identified by FEV1/FVC < LLN from confirmatory spirometry but detected virtually all non-COPD cases correctly. When using pre-bronchodilator FEV1/FEV6 < 0.70, the lung monitor detected a higher proportion of true positives, the same proportion of true negatives and the discriminatory accuracy remained constant (C = 0.90 vs C = 0.91). Given the added complexity of applying LLN to the lung monitor as it is not connected to computer software, it appears justifiable to apply an FEV1/FEV6 fixed ratio to the lung monitor for purposes of screening, while maintaining the LLN for diagnosing and monitoring COPD.29,30,31,32

Test performance varied considerably depending on the specified cut-point of the pre-bronchodilator FEV1/FEV6 ratio. Our proposed optimal cut-point of <0.78 was similar to previous studies, which had suggested using cut-points of <0.75,21 <0.7819 and <0.80.20 The sensitivity and specificity of the lung monitor in our sample was acceptable for a screening test, missing <20% of COPD cases, while 1 in 4 patients of the 57% referred for confirmatory spirometry were true positives and therefore would be eligible for diagnosis and relevant treatment. While FEV1/FEV6 < 0.78 appeared the most efficient in our sample, if the lung monitor were to be used as a screening test the cut-point could be modified according to the balance of acceptable false negative rates and availability of resources.

We have assessed the screening test performance of one type of microspirometer. One factor affecting accuracy may be the different lung function indices being measured: FEV6 by the lung monitor vs FVC by the ndd device (confirmatory spirometry). We assessed test performance of both devices using FEV1/FEV6 < LLN as the cut-off for a positive result, relative to confirmatory spirometry FEV1/FVC < LLN. The ndd device had sensitivity of 80.4% and specificity of 98.1%, compared with the lung monitor sensitivity of 50.5% and specificity of 99.0%. This suggests that the difference in indices only partly affects performance. Another important difference to consider is the type of sensor used in the two devices for flow/volume measurement (turbine in the lung monitor vs ultrasonic in the ndd), as evidence suggests a degree of inaccuracy in turbine devices.33,34

Our analysis sample had fewer controls than determined by our sample size calculation, containing 207 instead of 248. While the precision around specificity was reduced, the precision around sensitivity estimates was unaffected; the latter being arguably more important in the context of screening.

Using the LLN criteria to define cases in our primary analysis ensured an accurate assessment of lung function, without added ‘noise’ from misdiagnosed patients which can be introduced when using the FEV1/FVC < 0.7 ratio.32 As the majority of previous microspirometry test accuracy studies used the fixed ratio definition of obstruction,19,20,21,24,25,27,35 our study has made a valuable contribution to the evidence base.

Owing to the case–control study being nested within a larger COPD cohort study, the analysis sample had a higher prevalence of COPD and possibly more advanced disease than would be observed in an undiagnosed primary care population reporting respiratory symptoms. Therefore, our study is at potential risk of spectrum bias, as the reported sensitivities and specificities may not fully reflect the test performance of the lung monitor if used as a screening tool within symptomatic patients with lower prevalence of COPD. However, by using Bayes’ Theorem the reported post-test estimates were based on current UK COPD prevalence of 3–10%, mitigating against this risk.

Nearly a third of our sample was a screened population, suggesting that our findings will resonate with potential screening processes, as patients could be selected for microspirometry on the basis of symptom- or risk-based screening tests. Furthermore, the fact that we included patients with chronic respiratory symptoms and a range of lung function severities means that our results may apply to an undiagnosed population with a similar symptom profile. In addition, our study was not restricted to ever-smokers, unlike previous studies.14,19,21,22,25

For practical reasons, the same researcher administered both the lung monitor and confirmatory spirometry. Although researchers only recorded raw FEV1 and FEV6 lung monitor values and did not calculate obstruction from this first test, it is possible that researchers were not entirely blind when administering the confirmatory spirometry to the patient. While this introduced a risk of review bias, this was minimised as researchers received standardised training to give only brief instruction for lung monitor tests and proper coaching for confirmatory spirometry.

Most previous studies have either used only pre-bronchodilator microspirometry19,20,21,23,24,25,26 or post-bronchodilator microspirometry,27 and the only study to measure pre- and post-bronchodilator microspirometry did not report comparative test accuracy.22 By demonstrating the comparability of test performance irrespective of bronchodilation, our study supports the continued use of pre-bronchodilator microspirometers for screening purposes.

Participants performed three blows using the lung monitor, irrespective of blow quality as indicated by the device’s in-built quality alert, with the highest recorded readings being used for analyses. While this follows some previous studies,23,25,26 had we required all lung monitor blows to be technically valid19,20,22 we may have obtained greater FEV1 or FEV6 values for some participants. Furthermore, like most studies we did not assess within-participant repeatability across blows on the lung monitor, though this has been done in at least one study.21

The observed test performance of the lung monitor suggests that it could be reliably used as a screening tool in patients perceived to be at risk of COPD, to select those requiring confirmatory spirometry. The efficiency of the diagnostic spirometry test could therefore be substantially increased, by patients highly unlikely to have airflow obstruction being screened out in advance. Screening at-risk symptomatic patients with a lung monitor rather than referring all patients for confirmatory spirometry also represents financial savings, with the handheld device being approximately one tenth of the cost of diagnostic spirometers. Resource savings could be realised in practices irrespective of whether they conduct confirmatory spirometry ‘in house’ or refer patients to a lung function unit, as both models would reduce the number of patients performing this diagnostic test.

The ability to use a fixed ratio for the lung monitor rather than the LLN to assess airflow obstruction represents a time saving for clinicians, who would otherwise need to use software to refer to reference equations. The comparable test performance of the lung monitor irrespective of bronchodilation supports the use of pre-bronchodilator tests, further contributing to the efficiency and ease of the screening test, a key consideration in the context of time-pressured primary care consultations.

The lung monitor could potentially be administered by any member of a primary care team, as it is a simple device requiring minimal training. This would be beneficial in general practice where staff may be unfamiliar with the device19 and the simplicity may minimise the risk of becoming de-skilled in using the lung monitor, in contrast to confirmatory spirometry where clinicians’ skills can reduce over time if they do not perform the test regularly.36

The simplicity of the lung monitor, the minimal number of required blows and its good test performance suggests that it could be particularly useful as a screening test in patients with poor coordination or lower cognitive ability. Furthermore, our Patient Advisory Group preferred the lung monitor over other microspirometer models suggesting that it may be more acceptable to patients.

While we have suggested optimal cut-points based on the balance of sensitivity and specificity, in practice, the optimal cut-point would be determined by the clinical setting in which the lung monitor was being used. For example, in settings where access to quality confirmatory spirometry may not be available, particularly in low-resource settings, specificity of the lung monitor may be prioritised. In these settings, using thresholds with higher specificity could effectively exclude the majority of those with respiratory symptoms who do not have COPD, thus preventing overdiagnosis.

In addition to use as a screening tool, the accurate measurement of FEV1 may indicate that the device could be used to monitor obstruction severity or lung function decline among diagnosed COPD patients, for example during annual reviews. Further research would be needed to explore this, but the potential time and cost savings afforded by using the lung monitor instead of confirmatory spirometry may be attractive to General Practitioner practices, who would still obtain annual FEV1 values as recommended by bodies such as the National Institute for Health and Care Excellence in the UK.28

Future research could build on preliminary evidence regarding microspirometer screening strategies,13,14 which could be implemented in differing clinical or economic contexts. Using a combination of microspirometry and screening questionnaires for example may prove more efficient than microspirometry alone. Furthermore, rather than using one cut-point to identify patients requiring confirmatory spirometry, certain contexts may warrant using two cut-points to refer only those patients where there is uncertainty about their diagnosis. For example, in low- and middle-income countries where availability of confirmatory spirometry may be limited, a three-tiered approach may be plausible whereby the top proportion of patients are defined test negative, the bottom proportion are defined as test positive and the middle proportion are referred for confirmatory spirometry.

Our results show that the Vitalograph lung monitor, which is a cheap and simple device, has acceptable accuracy for use within a screening pathway for undiagnosed COPD among primary care patients with respiratory symptoms. We have established that the test performance of the lung monitor is unaffected by bronchodilation, and our optimum cut-point of FEV1/FVC < 0.78 supports previous studies, with no observed advantage of using LLN for this screening test. Our paper makes a valuable contribution to the evidence base concerning potential COPD screening tests, though more work is required to inform the need for a formal screening programme.

Methods

Study design

We conducted a prospective case–control study to evaluate the screening performance of the Vitalograph® lung monitor (Vitalograph Ltd, Buckingham, UK), nested within a large COPD Cohort study.

Participant recruitment

Study participants were drawn from those attending for their 3-year follow-up assessment as part of the Birmingham COPD Cohort Study, which has been reported in detail elsewhere.37 In brief, participants were primary care patients aged ≥40 years, who either had previously clinically diagnosed COPD or had reported chronic respiratory symptoms as part of a case-finding trial.38 Participants from the case-finding trial were invited to join the Cohort study, irrespective of their spirometry results, if they reported chronic cough or phlegm for ≥3 months for at least 2 years, wheeze in the past 12 months or dyspnoea of MRC grade ≥2.

At the 3-year follow-up assessment visit, cohort participants were invited to take part in the additional tests for this case–control study (Fig. 2) and those who agreed were asked to sign a consent form. Those who declined to participate completed the standard Cohort assessment. The National Research Ethics Service Committee West Midlands, Solihull provided approval for both the Birmingham COPD Cohort (11/WM/0304) and the case-finding trial (11/WM/0403).

Fig. 2
figure 2

Case–control study design.

Data collection and clinical measures

In addition to the lung function tests described below, participants underwent the standard Cohort follow-up assessment, which included various physiological and anthropometric measurements (height, weight, grip strength, exercise capacity) as well as completing questionnaires.

Index test: lung monitor microspirometry

Participants received pre- and post-bronchodilator microspirometry with the Vitalograph lung monitor prior to confirmatory post-bronchodilator spirometry (Fig. 1). The lung monitor measured FEV1 and FEV6 in litres. In contrast to confirmatory spirometry, participants received minimal explanation or coaching when using the lung monitor. Researchers told participants to take a deep breath until lungs were full and blow into the mouthpiece as hard and fast as they could until being told to stop. Researchers demonstrated the correct technique once and then allowed the participant to perform the blows themselves, without additional coaching or encouragement. Participants performed three blows pre-bronchodilator and three blows post-bronchodilator. Technically unsatisfactory blows identified by the in-built quality assessment were recorded on the case report form, but participants were not asked to repeat the blow. The best FEV1 and FEV6 blows were used for analyses, irrespective of quality and which blow attempt they came from.

Positive test results were defined as being below the 5th percentile of the predicted pre-bronchodilator FEV1/FEV6 ratio (i.e. the LLN) using the NHANES III equations.39 Alternative positive test results were also pre-specified, including post-bronchodilator FEV1/FEV6 below the LLN, and various cut-points of the FEV1/FEV6 ratio.

Reference test: post-bronchodilator confirmatory spirometry

Post-bronchodilator confirmatory spirometry was conducted according to American Thoracic Society and European Respiratory Society 2005 guidelines40 by trained researchers using the ndd Easy on-PC spirometer. Participants received 400 μg of Salbutamol and after waiting at least 20 min, performed a minimum of 3 (maximum of 6) blows until repeatability was achieved. Although the lung monitor and spirometry tests were administered by the same researcher, the tests were in effect administered blind of each other, as researchers did not record the FEV1/FEV6 ratio for the lung monitor before administering confirmatory spirometry.

Cases were defined as participants whose predicted FEV1/FVC ratio was below the LLN using the NHANES III equations, according to confirmatory spirometry. Participants not meeting this criterion formed the controls.

Aims

The primary aim was to assess the pre-bronchodilator test accuracy (sensitivity and specificity) of the lung monitor (FEV1/FEV6) against post-bronchodilator confirmatory spirometry (FEV1/FVC), using the LLN definition of airflow obstruction.

We also aimed to assess the correlation and agreement between lung function measures from both devices and to compare test accuracy of pre- and post-bronchodilator lung monitor data. Finally, to identify the threshold that optimised sensitivity and specificity, we explored the effect of using different FEV1/FEV6 thresholds, including the fixed ratio of <0.7, to define a positive test result on the lung monitor.

Sample size

We calculated that we required a sample size of 248 cases and 248 controls to detect an assumed sensitivity of 85%21,27 while ensuring the lower bound of the CI was >80%.

Statistical analysis

We evaluated the diagnostic test accuracy of the lung monitor (index test) for all participants with complete data for the index and reference tests. We estimated sensitivity and specificity of the lung monitor using pre-bronchodilator data. We compared test accuracy of pre- and post-bronchodilator lung monitor blows, using McNemar’s test. Using continuous test values, we assessed the discriminatory accuracy of FEV1 and FEV6 measured by the lung monitor via receiver operating characteristic curve analysis. We then conducted sensitivity analyses using a fixed ratio definition of obstruction to identify a lung monitor FEV1/FEV6 optimal threshold.

To account for the case–control study design, post-test probabilities (herein referred to as positive predictive values (PPV)) were calculated using Bayes’ Theorem to reflect current COPD prevalence in the UK. For our tables and appendices, we calculated PPVs based on COPD prevalence among adults aged ≥40 years being 3–10%.1,41

All analyses were conducted in Stata SE v15.

The paper was written according to the STARD guidance42 for reporting studies of diagnostic accuracy.