Introduction

It is estimated that up to 500,000 people worldwide every year suffer a spinal cord injury (SCI), with a majority of SCIs occurring due to traffic incidents, falls, or violence [1]. Individuals with SCIs often experience decreased levels of mobility, and many of them use a wheelchair as a primary means of mobility [2]. Despite this regained mobility, these individuals still experience lower levels of mobility in comparison with their ambulatory counterparts and hence are subject to lower levels of physical activity (PA) [3]. Decreased levels of PA in this population has been associated with high incidences of chronic conditions such as type 2 diabetes, cardiovascular diseases, fatigue, weight gain, pain, and depression [4].

With the exponential growth of wearable devices in recent years, these devices are increasingly used to help people track free-living PA for self-management [5, 6] as well as for supporting a wide variety of research [7]. One of the common measures provided by these devices is energy expenditure (EE) which is often used to monitor energy balance (i.e., the difference between food intake and EE) for weight management. EE is composed of three components including the resting metabolic rate or resting energy expenditure (REE) that contributes 60%–75% to the overall EE, thermic effect of food that contributes ~10%, and physical activity energy expenditure (PAEE) that is the most flexible component contributing 15%–30% to the overall EE, and is easily modifiable through PA participation [8]. For people with SCI, the measured REE is 14%–27% lower than people without disabilities due to the reduced fat-free mass and sympathetic nervous system activity [9]. These individuals, especially those who use wheelchairs for mobility, also tended to have lower PAEE due to the primary use of small muscle groups in the upper body [10]. With the lower overall EE in this population, they are at a higher risk for weight gain and associated health problems. Thus, it is important for the wearable devices to provide accurate feedback of everyday EE in people with SCI.

Many wearable devices on the market today are designed to be wrist-worn for better compliance, however, they are calibrated using a protocol that involves predominantly lower-extremity movement such as walking and running, which excludes their usage in the nonambulatory population. In recent years, there has been some work on developing custom EE predictive models using wearable devices for wheelchair users [10]. Of the few commercial wearable devices used to estimate EE in manual wheelchair users (MWUs) with SCI, ActiGraph activity monitors (ActiGraph, LLC., Pensacola, FL, USA) have been the primary device of choice by researchers [11]. The ActiGraph devices feature a primary accelerometer and an inertia measurement unit, and provides user access to proprietary variables such as accelerometer counts as well as raw sensor signals at various frequencies. Several studies developed custom EE predictive equations based on ActiGraph activity monitors for MWUs with SCI [11,12,13,14,15]. These studies often used a metabolic cart to obtain the criterion measure of EE while participants perform a variety of PA wearing an ActiGraph device on the wrist. They then developed custom EE predictive equations that relate the outputs of ActiGraph devices with the criterion measure. As the performance of these predictive equations can be affected by the activity protocol and evaluation method, it is difficult to determine the comparative validity of these predictive equations and know which one(s) could be potentially used by future work.

The aim of this study was to conduct a literature search for existing EE predictive equations using ActiGraph activity monitors for MWUs with SCI, and evaluate their validity using an out-of-sample dataset. Using the same dataset collected separately from these studies to evaluate these predictive equations will provide an unbiased result to help guide appropriate and informed use of these predictive equations.

Methods

Existing EE prediction equations

A literature search was conducted to collect studies that had developed EE predictive equations based on wearable devices for MWUs. The eligibility criteria are (1) the output of the predictive equation should be in a form related to EE (e.g., overall EE, PAEE, and VO2); (2) the input of the predictive equation should include variables from a wrist-worn ActiGraph activity monitor; and (3) the data used to develop the predictive equations should be from people with physical disabilities leading to the use of a manual wheelchair for mobility, and at least 25% of the sample should be MWUs with SCI. Three databases—PubMed, Institute of Electrical and Electronics Engineers, and Scopus were used for the search. A set of search terms was used for wheelchair users, EE, and activity monitors including different spellings and synonyms. The search terms were then logically joined by “OR” and “AND”. The end date for the search was March 5th, 2019. The search of the three databases yielded 76 results, and four of them met the eligibility criteria [12,13,14,15]. An additional fifth study was acquired from a university thesis catalog [11]. A flow diagram describing the selection process is shown in Fig. 1.

Fig. 1
figure 1

PRISMA flow diagram displaying selection process.

The five sets of EE predictive equations and their related information are summarized in Table 1. It is worth noting that the outputs of the predictive equations differ either by using different EE-related variables or using different units. Eq. #1 [12] and #2 [13] predicted PAEE in unit of kcal min−1 and kJ min−1, respectively, using per minute vector magnitude counts (VMC), a proprietary activity unit of ActiGraph devices, as the sole predictor variable. The criterion PAEE was obtained by subtracting the measured REE from the EE measured by a metabolic system during activities. Eq. #3 [14] predicted the VO2 in units of ml kg−1 min−1. It used variables that correspond to features extracted statistically and from the discrete wavelet transform (DWT) of the signals from each axis, i.e., x-axis counts (XC), y-axis counts (YC), z-axis counts (ZC), and VMC. The statistical features included the 50th, 75th, and 90th percentiles of each minute for each axis. The DWT features included the first level determination coefficient and the second level approximation coefficient. Eq. #4 [11] was from a non-peer-reviewed source (i.e., dissertation), which predicted EE in units of kcal min−1 using features from the raw accelerometer signals as well as two basal metabolic rate (BMR) equations including the Mifflin/St. Jeor BMR equation (BMR1) and the World Health Organization (WHO) equation (BMR2) as predictors. Eq. #5 [15] includes two different equations, one for left-handed individuals, and one for right-handed individuals. They predict VO2 in units of ml kg−1 min−1 using per minute VMC as predictors.

Table 1 EE prediction equations for MWUs.

The out-of-sample dataset

The out-of-sample dataset was collected from a study performed at two sites including the Human Engineering Research Laboratories, Pittsburgh, PA, and the James J Peters VA Medical Center, Bronx, NY. This study was approved by the US Department of Veterans Affairs (VA) Central Institutional Review Board. The inclusion criteria are (1) between the ages of 18 and 65; (2) having an SCI at least 1-year post injury and medically stable, and (3) using a manual wheelchair as their primary means of mobility for at least 40 h/week.

Participants in the study were asked to refrain from engaging in moderate or vigorous intensity PA from the previous night. They were also asked to refrain from taking caffeine and eating on the day prior to their testing. Once participants gave consent, they completed a demographics questionnaire. Measurement of height was recorded to the nearest centimeter using a tape measure when participants laid supine, while weight was recorded to the nearest decimal using a wheelchair weight scale (Detecto, Webb City, MO, US). Participants rested in a seated position for almost half an hour before they were asked to rest in a supine position for the measurement of REE for 20 min. Participants were instructed not to talk and stay awake during the REE measurement. They then performed a number of activities of daily living (ADL) and exercise in random order. Activities included: resting in a wheelchair; propulsion at self-selected slow, normal, and fast pace on a concrete surface; propulsion up/down a 1:60 sloped tile surface; watching TV; working on a computer; playing basketball; sweeping/vacuuming the floor; loading and unloading a dishwasher; weight lifting; TheraBand exercises; arm ergometry at a self-selected slow and fast pace; folding laundry; and being pushed in their wheelchair. Each of these activities was performed for 10 min with at least a 3-min break. Participants were equipped with a COSMED K4b2 portable metabolic cart (COSMED Inc, Rome, Italy) and an ActiGraph GT9X Link on their dominant wrist. The metabolic cart measures VO2 and carbon dioxide production (VCO2), and uses the Weir Equation [16] to predict EE in units of kcal min−1 based on VO2 and VCO2. The ActiGraph GT9X Link was configured to record raw acceleration signals at 30 Hz. The raw signals as well as activity counts for each axis and VMC in 1-s epoch size were obtained from the ActiGraph ActiLife software (v6.11.9).

Data analysis

Prior to data analysis, steady-state data during each activity trial, defined as VO2 and VCO2 measured by the K4b2 having less than 10% changes within 5 consecutive minutes [17, 18], were extracted. When 5 consecutive steady-state minutes were not available, at least 3 consecutive minutes of data were attempted [19] or data from the activity was discarded [19]. Only steady-state data were used to evaluate the performance of the five sets of EE predictive equations shown in Table 1. To obtain reliable REE measurement, the first 5 min of data were deleted before analysis [20].

To make sure the outputs from all equations are consistent for comparison, we performed a number of conversions to convert all outputs to EE in kcal min−1. For Eq. #1 [12] and #2 [13], we added the measured REE from resting in a supine position for each participant to their predicted PAEE to obtain the predicted EE in kcal min−1. For Eq. #3 [14] and #5 [15], we used the caloric equivalent based on the respiratory exchange ratio and the participant weight to convert oxygen consumption in ml min−1 kg−1 to EE in kcal min−1. Equation #4 [11] predicted EE in kcal min−1, and thus no conversion was required. Standardizing and processing of all equations was done using MATLAB 2018b (MathWorks Inc, Natick, MA, USA).

To examine the validity of the five sets of EE predictive equations, an equivalence test between each predictive equation and the criterion measure was performed based on a confidence interval (CI) method [21]. We first obtained both the mean criterion and estimated EE for each activity across all participants. A regression model was then fitted to the pairs of mean criterion EE (X-axis) and mean estimated EE (Y-axis) for each activity. If the predictive equation is equivalent to the criterion across all activities, the intercept of this regression should be 0 and the slope should be 1. To make sure the intercept describes an average activity, the X and Y values of the regression were further adjusted by subtracting the overall criterion mean (averaged overall activities) [21]. Research suggested regression-based equivalence regions as ±10% of the criterion mean for the intercept and (0.9, 1.1) for the slope (i.e., ±10% of the slope of 1) that would be expected for equal means on the two measures [21, 22]. We also tested the equivalence regions of ±15% and ±20% of the criterion mean for the intercept and of the slope of 1, respectively. To claim equivalence across the array of activities tested at α = 0.05, two 90% CIs, one for the intercept and one for the slope, should fall inside their respective equivalence regions.

In addition, the mean absolute error (MAE), mean absolute percent error (MAPE), mean signed error (MSE), and mean signed percent error (MSPE) were calculated for each participant by comparing the per minute estimated and criterion EE (kcal min−1). The intraclass correlation (ICC) using a two-way mixed-effects model with absolute agreement ICC (3, 1) was obtained along with the 95% CIs between the estimated and criterion EE across all participants. As proposed by Koo and Li [23], ICC values higher than 0.9 are considered as excellent, between 0.75 and 0.9 as good, between 0.5 and 0.75 as moderate, and lower than 0.5 as poor reliability. Also, the Bland–Altman (BA) plot and analysis was used to compare the per minute estimated EE against the criterion across all activity trials of all participants. It plotted the differences between the estimated and criterion EE against their average for each trial of each participant, and also calculated the mean difference between the per minute estimated and criterion EE (the ‘bias’), and 95% limits of agreement (LoA) as the mean difference plus and minus 1.96 times the standard deviation (SD) of the differences [24].

We further looked into how the predictive equations perform with different intensities of PA. We first averaged the VO2 values (ml kg−1 min−1) from the metabolic cart for each minute and then divided it by the resting metabolic equivalent of 2.7 ml kg−1 min−1 for individuals with SCI [25] to obtain the metabolic equivalent task (MET) for the minute. MET values ≤1.5 are classified as sedentary behavior, 1.5–3 METs are classified as light intensity, and ≥3 METs are classified as moderate-to-vigorous PA (MVPA) [25]. The MAE, MAPE, MSE, MSPE, and ICC (3, 1) were then calculated for each subject by comparing the per minute estimated and criterion EE for each intensity group. Finally, the same set of measures was calculated for each subject by comparing the per minute estimated and criterion EE for each type of activity. All statistical analysis was performed using IBM SPSS Statistics v. 25 (IBM, Armonk NY, USA).

Results

A total of 30 participants were recruited and tested in this study. One participant did not have steady-state REE after the first 5 min of resting data were removed, and thus was not included in the data analysis. Demographic information for 29 participants is shown in Table 2. The total steady-state activity minutes for all participants ranged from 19 to 104 min with a mean (SD) of 73 (21) minutes. Based on the criterion VO2, 21% of the time was sedentary, 44% of time was in light intensity PA, and 35% of time was in MVPA. The performance of the EE predictive equations in terms of MAE, MAPE, MSE, MSPE, and ICC (3, 1) between the per minute estimated and criterion EE across all participants and intensity groups can be found in Table 3. Same measures for each type of activity across all participants can be found in Supplementary Material. BA plots in Fig. 2 show the differences between the per minute estimated and criterion EE against their average for each trial of each participant. The range between the 95% lower LoA and upper LoA for Eq. #1–#5 [11,12,13,14,15] are 5.01, 4.70, 5.37, 4.74, and 25.09 kcal min−1, respectively.

Table 2 Demographic data.
Table 3 Performance of EE predictive equations for all, sedentary, light, and MVPA activities.
Fig. 2: Bland-Altman plots of the differences between the criterion and estimated EE using the five predictive equations.
figure 2

Bland and Altman plots showing the differences between the criterion and estimated EE using the predictive equations including Nightingale (Eq. #1) [12], Nightingale (Eq. #2) [13], Garcia-Massó (Eq. #3) [14], Tsang (Eq. #4) [11], Learmonth (Eq. #5) [15]. LoA limits of agreement. Eq. #1–4 axis values differ from those of Eq. #5.

In terms of the equivalence testing, the overall criterion mean of all 17 activities was 2.89 kcal min−1 so the intercept equivalence region is (−0.29, 0.29) at 10%, (−0.43, 0.43) at 15%, and (−0.58, 0.58) at 20%. The slope equivalence region is (0.9, 1.1) at 10%, (0.85, 1.15) at 15%, and (0.8, 1.2) at 20%. The results from regression predicting the criterion mean from the estimated mean by each predictive equation for 17 activities is shown in Table 4. For all five sets of EE predictive equations, the 90% CI for both the intercept and the slope (Table 4) was outside their respective equivalence regions at all levels including 10%, 15%, and 20%, and thus none of the equations demonstrated statistical equivalence against the criterion measure.

Table 4 Equivalence testing.

Discussion

This study examined the performance of five sets of published EE predictive equations for MWUs using an independent dataset from 29 MWUs with SCI. The out-of-sample validation showed that these predictive equations did not demonstrate statistical equivalence against the criterion measure based on 20% equivalence regions. They also had varied performance when compared with the criterion measure. The MAE (MAPE) for the five sets of predictive equations ranged from 0.87 to 6.41 kcal min−1 (31%–206%) with the ICC estimates ranging from 0.06 to 0.59. From the BA plots in Fig. 2, Eq. #1–#3 [12,13,14] all demonstrated considerable heteroskedasticity (i.e., increasing error as the intensity of activity increases). Eq. #4 [11] showed a tendency (R2 = 0.808) to over-predict with higher intensity activities whereas for Eq. #5 [15], the negative correlation (R2 = 0.927) implies a tendency to under-predict with higher intensity activities.

Though none of the equations demonstrated statistical equivalence against the criterion measure, the regression (Table 4) based on Nightingale’s Eq. #2 [13] yielded a slope of 1.118 (closest to 1) and an intercept of 0.237 (closest to 0). From Table 3, this equation also showed the lowest MAE and highest ICC. However, it should be noted that when standardizing this equation along with Nightingale’s Eq. #1 [12], we added the measured REE from the metabolic cart to the estimated PAEE to obtain the estimated overall EE. Therefore, it is expected that these equations may yield better accuracies in estimating the overall EE, as the REE needed in such estimations is from direct measurement instead of by the predictive equations. Thus, the accuracies of the Nightingale’s Eq. #1 [12] and Eq. #2 [13] equations should be used with caution when the overall EE prediction is of primary interest.

Similar to both Nightingale’s equations [12, 13], Learmonth’s Eq. #5 [15] also utilized the VMC as their only predictor variable; however, the equations yielded far higher errors and lower ICC values in comparison with the rest of the equations. One difference between Learmonth’s study [15] and others was that the former used a protocol including only three propulsion activities at 1.5, 3.0, and 4.5 mph, respectively, while other studies included a mix of propulsion activities and other ADLs [11,12,13,14]. In addition, there could be other unidentified systematic errors that caused the large estimation errors, as evidenced by the BA plots in Fig. 2 and large deviations from the ideal slope of 1 and intercept of 0 in the equivalence testing (Table 4).

Tsang’s study [11] included the largest and widest array of activity recordings among the five studies with a total of 24 activities, 13 of which came from a lab session and 11 of which came from a home session. It also had the largest sample size among the five studies. As light-weight PA accounted for a large portion of the activities, Tsang’s Eq. #4 [11] showed the lowest MAE (MAPE) of 0.43 kcal min−1 (18%) and highest ICC of 0.73 for light-intensity activities. However, the performance of Tsang’s Eq. #4 [11] fell short for sedentary behavior and MVPA as shown in Table 3. The lowest MSPE of 15% but a poor ICC value of Tsang’s equation [11] for all activities indicate that Tsang’s Eq. #4 [11] consistently over and underestimated different activities, causing these errors to weigh each other out. In addition, the BA plots in Fig. 2 and large deviations from the ideal slope of 1 in the equivalence testing (Table 4) also indicate there might be systematic errors in the modeling process. One of the issues could be related to its use of the able-bodied BMR prediction equation, which have previously been shown to demonstrate considerable error when used in individuals with SCI [26].

Garcia-Massó’s Eq. #3 [14] is the only one that utilizes statistical methods as well as signal processing techniques in the modeling process. Compared with the other equations, it showed reasonable performance especially considering that it estimates the overall EE directly and thus does not need to use measured REE in the total EE estimation as the Nightingale’s Eq. #1 [12] and Eq. #2 [13]. The more complex modeling technique used in Garcia-Massó’s study [14] may have contributed to the reasonable performance achieved by Garcia-Massó’s Eq. #3 [14] for predicting the overall EE as compared with the other equations.

While these predictive equations seemed to yield moderate-to-large estimation errors for varying reasons, we want to contrast their performance with what was available among the general ambulatory population. A recent study [27] assessed the accuracies of several consumer-grade activity monitors such as the Fitbit Surge (Fitbit Inc., San Francisco CA, United States), Jawbone Up3 (Jawbone Inc., San Francisco, CA, United States), and Apple Watch 2 (Apple Inc., Cupertino, CA, United States) in 44 ambulatory participants. The study found that the Jawbone Up3 gave the best performance with a MAPE (SD) of 28% (27%), whereas the Fitbit Surge had the worst performance with a MAPE (SD) of 67% (80%). The other devices including the Apple Watch 2 performed similarly, with a MAPE (SD) of 49% (47%). Another study assessed the laboratory and daily EE estimates from four consumer-grade devices including the three aforementioned devices in [27] and two research-grade devices. While the MAPE values reported for the three consumer devices differed from those provided by [27], the MAPE for both consumer- and research-grade devices ranged from 20% to 40% for laboratory assessment and 15%–34% for 24-hr free-living assessment [28]. A 2018 systematic review and meta-analysis of the validity of activity monitors in estimating EE in the general population found large and significant heterogeneity for many devices and concluded that EE estimates from wrist and arm-worn device differ in accuracy depending on activity type [29]. In addition, a pilot study from 2018 assessed the feasibility of using Fitbit Charge 2 to monitor daily PA and hand-bike training in six wheelchair users with SCI. The study provided descriptive graphs to show the possibility of using total daily step counts to detect training days and interpersonal/intrapersonal variations in the daily PA level; however, it did not incorporate any criterion measure and provide meaningful statistics [30]. These findings are similar to what was found in this paper, with the predictive algorithms varying in accuracy across different activity types and intensities.

Compared with the EE prediction performance for the general ambulatory population, the existing EE prediction algorithms for MWUs with SCI demonstrated greater inaccuracy with MAE (MAPE) ranging from 0.87 to 6.41 kcal min−1 (31%–206%). Extrapolating this error over a 24-hr period would lead to an EE estimation error of 1253–9230 kcal day−1. However, as accuracies of these predictive EE equations vary across different types and intensities of activities, the EE estimation error yielded in the study based on the specific activity protocol may not accurately reflect the daily EE estimation errors in free-living conditions. Nonetheless, future work is needed to develop more accurate EE algorithms for MWUs with SCI. With the growing prevalence of multisensor consumer devices, new EE predictive algorithms could consider incorporating physiological signals such as heart rate to help improve EE prediction accuracy for MWUs with SCI, especially for MVPA and with more practical calibration procedures [31]. In addition, EE estimation could use signal features and patterns extracted from high resolution raw acceleration signals with machine-learning techniques to better classify the activity types and derive more sophisticated EE prediction models. Finally, as REE accounts for a large percent of daily EE, more research is needed to improve REE estimation accuracy for this population based on readily available demographic and anthropometric information.

Although this is the first study that offers a comparative evaluation of all predictive ActiGraph EE algorithms in MWUs with SCI, the study has a few limitations. The study attempted to follow a systematic review process; however, the search covered only three databases and limited effort has been made to locate unpublished work. Thus, the study may not include all relevant EE predictive algorithms. In terms of the out-of-sample data collection, all 29 participants in the study had paraplegia. A larger sample size with various levels of diagnosis including tetraplegia could further improve our understanding of predictive equation performance. This study only equipped participants with one ActiGraph monitor on their dominant wrist. Garcia-Massó’s study [14] also developed a nondominant wrist equation, which could not be evaluated in this study. Finally, although the activity protocol in our study included a relatively large range of typical daily activities, they were not performed in natural settings. Most importantly, the proportion of different types of activities in the protocol does not necessarily reflect the typical activity profile in everyday living. As the estimation accuracy of the EE predictive equations may be dependent on the types of activities, the results of this evaluation should be interpreted with caution.

Conclusion

EE estimates from all five sets of predictive equations based on ActiGraph monitors for MWUs with SCI failed to fall into the ±10%, ±15%, and ±20% equivalence regions set by the criterion. These equations yielded a MAE of 0.87–6.41 kcal min−1 and a MAPE of 31%–206%. Future work is needed to develop more accurate EE predictive algorithms for MWUs with SCI.