Public Significance Statement

Validated assessment instruments are needed to better understand the impact of COVID on mental health. The current study develops and validates a new measure for the broad assessment of COVID-related behaviors, worry and disability.

In late 2019, a new coronavirus (COVID-19) was identified in Wuhan, China and quickly spread across the globe, resulting in a pandemic (WHO, 2020b). In addition to impacting the physical health of millions of Americans, the COVID-19 pandemic is a significant psychological stressor due to both the threat of the illness itself and the mitigation strategies used to contain the spread (e.g., social distancing). The social, educational and vocational upheaval caused by the COVID-19 pandemic has further exacerbated fears of illness and death related to the virus and undoubtedly negatively impacted public mental health (Carvalho et al., 2020; Pfefferbaum & North, 2020).

It is imperative to understand cognitive, emotional, and behavioral responses to the pandemic, as such data are crucial for informing population-level interventions. Several authors have already made calls for researchers to rapidly gather data regarding the effects of the pandemic on psychological and social functioning, highlighting that increases in overall distress, incidence of psychiatric conditions, and unhealthy coping behaviors (e.g., substance use) are likely to occur in the coming months (Cullen et al., 2020; Holmes et al., 2020; Pfefferbaum & North, 2020; Reger et al., 2020). Research from past viral outbreaks, such as SARS and Ebola, supports these hypotheses, showing increases in distress, anger, depression, anxiety, substance use, and posttraumatic stress disorder symptoms, even several months after quarantine and other protective measures have ended (Brooks et al., 2020; Hawryluck et al., 2004; Jeong et al., 2016; Mazumder et al., 2020; Mihashi et al., 2009; Sprang & Silman, 2013; Taylor, 2019).

Early reports regarding COVID-19 indicate that people are indeed reporting significant concern about the pandemic and its consequences (Holmes et al., 2020). Within the U.S., in particular, early estimates suggest that approximately 65–70% of individuals may be experiencing moderate to severe levels of psychological distress due to the pandemic (Hsing et al., 2020; Nelson et al., 2020a, b; Rosen et al., 2020; Twenge & Joiner, 2020). There also have been noted increases in feelings of hopelessness, sadness, and worthlessness (Twenge & Joiner, 2020), as well as decreases in feelings of social connection (Hsing et al., 2020). Initial studies have shown that more people are seeking psychiatric care and calling national crisis lines (Bharath, 2020; Lakhani, 2020; Levine, 2020), providing further evidence that the pandemic is posing a significant threat to mental well-being.

Though the initial work regarding COVID-19 provides a foundation for subsequent work, its utility and generalizability remains unclear due to methodological limitations. There have been a number of questionnaires developed to assess psychological responses to the pandemic (Hsing et al., 2020; Nelson et al., 2020a, b; Qiu et al., 2020; Rosen et al., 2020; Simione & Gnagnarella, 2020; Taylor et al., 2020). However, existing questionnaires are limited in several ways. First and perhaps most importantly, few of the existing scales have been subjected to rigorous reliability or validity testing, which limits our ability to draw strong conclusions about what aspects of the COVID-19 pandemic are the most distressing and thus most likely to impact mental health. As a result, we are unable to determine the full utility of those measures, which is crucial to understand reactions to COVID-19 and their effects on mental health as well as identify individuals most at risk for negative outcomes.

To our knowledge, there is only one self-report questionnaire that has been psychometrically tested in a U.S. sample (Taylor et al., 2020). Taylor et al. (2020) developed the COVID-19 Stress Scale (CSS), which is a 36-item measure that assesses fears related to (1) the danger of contracting COVID-19 (e.g., “I am worried that I can’t keep my family safe from the virus”); (2) economic consequences (e.g., “I am worried that grocery stores will close down”); (3) xenophobia (e.g., “I am worried that foreigners are spreading the virus in my country”); (4) compulsive behaviors (e.g., “checking my own body for signs of infection”); and (5) traumatic stress symptoms (e.g., “I had bad dreams about the virus”). The CSS demonstrated a stable factor structure, good to excellent internal consistency, and adequate convergent and discriminant validity. However, it is limited by its sole focus on health-related fear and anxiety reactions to the pandemic. The CSS does not account for the multifaceted nature of COVID-19-related reactions and stressors, thereby limiting its use as a measure of overall distress and impairment.

Second, most of the existing scales do not assess the multifaceted psychological reactions to the pandemic. As would be expected, the COVID-19 pandemic has resulted in a broad range of impacts, including cognitive (e.g., concern about contracting the virus), emotional (e.g., feelings of sadness), and/or behavioral elements (e.g., stockpiling of food and supplies; Holmes et al., 2020; Pfefferbaum & North, 2020). However, most measures have focused primarily on only one or two of these aspects (Nelson et al., 2020a, b; Qiu et al., 2020; Simione & Gnagnarella, 2020). Importantly, these reactions may be differentially associated with outcomes. For example, Rosen et al. (2020) found that behavioral changes (e.g., time spent reading the news) were a stronger predictor of overall anxiety than cognitive factors (e.g., concern about financial impacts). Therefore, clarifying the specific nature of responses to the pandemic is critical for predicting outcomes and ultimately creating targeted interventions.

Third, existing scales are limited in their assessment of stressors related to the pandemic, such as medical concerns, social isolation, financial difficulties, familial stress, and change in everyday routine (Pfefferbaum & North, 2020). Most of the COVID-19 measures evaluate only one or two sources of stress (Hsing et al., 2020; Nelson et al., 2020a, b; Qiu et al., 2020; Simione & Gnagnarella, 2020; Taylor et al., 2020). However, there is initial evidence that specific stressors may be differentially associated with outcomes, as one study found that concern about contracting the virus, but not concerns about social isolation, was a significant predictor of overall psychological distress (Hsing et al., 2020). As such, it is important to capture the many ways in which the pandemic may differentially impact functioning.

Taken together, the COVID-19 pandemic is a significant psychological stressor that threatens public mental health. As research regarding the impact of COVID-19 progresses, it is imperative to utilize comprehensive and validated measures to ensure systematic, consistent, and generalizable empirical work. Therefore, the primary aim of the current study was to develop and provide psychometric evaluation of a comprehensive COVID-19 Impact Battery (CIB) that consists of three measures to assess behaviors, worry, and dysfunction in response to the COVID-19 pandemic. The secondary aim was to develop and provide psychometric evaluation of a short scale, single measure (CIB-S) that taps into each area of the CIB measures to allow for flexibility and to increase rapid data collection of COVID-19 distress.

In the current study, we developed and validated the CIB and CIB-S using a stepwise procedure in line with best-practice measurement procedures (Boateng et al., 2018; Devellis, 2016). First, we created a pool of potential items based on the authors’ clinical experience with fear and anxiety as well as the input from experts in the field of anxiety. We also consulted polling research (Keeter, 2020) on psychological distress during the pandemic. Second, we used exploratory factor analysis (EFA) to evaluate the structure of the CIB and CIB-S among an initial sample (Sample 1) of participants recruited through Amazon Mechanical Turk (Mturk). Third, we conducted confirmatory factor analysis (CFA) to validate the proposed structure across two independent samples (an independent Mturk sample [Sample 2] and a sample of faculty, students, and staff at a Midwestern University). Fourth, we examined test–retest reliability in Sample 1 from baseline to 1-month follow-up. Finally, we examined (a) convergent, (b) discriminant, and (c) construct validity using structural equation modeling (SEM) among Sample 2 participants. Specifically, relations were examined between the CIB/CIB-S and measures of general distress and worry for convergent validity and a measure of perceived attentional control for discriminant validity. Based on prior studies, we expect general distress and worry to be moderately to highly associated with maladaptive COVID-19 behaviors and worry due to COVID-19 whereas we expect attentional control to be minimally associated with maladaptive COVID-19 behaviors and worry due to COVID-19 (Baiano et al., 2020; Manning et al., 2021; Saulnier et al., 2021). Construct validity was tested by examining the overlap between the scales that we created and additional COVID-19 pandemic distress and disability.

Method

Data collection across all three samples involved completion of batteries of self-report questionnaires of varying lengths. These surveys were hosted on the Qualtrics platform. For Sample 1 and 2, both recruited from Amazon Mturk, participants had to have an approval rating of at least 95% with a minimum of 100 surveys (i.e., Peer et al., 2014). Prior to initiating a survey, all participants provided informed consent electronically. All participants had to be 18 years of age or older and live in the United States to participate. Study procedures were approved by the Institutional Review Boards of Florida State University and Ohio University and the study was conducted in accordance with the 1964 Helsinki declaration and its later amendments.

Participants and Procedures

Sample 1

Sample 1 comprised 249 participants recruited from Mturk. Of these, 74 participants were excluded for missing at least one of seven attention check items asking participants to select certain responses if they are paying attention or fill in a text box with a phrase. Sample 1 participants completed a self-report survey (Wave 1) and were re-administered the survey at 1-month post their initial assessment (Wave 2). Wave 1 data collection began on April 13, 2020, with modal completion on the same day. Wave 2 data collection began on May 14, 2020, with modal completion on the same day. The sample demographics were comparable across waves. Sample 1 included 175 participants at Wave 1 (Mage = 39.05 years, SD = 11.79; 51.4% female) and 122 participants at Wave 2 (Mage = 40.93 years, SD = 12.18; 48.4% female; see Table 1 for sample demographics). Most participants in this sample identified as White (n = 135, 77.1%); a small number of participants identified as Hispanic (n = 15, 8.6%). Within this sample, 47.4% of participants endorsed a 4-year college degree (BA, BS) as their highest level of education achieved. Most participants reported an estimated yearly family income of $75,000 or less (65.1%). At Wave 1, the survey took 29.13 min to complete on average (SD = 15.25 min); at Wave 2, the survey took 43.78 min to complete on average (SD = 20.74 min). Note that the longer survey interval at Wave 2 was due to a longer assessment battery. Participants were compensated $4.00 per hour for completing the survey.

Table 1 Demographic characteristics across samples

With respect to a diagnosis of COVID-19 at Wave 1, 4.6% of the sample reported a confirmed diagnosis, whereas 10.9% reported believing they have COVID-19 but have not yet been tested or diagnosed (see Table 2 for COVID-19 sample characteristics). Regarding exposure to COVID-19, 12.0% of the sample reported being exposed to someone with confirmed COVID-19, and 9.7% reported being exposed to someone who had been tested for COVID-19 but were awaiting the results. A reported 5.1% of the sample indicated that someone in their home had contracted COVID-19. With respect to participants’ perception of the approximate size of the COVID-19 outbreak in their area, the distribution of responses appeared relatively normal, with the largest percentage (22.9%) of respondents indicating a “Medium” outbreak. The vast majority (93.7%) of respondents indicated that they were currently under a stay-at-home order and, of those, more than half (58.9%) reported being under that order for 2–4 weeks.

Table 2 COVID-19 participant characteristics

Sample 2

Sample 2 comprised 900 participants recruited from two nonoverlapping Mturk studies (see Table 1 for sample demographics). Due to emerging evidence that traditional attention check items can be circumvented using automatic or “bot” responding (e.g., Pei et al., 2020), three attention check items using both adversarial questioning (i.e., referring to alternative answers in the questions) and deliberate “typos” (e.g., se1ected) were included in the study. Participants who failed any attention check item were excluded. Data collection for these participants began on April 29, 2020, with modal completion on the same day. Sample 1 included 635 participants at Wave 1 (Mage = 38.52 years, SD = 10; 49.0% female) and 321 participants at Wave 2 (Mage = 40.02 years, SD = 10.54; 53.6% female; see Table 1 for sample demographics). Most participants identified as White (n = 520, 81.9%). A small number of participants identified as Hispanic (n = 71, 11.2%). Within this sample, 46.5% of participants endorsed a 4-year degree (BA, BS) as their highest level of education achieved. Most participants reported an estimated yearly family income of $75,000 or less (60.5%). At Wave 1, the survey took 54.36 min to complete on average (SD = 50.75 min); at Wave 2, the survey took 59.13 min to complete on average (SD = 31.78 min). Participants were compensated $4.25 for the self-report battery.

With respect to a diagnosis of COVID-19 at Wave 1, 1.7% of the sample reported a confirmed diagnosis; of those who did not report a confirmed diagnosis, 3.7% reported believing they have COVID-19 but have not yet been tested or diagnosed (see Table 2 for COVID-19 sample characteristics). With respect to participants’ perception of the approximate size of the COVID-19 outbreak in their area, the distribution of responses appeared relatively normal, with the largest percentage (21.3%) of respondents indicating a “Medium” outbreak. Regarding exposure to COVID-19, 6.6% of the sample reported being exposed to someone with confirmed COVID-19, and 6.0% reported being exposed to someone who has been tested for COVID-19 but is awaiting the results. A reported 1.7% of the sample also indicated that someone in their home had contracted COVID-19. A large majority (87.6%) of respondents indicated that they were currently under a stay-at-home order and of those, more than half (54.7%) reported being under that order for 4–6 weeks.

Sample 3

Sample 3 comprised 281 participants recruited through an email to all BLINDED FOR REVIEW faculty, staff, and students (see Table 1 for sample demographics). We excluded 22 participants who failed at least one of two attention check items. Sample 3 contained 259 participants (Mage = 36.40 years, SD = 14.95; 74.4% female). Data collection for these participants began on May 26, 2020, with modal completion on the same day. Most participants in this sample identified as White (n = 236, 91.1%). A small number of participants identified as Hispanic (n = 9; 3.5%). Participants most frequently endorsed a graduate degree (MA, MS, JD, MBA, PhD) as highest level of education achieved (n = 107, 41.3%). Most participants reported an estimated yearly family income of $75,000 or less (n = 137, 53.1%). The survey took 46.59 min to complete on average (SD = 81.78). Participants volunteered to participate in this study and were not monetarily compensated. With respect to a diagnosis of COVID-19, 2.2% of the sample reported a confirmed diagnosis; of those who did not report a confirmed diagnosis, 6.8% reported believing they have COVID-19 but have not yet been tested or diagnosed (see Table 2 for COVID-19 sample characteristics). With respect to participants’ perception of the approximate size of the COVID-19 outbreak in their area, the distribution of responses was positively skewed, with the largest percentage (42.6%) of respondents indicating a “Very Small” outbreak. Regarding exposure to COVID-19, 6.7% of the sample reported being exposed to someone with confirmed COVID-19, and 4.8% of the sample reported being exposed to someone who has been tested for COVID-19 but is awaiting the results. A reported 4.8% of the sample also indicated that someone in their home had contracted COVID-19. Less than half (47.0%) of respondents indicated that they were currently under a stay-at-home order and of those, a majority (73.2%) reported being under that order for 6–8 weeks.

Measures

COVID-19 Impact Battery

CIB Behaviors Scale (Samples 1–3)

This scale was created to measure behavioral patterns in response to the COVID-19 outbreak. Participants responded to this scale by rating the extent to which they “have engaged in the following behaviors in response to COVID-19” using a five-point scale (from 0 = “Not at all” to 4 = “Very much”). Each of the 22 potential behaviors listed in this scale (e.g., “Hand washing;” “Using hand sanitizer”) were rated by participants. This scale was piloted in the present study (see Appendix Table 7).

CIB Worry Scale (Samples 1–3)

This 25-item scale was created to measure worry and distress in response to the outbreak of COVID-19. The items on this measure use a five-point scale (from 0 = "Not at all" to 4 = "Very Much"). Participants used this scale to rate each item (e.g., “I worry that I will lose my employment;” “I worry that I will lose motivation”) based on the degree to which it has caused distress. This scale was piloted in the present study (see Appendix Table 8).

CIB Disability Scale (Samples 1–2)

We adapted 10 items from the WHODAS II (World Health Organization, 2000a) to measure difficulties resulting from the outbreak of COVID-19. Instructions asked participants to consider difficulties “due to the COVID-19 outbreak” rather than those “due to health conditions.” Item wording was also altered to reflect the adaptation to the COVID-19 outbreak (e.g., “How much have you been emotionally affected by the COVID-19 outbreak?”). Of the 10 items, 7 ask participants to rate difficulties on a five-point scale from 0 (“None”) to 4 ("Extreme or cannot do"). Consistent with the WHODAS, participants used this scale to rate the degree of difficulties experienced in the preceding 30 days because of the COVID-19 outbreak. The final three questions assessed the number of days the disabilities have been present or impairing out of the preceding 30. Only the first 7 items were piloted for the measure (see Appendix Table 8).

COVID-19 Impact Battery Short (CIB-S)

Based on the factor structure and overlap of the COVID-19 scales in the CIB, we constructed a brief scale to broadly capture the psychological impact of COVID-19 on individuals. Due to an administrative error, Sample 2 did not include COVID-19 Disability items. To provide a limited but still strong test of confirmatory support for the CIB-S, Sample 1 Wave 2 data was used to provide additional model fit information as well as additional tests of convergent and discriminant validity.

Convergent and Discriminant Validity

The Positive and Negative Affect Schedule (PANAS; Watson et al., 1988; Samples 1–2)

The 20-item PANAS was used to measure positive affect (PA) and negative affect (NA) in Sample 2. The PA and NA scales each contain 10 one-word adjectives reflecting PA and NA, respectively. In these studies, participants were asked to rate the degree to which these adjectives applied to their emotional state over the past week using a 5-point Likert-type scale, ranging from 1 (“Very slightly or not at all”) to 5 (“Extremely”). The PANAS PA and NA scales have demonstrated good psychometric properties (Kring et al., 2007; Watson et al., 1988). The PANAS NA demonstrated adequate reliability across samples (Sample 1 α = 0.78; Sample 2 ω = 0.94). The PANAS PA demonstrated adequate reliability in Sample 1 (α = 0.73).

The Brief Penn State Worry Questionnaire (Brief PSWQ; Topper et al., 2014; Sample 2)

Trait anxiety was assessed using the Brief PSWQ. The Brief PSWQ is a 5-item measure developed to assess trait worry using a 5-point scale (e.g. “Many situations make me worry;” “When I am under pressure, I worry a lot”). The Brief PSWQ has shown good internal consistency (α ranging from 0.84 to 0.91) and is highly correlated with the full PSWQ (r ranging from 0.91 to 0.94; Topper et al., 2014). The Brief PSWQ had excellent reliability (ω = 0.94) in Sample 2.

The Attentional Control Short Straightforward Scale (ACS-SS; Judah et al., 2020; Sample 2)

Trait attentional control was assessed using the ACS-SS. This scale comprises 12 items capturing two lower-order dimensions of AC, focusing and shifting, as well as a general AC factor. The 12-item ACS-SS has demonstrated excellent psychometric properties (Judah et al., 2020) and was found to correlate moderately (rs from -0.26 to -0.34) with measures of anxiety and depression. The ACS-SS had excellent reliability (ω = 0.88) in Sample 2.

Construct Validity

Several items from the COVID-19 demographics, administered to Sample 3, were used to provide construct validity. Participants rated their fear in relation to their health and economic impacts of COVID-19 as well as their feelings of loneliness in response to the lack of social distancing using visual analog scale (VAS) ratings from 0 to 100.

Data Analytic Plan

A battery of measures (CIB) as well as a short version (CIB-S) of this battery were developed in the present study using recommended measure development procedures (e.g., Boateng et al., 2018; Devellis, 2016). Following item selection, items were administered to three separate samples. Sample 1 was used to conduct exploratory factor analysis (EFA) to determine dimensionality of the measures, to remove poorly fitting items, and to create reduced measures that balanced capturing breadth of a construct with participant time demands. Further, in line with best-practices in structural equation modeling (SEM), we selected at least four items for each lower-order factor that emerged when possible. Acceptable items were defined as those that loaded 0.40 or greater on a single factor and less than 0.32 on other factors (Tabachnick et al., 2007). Poorly fitting items were removed in a stepwise matter, with items with no unique loadings removed first, and cross loading items removed second. Acceptable factors were defined as factors with three or more items (Velicer & Fava, 1998). To provide additional information, the “elbow” of the scree plot (Cattell, 1966), and parallel analyses (Horn, 1965) were examined. The “elbow” of the scree plot identifies the change in factors wherein eigenvalues level off. Parallel analysis, based on 1000 iterations, provides an approximation of the number of factors obtained compared to the likelihood of finding this number of factors at random, with factors above the 95th percentile considered to have occurred above a chance level (Horn, 1965). To develop the CIB-S, a bifactor EFA was fit to all the items that were retained for the CIB factors. Item selection for the CIB-S followed recommendations by Ebesutani et al. (2012), and included: (a) loading > 0.30 on the general factor, (b) loading > 0.30 on a specific factor, and (c) loading uniquely (i.e., only on the general and specific factors).

The next steps in measure development included confirming the factor structure in independent samples. CFAs of the EFA-derived CIB factors were tested in Samples 2 and 3. Following this, convergent, discriminant, and construct validity were tested using SEM. All models were estimated using robust maximum likelihood (MLR) estimation. A non-significant chi square (χ2) value indicates that the model fits the data well. In addition, CFI values above 0.95 indicate good fit. Finally, RMSEA values below 0.05 indicate good fit, with an upper bound RMSEA of 0.10 meaning that poor fit cannot be ruled out and a lower bound RMSEA of 0.05 meaning that good fit cannot be ruled out (Hu & Bentler, 1998). In all SEM models, a Bonferroni correction was used to account for multiple significance tests.

We conducted tests of internal validity through examining longitudinal measurement invariance and invariance across gender of the CIB factors. Mean differences were also examined to determine whether CIB factors were changing in people over time. For invariance testing involving the CIB behavior and worry factors, we used Sample 2 data. Because we did not have CIB disability items in Sample 2, Wave 1, we used Sample 1 data to test longitudinal measurement invariance. Invariance was assessed in a step wise manner, starting with a model with no restrictions (configural model), progressing to a model where the factor loadings were set to equality (metric model), and ending with a model where both factor loadings and factor intercepts were set to equality (scalar model). At each step, the model was compared to the preceding model using the Yuan–Bentler χ2 difference test (Satorra & Bentler, 2001). A significant χ2 difference reflects that the more parsimonious (invariant) model fits the data significantly worse than allowing the relevant model parameters to vary. In the longitudinal invariance testing models, correlated residuals were allowed for identical items over time (Sorbom, 1989).

Participants were prompted to respond to missed questions across surveys, resulting with little missing data due to participant nonresponse. Planned missingness was used in Sample 2 based on recommendations by Rhemtulla and Little (2012) to increase the number of constructs assessed without sacrificing participant response quality. Participants were randomly given 80% of the PANAS NA and ACS-SS items. Missing data were estimated using MLR and all analyses were conducted in Mplus 8.4 (Muthen & Muthen, 19982017).

Results

Initial Item Selection Using Exploratory Factor Analysis

EFA of CIB Behaviors Scales (Sample 1, Wave 1)

For the EFA of Time 1 COVID-19 Behaviors factors, up to three factors were supported based on parallel analysis. The scree plot supported up to four factors. However, one of the factors in the four-factor solution did not contain any uniquely loading items; therefore, item loadings on the three-factor solution were examined (see Appendix Table 7). Several items were removed due to low unique loading; items 11 (not allowing children to attend school), 12 (reading or watching the news), 20 (avoiding going to work), 7 (wearing a mask in public), and 22 (avoided food takeout/delivery) were removed, resulting in 17 items that loaded across three factors.

Factor loadings were examined in conjunction with expert analysis of the item content by the research team to select the four optimal items for each factor (see Table 5 for factor loadings). Items 1–4 were retained for the Stockpiling factor as they were the only items to load uniquely on this factor. Items 15, 16, 18, and 19 were retained for the Avoiding factor. We selected four of the five highest loading items. We opted to select item 18 (avoided taxis or ride-sharing) over item 17 (avoided public transportation), as these items appeared to be somewhat redundant. Finally, items 6, 8, 9, and 10 were retained for the Cleaning factor as they were the only items to load uniquely on this factor.

EFA of CIB Worry Scales (Sample 1, Wave 1)

Parallel analysis supported up to three factors whereas the scree plot favored up to four factors. Regardless, removing poorly loading items from the four-factor model ultimately resulted in a three-factor solution fitting the data best. Therefore, items were examined from the three-factor solution. Several items were removed due to low unique loadings and/or cross-loading: Items 7, 17, 4, and 14 were removed, resulting in 21 items. Factor loadings were examined in conjunction with expert analysis of the item content by the research team to select the four optimal items for each factor (see Table 6 for factor loadings). Items 1, 2, 3, and 19 were retained for the Financial Worries factor, items 6, 8, 9, and 22 were retained for the Health Worries factor, and items 20, 21, and 25 were retained for the Catastrophic Concerns factor. There was not a fourth item that loaded uniquely on this factor so only three items were selected.

EFA of CIB Disability Scale (Sample 1, Wave 1)

A one-factor model of disability in response to COVID-19 fit the data best based on parallel analysis, the examination of the scree plot. All 7 items loaded uniquely on this factor and were retained.

EFA of COVID-19 Impact Battery-Short (Sample 1, Wave 1)

A bifactor EFA was fit to the final items from the CIB Behavior, Worry, and Disability EFA solutions. However, the CIB Behavior items did not load on the common factor; in contrast, most of the items from the CIB Worry and Disability scales loaded onto the common factor. Further, a separate bifactor EFA of just the CIB Behaviors items did not provide support for a common factor across the Behaviors factors. In addition, examination of item content suggested that certain behaviors might be adaptive for physical and mental health, depending on the stage of the virus in a person’s community. Therefore, a bifactor EFA was conducted including only the Worry and Disability items. Following similar procedures to Ebesutani et al. (2012) to capture construct breadth, we selected one item from each of the three Worry subdimensions that were among the highest loading items and captured aspects of the pandemic that were likely to remain relevant: Item 1 (I worry I will be unable to provide for my family), 20 (I worry that if I go into quarantine, I will go crazy), and 22 (I worry that I am going to contract COVID-19) were selected from the Worry factors. Two items from the Disability scale (items 1 and 4) were two of the highest loading items. Item 1 assessed difficulty taking care of household responsibilities. Item 4 assessed the ability to concentrate for more than 10 min (see Appendix Table 10).

Confirmatory Factor Analysis to Verify Factor Structures

Separate CFAs were fit to the CIB Behavior, Worry, and Disability solutions derived from the EFAs. To provide a robust test of model fit, CFAs were conducted in Sample 1, Wave 2, Sample 2, and Sample 3. Model fit statistics for these models are provided in Table 3.

Table 3 Model fit indices for all confirmatory factor analyses and structural equation models examined

CIB Behavior Factors (Sample 1, Wave 2; Sample 2; Sample 3)

Table 4 contains factor loadings for the CIB Behavior factors and factor intercorrelations. Across all three samples, the CFA provided adequate to good fit to the data. Across samples, the Stockpiling and Cleaning factors were significantly, positively correlated (rs = 0.38–0.64) as were the Cleaning and Avoiding factors (rs = 0.29–0.50). However, the Stockpiling and Avoiding factors were not significantly correlated (r from − 0.06 to 0.08). Reliability (ω) ranged from 0.89 to 0.93 for the Stockpiling factor, 0.75 to 0.80 for the Cleaning factor, and 0.79 to 0.84 for the Avoiding factor.

Table 4 Factor loadings and factor correlations for the CIB behavior factors across samples

CIB Worry Factors (Sample 1, Wave 2; Sample 2; Sample 3)

Table 5 contains factor loadings for the CIB Worry factor indicators and factor intercorrelations; Across all three samples, the CIB Worry factors provided adequate model fit after allowing for a residual correlation between items 6 (I worry that I will get sick and be unable to take care of my family) and 22 (I worry that I am going to contract COVID-19). Moderate to large (rs = 0.59–0.75) correlations between the Financial Worry and the Health Worry factors were found. Smaller, but still moderate to large correlations were found between the Financial Worry and the Catastrophic Worry factors (rs = 0.26–0.53). Finally, whereas a significant moderate correlation between the Health Worry and Catastrophic Worry factor was found in Samples 1 and 2 (rs = 0.56–0.57), this correlation was significant but much smaller (r = 0.19) in Sample 3. Reliability (ω) ranged from 0.81 to 0.91 for the Health Worry factor, 0.83 to 0.88 for the Financial Worry factor, and 0.82 to 0.87 for the Catastrophic Worry factor.

Table 5 Factor loadings and factor correlations for the CIB worry factors across samples

CIB Disability Factor (Sample 1, Wave 2; Sample 2, Wave 2; Sample 3)

Table 6 contains factor loadings for the CIB Disability factor. Across Sample 1, Wave 2 and Sample 3, the CIB Disability factor demonstrated excellent fit to the data. Further, all items loaded significantly on the CIB Disability factor. Although item 2 loaded at 0.38 in Sample 1, this item was retained because it loaded at 0.52 in Sample 2, and.48 in Sample 3. Reliability (ω) was 0.82 in samples 1 and 3, and 0.87 in sample 2.

Table 6 Factor loadings for the CIB disability factor across samples

CIB-S Factor (Sample 1, Wave 2; Sample 2, Wave 2; Sample 3)

Table 3 contains model fit statistics for the CIB-S. The CIB-S fit the data marginally to adequately well in Sample 1, Wave 2 (see Table 3). Excellent fit was achieved after allowing a residual covariance between the two Disability scale items. The CIB-S fit the data poorly to marginally well in Sample 2, Wave 2 (see Table 3). Excellent fit was achieved after allowing a residual covariance between two of the Worry scale items (“I worry I will not be able to provide for my family during this time” and “I worry I will contract COVID”). The CIB-S provided excellent fit to the data in Sample 3 data without the need of this residual covariance. In Sample 1, Wave 2, standardized loadings (λs) ranged from 0.35 to 0.87 (average λ = 0.62). In Sample 2, λs ranged from 0.44 to 0.78 (average λ = 0.63). In Sample 3, λs ranged from 0.44 to 0.71 (average λ = 0.54). Reliability (ω) ranged from 0.66 to 0.76.

Test–Retest Reliability for CIB Factors and CIB-S (Sample 1, Waves 1 and 2)

The CIB Behavior factors demonstrated good (r > 0.8) to excellent (r > 0.9) one-month test–retest reliability: Stockpiling r = 0.88; Cleaning r = 0.94; Avoiding r = 0.82. The CIB Worry factors also demonstrated adequate (r > 0.7) to good test–retest reliability: Financial Worries r = 0.83; Health Worries r = 0.70; Catastrophic Concerns r = 0.77. The CIB Disability factor demonstrated adequate test–retest reliability (r = 0.77). Finally, the CIB-S demonstrated excellent test–retest reliability (r = 0.95).

Discriminant and Convergent Validity of the CIB and CIB-S Factors

Model fit for the SEMs examining the relations between the CIB/CIB-S factors and VAS ratings of COVID-related fear, social isolation-related loneliness, and economic fallout-related fear provided adequate model fit (see Table 3). Regarding convergent and discriminant validity, we hypothesized the CIB scales would significantly relate to measures of negative affect and worry, and less so with attentional control. Model corrections were applied by sample, setting the p values to 0.002 in Samples 2 and 3 (0.05/24).

CIB Behavior Factors (Sample 2, Wave 1)

In this model, the Stockpiling factor significantly positively correlated with NA scores (r = 0.44, p < 0.001) and PSWQ scores (r = 0.22, p < 0.001) but not AC-SS scores (r = 0.09, p < 0.05). The Cleaning factor significantly positively correlated with ACS-SS scores (r = 0.18, p < 0.001), but not NA score (r = 0.12, p < 0.01) or PSWQ scores (r = 0.09, p > 0.05). Finally, the Avoiding factor did not significantly correlate with NA scores (r = -0.09, p < 0.05), PSWQ scores (r = 0.03, p > 0.05) or ACS-SS scores (r = 0.08, p > 0.05). The strength of the correlations the CIB Stockpiling and Avoiding factors shared with NA scores were significantly greater than the strength of the correlations these factors shared with ACS-SS scores (ps < 0.05). The factors explained 24.0%, 6.8%, and 2.1% of the variance in NA scores, PSWQ scores, and ACS scores, respectively.

CIB Worry Factors (Sample 2, Wave 1)

The Financial Worry factor was positively correlated with NA (r = 0.66, p < 0.001) and PSWQ scores (r = 0.44, p < 0.001) but not correlated with ACS-SS scores (r = − 0.11, p < 0.05). The Health Worry factor was also positively correlated with NA (r = 0.68, p < 0.001) and PSWQ scores (r = 0.56, p < 0.001) but not correlated with ACS-SS scores (r = − 0.10, p < 0.05). The Catastrophizing Worry factor was positively correlated with NA (r = 0.63, p < 0.001) and PSWQ scores (r = 0.38, p < 0.001) but not correlated with ACS-SS scores (r = − 0.10, p < 0.05). The strength of the correlations between the CIB Worry factors and NA scores were significantly greater than the strength of the correlations between the CIB Worry factors and ACS-SS scores (ps < 0.05). The CIB Worry factors explained 60.5%, 30.1%, and 1.5% of the variance in NA scores, PSWQ scores, and ACS scores, respectively.

CIB Disability Factor (Sample 2, Wave 2)

The CIB Disability factor was significantly correlated with NA scores (r = 0.77, p < 0.001), PSWQ scores (r = 0.47, p < 0.001), and ACS scores (r = − 0.25, p < 0.001); further, the relation the CIB Disability factor shared with NA was stronger than the relation this factor shared with the PSWQ (p < 0.001), and ACS (p < 0.001). Moreover, the relation the CIB Disability factor shared with the PSWQ was stronger than the relation this factor shared with the ACS (p < 0.001). The CIB Disability factor explained 59.3% of the variance in NA scores, 21.9% of the variance in PSWQ scores, and 6.4% of the variance in ACS scores.

CIB-S Factor (Sample 2, Wave 2)

The CIB-S factor was significantly correlated with NA scores (r = 0.79, p < 0.001), PSWQ scores (r = 0.44, p < 0.001), and ACS scores (r = -0.26, p < 0.001); further, the relation the CIB-S factor shared with NA was stronger than the relation this factor shared with the PSWQ (p < 0.001), and ACS (p < 0.001). Moreover, the relation the CIB-S factor shared with the PSWQ was stronger than the relation this factor shared with the ACS (p < 0.001). The CIB-S factor explained 61.6% of the variance in NA scores, 19.3% of the variance in PSWQ scores, and 6.5% of the variance in ACS scores.

Construct Validity of the CIB and CIB-S Factors (Sample 3)

Model fit for the SEMs examining the relations between the CIB/CIB-S factors and VAS ratings of COVID-related fear, social isolation-related loneliness, and economic fallout-related fear provided adequate or better model fit (see Table 3).

CIB Behavior Factors

The Stockpiling factor was significantly related to COVID-19 fear (r = 0.36, p < 0.001) and economic fear (r = 0.21, p = 0.001), but not loneliness due to social isolation (r = 0.04, p > 0.05). The Avoiding factor was related to COVID-19 fear (r = 0.30, p < 0.001) and economic fear (r = 0.24, p < 0.001) but not loneliness due to social isolation (r = 0.16, p < 0.05). Finally, the Cleaning factor was related to COVID-19 fear (r = 0.49, p < 0.001), loneliness due to social isolation (r = 0.24, p < 0.001), and economic fear (r = 0.33, p < 0.001).

CIB Worry Factors

The Financial Worry factor was significantly related to COVID-19 fear (r = 0.40, p < 0.001), loneliness due to social isolation (r = 0.29, p < 0.001), and economic fear (r = 0.66, p < 0.001). The Health Worry factor was related to COVID-19 fear (r = 0.71, p < 0.001) and economic fear (r = 0.54, p < 0.001) but not loneliness due to social isolation (r = 0.30, p < 0.05). Finally, the Catastrophizing Worry factor was related to loneliness due to social isolation (r = 0.51, p < 0.001) but not to COVID-19 fear (r = 0.15, p < 0.05) or economic fear (r = 0.11, p > 0.05).

CIB Disability Factor

The CIB Disability factor was significantly related to COVID-19 fear (r = 0.42, p < 0.001), loneliness due to social isolation (r = 0.58, p < 0.001), and economic fear (r = 0.38, p < 0.001).

CIB-S Factor

The CIB-S was significantly related to COVID-19 fear (r = 0.66, p < 0.001), loneliness due to social isolation (r = 0.56, p < 0.001), and economic fear (r = 0.67, p < 0.001). The CIB-S model did not fit the data well. However, when models were specified between the CIB-S factor and each of the outcome variables separately, model fit was adequate and the size of the relations between the CIB-S and outcome variables did not differ substantively.

Internal Validity of CIB Factors

Longitudinal Measurement Invariance of CIB Behavior Factors (Sample 2, Waves 1 and 2)

Measurement invariance was used to examine the internal validity of the CIB Behavior factors. To improve the configural model, a factor intercorrelation was included between SBS items 1 and 4 at Wave 2. Full metric invariance was not achieved compared to the configural invariance model (∆χ2 = 26.28, ∆ df = 12, p = 0.01). Partial metric invariance was achieved by allowing SBS item 10 (“disinfecting packages/mail”) to freely load across waves (∆χ2 = 17.82, ∆ df = 11, p = 0.09). This model provided adequate to good fit to the data (χ2 = 418.59, df = 235, p < 0.001, CFI = 0.95, RMSEA = 0.05, 90% CI [0.04, 0.06]). Scalar variance was not achieved, even after allowing the only four recommended intercepts to be free. We examined differences in item intercepts (means) using Wald χ2 tests and found significantly higher scores at Wave 1 for items 1, 9, 10, 15, 16, and 19 and significantly higher scores for item 3 at Wave 2. The highest cross-wave correlations were between each factor and that same factor at the next wave: Stockpiling r = 0.81, Avoiding r = 0.61, and Cleaning r = 0.84 (ps < 0.001).

Longitudinal Measurement Invariance of CIB Worry Factors (Sample 2, Waves 1 and 2)

Measurement invariance was used to examine the internal validity of the CIB Worry factors. Full metric invariance was achieved compared to the configural invariance model (∆χ2 = 10.93 ∆ df = 11, p = 0.45). Full scalar invariance was achieved compared to the metric invariance model (∆χ2 = 7.37, ∆ df = 8, p = 0.50). This model provided adequate to good fit to the data (χ2 = 432.27, df = 202, p < 0.001, CFI = 0.95, RMSEA = 0.06, 90% CI [0.05, 0.07]). Examination of factor mean differences was achieved through fixing latent factor means for Wave 1 to 0 and comparing the Wave 2 mean to this Wave 1 mean. The Financial Worries factor was significantly lower in Wave 2, compared to Wave 1 (ΔM = − 0.11, p < 0.01). In contrast, the Health Worries (ΔM = − 0.05) and Catastrophizing Worries (ΔM = 0.01) factors were not significantly different between Waves 1 and 2 (ps > 0.21). The highest cross-wave correlation was between each factor and that same factor at the next wave: Financial Worries r = 0.87, Health Worries r = 0.81, and Catastrophizing Worries r = 0.77 (ps < 0.001).

Longitudinal Measurement Invariance of CIB Disability Factor (Sample 1, Waves 1 and 2)

Measurement invariance was used to examine the internal validity of the CIB Disability factor. Full metric invariance was achieved compared to the configural invariance model (∆χ2 = 9.16 ∆ df = 7, p = 0.24). Full scalar invariance was achieved compared to the metric invariance model (∆χ2 = 3.38 ∆ df = 6, p = 0.76). The full scalar invariance model provided good fit to the data (χ2 = 101.56, df = 82, p = 0.07, CFI = 0.97, RMSEA = 0.05, 90% CI [0.00, 0.07]). The CIB Disability factor score did not differ from Wave 1 to Wave 2 (ΔM = -0.12, p = 0.14).

Longitudinal Measurement Invariance of CIB Short Form (Sample 1, Waves 1 and 2)

Measurement invariance was used to examine the internal validity of the CIB-S factors. The CIB-S was treated as a two-factor model for measurement invariance due to poor fit with the one-factor model. Full metric invariance was achieved compared to the configural invariance model (∆χ2 = 3.99 ∆ df = 5, p = 0.55). Full scalar invariance was achieved compared to the metric invariance model (∆χ2 = 3.18 ∆ df = 3, p = 0.37). This model provided marginal to good fit to the data (χ2 = 49.30, df = 32, p = 0.03, CFI = 0.96, RMSEA = 0.07, 90% CI [0.03, 0.11]). The CIB-S total score did not differ from Wave 1 to Wave 2 (ΔM = -0.20, p = 0.11).

Discussion

The impetus for this paper was to address an unmet need for development and initial validation of a battery of self-report measures to assess the broad impact of COVID-19 on mental health functioning. Using multiple samples, we were successful in validating the CIB. The long version of the battery evaluates three broad domains including COVID 19-related behaviors, worries, and disability. The resulting CIB scales offer a more comprehensive assessment of impact relative to existing instruments that focus on only one aspect of stress or mental health such as anxiety or worry (Hsing et al., 2020; Nelson et al., 2020a, b; Qiu et al., 2020; Simione & Gnagnarella, 2020; Taylor et al., 2020).

For parsimony and simplicity, we had hoped to be able to derive a single scale assessing COVID-19 impact. Acknowledging the complicated relations between COVID-19 behaviors and mental health, we opted not to include items from the COVID-19 behaviors scales when creating a brief unidimensional measure of COVID-19 impact. The level of associations between the scales we have created support the CIB as three nonredundant multidimensional scales of COVID-19 impact.

The other main aim of the study, to create a short version to assess COVID-19 impact (i.e., the CIB-S), was driven by recognition that brief measures are more feasible in clinical and medical settings where there are multiple, competing demands. Using methods utilized in other reports aimed at creating briefer measures (Ebesutani et al., 2012), we were able to derive a very abbreviated measure. Despite elimination of behaviors, the CIB-S, consisting of items from the CIB worry and disability measures, performed extremely well in terms of reliability and validity testing and therefore appears to be an excellent, brief measure of COVID-19 impact.

It is difficult to determine the course of COVID-19. As we have seen, the effects of the pandemic vary dramatically by country, state and locality and these impacts also fluctuate significantly over time (Johns Hopkins University & Medicine, 2020). One strength of the current study is that we collected data across multiple time periods and two of the samples were national samples, though they were not normative samples. The stability of the findings across time points and across areas that have been impacted to varying degrees suggests the measures developed are likely to be applicable in the future despite the likelihood that the social, economic and psychological impacts of COVID-19 will change.

The CIB may have a number of clinical applications. In particular, the short form of the scale may be a useful screener of distress whereas the entire set of scales may assist service providers in giving them a better understanding of the broad impact of COVID-19 and therefore may help to guide treatment. However, we primarily envision the CIB as a tool that is useful for researchers. In particular, reliably documenting COVID-19 specific worries and behaviors will be important in grounding many lines of research evaluating the impact of the pandemic on mental health. The pandemic has already resulted in documented psychological distress (Cullen et al., 2020; Holmes et al., 2020; Pfefferbaum & North, 2020; Reger et al., 2020), but the CIB could be very helpful in ascertaining pandemic-specific mechanisms that may account for reported increases in psychological distress and disability.

As with any study, there are a number of limitations to be considered with the current report. First, we necessarily relied on online data sources for the acquisition of the study participants. Thus, these samples are national but not necessarily normative. Although use of online crowdsourcing mechanisms is increasingly common and the procedures are generally well-accepted (Thomas & Clifford, 2017; Sheehan, 2018), there are some concerns about these procedures including the contamination of data from automated or “bot” responses and the representativeness of such samples (Pei et al., 2020). To mitigate some of these concerns, we utilized reliability checks, which are commonly recommended for these data sources (Peer et al., 2014; Pei et al., 2020). Moreover, we did not rely on a single sample but instead replicated the findings across multiple samples, collected at different time points by different labs to help ensure the reliability of the findings. Reliability was further supported by the inclusion of one community sample collected outside of the MTurk system. While online data collection will likely become increasingly utilized in light of COVID, concerns about representativeness of these samples is still warranted. An additional limitation was that we were unable to compare the CIB to other COVID-related measures (e.g., CSS; Taylor et al., 2020), because we initiated our various studies prior to the publication of these scales. It will be important to compare and contrast the merits of these various scales as research in this area continues.

Research on the impact of COVID-19 is still nascent, so it is imperative to utilize comprehensive and validated measures in order to ensure systematic, consistent, and generalizable empirical work. We believe that this COVID-19 impact measure, along with the short form of the measure, will significantly add to our ability to understand the impact of the pandemic on mental health and well-being.