Introduction

There is an increasing demand in health economic evaluations to extend the measurement of benefit beyond health and health-related quality of life (HRQoL) and to capture other important aspects of wellbeing. The ICECAP instruments [the ICECAP-A (ICEpop CAPability measure for adults) and ICECAP-O (ICEpop CAPability measure for older people)] have been developed to serve such purpose and have gained popularity in the recent years. These are preference-based wellbeing measures, building on Amartya Sen’s capability approach, which defines wellbeing in terms of an individual's ability and capability to do certain things that are important in life [1]. The use of such measures has increased in economic evaluations in the recent years [2, 3], and regulatory bodies such as The National Institute for Health and Care Excellence (NICE) encourage their use for measuring the impact of social care interventions [4, 5].

The two instruments differ in the age of their recommended target population. First, the ICECAP-O instrument has been developed to be applied among the older population of 65 years and over [6, 7]. Then, a more general version, the ICECAP-A instrument has been created for the use among the general adult population of 18 years and older [8]. Both measures cover five domains of wellbeing (ICECAP-A: attachment, stability, achievement, enjoyment, autonomy; ICECAP-O: attachment, security, role, enjoyment, control) that were found to be important to their target population in the UK by qualitative research [7, 8]. Although there seem to be overlapping domains between the two measures (attachment, enjoyment, autonomy/control), only autonomy/control is surveyed identically (e.g. ‘I am able to be completely independent’). Attachment and enjoyment are addressed differently, expressing somewhat different angles in the two measures (e.g. attachment: ICECAP-A ‘I can have a lot of love, friendship and support’ vs. ICECAP-O ‘I can have all of the love and friendship that I want’). The largest differences appear in the stability/security and achievement/role domains, where there is more emphasis on concerns (‘thinking about the future’) and feeling valuable (‘doing things that make you feel valued’) in the ICECAP-O, while stability (‘feeling settled and secure’) and evolution (‘achievement and progress’) are in focus in the ICECAP-A.

Several studies assessed the validity of the ICECAP-A and ICECAP-O measures [5, 6, 9,10,11,12,13,14,15,16,17,18,19,20]. However, no quantitative studies compared the measurement properties of the ICECAP-A and ICECAP-O measures among elderly (aged 65 +) respondents [21], who are the target population of both instruments. We assume that the ICECAP-A might be more relevant than the ICECAP-O for individuals over 65, whose life circumstances are more similar to that of younger adults. Moreover, we hypothesize that the ICECAP-O might be more adequate than the ICECAP-A under age 65 for some subgroups of people, especially for those whose health and socioeconomic status resembles more to that of the elderly (e.g. having health problems, being pensioner or disability pensioner).

Thus, the aim of the study was to compare measurement properties of the ICECAP-A and ICECAP-O instruments on a sample of adult general population between the ages of 50 and 70. First, we aim to compare discriminatory power, convergent and content validity of the two instruments. Second, we assess agreement and explore the determinants of the differences between the two measures. We are specifically interested if there are any other circumstances apart from age that can motivate/justify the choice between the two instruments in research studies and in economic evaluations.

Methods

The survey and the questionnaire

The data for this study come from a major cross-sectional survey (performed by our research group) with the primary objective to measure the health status and wellbeing of the Hungarian population [9, 22]. The survey was conducted by computer-assisted personal interviews on a representative sample for the Hungarian adult population (N = 2023). In the data collection, quotas were employed to obtain a representative sample in terms of age, gender, and residence. The recruitment of the respondents and the interviews were carried out by a survey company (New Land Media Kft.). Interviewers received specific training on the content and purpose of the survey. The study was approved by the Hungarian Medical Research Council (Nr. 10058-3/2019/EKU). Participation in the study was completely voluntary and respondents’ personal data remained anonymous for the analyses. Respondents provided their written informed consent at the start of the survey.

The survey questions covered socio-demographics (such as age, gender, education, marital status, employment status, household size, monthly net household income, place of residence); health; wellbeing, happiness and satisfaction. The validated Hungarian language versions of the ICECAP-A and ICECAP-O questionnaires (see below) were applied [9]. Respondents between the ages of 50–70 filled in the paper-based self-completed versions of both the ICECAP-A and ICECAP-O questionnaires. (Respondents under the age of 50 filled in only the ICECAP-A questionnaire, while respondents over 70 filled in only the ICECAP-O questionnaire.) Our choice for this age group was driven partly by the 65 years age limit of ICECAP-O and partly by the retirement age in Hungary (64–65 years in 2019). The upper age limit (70 years) was set to allow the involvement of individuals over the age of 65 who are in relatively good health while working in a paid job. The lower age limit was broader as activity limitations due to health problems is substantial already at the age of 50 (about 25% [23]) and the share of disability pensioners increases substantially from this age [24]. We used the best–worst scaling method-based UK tariffs to calculate ICECAP-A and ICECAP-O index scores as tariffs were not available from any other country, including Hungary, at the time of the study [25, 26].

Measurement tools

ICECAP-A and ICECAP-O The descriptive system of the ICECAP-A instrument covers the following: (1) Attachment (an ability to have love, friendship and support); (2) stability (an ability to feel settled and secure); (3) achievement (an ability to achieve and progress in life); (4) enjoyment (an ability to experience enjoyment and pleasure) and (5) autonomy (an ability to be independent), while the ICECAP-O items are the following: (1) attachment (love and friendship); (2) security (thinking about the future without concern); (3) role (doing things that make you feel valued); (4) enjoyment (enjoyment and pleasure); and (5) control (independence). A 4-level response scale is applied for each item and respondents are asked to indicate the one that best describes their overall quality of life now. Overall scores range from 0, which represents ‘no capability’ to 1, which represents ‘full capability’. Tariff sets for the instruments were developed using best–worst scaling methods [25, 26], and based on experience utility approach [27], and currently available only for the UK population.

Five Well-Being Index (WHO-5) [28, 29]: The WHO-5 is a self-reported measurement tool to assess mental wellbeing. It consists of five statements and respondents are asked to rate their status in relation to the past 2 weeks (5 = all of the time, 4 = most of the time, 3 = more than half of the time, 2 = less than half of the time, 1 = some of the time, 0 = at no time). The final score is the sum of the response scores multiplied by four (score range 0–100), where higher score represents better wellbeing.

Satisfaction with Life Scale (SWLS) [30, 31]: The SWLS is a five-item instrument with a 7-point response scale (7-strongly agree, 1-stronly disagree). SWLS score is the sum of the responses and categories have been established based on the score (31–35: Extremely satisfied, 26–30: Satisfied, 21–25: Slightly satisfied, 20: Neutral, 15–19: Slightly dissatisfied, 10–14: Dissatisfied, 5–9: Extremely dissatisfied).

Visual analogue scales (VAS) Data on current happiness and satisfaction were collected on an 11-point visual analogue scale (VAS, 0 represents no and 10 represents full happiness/satisfaction).

Minimum European Health Module (MEHM) The health status of the respondents was measured by the MEHM [32]. The MEHM consists of three general questions characterizing three different concepts of health: (1) self-perceived health (very good/good/fair/bad/very bad); (2) long-standing illness and (3) activity limitations due to health problems for more than 6 months measured via the global activity limitation indicator (GALI) (severely limited/limited but not severely or/not limited at all).

EQ-5D-5L HRQoL of the participants was assessed by the paper-based self-completed version of the EQ-5D-5L questionnaire [33]. The EQ-5D-5L comprises a descriptive system and a health thermometer (EQ VAS). Respondents are asked to indicate on a 5-level response scale (1 = no problems, 5 = unable/extreme problem) on each of the five domains of the descriptive system (mobility, self-care, usual activities, pain/discomfort, anxiety/depression) which describes the best their current health. This health profile can be converted in an EQ-5D-5L index score. We used Hungarian tariffs (value range − 0.848 to 1) to calculate EQ-5D-5L index score [34]. Similarly, current health status is self-assessed on the EQ VAS with anchors 0 (‘worst health status you can imagine’) and 100 (‘best health status you can imagine’).

Statistical analysis

In the first part of the analysis, we compare the measurement properties of the ICECAP-A and ICECAP-O instruments in terms of discriminatory power, construct validity and convergent validity following the terminology from the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) taxonomy [35]. In the second part of the analysis, we assess agreement between the two instruments (including the comparison of ICECAP-A and ICECAP-O index scores (hereinafter ICECAP-A and ICECAP-O scores) and explore the determinants of the differences between the ICECAP-A and ICECAP-O scores.

Discriminatory power

First, we present the distribution of answers to both the ICECAP-A and ICECAP-O questionnaire. Second, we compare discriminatory power of the two instruments by calculating Shannon Evenness Index (SEI) [36] for each item of the two measures. The SEI is defined as follows:

$${\text{SEI}} = - \frac{{\mathop \sum \nolimits_{i = 1}^{l} \left( {p_{i } x lnp_{i} } \right)}}{lnl},$$

where l is the number of levels in a domain of the descriptive system (in our case 4), and pi is the proportion of observations in the ith level (i = 1, …, l). SEI ranges between 0 and 1 (1 indicating that all responses are evenly distributed across levels). The higher the SEI, the more the information obtained and the better the discriminatory power.

Third, we explore and compare the share of respondents in full capability [i.e. who marked the top level (level 4)] and in sufficient capability [i.e. those who marked at least the second-best level (level 3) on all five items] [1, 37] according to the two measures.

Construct validity

We investigate whether both ICECAP-A and ICECAP-O scores can differentiate between the groups hypothesized to differ in their levels of capability. Subgroup comparisons are carried out by one-way ANOVA tests. The following covariates are used for the subgroup comparison: gender (men/women); age group (50–54, 55–59, 60–64, 65–70); education: (primary, secondary, tertiary); employment status (working full time/part-time/self-employed, pensioner, disability pensioner, unemployed, other); having a paid job (yes, no); settlement type (capital, other town, village); marital status (married, partnership, single, widow/widower, divorced, other; married/partnership: yes or no); income quintile based on per capita net monthly household income; self-perceived health (very good, good, fair, bad, very bad); GALI activity limitations for more than 6 months (severely limited, limited but not severely, not limited); long-standing illness (yes, no). Based on previous studies [3, 5, 6, 9, 11], we expected positive association with better health, being married/living with a partner, having higher income, being employed/having a paid job, living with others and living outside the capital.Associations between the ICECAP-A and ICECAP-O scores and covariates were also explored by ordinary least square (OLS) regression analysis using the same covariates used in the subgroup analyses (see above).

Convergent validity

To compare convergent validity of the instruments, Pearson’s correlation coefficients were calculated between ICECAP-A and ICECAP-O scores and related measures of HRQoL and wellbeing (EQ-5D-5L index, EQ VAS, happiness, satisfaction, WHO-5, SWLS). Correlations are considered strong if the coefficient is over 0.5, moderate between 0.3 and 0.5 and weak under 0.3 [38].

Furthermore, we also assess correlations between the ICECAP-A and ICECAP-O items. Pearson’s correlation coefficients are estimated between each domain of the two measures. We expect strong correlations between certain questionnaire items. Strong correlation is expected between ICECAP-O Control and ICECAP-A Autonomy domains, as these have the same wording. Furthermore, both questionnaires have Attachment and Enjoyment domains, (however, their item levels are formulated differently); thus we can expect that they are strongly correlated and correlations between them are stronger than with other items.

Agreement and comparison of ICECAP-A and ICECAP-O scores

Using paired t tests, we assess whether ICECAP-A and ICECAP-O scores significantly differ for individuals for the whole sample and in different subgroups of respondents.

To assess agreement among the ICECAP-A and ICECAP-O scores, we calculate intraclass correlation coefficients (ICC) and present the data on a Bland–Altman plot. ICC is calculated using a two-way mixed model of absolute agreement. The ICC can range from 0.00 (no stability/agreement) to 1.00 (perfect agreement). Based on [39], agreement is considered poor for ICC values below 0.5, moderate between 0.50 and 0.75, good between 0.75 and 0.90 and excellent above 0.90. The Bland–Altman plot shows the differences between the ICECAP-A and ICECAP-O scores (ICECAP-O minus ICECAP-A on the y-axis) against the averages of the two scores (x-axis). Horizontal lines show the mean difference, and the limits of agreement, which is the mean difference plus and minus 1.96 times the standard deviation of the differences.

Determinants of the difference between ICECAP-A and ICECAP-O scores

We explore one-way associations of the difference of the ICECAP-A and ICECAP-O scores (ICECAP-O minus ICECAP-A) with age, EQ-5D-5L, EQ VAS and with the ICECAP-A and ICECAP-O scores, and graphically illustrate these associations on two-way scatters plots as well.

With logistic regression analysis, we further examine the determinants of falling outside the limits of agreement (i.e. the difference between ICECAP-O and ICECAP-A score being larger/lower than the mean difference ± 1.96 times the standard deviation of the difference). These are the dots outside the horizontal lines on the Bland–Altman (BA) plot.

Results

The sample

In the study sample of 2023 respondents, 711 respondents were between the ages of 50 and 70; hence they were asked to fill in both the ICECAP-A and ICECAP-O questionnaires. Among them, all but two respondents answered all questions on the ICECAP-A instrument, and another two respondents did not answer all questions on the ICECAP-O, which indicate good feasibility of both instruments. Altogether 707 respondents provided full answers to both questionnaires, and this subsample was considered for the analyses. Among them 46.3% were women, and the average age was 60.1 years (SD = 6.0) with 23.5% of the respondents being between 50 and 54 years of age, 22.2% between 55 and 59, 25.0% between 60 and 64 and 29.6% between 65 and 70. 7.5% of respondents reported being in very good self-perceived health, while 44.0%, 38.9%, 8.8% and 0.8% reported good, fair, poor and very poor health status, respectively. 43.7% had long-standing illness, and 3.5% considered themselves severely limited in their activities and 22.9% limited but not severely according to the GALI question. The average EQ-5D-5L index score of the respondents was 0.91 (SD 0.16), and the average of EQ VAS was 76.8 (SD 17.1).

Comparison of measurement properties

Discriminatory power

Table 1 shows the distribution of answers on the ICECAP-A and ICECAP-O items. The modal response for the ICECAP-A items was the second-best level (level 3) for all the items except for Attachment, where the most common answer was the top level (level 4) with 49.6% of the answers. For the ICECAP-O, the modal answer was the second-best level (level 3) across each of the five domains. On the ICECAP-A instrument, little or no capability (level 1 or 2) was most frequently reported on the Achievement domain (22.6%), followed by Stability with 16.5% reporting little or no capability, Autonomy (13%) and Enjoyment (11.7%). The least problematic item was Attachment where only 7.3% of the respondents reported little or no capability. On the ICECAP-O, respondents were slightly less likely to indicate little or no capability, the prevalence was the highest in the Security domain (19.7%) and lowest in the Attachment domain (8.2%) (Table 1).

Table 1 ICECAP-A, ICECAP-O distribution of responses N = 707

Based on the distribution of answers, Shannon Evenness Indices (SEI) were calculated to compare the discriminatory power of the two instruments (Table 1). For ICECAP-A, SEIs range from 66.2% (Attachment) to 79.9% (Achievement). For ICECAP-O, SEIs range from 66.9% (Attachment) to 81.9% (Security). Considering the average of the SEIs for ICECAP-A (74.3%) and ICECAP-O (72.2%), ICECAP-A had slightly better discriminatory power than the ICECAP-O.

Finally, regarding ceiling effect, the share of respondents in full capability (best level marked on every item) was comparable for the ICECAP-A and ICECAP-O instruments (21.2% and 19.9%, respectively). Among them, 113 respondents (16.0%) had full capability on both measures. The share of respondents in sufficient capability (proportion of respondents who indicated at least level 3 on all items) was larger (but not significantly) for the ICECAP-O instrument (67.8% and 71.3%, respectively). As many as 439 respondents (62.1%) had sufficient capability by both measures. Differences in the share of respondents in full and sufficient capability by socio-demographic groups are presented in Electronic Supplementary material Table 1.

Construct validity

Table 2 presents summary statistics for the ICECAP-A and ICECAP-O scores by socio-demographic characteristics and health status. Both ICECAP-A and ICECAP-O scores significantly differed by age groups, education, employment, marital status, income quintiles and all the three health measures. There were no significant differences neither in ICECAP-A nor in ICECAP-O scores by gender and settlement type. Respondents living with someone had significantly higher ICECAP-A scores, while there was no significant difference in their ICECAP-O scores.

Table 2 Summary statistics for ICECAP-A and O scores by subgroups

According to the results of the multivariate regression analysis (Table 3, columns 1–2), respondents with higher EQ-5D-5L index (indicating better HRQoL), had significantly higher ICECAP-A and ICECAP-O scores; however, the marginal effect of the EQ-5D-5L index score was slightly lower in the case of the ICECAP-O instrument (0.467 vs 0.414, respectively). Also, pensioners, disability-pensioners and unemployed had significantly lower ICECAP-A scores, but ICECAP-O scores were only associated with disability pensioner status. Respondents living in the capital had significantly lower ICECAP-A and ICECAP-O scores compared to respondents living in other towns or villages. Also, people in the lowest income third had significantly lower ICECAP-A and ICECAP-O scores than people in the highest income third.

Table 3 Results of the OLS and logistic regression analysis

Convergent validity

Correlations with other HRQoL and wellbeing measures were comparable across the two instruments (Table 4). Both ICECAP-A and ICECAP-O had the strongest correlation with the EQ-5D-5L index score (0.573 and 0.604), followed by the WHO-5 index (0.566 and 0.557) and the satisfaction with life VAS score (0.517 and 0.537).

Table 4 Pearson’s correlations between ICECAP-A and ICECAP-O scores with EQ-5D-5L, EQ VAS, WHO-5 and SWLS scores

We also calculated correlations between ICECAP-A and ICECAP-O questionnaire items. As expected, ICECAP-A enjoyment had the strongest correlations with ICECAP-O enjoyment (r = 0.670) among the ICECAP-O items (and vice versa) (Table 5). Also, ICECAP-A attachment had the strongest correlations with ICECAP-O Attachment (r = 0.609). (However, correlation between ICECAP-O Attachment and ICECAP-O Enjoyment was stronger (r = 0.626) than this.) Also, correlation between ICECAP-A control and ICECAP-A autonomy domains, which have the same wording, was indeed the highest among all pair-wise correlations (r = 0.720). Furthermore, both ICECAP-O security and role items had the strongest correlations with ICECAP-A achievement (r = 0.571 and 0.564, respectively) and stability domains (r = 0.566 and 0.502). ICECAP-A stability had the strongest correlations with ICECAP-O security (r = 0.566) and enjoyment (r = 0.560), and ICECAP-A achievement had correlation coefficients higher than 0.5 with all the ICECAP-O domains.

Table 5 Pearson’s correlations between ICECAP-A and ICECAP-O items

Agreement and comparison of ICECAP-A and ICECAP-O scores

The mean ICECAP-A and ICECAP-O scores of respondents were 0.85 (SD 0.15) and 0.87 (SD 0.12), respectively. The ICC was estimated at 0.876 (95% CI 0.844–0.900), which indicate good agreement between the two measures. For 368 (52.1%) respondents the ICECAP-O score was higher than the ICECAP-A scores, for 114 (16.1%) they were equal (but 113 of these were 1) and for 225 (31.8%), the ICECAP-O score was lower than the ICECAP-A score. The minimum value of the ICECAP-O score (0) was lower than the minimum value of the ICECAP-A score (0.069), while the maximum value was 1 for both instruments.

Paired t test results showed that individuals’ ICECAP-O scores were significantly higher than ICECAP-A scores. This was the case in most subgroups (Table 2), except in age group 50–54, for respondents with other types of employment (e.g. homemaker), respondents living in civil partnership, respondents in the 5th (highest) income quintile, or respondents in very bad health status, where no significant differences were found between individuals’ ICECAP-A and ICECAP-O scores.

Determinants of the difference between ICECAP-A and ICECAP-O index scores

The distribution of the difference between ICECAP-A and ICECAP-O scores (ICECAP-O minus ICECAP-A) with the mean of 0.024 and standard deviation of 0.088) is presented in a histogram (Electronic Supplementary Material Figs. 1 and 2). The BA plot (Electronic Supplementary Material Fig. 3) shows the difference of ICECAP-A and ICECAP-O scores as a function of the average of two scores. Altogether the difference fell outside the limits of agreement in 48 cases (37 on the positive side, and 11 on the negative side). The graph indicates that the difference decreases with the increase of the mean score. Positive differences for lower scores indicate that for lower capability ICECAP-O scores are generally higher than the ICECAP-A scores.

Two-way scatters (Electronic Supplementary Material Fig. 4) show that the difference significantly decreases with the increase of both ICECAP-A and ICECAP-O scores; however, the difference has a stronger association with the ICECAP-A score. Furthermore, the difference slightly increases with age and decreases with the increase of the EQ-5D-5L index and EQ VAS scores.

Multivariate regression models 3 and 4 (Table 3) indicate that ICECAP-A and ICECAP-O scores can be calculated from each other with good confidence. In model 3 (column 3), the ICECAP-A score increased with the increase of ICECAP-O score (marginal effects were 0.966). On the other hand, it was significantly lower if the respondent was between 55 and 64 years old, a pensioner, unemployed, from the middle income third. In model 4 (column 4), the constant indicates that ICECAP-O scores are systematically higher than ICECAP-A scores by an average 0.265. ICECAP-O score significantly increased with the increase of the ICECAP-A and the EQ-5D-5L scores (marginal effects were 0.546 and 0.159, respectively). Furthermore, it was significantly higher than the ICECAP-A if the respondent was unemployed or from the middle income third, but lower if the respondent had primary education, lived alone or had other type of employment status (like homemaker). In the age group 60–64, ICECAP-A was significantly lower than the ICECAP-O score by 0.02. Rather high R2 of the two models (0.69 and 0.71) indicate that covariates explain most of the variance in the measures in both cases.

Logistic regression results on the difference being larger/lower than the mean difference ± 1.96 times the standard deviation are presented in Table 3, columns 5 and 6. Results indicate that if someone has higher ICECAP-A score (better capability), it is less likely that the ICECAP-O score is meaningfully higher than the ICECAP-A score, but at the same time it is more likely that the ICECAP-A score is meaningfully higher than the ICECAP-O score. Furthermore, if someone is in a better HRQoL (EQ-5D-5L) it is significantly less likely that ICECAP-A scores are higher than ICECAP-O scores. People between the ages of 60 and 70 and with respondents with tertiary education were less likely to have meaningfully higher ICECAP-A scores than ICECAP-O scores.

Discussion

This is the first study to compare measurement properties of the ICECAP-A and ICECAP-O instruments among the middle-aged/elderly population. Our findings confirmed that both instruments are valid measurement tools among the general population aged 50–70 years. In summary, the ICECAP-O scores were systematically higher than the ICECAP-A scores. The difference was driven mostly by respondents’ health status (EQ-5D-5L) and employment status rather than age. ICECAP-A and ICECAP-O scores could be calculated from each other with a good confidence (R2 = 0.69 and 0.71), especially when the health state and employment status of respondents are available.

We found several similarities between the two instruments. The measures showed similar construct and convergent validity (i.e. ICECAP-A and ICECAP-O scores differentiate between the same subgroup of respondents). Also, correlation with other HRQoL and wellbeing measures indicated similar convergent validity of the two instruments. We also found similar ceiling effects, as the share of respondents in full capability was comparable according to the two measures (21.2% and 19.9%). Overall, the agreement between ICECAP-A and ICECAP-O scores was good (ICC = 0.876) between the two measures.

Furthermore, strong correlations found between ICECAP-A and ICECAP-O items indicate similar concepts behind the questionnaire items. As expected, the ICECAP-A attachment, stability, enjoyment and autonomy items had the strongest correlation with the ICECAP-O attachment, security, enjoyment and control items, respectively. Nevertheless, we can also find some interesting controversies as well. For example, ICECAP-O security had slightly stronger correlation with ICECAP-A achievement than with ICECAP-A stability. ICECAP-O security item covers concerns about the future, while ICECAP-A stability reflects security in the present, and future worries are not covered by any of the ICECAP-A items. Therefore, it makes sense that concerns about the future are better projected by current achievements and progress than by security in the present. Furthermore, ICECAP-A achievement correlated stronger with ICECAP-O enjoyment, control and security than with ICECAP-O role. This finding suggests that achievement and progress might result in less concerns about the future, and more feelings of independence, enjoyment and pleasure in the present. Also, it implies that possibly achievement and progress are not the things that make people feel valued.

Our analysis pointed out some further differences between the two instruments. Overall, (at least on this subsample of adults between the age of 50 and 70) ICECAP-A seems to be a slightly more sensitive measure in terms of discriminatory power than the ICECAP-O. Also, the share of respondents in sufficient capability was slightly higher according to the ICECAP-O instrument.

Regarding ICECAP scores, average ICECAP-O scores were systematically higher than ICECAP-A scores in most of the subgroups. This can be partly explained by the difference in tariff values between the ICECAP-O and the ICECAP-A (tariffs of ICECAP-O being higher than of the ICECAP-A for most domains and levels) (Electronic Supplementary Material Table 2). Difference between the ICECAP-O and ICECAP-A scores increased with the decrease of capability, HRQoL and with age. Multivariate regression results also indicated that employment and health status had slightly larger marginal effect on the ICECAP-A score than on the ICECAP-O score. Pensioners, and unemployed had significantly lower ICECAP-A scores, but these seemed not to be associated with the ICECAP-O scores. These results are not unexpected, given that ICECAP-O is specifically designed to consider preferences and values of elderly respondents, and it is likely that employment status is relatively less important for elderly, while worse health state might be more acceptable in older ages [40,41,42].

Findings related to the age of respondents require special attention as the difference between the two measures is defined by the age of their target population. Our analysis pointed out that ICECAP scores were more sensitive to health and employment status than to age. Difference between ICECAP-O and ICECAP-A was significant above the age of 55, but not between 50 and 54. This means that people over 55 tended to have higher ICECAP-O scores than ICECAP-A scores (however, this is true to most subgroups), and it seems that the difference slightly increased with age.

Implications of our findings

Our results have some implications on the potential use of the ICECAP instruments in economic evaluations as well. As opposed to ICECAP-O, ICECAP-A has been developed for the use among the entire adult population and thus a preferred option for use in economic evaluation [8]. Our results imply that in economic evaluations, when two health states are compared, using ICECAP-A might result in a larger difference in scores than the ICECAP-O. This indicates that ICECAP-A might magnify the benefit of health care interventions in terms of improvement in capability wellbeing compared to the scenario when ICECAP-O is applied. On the other hand, using ICECAP-O may potentially underestimate gains in capability wellbeing, especially among the working (working age) population where health status can affect people’s ability to work as well. None of the two instruments contain health as a separate conceptual attribute. However, health improves the capability to achieve desired functions such as meaningful relationships (ICECAP-A and ICECAP-O attachment), enjoyment of life (enjoyment), independence (ICECAP-A autonomy and ICECAP-O control) [10]. Also, health strongly affects the ability to work among the working population, and working itself results in higher ICECAP-A scores, most probably because it gives people the feelings of progress and achievement (ICECAP-A achievement).

Our findings may also help to provide guidance on the selection of the instruments in further research studies. Our results show that employment status is a crucial factor in driving the difference between ICECAP-A and ICECAP-O scores. In most cases, employment status is determined largely by age, thus using age as a simple cut-off for switching between the measures seems appropriate. Nevertheless, in some cases, it is the employment status of the target population, rather than the age, which could better drive the choice between the instruments in age groups around the retirement age. For example, ICECAP-A could be a preferred option to be used among people—even in older ages—if they are still active in the labor market (or only temporarily absent from work), as it better reflects losses in wellbeing due to their decreased ability to work. On the other hand, in inactive population groups (e.g. disability pensionaries), the ICECAP-O could be considered even in ages under 65. Given the potential increases in the retirement age in the future, recommending the use of ICECAP-O above the retirement age (rather than above a specific age cut-off of 65 years), as a rule of thumb, seems to be a potential simple solution to consider to strengthen the validity of the instrument in the long term.

Limitations

Some limitations of our study should be mentioned. The age of the studied population is limited to age 50–70; however, investigation could be extended to higher ages as well, especially in countries where retirement age and the life expectancy is substantially higher than in Hungary. Due to the cross-sectional nature of the study, we could not assess and compare the responsiveness of the ICECAP instruments to changes in this age group, and we cannot examine any causality issues either. Furthermore, our study has been carried out among the general population, so it is possible that studies focusing on specific patient populations would lead to different findings.

Conclusion

Our results confirmed the validity of both the ICECAP-A and ICECAP-O measures in the 50–70 age group and the two measures show good agreement with each other. In general, ICECAP-O results in higher scores than ICECAP-A in the 50–70 age group, and the difference increases with the decline of capability, health status and age. The employment status has larger impact on ICECAP-A scores than on ICECAP-O scores. Also, the difference between the two measures is determined mostly by health status (EQ-5D-5L) and employment status rather than age. Our research results are suggested to be considered for the choice of measures and design of capability wellbeing studies. Further research should investigate whether the choice of instruments (for a population around the retirement age) should also be linked to the respondent’s employment status.