1 Introduction

Traditionally, psychological research focused on measuring and alleviating psychological symptoms, such as depression and anxiety. However, in the last two decennia, there is a growing interest in positive functioning and well-being (Seligman, 2019). Research suggests that well-being and psychopathology are related, yet distinct dimensions of mental health (Keyes, 2005). Well-being has been shown to be related to less medical consumption and improved mental and physical health, personal functioning, societal participation, productivity, and longevity (Chida & Steptoe, 2008; Hone et al., 2015; Howell et al., 2007; Lamers, Bolier, et al., 2012; Lamers, Glas, et al., 2012; Lyubomirsky et al., 2005). High levels of well-being seem to protect from mental disorders (Keyes et al., 2010; Schotanus-Dijkstra et al., 2016) and increase the likelihood of recovery from mental illness (Iasiello et al., 2019; Schotanus-Dijkstra et al., 2019). In contrast, people with low levels of well-being have been shown to be twice as likely to be depressed ten years later (Wood & Joseph, 2010).

Although thousands of studies on well-being have been conducted (Seligman, 2019), there is still a strong discrepancy regarding how well-being is conceptualized and measured in these studies. First, there are different types of well-being that stem from two different traditions: the hedonic and eudaimonic tradition. From the hedonic perspective, well-being is seen as pleasure and happiness and the absence of negative emotions. In the eudaimonic tradition, well-being is described as meaning in life, personal growth and self-actualization, as well as a commitment to goals and values that one shares with a social group (Delle Fave et al., 2011). Emotional and subjective well-being originated from a hedonic perspective, while psychological and social well-being emanate from the eudaimonic tradition (Ryan & Deci, 2001; Westerhof & Keyes, 2010). Second, within these traditions, well-being is measured multifariously. For example, while some studies measure subjective well-being as positive versus negative emotions (Watson et al., 1988), others assess it as satisfaction with life (Pavot & Diener, 2009). Third, since the beginning of the field of positive psychology about 20 years ago (Seligman & Csikszentmihalyi, 2014), many different well-being scales have been developed. This complicates well-being research even more due to the widespread manner in which specific well-being domains are measured. Previous meta-analyses mirror the vast variety of different definitions and operationalizations of well-being. For example, a large meta-analysis on the effects of positive psychology interventions on well-being included more than twenty different subjective and psychological well-being measures (Carr et al., 2020). Different concepts such as hope, purpose in life, and psychological capital were all combined under the umbrella of psychological well-being. A similar variety in included outcome measures is also found in other meta-analytic studies that focus on well-being (e.g., Chakhssi et al., 2018; Hendriks, Schotanus-Dijkstra, Hassankhan, de Jong, et al., 2020).

One of the most commonly used well-being instruments is the Mental Health Continuum (MHC) (Keyes, 2002), which is comprehensive, theory-driven, and includes both the hedonic and eudaimonic well-being traditions. This instrument measures emotional well-being from the hedonic tradition (feeling good), as well as social and psychological well-being from the eudaimonic perspective (functioning well). Emotional well-being is defined as being happy, and being interested and satisfied with life. Social well-being is conceptualized with the five dimensions of social contribution, integration, actualization, acceptance and coherence (Keyes, 1998). Psychological well-being is measured using the definition by Ryff (1989) with the six dimensions of self acceptance, environmental mastery, positive relations with others, personal growth, autonomy, and purpose in life.

The total score of the MHC combines the different forms of well-being, while the subscale scores allow the distinction between emotional, social, and psychological well-being. To our knowledge, no one has yet systematically summarized the impact of psychological interventions on well-being through the use of one single comprehensive well-being measurement instrument. Therefore, the current meta-analysis aims to synthesize studies that examined the effects of psychological interventions on well-being using the MHC. This is relevant for several reasons.

First, previous meta-analyses (e.g., Bolier et al., 2013a; Chakhssi et al., 2018; Sin & Lyubomirsky 2009) used a vast variety of different instruments to assess well-being. For some of these instruments it is debatable whether they assess an integral part of well-being or concepts only broadly related to it, such as hope. This provides an ambiguous picture of the effectiveness of psychological interventions on well-being. Therefore, one comprehensive instrument will be used, that is based on theory, widely accepted and used, and which also provides a clear and unambiguous picture of well-being. Second, there are no meta-analyses yet that give a full overview of well-being by including and differentiating between hedonic and eudaimonic well-being. Specifically, no one has yet meta-analytically looked at social well-being, which is also an important part of eudaimonic well-being. Third, previous meta-analyses mainly examined the effects of positive psychology interventions (PPIs) (Bolier et al., 2013a; Carr et al., 2020; Chakhssi et al., 2018; Hendriks, Schotanus-Dijkstra, Hassankhan, de Jong, et al., 2020; Hendriks et al., 2018; Koydemir et al., 2020; Sin & Lyubomirsky, 2009) to improve well-being. However, well-being is not only confined to positive psychology interventions, but also other types of psychological interventions, including traditional clinical interventions such as cognitive-behavioral therapy (CBT) or health coaching interventions (Weiss et al., 2016). Therefore, all types of psychological interventions will be included in this meta-analysis. Other meta-analyses merely focused on specific parts of well-being, such as psychological well-being (Weiss et al., 2016) or specific concepts or interventions related to well-being such as kindness, optimism, posttraumatic growth, strengths, resilience, gratitude, and forgiveness (Akhtar & Barlow, 2018; Baskin & Enright, 2004; Curry et al., 2018; Davis et al., 2016; Dickens, 2017; Malouff & Schutte, 2017; Roepke, 2015; Schutte & Malouff, 2019; Wade et al., 2014).

The goal of the current study is to synthesize the effects of psychological interventions in improving well-being as measured with the MHC. Meta-analyses of randomized controlled trials (RCTs) of psychological interventions from all fields which included the MHC as an outcome will be conducted. This allows us to determine the effectiveness of psychological interventions in changing well-being as measured with one comprehensive and validated instrument. Furthermore, this enables us to examine whether psychological interventions have different effects on the three well-being types (i.e., emotional, psychological, and social well-being) and whether different types of interventions are more or less efficient in improving well-being.

2 Method

The study was conducted in accordance with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines (Moher et al., 2015) and the Cochrane handbook for systematic reviews of interventions (Higgins et al., 2019). The study was registered on PROSPERO before the start of data collection (CRD42020162693) and all deviations were noted. All coded data and the R script for the meta-analyses is available on the OSF website (https://osf.io/372sr/).

2.1 Eligibility criteria

Eligibility criteria were determined before the search was conducted and based on PICOS criteria for the inclusion and exclusion of studies (i.e., population, intervention, comparator, outcome, study design). Both clinical and non-clinical populations regardless of age were included. Clinical populations could either be psychiatric or somatic disorders.

The MHC had to be used as a primary or secondary outcome, either in its short format (MHC-SF) with 14 items (Keyes et al., 2008) or the longer version (MHC-LF) containing 40 items. Only randomized controlled trials (RCTs) of psychological interventions were included. Psychological interventions were defined as interventions that either focus on psychological content, or apply an interpersonal process designed to bring about the modification of feelings, cognitions, attitudes and behaviors or a psychological target outcome, or an intervention that is designed to help people with a psychological issue (Hodges et al., 2011). Studies examining the effect of pharmacological treatments (e.g., medication studies) or physical activity (e.g., physical exercise or yoga) were excluded. Studies with inactive (no intervention, waiting list, or treatment as usual) or active control groups were included. Studies needed to include at least a posttest, but studies containing follow-up measurements were also included. To be included, articles had to be published in an English, Dutch or German peer-reviewed journal. Book chapters, study protocols, dissertations, and conference proceedings were excluded. Data needed to calculate effect sizes had to be available in the article or had to be provided upon request.

2.2 Search strategy

A systematic literature search was conducted in Scopus, PsycINFO, PubMed, and the Cochrane Controlled Register of Trials. The initial search was conducted on 15 November 2019 and was updated on 14 October 2020. Two information specialists consulted the authors on the development of the search strategy. All three authors conducted the search. Search terms were used referring to the MHC to search in all sections of the article. This was combined with terms on intervention and RCT to search for in article titles and abstracts. Thesaurus in PsycINFO and MeSH terms in PubMed were also added to the search. The search was limited to English, Dutch and German articles and studies from 2002 or later, since the first studies about the MHC were published in this year (Keyes, 2002). If a study protocol was identified that included the MHC as an outcome, it was manually checked whether results about this study were already published. Additionally, prior meta-analyses and systematic reviews were cross-checked (Bolier et al., 2013a; Brandel et al., 2017; Carr et al., 2020; Casellas‐Grau et al., 2014; Chakhssi et al., 2018; Curry et al., 2018; Geerling et al., 2020; Hendriks, Schotanus-Dijkstra, Hassankhan, de Jong, et al., 2020; Hendriks et al., 2018; Koydemir et al., 2020; Sin & Lyubomirsky, 2009; Weiss et al., 2016). The complete search strategy can be found in Table S1 in the Supplementary Material.

2.3 Selection of studies

Studies were screened for eligibility based on their title and abstract in the first phase and its full-text in the second phase. To standardize the selection process, the three authors discussed the first ten records from the search together and decided whether they should be in—or excluded based on previously defined eligibility criteria. Next, the authors independently screened the first 100 studies from the database search and an interrater-reliability was calculated. Moderate agreement between the three reviewers was found (Fleiss' Kappa = 0.69) (McHugh, 2015). The remaining titles and abstracts were then split up between the authors and independently screened. Afterward, the remaining full-texts were also split between the authors and independently screened. Any uncertainties during the screening process regarding the eligibility of studies were discussed with the other two authors until consensus was reached.

2.4 Outcome

The outcome of the meta-analysis was well-being as measured by the MHC, which is a comprehensive, evidence-based, well-validated and widely used measure of well-being (Keyes, 2002, 2005, 2007; Keyes et al., 2008). Two versions of the MHC have been developed, the long version containing 40 items (MHC-LF) and the short version containing 14 items (MHC-SF). The long as well as the short version contain the three subscales: emotional (MHC-LF: 7 items/MHC-SF: 3 items), social (15/5 items) and psychological (18/6 items) well-being. In the MHC-SF, the response options measure the frequency of each item, so respondents have to answer how often they experienced/felt a certain way during the past month. The answer options range from never, once or twice, about once a week, about 2 or 3 times a week, almost every day to every day. No modified versions with fewer items, adapted item formulations or different response formats have been included. Total and subscale scores of the MHC were included as outcomes in the current meta-analysis.

In both the MHC-LF and MHC-SF, the estimates of internal consistency reliability were found to be high (> 0.80) for all three subscales (Keyes, 2005). The MHC-SF subscales have been found to have moderate test–retest reliability, as well as good convergent and discriminant validity (Lamers, Westerhof, Bohlmeijer, ten Klooster & Keyes, 2011).

2.5 Data extraction

Data was extracted for the following categories: (1) Population and number of participants, (2) country (3) intervention, (4) duration in weeks and number of sessions, (5) control group, (6) format and guidance, (7) follow-up, (8) mean age and SD or range, and (9) percentage of female participants. Included full-texts were split up between the authors and data was extracted independently by the three authors using a pre-defined data extraction sheet. Afterward, the first author (JK) and last author (LAW) checked the extracted data for all studies again together, resolving disagreements by discussion. To calculate effect sizes, means and standard deviations were extracted for both the intervention and control group at posttest and, if applicable, at follow-up. Means and standard deviations were extracted for the total MHC scores as well as the subscales. One part of the extracted means and standard deviations (20% of each author) was cross-checked by the two other authors.

2.6 Quality assessment

The included studies were rated on methodological quality by using a quality assessment based on the Cochrane collaboration’s tool for assessing risk of bias (Higgins et al., 2011). Since not all items of the Cochrane risk of bias tool apply to psychological studies, a selection was made of criteria that were relevant to psychological RCTs. The following eight criteria were used to assess the quality of included studies. The included items covered: (1) randomization, (2) drop-out, (3) intention-to-treat analyses, (4) qualified professionals, (5) power analysis or sufficient sample size, (6) treatment integrity, (7) baseline comparability and/or adjustments were made to correct for baseline imbalance and (8) inclusion and exclusion criteria. Each criterion was assessed with 0 (absent) or 1 (present). The percentage of items scored with 1 across all criteria was calculated to determine the study quality. The quality of studies was classified as lower (less than 75% of all criteria scored with 1), fair (75% or more, but less than 100%), and good (100%). Certain criteria were not applicable for some studies. For example, for self-help interventions, the fourth criterion (qualified professionals) was not applicable. In these cases, these items were scored with NA and the percentage of criteria that were scored with 1 was calculated for all remaining items. Included studies were evenly divided between the three authors of the study, who independently scored the quality of studies. 20% of the quality assessment were cross-checked by the other authors. Uncertainties in the scoring were discussed together with all three authors until consensus was reached. An overview of the items and the criteria that were used for scoring can be found in Table S2 in the Supplemental Material.

2.7 Meta-analytic strategy

The calculation of effect sizes, three-level meta-analyses and publication bias analyses were conducted in R (R Core Team, 2020) using the following packages: esc, dmetar (Harrer et al., 2019) and metafor (Viechtbauer, 2010). As summary measure, the standardized difference in means was used. Between-group effect sizes were calculated for posttest, and if applicable, also for follow-up. If possible, effect sizes were calculated based on the intention-to-treat principle. If intention-to-treat results were not provided, findings from per-protocol analyses were used. To correct for potential small sample size bias, Hedges' g was used as effect size. The corresponding sampling variance of g in each group was calculated as the square root of the standard error of the effect size (Harrer et al., 2019). Some studies included multiple follow-up measurement points. In these cases, the first of these follow-up measurements (i.e., the follow-up that was closest to the posttest) was used for the primary meta-analysis. Although this decision comes at the cost of relatively high heterogeneity in the length of included follow-up measurements, it also results in relatively little loss of data.

Separate meta-analyses were conducted for posttest and follow-up, as well as for the total scores of the MHC and emotional, social, and psychological well-being subscales. Since it cannot be assumed that effect sizes of studies using non-active control groups and active control groups represent the same true effect, primary meta-analyses were separately reported for active and non-active control groups (Karlsson & Bergmark, 2015; Levack et al., 2019). Because of the multilevel structure of the data (i.e., multiple comparisons nested within RCTs), three-level models implemented in the metafor package (Viechtbauer, 2010) were used for all meta-analyses. Three-level models can account for clustering of effect sizes within studies (level 2) and simultaneously model between-study effects (level 3). This allows to model statistical dependence without the correlations between dependent effect sizes known. Furthermore, this allows to include each individual effect size in the analyses instead of aggregating individual effect sizes from the same study, increasing power and precision of the pooled estimate (Cheung, 2019; Konstantopoulos, 2011; Scammacca et al., 2014; Van den Noortgate et al., 2013). A traditional two-level model was also conducted and compared with the three-level models to determine whether the more complex models significantly better fitted the data using a likelihood ratio test. The three-level model indeed significantly better fitted the data χ2(df = 1), 38.56, p < .001), further justifying the use of the more complex model. Additional meta-analyses with outliers excluded were conducted to examine the impact of outliers on the pooled effect. An effect size was classified as an outlier if the confidence interval did not overlap with the confidence interval of the pooled effect (Harrer et al., 2019). Effect sizes were interpreted as small (< 0.33), moderate (0.33–0.55) or large (> 0.56) (Lipsey & Wilson, 1993). Heterogeneity of effect sizes was determined using Cochran’s Q statistic. Heterogeneity was quantified using τ2 and I2 values (Cheung, 2014, 2019), which were separately reported for level 2 (within-study heterogeneity) and level 3 (between-study heterogeneity). The I2 statistic gives an indication of the total variance across all effect sizes included in meta-analyses, with smaller values being indicative of little variation between trials. Values of 25%, 50%, and 75% or higher indicate small, moderate, and high heterogeneity, respectively. The Q-statistic assesses whether effect sizes are different from each other, based on what would be expected on chance alone. Significant Q values are indicative of heterogeneity in effect sizes (Higgins et al., 2019).

In addition, univariable meta-regression analyses were conducted using three-level models to examine multiple potential moderators of the pooled effect size. These analyses were conducted for total and subscales scores of the MHC-SF at posttest. Due to the relatively small number of studies using active control groups, moderator analyses were limited to comparisons with non-active control groups. The following pre-specified moderators were examined: (1) population: non-clinical versus clinical, (2) duration: short (< 8 weeks) versus long (≥ 8 weeks), (3) delivery mode: web-based versus offline, (4) study quality: lower versus fair versus good. In addition, post-hoc analyses were conducted for the following subgroups: (1) intervention type: ACT versus positive psychology versus mindfulness versus life-review, (2) guidance: guided versus not guided, and (3) format: individual versus group versus self-help. Wald-type tests of model coefficients were used to see whether significant differences between specific categories of moderators exist. If a moderator contained more than two categories, separate linear combinations of model coefficients were tested using the ‘anova’ function implemented in the metafor package (Viechtbauer, 2010).

Publication bias was separately assessed for studies using non-active and active control groups. Funnel plots and precision effect test and precision-effect estimate with standard error (PET-PEESE) to determine the risk for publication bias. For the funnel plot, the treatment effect is plotted against the standard error of the treatment effect. A symmetric distribution of studies around the overall estimated effect is indicative of the absence of publication bias. Asymmetry of the funnel plot was statistically tested by including the inverse of the sample size as covariate in the three-level models. A significant relationship between the inverse of the sample size and the effect sizes would be indicative of an asymmetric funnel plot (Oosterhoff et al., 2016; Peters et al., 2006). PET-PEESE (Stanley, 2008; Stanley & Doucouliagos, 2014) represents a combined approach. First, the standard error is included as a potential covariate in the three-level model (PET). The procedure for PEESE is very similar, only that the squared standard error is used as a covariate. The intercept of these models (β0) can be seen as the estimated true effect of the meta-analysis after correcting for small study bias. If the intercept of the PET model is significantly larger than zero, it is recommended to use the PEESE intercept as an estimate of the true effect size (Harrer et al., 2019; Stanley & Doucouliagos, 2014). Two-tailed tests and 95% confidence intervals were used for all analyses.

3 Results

3.1 Study selection

The initial database search produced 2143 records, of which 1778 articles remained after duplicates were removed. Five articles were included through other sources. After the removal of titles and abstracts, 106 full-text articles were screened for eligibility, of which 60 articles were excluded. Ultimately, 46 articles met the inclusion criteria and were included in the meta-analysis. Figure 1 shows the PRISMA flow diagram summarizing the screening and selection procedure.

Fig. 1
figure 1

Flowchart of screening and study selection process

3.2 Study characteristics

The characteristics of the study, population, as well as the intervention and control groups are summarized in Table 1. All included studies used the MHC-SF and all studies were published in English peer-reviewed journals between 2010 and 2020. Included studies were mainly conducted in Western countries, with most studies being conducted in the Netherlands (n = 19), United States (n = 8), and Australia (n = 4). Only four studies were conducted in non-Western countries, namely in Chile, Guatemala, Suriname, and South Africa. The average duration of the interventions was 7.73 weeks, and the average number of sessions in the interventions was 7.46. Of all included studies, 18 used guided intervention and 28 treatments were self-help interventions. In terms of follow-up measurements, 29 studies used a follow-up measurement in addition to the post measurement. The average duration of the first follow-up measurement was 5.14 months. From these studies, 12 studies used more than one follow-up measurement, three studies used three or more follow-ups.

Table 1 Main characteristics of included studies (k = 46)

3.3 Population characteristics

In total, the included studies contained 7,618 participants, of which 4,055 participants were in the intervention conditions and 3,563 in the control conditions. The sample sizes were between 23 and 1,162 participants. At baseline, the mean age of participants was 37.02 (SD = 8.66) and 73.77% of the participants were female. A vast variety of populations was included in the meta-analysis. Of all studies, 28 studies were conducted in non-clinical populations, with most studies being conducted in university students (n = 8) and adults from the general population (n = 5). Only one study included pupils. The remaining 19 studies were conducted in clinical populations. Most of these studies included people with psychological problems (n = 15), such as depressive symptomatology (n = 8), and psychological distress (n = 5). Three studies included participants suffering from physical health problems, including Parkinson’s disease, chronic pain, and cancer. One study included people with mental and/or physical health problems.

3.4 Intervention and control group characteristics

An array of different psychological interventions was included, with the most prominent interventions being ACT (n = 14), positive psychology (n = 11), and mindfulness-based interventions (n = 7). Also, four studies were included that examined the effect of life-review therapy. From all studies, 11 studies examined the effect of other types of interventions, for example, health promotion interventions (Addley et al., 2014; Bonthuys et al., 2011), relationship interventions (Halford et al., 2017), or music-based interventions (Gold et al., 2017; Hides et al., 2019). One study (Walker & Lampropoulos, 2014) included both CBT and positive psychology interventions as separate intervention groups, which was also the only study using traditional CBT.

Control groups were non-active in 32 studies, while nine studies used active control comparison groups. Expressive writing was used as an active control condition in three studies (Lamers, et al., 2015; Lamers, et al., 2015; Pots et al., 2016; Trompetter et al., 2015). Other studies used, for example, book reading (Halford et al., 2017), a mindfulness workbook (Levin, Krafft, et al., 2020a), ecological momentary assessment (Levin et al., 2019), behavioral support (O’Connor et al., 2020), or problem-based care optimization (Weiss et al., 2020). Four studies (Bohlmeijer et al., 2020; Lamers et al., 2015, 2015; Pots et al., 2016; Trompetter et al., 2015) contained both an active and non-active control group. For one study, it was not clear what the control group contained (Bonthuys et al., 2011). The non-active control groups were waiting-list (n = 27), no intervention (n = 4) and treatment as usual (n = 2).

3.5 Study quality

Of all included studies, 15 were rated as good, while 17 studies were rated as fair, and 14 studies as lower. The average number of criteria fulfilled was 80.5%. The criteria that were fulfilled in most studies were the use of qualified professionals and the description of in- and exclusion criteria, with 90% and 91.3% of the studies fulfilling these criteria, respectively. Conducting a power analysis or having an adequate sample size and testing treatment integrity were only fulfilled in 65.9% and 63.3% of the studies, respectively. An overview of the results from the quality scoring of included studies can be found in Table S3 of the Supplemental Material.

3.6 Meta-analyses

For total MHC-SF scores, 64 comparisons from 45 studies were analyzed at posttest. For the subscales of emotional, social, and psychological well-being, 37 comparisons from 37 studies were analyzed at posttest, respectively. For follow-up scores, 33 comparisons were included for total scores, while 22 comparisons were included for emotional, social, and psychological well-being, respectively. Table 2 summarizes the three-level models for the effects of the interventions on the total scores of the MHC-SF compared with non-active control groups, as well as the subscale scores at posttest and follow-up and the variance components of the three-level models. Table 3 summarize the effects for interventions compared with active control groups. Posttest and follow-up findings are reported separately with and without outliers.

Table 2 Three-level random-effects meta-analyses of the effect of interventions on well-being measured with the MHC-SF (compared with non-active control groups)
Table 3 Three-level random-effects meta-analyses of the effect of interventions on well-being measured with the MHC-SF (compared with active control groups)

3.6.1 Post-intervention effects

For studies using non-active control groups, significant small effects were found for the total scores of the MHC-SF (β = 0.25, 95% CI: 0.14 to 0.37, p < 0.001), as well as for emotional (β = 0.27, 95% CI: 0.18 to 0.36, p < 0.001), social (β = 0.25, 95% CI: 0.17 to 0.33, p < 0.001) and psychological well-being (β = 0.30, 95% CI: 0.20 to 0.39, p < 0.001). Effects with outliers excluded were very similar in magnitude and also significant (β = 0.29, 95% CI: 0.23 to 0.36, p < 0.001). Similarly, significant effects were found for the subscales when outliers were excluded. When compared with active control groups, interventions were not effective in improving well-being at posttest.

Significant heterogeneity in effect sizes for studies with non-active control groups was found for the total scores of the MHC-SF with outliers included (Q = 199.43, df = 48, p < 0.01). Heterogeneity at level 3 (τ2(3)) was 0.09 and zero at level 2 (τ2(3)). The corresponding I2 were 75.14% and 0.00%, suggesting that about 75% of the variance is explained by differences between studies, no variance is explained by differences within studies and about 25% of the variance is explained by sampling error. Similarly, significant heterogeneity was found for the three subscales, with I2 for level 3 ranging from 49.01% to 55.65%. Not surprisingly, heterogeneity substantially decreased when outliers were excluded. Figure 2 and 3 show the forest plots for effects on total scores of the MHC-SF at posttest for studies using non-active and active control groups, respectively.

Fig. 2
figure 2

Forest plot of included studies using non-active control groups

Fig. 3
figure 3

Forest plot of included studies using active control groups

3.6.2 Follow-up effects

For studies using non-active control groups, significant small effects were found for the total scores of the MHC-SF at follow-up (β = 0.26, 95% CI: 0.13 to 0.40, p < 0.001). For the subscales, the largest effects were found for emotional (β = 0.38, 95% CI: 0.16 to 0.59, p < 0.01) and psychological well-being (β = 0.39, 95% CI: 0.23 to 0.56, p < 0.001). Smaller, but still significant effects were found for social well-being (β = 0.24, 95% CI: 0.10 to 0.38, p < 0.01). Results were similar with outliers excluded. When compared with active control groups, interventions were not effective in improving well-being at follow-up.

3.6.3 Subgroup analyses

The findings of the three-level meta-regression analyses including moderators are summarized in Table 4. Among all potential moderators, univariable meta-regression models indicated that only guidance and quality of studies seemed to moderate the effect of the interventions on total scores of the MHC-SF at posttest. Studies that were guided had a significantly stronger effect than non-guided studies (F1,43 = 5.05, p = 0.03). Also, studies with good quality had a significantly stronger effect compared with low quality studies (F1,42 = 4.15, p = 0.048).

Table 4 Univariable mixed-effect meta-regression models to examine moderators for the effects of the interventions on well-being (measured with the MHC-SF at posttest)

For the emotional, social, and psychological well-being subscales, effect sizes did not significantly differ based on population, intervention type, duration, guidance, delivery mode, and study quality. However, offline studies were more effective in improving social well-being compared with web-based studies (F1,28 = 6.35, p = 0.02). Subgroup analyses for the subscales can be found the Supplemental Material (Table S4, S5, and S6).

3.7 Publication bias

Visual inspection of the funnel plots did not indicate skewness of the included effect sizes for the total scores of the MHC-SF, neither for studies using non-active nor for studies using active control groups (Figure S4 and S5 in Supplemental Material). For studies using non-active control groups, we found that the inverse of the sample size was significantly related with observed effect sizes (p < 0.01), but not for studies using active control groups (p = 0.35). For studies using non-active control groups, the PET intercept was significantly larger than zero (p < 0.001), and therefore the PEESE intercept was used as an estimate of the true effect after correcting for small study bias. The PEESE intercept was also significantly larger than zero (β0 = 0.35, p < 0.001), and somewhat larger than the effect size from our primary meta-analysis. For studies using active control groups, the PET intercept was not significantly larger than zero, and was also somewhat larger than the pooled effect from our primary meta-analysis (β0 = 0.13, p = 0.55). In sum, this indicates that our findings were not threatened by publication bias.

4 Discussion

The last two decades experienced a rapid increase in well-being studies and studies using well-being as an outcome (Seligman, 2019). However, there is still a strong discrepancy between studies in the way well-being is conceptualized and measured. Therefore, the goal of this study was to estimate the effects of psychological interventions in improving well-being as assessed by one consistent, evidence-based instrument that measures well-being in a comprehensive way (MHC; Keyes, 2002). Small significant effects on well-being at posttest and follow-up were found for the total scores and the emotional, social, and psychological well-being subscales.

4.1 Main findings

The findings that overall well-being significantly increased at posttest is in line with previous meta-analyses of psychological (Weiss et al., 2016) and positive psychology interventions (Bolier et al., 2013a; Carr et al., 2020; Chakhssi et al., 2018; Hendriks et al., 2020a; Hendriks et al., 2018; Koydemir et al., 2020; Sin & Lyubomirsky, 2009). However, when compared with non-active control groups, the size of the effect (0.25) was substantially smaller compared to some of the previously published meta-analyses, which found effect sizes up to 0.61 for overall well-being (Sin & Lyubomirsky, 2009), 0.48 for emotional (Hendriks et al., 2018), and 0.44 for psychological well-being (Weiss et al., 2016). Still, some other meta-analyses reported smaller effects of 0.28 (Chakhssi et al., 2018) or 0.23 (Koydemir et al., 2020) for overall well-being at posttest, suggesting similar effects as the effects obtained in our study. Interestingly, the effect of interventions completely seemed to disappear when only studies were analyzed that included active control groups. This shows that the comparator used has a substantial impact on the overall effect size of meta-analyses (Karlsson & Bergmark, 2015; Levack et al., 2019), and is in line with previous meta-analyses that also found that pooled effects substantially decrease when the comparator includes active elements (e.g., Breedvelt et al., 2019; Carr et al., 2020).

The differences in effect sizes compared to previous meta-analyses could have (at least) two possible explanations. First, well-being as measured with the MHC-SF might be a relatively stable construct. Support for this comes from a representative longitudinal study including 1,932 Dutch adults, showing that all dimensions of the MHC-SF were consistent over a period of nine months and four timepoints (Lamers, Bolier, et al., 2012; Lamers, Glas, et al., 2012). Previously published meta-analyses on well-being included more specific outcomes of well-being, such as positive affect or hope (e.g. Bolier et al., 2013a, 2013b; Sin & Lyubomirsky, 2009). These specific outcomes might be more prone to change. Especially since in many cases, the interventions were directed to address these specific outcomes (e.g., a gratitude intervention to increase gratitude). This underlines the relevance of clear well-being definitions and consensus on how well-being is measured, as different instruments from different traditions (Diener & Ryan, 2009; Ryff, 1989) may lead to different results. Second, differences in methodologies  might lead to substantial differences in effects. White et al. (2019) reanalyzed the effects of two of the most-cited meta-analyses on positive psychology interventions (Bolier et al., 2013a, 2013b; Sin & Lyubomirsky, 2009) and concluded that the effects from these meta-analyses are likely to be overestimated. The main reasons for this were that meta-analysis did not weigh studies by their sample size, used different in- and exclusion criteria, and did not adjust for small sample size bias (White et al., 2019). This shows that deployed methodologies strongly differ between meta-analyses, which may have a substantial impact on the estimated effects. To make results more comparable between meta-analyses, we encourage future meta-analyses to adhere to tools such as the Cochrane guidelines or the PRISMA-checklist and to be transparent about decisions that are taken in the course of analyses.

Interestingly, it was not found that specific types of interventions were more effective in improving well-being, nor in improving specific dimensions of well-being. Mindfulness and ACT both had a significant effect on overall well-being. One might assume that positive psychology interventions are substantially more effective in improving well-being as they are specifically designed to do so (Seligman & Csikszentmihalyi, 2014), but this was not the case in our meta-analysis. Instead, we found that positive psychology interventions had the smallest and a non-significant effect on well-being, which contradicts previous meta-analyses on positive psychology interventions (e.g., Bolier et al., 2013a, 2013b; Sin & Lyubomirsky, 2009). This could partially be explained by some outliers that occurred in the positive psychology subgroup and lower quality of studies. Yet, our findings suggest that different interventions (not only those confined to positive psychology) can improve well-being, including ACT and mindfulness-based interventions. This is relevant, since it extends on previous meta-analytic findings showing that ACT (A-tjak et al., 2015; Öst, 2014) and mindfulness (Hofmann et al., 2010; Khoury et al., 2013) are effective in alleviating psychological symptoms. Our findings suggest that these interventions, in addition to alleviating symptoms, also have the potential to enhance overall well-being to varying degrees.

Only a few differences between subgroups were found, suggesting that effects on well-being are independent of most moderating variables included in our study. Self-help interventions did not show significantly smaller effects than in-person delivered interventions and effects of web-based interventions were comparable with effects of offline interventions. Previous meta-analyses suggest that self-help interventions might be as effective in improving psychopathology as more time-intensive face-to-face interventions (Cavanagh et al., 2014; Cregg & Cheavens, 2021; Hirai & Clum, 2006) and that web-based interventions are effective in improving psychological problems (Chiesa & Serretti, 2009; Sevilla-Llewellyn-Jones et al., 2018). Our findings suggest that self-help and web-based interventions also have the potential to enhance well-being. Furthermore, no difference was found between non-clinical and clinical populations, suggesting that well-being can also be improved in the latter. This is also important, since well-being has been shown to be an important predictor for recovery from psychological problems (Iasiello et al., 2019; Schotanus-Dijkstra et al., 2019) and to be associated with decreased risk of future disorders (Lamers, et al., 2015; Lamers, et al., 2015; Schotanus-Dijkstra, Ten Have, et al., 2017). Enhancing well-being in clinical or vulnerable groups might therefore have beneficial effects in terms of recovery and recurrence of psychological problems.

Regarding the subgroup analyses, it should also be noted that the number of comparisons within some subgroups was relatively small and heterogeneity was high, which limits the power to detect statistically meaningful differences between subgroups (Cafri et al., 2010). Drawing final conclusions based on the results from the subgroup analyses would therefore be premature.

Similar effect sizes were found for the emotional, social, and psychological well-being subscales, with a slightly smaller effect size for social well-being at follow-up. This could be explained by the finding that the MHC-SF subscales, especially emotional and psychological well-being, are highly positively correlated, suggesting that if one improves the other one is also likely to increase (Lamers et al., 2011). Previous meta-analyses (Bolier et al., 2013a, 2013b; Hendriks et al., 2018; Koydemir et al., 2020) also showed that both emotional and psychological well-being improved, and not only one type of well-being. Our findings are in line with this and suggest that psychological interventions seem to comprehensively improve well-being and that the effect of the interventions does not seem to be unique to hedonic or eudaimonic well-being. Our study offers strong support for this conclusion, as the way hedonic and eudaimonic well-being are measured in included studies is standardized since all of the studies used one consistent measurement instrument. This finding is also important, as studies have shown that both hedonic and eudaimonic well-being are important predictors of mental and physical health (DuPont et al., 2020; Lamers, Bolier, et al., 2012; Lamers, Glas, et al., 2012; Martín-María et al., 2017; Wood & Joseph, 2010). Despite decreases in the effect sizes, effects on well-being were still significant at follow-up (with outliers included). This suggests that the improvement in well-being does not merely represent a short-term improvement, but can also be sustained in the longer term.

4.2 Strengths and limitations

One strength of this meta-analysis is the use of one single, comprehensive and well-validated well-being measure. This allows to explicitly determine the effect on well-being measured in one consistent way. Another strength is the thorough execution of our study, pre-registration of the protocol and compliance with guidelines and best practice recommendations as well as the transparency of important steps taken in the course of our study and analysis.

However, this meta-analysis is not without limitations. First, for the analyses of follow-up effects, it was decided for a pragmatic approach and simply included the first follow-up measurement in the analyses. This results in relatively high heterogeneity in the included follow-ups, which should be kept in mind when interpreting the results. This approach was used to minimize the loss of data, as no study with a follow-up measurement needed to be excluded from the analyses. Second, due to the limited number of studies in some subgroups, not all potentially interesting moderators could be tested. For example, age was found to be a significant moderator in a previous meta-analysis on positive psychology interventions (Carr et al., 2020). Third, studies were not included that used only one or two (longer) versions of the subscales included in the MHC-SF such as Ryff’s Psychological Well-Being Scales (Ryff & Keyes, 1995) or the Social Well-being Scale (Keyes, 1998). This would have given us more data on the subscales, but it was decided against the inclusion of those scales, as our goal was to use a single consistent outcome measure of well-being. Also, the MHC-SF was not the primary outcome in all included studies. Fourth, grey literature was not searched and only RCTs were included. This could introduce publication bias, which we, however, did not find. Although it would also be interesting to conduct a review with all types of research designs that used the MHC-SF in the future, only RCTs were included as this represents the gold standard of trials.

4.3 Implications

To our knowledge, this is the first meta-analysis examining the effect of psychological interventions on well-being using one consistent and comprehensive measurement instrument. Our findings suggest that emotional, psychological, and social well-being can be improved. This indicates that psychological interventions have a significant effect on overall well-being, but that this effect might be smaller than previously reported. Also, our findings suggest that the effect on well-being is rather universal, and does not seem to depend on the factors population, intervention type, or delivery mode. However, we did find a significantly higher effect for guided than for non-guided interventions. Despite the relatively small effect size, comprehensively improving well-being on a public mental health level might still have a large-scale impact. In this context, our findings also suggest that self-help and web-based interventions are not less effective than in-person delivered and offline interventions, respectively. Considering that such interventions can represent low-threshold, cost-effective, and easy distributable treatment options, they might represent valuable options to improve well-being.

Also, the quality of included studies was relatively high, with 15 studies rated as good. This is a remarkable finding, considering that the quality of studies was scored conservatively (i.e., studies had to score 1 on all items of the quality assessment to be rated as good) and since previous meta-analyses concluded that included studies in their meta-analysis were of rather low quality. Despite this promising finding, especially two items of the quality assessment were rated relatively low across studies: power calculation (65.9%) and treatment integrity (63.3%). Therefore, future studies are encouraged to be based on proper power-analyses and check treatment integrity, for example by including an intervention logbook in the study. Furthermore, our findings suggest that well-being as measured by the MHC-SF is sensitive to change, since significant improvements in overall well-being were found. This is important for researchers and practitioners who want to use the MHC-SF for their work.

5 Conclusion

The findings of this meta-analysis suggest that psychological interventions are effective in improving well-being, also when well-being is measured with one consistent and comprehensive instrument. Although the effects on well-being were smaller than in some previously published meta-analyses, the findings are promising and show that psychological interventions have the potential to improve well-being. The results also suggest that different interventions can improve well-being, and that well-being can be improved by means of different formats, delivery modes and in both clinical and non-clinical populations.